Sequence ontology data missing from output and ETL step

I have a working version of OT running locally, but it seems that the Sequence Ontology data is not being processed by the platform-etl-backed code. I have the so-inputs but I don’t see to get any Sequence Ontology data, nor do I see it in the google cloud OUTPUT folder for many of the releases.

Is the SO data being treated differently and not loaded into Elastic Search? Perhaps it is treated in the same way it seems we are treating the EFO disease hierarchy which seems to be provided as a standalone JSON file which the UI directly parses to get the disease hierarchy…

In my UI, I don’t get the nice parsing of values like “SO_0001628” as “intron variant” in the UI and always provides N/A, while I do see the values being parsed on the OT Main release, so I am missing something I guess?

Thanks, thon

Yes, SO data is treated a little differently. If you used the Platform Output Support project you need to update a field in the configuration files to point to the correct place. ```The field config_direct_json should be set to point to where you downloaded the inputs. For the 22.09 release the correct configuration value would be gs://open-targets-data-releases/22.09/input

This definitely isn’t intuitive: keep an eye on this issue which will update the ETL so that all the outputs are in the output directory and it’s clearer what POS is doing.

You can see in POS that some files are copied directly from the input directory and then used to populate the databases.

1 Like

Thanks, that did the trick! I am using the ES/CH scripts directly and not using the image building scripts, but it helped me get it working locally…