Thank you for this new release and making all these data available.
After downloading and checking the data, I think that protein expression (at least per cell type) is missing. It seems to be also the case for mouse model literature references. These information have NA values for a few targets I checked on the web site ( Open Targets Platform , Open Targets Platform )
Hi @pgodard , Thanks again for reporting this bug! We have addressed the issue at code level and also applied a patch on the released data. Now, all our distributions contains the right dataset: UI, BigQuery, API and FTP.
We have made mayor updates in our data processing pipelines/processes with the last release, so please keep us posted in case you encounter anything suspicious. Thank you.
The lack of literature references on the Mouse Phenotype widget has been addressed under #4169, however the fix in the data is not rolled out due to a number of complicated consequences. The fix will be out with the next release, which is due around the second half of March.
Thanks for continuing to improve and develop open targets. It has been a great resource for research.
I was checking the tables on BigQuery and noticed that the evidence table only has a little over 1000 rows way fewer than what is shown in the release notes (32,515,132). Could someone verify that the table was uploaded correctly? Thank you!
Hello! And welcome to the Open Targets community! Yes, this is bug. It has been reported under this community post. Although it has been resolved on our end immediately, Google only syncs data every first day of the month. So we need to wait for another week to see the effect. We keep our eyes on the updated data and will let you know if there’s any outstanding concern. In the meantime, please don’t hesitate reaching out if you see anything strange.
I can confirm, the Platform dataset has been updated on BigQuery! This query:
SELECT
evidence.datasourceId,
COUNT(*) as record_count
FROM
`bigquery-public-data.open_targets_platform.evidence_*` AS evidence
WHERE
evidence.targetId = "ENSG00000139618"
GROUP BY
evidence.datasourceId
ORDER BY
record_count DESC;
The different evidence sources are stored in distinct tables named as evidence_{sourceId}, so if you want to read all evidence you’ll need to read all tables as FROM `bigquery-public-data.open_targets_platform.evidence_*` AS evidence. And from now on, you don’t need to apply a filter if you want to work with source specific evidence.
One caveat: for some reason, Google syncs data without dropping tables that are no longer found in our datasets, so the evidence table is still there, please don’t use it, or read all evidence as evidence*. We’ll reach out to Google to make sure this table will not be propagated for the next release.