Hi,
I am struggling a bit to understand the (new) associaton scores, and to what extent they are comparable. I just looked at two of the strongest and well-known associations in cancer; EGFR in lung cancer, and BRAF in melanoma. From your web interface, the association of BRAF with cutaneous melanoma has a score of 0.82 (mouse over in the Associated diseases/Table pane), while the association of EGFR with lung adenocarcinoma has a score of 23.3 (similar mouse over). Are they at all comparable, in the sense that evidence for an association between EGFR and lung cancer is many orders higher than the association between BRAF and melanoma? And is there at all a global scale when it comes to the scores, or is the scale “local” for each target and its respective associations?
Hi @sigven! I’ve responded in our GitHub issue tracker but also wanted to post here for other Community members.
In regards to the differences when comparing associations displayed in the user interface and associations available in our dataset downloads, our data and technical teams have investigated the issue. Both the data available in the user interface and the associations datasets available for download are correct and valid. However, the difference between them is due to a slightly different algorithm and normalisation and harmonic sum strategy. We expect that the ranking between the user interface and the datasets will be broadly similar, but there will be some differences due to the different algorithms.
We will be harmonising our approach with our next release — 21.06 — scheduled for release at the end of June. This will mean that both the user interface and datasets will provide the same data.
Just wanted to let you know that with our recent Platform 21.06 release, we have harmonised our approach and so the scores in our datasets and user interface should now be the same. We have also simplified the associations datasets and only include the targetId, diseaseId, score, and evidenceCount fields — and so please note that the file size for the 21.06 data will be smaller than the data from our 21.04 release.
Thank you for being so patient and waiting while we fixed the mismatch in the association scores. Last week, our back-end team applied a fix and have regenerated the files and they match what is now available in the UI.
You can find the files in our FTP in both JSON and Parquet formats. Alternatively, you can also access the data through our BigQuery instance, open-targets-prod.