Where to find the harmonic sum calculations in the code?

njeanray · 6 August 2021 09:02

Hello Open Targets Community,

I’d like to know where to find the calculations of harmonic sums and more generally, where to find the scores calculation in the code ? In which module can I find these informations ?

Thanks in advance,

Best regards,
Nathalie

ahercules · 9 August 2021 12:01

Hi @njeanray!

The harmonic sum calculations for the overall, datasource, and datatype association scores can be found in the Association.scala module.

Individual pieces of evidence are scored in the evidence_datasource_parsers module or ETL configuration.

For example, the PhenoDigm evidence is scored in the PhenoDigm.py module. The resulting resource_score field is then used in the ETL configuration to calculate the association scores.

However, for a datasource like ClinGen, the individual pieces of evidence are scored in the ETL configuration file. The confidence string generated by the ClinGen module is used and mapped to a score.

Also, note that the ETL configuration also sets the default weights and datasource-specific weights.

Cheers,

Andrew

pkoneill · 10 March 2022 20:24

Hi @ahercules,

Could you say a bit more about how the harmonic sum scores are calculated? I ask because the scores are sensitive to the order of elements in the score vector, and I’m curious where this ordering is defined? I’m having some trouble reproducing overall association scores and wondering if it’s due to misordering-- I’m using the implied ordering in the figure on this page (Target - disease associations - Open Targets Platform Documentation) (with genomics_england corresponding to i=1 and phenodigm, i=22) but I’m not sure if that’s what’s intended?

Thanks in advance.

ahercules · 12 March 2022 12:36

Hi @pkoneill!

Unfortunately, I have recently left the Open Targets team, but will try and answer your question below. Feel free to tag the help desk team – @SirTarget – for further assistance.

Details on how associations are scored – overall, by data source, by data type – can be found in the Association.scala file that is part of the Platform ETL pipeline.

To understand how the individual evidence scores are prepared and sorted prior to scoring the association, please look at line 165 of the Association.scala file where the prepareEvidences function uses PySpark’s repartionByRange and sortWithinPartions functions to sort the evidence before returning the evidence set. This returned evidence set is then used in other functions to calculate the direct and indirect association scores on an overall, per data source, and per data type basis.

When reviewing Platform association scores, it is important to note that the association score is not a confidence assessment of the target-disease association, but rather than assessment of the availability of data for a given target-disease association.

I hope this helps answer your question and as I said, the Open Targets help desk team can provide further assistance – just tag their profile handle @SirTarget.

Cheers,

~ Andrew

Topic		Replies	Views
How are associations scores calculated in the Open Targets Platform? Platform FAQs	0	494	16 June 2023
The number of evidences from "open-targets-prod.platform.evidence" does not match "evidenceCount" Google BigQuery/Cloud	6	283	11 August 2023
How to calculate the overall association score? General ot-platform	2	250	1 November 2023
Overall association score with scored weights of 0 General ot-platform	1	40	3 October 2024
Exact relationships between drug scores and clinical trial evidence Frequently Asked Questions	7	546	24 May 2022

Where to find the harmonic sum calculations in the code?

Related topics