First, thanks a lot for curating this wonderful platform! I am currently working on drug target prioritization and am looking for some evidence on target-disease associations. Looking at the Drugs scores & clinical trial evidence, I am a bit confused as to how exactly the scores are calculated. For example, there is one “Phase I (Completed)” trial for MTOR-melanoma association, and according to the table here the raw score would be 0.1. How was it then transformed to 0.06?
Also, is the score of a particular phase assigned once the phase starts, even if the trial is still recruiting?
Regarding the first question, there is a difference between the evidence scoring - one single clinical trial for this case - and the scoring of ChEMBL as a datasource for a particular target-disease pair. 0.1 is the result of the first and 0.06 is the result of the second. More info here
About the status of the trial, yes at the moment we score in the same way a trial that is completed and a study that is recruiting . It has been a matter of discussion but we agreed that we have not much-increased certainty in the latter, as it might have not met its primary endpoint.
Thanks a lot for the clarification! Could you explain a bit more on the harmonic sum (e.g., what does S_k refer to)? I did not quite understand how through this calculation 0.1 is transformed to 0.06 still.
There is a piece of information that it’s not really well-documented. After the harmonic sum is calculated is divided by the maximum theoretical value (e.g. a vector filled with “1” of size 1000). This number is close to 1.6. So when the 0.1 is divided by this number ~1.6 it results in the 0.06 that you can see at the datasource level.
We will work on making this more clear in the documentation which at the moment lacks some of the important details. Sorry for that.
Thanks for getting back to me! It would be very helpful if the document could be updated. Ah, I got what the denominator is now! As a follow-up, for the numerator, the S_k’s are just the harmonic mean of the first k trials’ scores, and the raw scores are being ordered descendingly when there are multiple trials (pieces of evidence), is this correct?
Really sorry for bothering you again but I tried multiple average options for S_k and none fit the scores well enough. Could you kindly point me to the definition of S_k? Thanks in advance!
Hi, I just wanted to ask if it would be possible for the document to display a more polished formula in the harmonic mean part, where the meaning of each symbol is clearly labeled. It would be even better if an example could be included. I’m still struggling to see if there exists some mapping between the score and my own threshold requirement, and have turned to directly query the raw evidence for help. Yet there are a couple of tiny inconsistencies (a couple of drugs-targets are not uncovered through querying raw evidence downloaded from FTP for some specific disease, probably because the identifiers of disease descendants I used did not include some specific identifiers…), so it would still be very helpful to understand how the scores are calculated.