I have read the Nature Genetics paper entitled “Genetic factors associated with reasons for clinical trial stoppage”.
I am interested in the analysis on the link between target gene features (e.g. constraint, loss-of-function intolerance, RNA specificity) and clinical trial stoppage due to safety. I have looked into the GitHub repo but I have not been able to find an answer to my question.
Let’s focus on gene constraint for simplicity. I understand that many drugs have several gene targets. For each of them there is a separate gene constraint score from gnomAD. My question is how the several gene constraint values for one drug are aggregated into a single score to study the relationship between the “overall drug gene constraint score” and clinical trial stoppage. The same question to loss-of-function intolerance (pLI) & RNA specificity analysis.
Many thanks
@irene@ochoa tagging you since you seem to be the main contributors in the GitHub repo
That is a great question. In our study we don’t aggregate any annotation by clinical trial, what we do is to split the information before calculating the enrichments. This way, each evidence is analysed independently. So, for example, for a given clinical trial that has stopped due to safety reasons, you might have different information if the tested drug has multiple targets, as you comment, or if the study uses multiple drugs as intervention.
We have made available here the table that shows how we have annotated every trial for which we have information about the studied condition and the target that the drug is modulating. In the page, you can download the data directly or explore it programmatically.
This query will return you a few examples of studies with multiple drugs/targets, and the different values of the annotation we have for them.
SELECT
nctid,
COUNT(DISTINCT targetId) as num_targets,
STRING_AGG(DISTINCT gc, ', ') AS distinct_gcs,
STRING_AGG(DISTINCT lof_tolerance, ', ') AS distinct_lofs,
prediction
FROM
train
WHERE
nctid IS NOT NULL
AND isStopped == 'stopped'
AND prediction == 'Safety_Sideeffects'
GROUP BY
nctid, prediction
ORDER BY
num_targets DESC
LIMIT 20;