Question regarding gene target aggregation in Nature Genetics paper

Hello,

I have read the Nature Genetics paper entitled “Genetic factors associated with reasons for clinical trial stoppage”.

I am interested in the analysis on the link between target gene features (e.g. constraint, loss-of-function intolerance, RNA specificity) and clinical trial stoppage due to safety. I have looked into the GitHub repo but I have not been able to find an answer to my question.

Let’s focus on gene constraint for simplicity. I understand that many drugs have several gene targets. For each of them there is a separate gene constraint score from gnomAD. My question is how the several gene constraint values for one drug are aggregated into a single score to study the relationship between the “overall drug gene constraint score” and clinical trial stoppage. The same question to loss-of-function intolerance (pLI) & RNA specificity analysis.

Many thanks

@irene @ochoa tagging you since you seem to be the main contributors in the GitHub repo

Hi @agamemnonc!

That is a great question. In our study we don’t aggregate any annotation by clinical trial, what we do is to split the information before calculating the enrichments. This way, each evidence is analysed independently. So, for example, for a given clinical trial that has stopped due to safety reasons, you might have different information if the tested drug has multiple targets, as you comment, or if the study uses multiple drugs as intervention.

We have made available here the table that shows how we have annotated every trial for which we have information about the studied condition and the target that the drug is modulating. In the page, you can download the data directly or explore it programmatically.

This query will return you a few examples of studies with multiple drugs/targets, and the different values of the annotation we have for them.

SELECT 
    nctid, 
    COUNT(DISTINCT targetId) as num_targets,
    STRING_AGG(DISTINCT gc, ', ') AS distinct_gcs,
    STRING_AGG(DISTINCT lof_tolerance, ', ') AS distinct_lofs,
    prediction
FROM 
    train
WHERE
  nctid IS NOT NULL
  AND isStopped == 'stopped'
  AND prediction == 'Safety_Sideeffects'
GROUP BY 
    nctid, prediction
ORDER BY 
    num_targets DESC
LIMIT 20;

Thank you for your question!
Irene

1 Like

@irene Thank you so much for your detailed response and for taking the time to share a query for this, that’s much appreciated!

And congrats on publishing this paper, it is really great work!

1 Like