Excluding Non-Regular Diseases from Open Targets API Results

Hi everyone,

I need help with a code I’m writing.

Basically, I’ve already written code that retrieves, for each target UniProt ID in a list, all the associated diseases. From analyzing the results of this code (see the code below), I’ve noticed that some of the retrieved diseases are not “regular” diseases. In fact, some of the retrieved diseases are “lymphocyte count,” “blood protein measurement,” “body mass index,” and other related terms.

So, my question is: Is there a way to filter out these “not regular” diseases and only retrieve “real” diseases?

I have seen that those “not regular” diseases are not in the “disease” ontology tree if we look in the Ontology Lookup Service OLS, Ontology Lookup Service (OLS) (for example, lymphocyte count “EFO_0004587” is under “measurement”) , while other proper diseases (such as Alzheimer’s disease “MONDO_0004975” or rheumatoid arthritis “EFO_0000685”) are under the “disease” ontology tree.

Is this the right way to filter out “not regular” diseases? Can I do that using the Open Targets Platform API? Or do I have to somehow query OLS with the disease ID?

This is the code I’m using to fetch target associated diseases:

query retrieve_diseases_from_target_UniprotId($queryTerms:String!) {   
  mapIds(queryTerms: [$queryTerms]) {
    mappings {
      hits {
        id
        name
        object {
          ... on Target {
            id
            approvedSymbol
            approvedName
            associatedDiseases(
      page: { index: 0, size: 30000}
      orderByScore: "score"
      BFilter: ""
      aggregationFilters: []
      enableIndirect: false
    ){
              count
              rows {
                disease {
                  id
                  name
                }
                score
              }
            }
          }
        }
      }
    }
  }
}

Is there any flag I can use to only retrieve proper diseases?

Thank you so much :smiley:
Vittorio

1 Like

Hi @Vittorio_lembo,

It sounds like you are trying to filter out any phenotype for which the parent term is Measurement (EFO: 0001444).

All the child terms of this phenotype are listed on the profile page for Measurement. Can you use this information to filter them out?

1 Like

Hi @hcornu ,

Yes, you got the point. Additionally, I want to exclude “diseases” that originate from other phenotypes such as medical procedures (EFO_0002571), and phenotypes (EFO_0000651) and maybe others.
I think the best way to do this is by first selecting only the ones I’m interested in. So, I think the steps are:

  1. Retrieve each efoId for all diseases stored in OTP. (Is there a way to do this with a script?)
  2. For each fetched efoId, retrieve the first ancestor(s).

Is this the best way to do it? What do you think?

Thank you,
Vittorio

Hi @Vittorio_lembo,

I think you are on the right track in using the ontology tree to identify terms that are not of interest. The principle is that if you want to exclude anything under the measurements branch, you should filter out any disease that has this term as an ancestor.

I would 1) store in a list the terms you want to discard; 2) then you make the API query to fetch the diseases (and their ancestors) linked to your targets. I only tested it with a single Uniprot, but the core of it would be something like this:

query retrieveDiseasesFromUniprotId($uniprotId: String!) {
  mapIds(queryTerms: [$uniprotId]) {
    # Map Uniprot ID to Ensembl
    mappings {
      hits {
        id
        name
        # Retrieve associated diseases for the Ensembl term
        object {
          ... on Target {
            associatedDiseases {
              count
              rows {
                disease {
                  id
                  ancestors
                }
              }
            }
          }
        }
      }
    }
  }
}
{
  "uniprotId": "Q7Z403"
}

and finally 3) you parse the API response to exclude IDs where the ancestors list has any element from your list of terms that do not refer to diseases.

Let me know if it helps. To avoid doing so many queries, you could accomplish the same thing using the datasets (target, diseases, and associations).

Best,
Irene

2 Likes