What kind of score is returned when using the "search" endpoint?

Hi there,

I would like to query OpenTargets via GraphQL based on general search terms, e.g. “breast cancer” (I do not have a suitable EFO ID at hand for all search terms). As such, I am using the search endpoint. However, I noticed that the score that is returned does not have a range between 0 and 1 as stated in the documentation (and which is the case when using the more specific endpoints). Can you please explain what this score is actually about?

This is my query:

query simpleQuery{
  search(queryString: "glioma", entityNames: ["target"], page: {index: 0, size: 10}){ 
    hits{
      	id
      	name
      	entity
      	score
    }
  }
}

and these are some of my results:

{
  "data": {
    "search": {
      "hits": [
        {
          "id": "ENSG00000108231",
          "name": "LGI1",
          "entity": "target",
          "score": 873.07935
        },
        {
          "id": "ENSG00000025293",
          "name": "PHF20",
          "entity": "target",
          "score": 353.00244
        },
...
]
    }
  }
}

I would be happy if you could help me with that.

Best,
Cindy Perscheid

Hello @CPerscheid! :wave:

Welcome to the Open Targets Community! :tada:

When using the search endpoint, the score is the a measure of how well your search term matched the returned result. It is not connected or related to the target-disease association scores. In order to access the target-disease association scores, you would need to use the target or disease endpoint and pass the relevant ID.

I hope this helps answer your question — and feel free to respond below if you have any further questions about the search endpoint for our team.

Thank you,

Andrew :slight_smile:

Dear @ahercules,

thanks for letting me know. I used the function get_associations_for_disease from the Python OpenTargetsClient before, which automatically resolved the corresponding IDs for a disease search term. I suppose I will have to do this task now on my own via the general search query first to get a valid EFO ID for my search term and then use it for the actual query?
If so: what would be the maximum score if we had an exact match, and at what level could I consider a returned result to be a good match (aka: how is the score calculated? Is it some string distance measure?)?

Best,

Cindy

Hi @CPerscheid!

Getting the right EFO ID for a set of disease labels is a problem we face everyday at Open Targets (and it is not easy!).

Given your use-case, I would suggest you to use OnToma - a tool that we have developed in-house and that we use for exactly that purpose.
Both the Python client and the CLI are easy to implement within your code or as an independent step of your data processing. Here is the documentation: OnToma documentation — OnToma documentation

Unlike the API, OnToma only returns results of high confidence and quality. It is therefore a more complex algorithm of that implemented in the search endpoint and it prevents you from the overhead of making many API queries.

If you are interested in the association scores for target/disease pairs, I would suggest you to:

  1. generate a table with all the disease labels that you have + the EFO IDs outputted by OnToma;
  2. join that table with the associations dataset from our Data Downloads page.

This can be rapidly done using Pandas.
I hope we have been of help. Please reach out if you have further questions!

Best,
Irene

Hi @irene,

OnToma sounds great - thanks for pointing out! I will definitely try it out. :slight_smile:

Best,
Cindy