Hi Eléonore,
And what would be a way to get the gene id for the target (not Ensembl)
When querying the GraphQL API, the only way to retrieve association is to provide Ensembl gene id or the EFO disease id. These are the type of identifiers our infrastructure uses for genes and diseases. So if you have gene names or symbols, you need to map them first. For the mapping you can use either our downloadable flatfiles or you can use Ensembl’s REST API, that allows you to map gene names and symbols to ensembl id.
When you build your GraphQL request, you can specify which fields you are interested in, so you can add disease name, even disease ancestors. Having the disease names can help you to find your disease of interest if you are querying with a fixed target. The documentation of the GraphQL schema chan be found here.
An example R implementation for retrieving associated diseases for a given target looks as follows:
library('jsonlite')
library('httr')
get_associations = function(target_id){
query_url = 'https://api.platform.opentargets.org/api/v4/graphql'
# Building query:
request_body = list(
operationName= 'TargetAssociationsQuery',
variables = list(
ensemblId= target_id,
index= 0,
size= 10000,
sortBy= 'score',
filter= '',
aggregationFilters = list()
),
query = '
query TargetAssociationsQuery($ensemblId: String!, $index: Int!, $size: Int!, $filter: String, $sortBy: String!, $aggregationFilters: [AggregationFilter!]) {
target(ensemblId: $ensemblId) {
id
approvedSymbol
approvedName
associatedDiseases(page: {index: $index, size: $size}, orderByScore: $sortBy, BFilter: $filter, aggregationFilters: $aggregationFilters) {
count
rows {
disease {
id
name
}
score
datatypeScores {
componentId: id
score
}
}
}
}
}
'
)
# Retrieve data:
response = POST(query_url, body=query, encode='json')
# Parse data:
char = rawToChar(response$content)
data = jsonlite::fromJSON(char)
# Extracting associations:
associations = data$data$target$associatedDiseases$rows
# Adding target and disease columns to dataframe:
associations$targetSymbol = data$data$target$approvedSymbol
associations$targetId = data$data$target$id
associations$diseaseId = associations$disease$id
associations$diseaseName = associations$disease$name
# Dropping unused columns and return:
return (associations[, c('targetId', 'targetSymbol', 'diseaseId', 'diseaseName', 'score', 'datatypeScores')])
}
target_id = 'ENSG00000065361'
get_associations(target_id)
To extract dataType scores for each association requires a bit of work with the dataframe, so you might want to use tidyverse.