Hello there community,
I am required to retrieve publication dates for 10 proteins for which I am doing the following.
query_string = """
query targetAnnotation($ensemblId: String!,$cursor:String!) {
target(ensemblId: $ensemblId) {
approvedSymbol
literatureOcurrences(startYear:1989,endYear:2025,cursor:$cursor){
cursor
rows{
publicationDate
}
}
}
}
"""
I am passing the cursor value in a recursive function as this query yields just 25 dates at a time. I keep doing this until the cursor returns None. This I found out by reading couple of previous discussions.
The code works all fine but it is extremely slow, as you can see my time period is 1989-2025. The code ran for about 45 minutes and crashed. I was appending each return to a list. Btw, it was still on the 1st protein by 45 minutes.
I also read that large queries like this should be done in OT’s BigQuery, however in the ‘targets’ schema ‘literatureOcurrences’ is not available while rest of the others from GraphQL API are there.
Can someone help me with what I want to achieve? I basically need a dataframe with Proteins as column names and publication dates as row values (preferably). Other structures are also fine.
The Europe PMC or PubMed provide similar results on the fly but the results are different when you search with UniProt id or HGNC symbol or synonym. I guess OT search does a better job in this aspect.
Thanks in advance,
R