I’m new to the platform (and to both GraphQL and SQL). I have a list of genetics variants as rsIDs, for which I would to retreive the table of “assigned genes” that is shown on the website when I query each variant individually. I understand that to do this, I will first need to map the rsIDs to the open targets variants IDs (e.g. 1_154453788_C_T).
I’m having a hard to working out how to do this from your documentation - perhaps I’m looking in the wrong place?! - and so would appreciate some advice on how I can do this or to be pointed in the direction of some tutorials and schema. From what I understand the best way to achieve this would be via the bigquery instance, but what the relevant tables and column names are seems like a mystery at the moment.
Thanks in advance.
for your query you would need to use 2 of our datasets:
- The variant index dataset. To map your rsIDs to our variant notation.
- The variant to gene scored dataset. To annotate your variants with the associated genes and their scores.
As always, if you have a large number of variants you want to annotate, I’d encourage you to use our datasets dump, and not the API, as it is easier to operate with. To get the variant information you can join your rsIDs with out variant index using the
alt_allele (we build our variant ID by concatenating these 4 columns). And to get the V2G scores, you can use our scored dataset to join the harmonised variant IDs with the columns
As explained in the documentation, you should be able to follow this approach either by downloading the data from the FTP or by using BigQuery.
I hope you find this helpful!
Thanks Irene, this is very useful. More generally, where is best to look in order to determine which variables are stored in which datasets? I have read the docs but perhaps have missed where a schema is conveniently available (i.e. outside of the being stored within the individual datasets).
Hi. I am doing the same thing and when I try to wget the variant index, I get a return that downloads the first few then says “Skipping directory ‘variant-index’.” and does not download the necessary data. Have you seen this before?
Hi @Melissa_B, I haven’t come across with this problem. Is it solved? Are you getting the data from the FTP?
I usually do something like
wget --recursive --no-parent --no-host-directories --cut-dirs 8 ftp://ftp.ebi.ac.uk/pub/databases/opentargets/genetics/latest//lut/variant-index/
Thank you for the suggestion. We are working on an optimised version of our pipelines where, among other significant changes, documentation of our datasets and the logic to generate them is a key component. Stay tuned!