Query associated loci for a list of UKBB endophenotypes

esebesty · 2 September 2021 17:22

Hi,

I just downloaded the full Open Targets dataset in Parquet format, and set up sparklr with R. The demo script is working ok. However, I’m a bit lost how to do a systematic query with a list of UKBB endophenotypes, and get a data.frame or tibble with association information. Practically the “Association Information” table that is available, for example here, using " Red blood cell (erythrocyte) distribution width" as query.

Is there a more detailed description of the data structure somewhere or more detailed example scripts?

Thanks for any pointers!

ochoa · 3 September 2021 09:22

Hi esebesty,

It does look like you almost have the answer.

Something we are currently working on is the best way to communicate what fields are available (or relevant) for each dataset.

In the meantime, you can see what fields are available for each datasource in our json schema. You will be interested in the “ot_genetics_portal” dataset fields. The “projectId” will for example tell you which entries come from UKBB GWAS.

Keep in mind that what you can see through the Open Targets Platform are all the GWAS-significant loci with an L2G score > 0.05. This is the best dataset if your analysis focuses in potentially causal genes. If instead, your focus is on the actual signals you should probably consider accessing the genetics portal data directly.

I hope this helps

David

esebesty · 6 September 2021 07:32

Thanks, that was useful. However, I’m still missing some information. In particular, the “Credible Set Size” and “LD Set Size” columns from the study summary page, and the Gene Prioritization details, like “Variant Pathogenicity”, “Distance”, etc columns. I guess they are present in some other datasources. Thanks!

ochoa · 9 September 2021 09:54

The Open Targets Platform does not provide that level of granularity. Instead, you can find this information accessing the Open Targets Genetics data. More information on how to access this data can be found in the documentation.

Topic		Replies	Views
Accessing gene burden data through Open Targets Data Access	1	209	8 August 2022
Associated studies: locus-to-gene pipeline Data downloads datadownloads , genetics-portal	5	382	22 December 2021
Open Targets Genetics v8 now live! Releases genetics-portal	5	657	17 October 2022
How does the Open Targets Platform process data from Open Targets Genetics? Community Feedback genetics-portal , ot-platform	3	490	23 February 2023
GWAS lead variants via API GraphQL API	6	504	24 October 2024

Query associated loci for a list of UKBB endophenotypes

Related topics