BigQuery data genetics credset table identifiers

Hi there,

I have a query regarding the Google BigQuery-public-data.open_targets_genetics. In the tab of variant_disease_credset table, I am not sure how to identify each credible set as there is not a unique identifier for them. I understand that the credset is a combination of variants that potentially have the causal variant for the phenotype studied. How can I identify the set of variants that belong to a specific credible set?

Also I am not sure what does tag variant refer to, and why there is not a p-value for the lead variant, should I look for it in another table? I am interested in finding the marginal and conditional p value for the lead variant.

Does the postprob_cumsum englobes the combined PIP for a credible set?

Thank you very much for your help!

Hi,

I would recommend not to use the data from bigquery-public-data.open_targets_genetics, as that resource is superseded by the new Platform, where the genetics data is integrated
into other Platform resources (and as such, the Genetics Portal has been shut down already). Please use bigquery-public-data.open_targets_platform, the most up-to-date Open Targets dataset, which is also available publicly on big query.

Colocalisation data can be found in two tables reflecting the colocalisation method used: colocalisation_coloc and colocalisation_ecaviar. These tables contain study locus identifiers, so you’ll be able to identify the credible sets in the credible_set table for more detailed statistics. For more information on the contents of these tables see the column and dataset descriptions on the Platform downloads page

Please let us know if there’s anything else we could help with!

.

Hi Daniel,

Thank you so much for your reply, it is very helpful. I will explore that data, it seems much more complete than the one I was using. I’ll let you know if I have any other questions.

Have a nice day

Best wishes,

Estefania