I’m looking at the parquet files with L2G calculations.
Entry rows give study ids and genes, for example
FINNGEN_R5_C3_OTHER_SKIN_EXALLC → ENSG00000034677
My question is whether there’s a pre-mapped EFO term for the study id? What’s the best way, from the L2G study id, to retrieve the mapped EFO term?
This question was originally asked on GitHub and has been posted here so that answers can benefit the whole community of users.
1 Like
The mapping from study IDs to EFO terms is contained in the study table, which can be found in the files on the FTP site for a genetics release. For example, for the latest release:
http://ftp.ebi.ac.uk/pub/databases/opentargets/genetics/22.02.01/lut/study-index/
You can find this entry:
{"study_id":"FINNGEN_R5_C3_OTHER_SKIN_EXALLC","pub_date":"2021-5-11","pub_author":"FINNGEN_R5","trait_reported":"Other malignant neoplasms of skin (=non-melanoma skin cancer) (all cancers excluded)","ancestry_initial":["European=184388"],"ancestry_replication":[],"n_initial":184388,"n_replication":0,"n_cases":10382,"num_assoc_loci":25,"has_sumstats":true,"source":"FINNGEN","trait_efos":["EFO_0009260"],"trait_category":"cell proliferation disorder"}
In general, each study can be mapped to one or more EFO terms.