Dear Team,
I’d like to recommend introducing gene-disease association score from disgenet to OT platform. Looking forward to receiving some feedback from your internal discussion.
https://www.disgenet.org/dbinfo
Thanks.
Shicheng
irene
24 May 2022 13:20
2
Hi @Shicheng_Guo and thank you for the reference.
DisGeNET seems to be pretty much in line to our objectives and it is a very vast resource to explore. I’ve created a ticket in our issue tracker to keep track of this investigation:
opened 01:15PM - 24 May 22 UTC
Enhancement
Data
Platform
## Context
DisGeNET is a discovery platform containing one of the largest public… ly available collections of genes and variants associated to human diseases.
This aim is very much in line with that of Open Targets so it would be useful to explore their gene/disease associations dataset. This is an extract of their documentation:
> The gene-disease information in DisGeNET is organized according to the types of source databases:
CURATED: GDAs from UniProt, PsyGeNET, Orphanet, the CGI, CTD (human data), ClinGen, and the Genomics England PanelApp.
ANIMAL MODELS: GDAs from RGD, MGD, and CTD (mouse and rat data)
INFERRED: GDAs from the Human Phenotype Ontology, and GDAs inferred from VDAs reported by Clinvar, the GWAS catalog and GWAS db
ALL: GDAs from previous sources and from LHGDN and BeFree
## Tasks
1. Explore the overlap with Open Targets. Many of the sources are already in OT.
To access the data:
- I've uploaded a copy of their db dump here: `gs://ot-team/irene/disgenet/disgenet_2020.db.zip`
Evidence are in the `geneDiseaseNetwork`table. Entities are not mapped, however can be extracted:
- gene symbols are found in the `geneAttributes` table. These can be fetched by joining on `geneNID`
- disease labels are found in the `diseaseAttributes` table. These can be fetched by joining on `diseaseNID`. We can run OnToma on the labels. Alternatively, they make available a LUT table with cross-references between disease labels and different ontologies - available at `gs://ot-team/irene/disgenet/disease_mappings.tsv`
## Notes
- The database hasn't been updated since May 2020.
- Their data is open access under a Attribution-NonCommercial-ShareAlike license.
- For the case of the overlapping sources, it is important to mention that DisGeNet doesn't capture as much granularity as we do. For the case of PanelApp, for example, they have 20k d/t evidence which is a very similar number to what we have, however they don't report information on inheritance patterns as we do.
Many thanks,
Irene