Due to a problem in the ontology, a number of rare diseases are duplicated in the 22.06. EFO is currently undergoing a cleanup of the Orphanet terms to assign them to their corresponding MONDO id when appropriate. As part of this process, a large number of terms have been duplicated in the ontology.
Hi team! I wanted to thank you for the v22.09 update; it’s really great to see the additions and the clean-up of these duplicate terms!
However, it seems that while the diseases dataset has had the duplicate entries removed, the diseaseToPhenotype dataset has had entries with the retired IDs removed entirely, instead of updating/remapping to the new term.
For example, “Familial dilated cardiomyopathy” was one of those rare diseases with a duplicate entry of Orphanet_217607 and MONDO_0016333. Now only the MONDO ID is contained in the diseases data. But diseaseToPhenotype previously had over 60 phenotypes associated with the Orphanet ID. Now there are no phenotypes associated with either the MONDO or the (retired) Orphanet ID.
The related rare disease of “Familial dilated cardiomyopathy with conduction defect due to LMNA mutation” only ever had a single Orphanet ID (Orphanet_300751). This still exists in both the diseases and the diseaseToPhenotype datasets.
Please let me know if you want any more information about this issue!
But the underlying problem has not been fixed. There are still no phenotypes associated with all the rare diseases which were previously duplicated. As a test case, the ID for Familial Dilated Cardiomyopathy MONDO_0016333 has no phenotypes associated with it, when the previous ID of Orphanet_217607 did have phenotypes.
Hoping this bug can be reopened, and the rare disease phenotypes recaptured!