Clarification needed: disease_hpo dataset appears to contain CHEBI terms instead of HPO terms

Hello Open Targets Team,

I am working with the disease_hpo dataset, which is described in the documentation as containing Human Phenotype Ontology (HPO) information. However, when inspecting the dataset, the entries appear to reference CHEBI identifiers (e.g., CHEBI_10545, CHEBI_11596, etc.), which correspond to chemical entities rather than HPO phenotype terms.

Please refer to this image:

Hello! What you are seeing might be confusing, but not unexpected! Not all entries in HPO has HP prefix! HP imports terms from other ontologies like CHEBI. This is a general practice of ontologies where their scope are overlapping. Look up the term stigmastane derivative on Ontology Lookup Service (OLS) specifically looking at HP ontology. You’ll see CHEBI_131702, which is not only in HP but in for other ontologies! If you look at the data, you’ll see ~60% of terms have HP prefix. While the other terms are spread across many different prefixes:

+------+-----+
|prefix|count|
+------+-----+
|    HP|19034|
|UBERON| 5624|
|    GO| 2518|
| CHEBI| 1845|
|    CL| 1177|
|  PATO|  566|
|    PR|  206|
| MPATH|   75|
|   NBO|   64|
|HsapDv|    5|
|    RO|    1|
+------+-----+

Thank you very much for clarifying!