Hello Open Targets Team,
I am working with the disease_hpo dataset, which is described in the documentation as containing Human Phenotype Ontology (HPO) information. However, when inspecting the dataset, the entries appear to reference CHEBI identifiers (e.g., CHEBI_10545, CHEBI_11596, etc.), which correspond to chemical entities rather than HPO phenotype terms.
Please refer to this image:
Hello! What you are seeing might be confusing, but not unexpected! Not all entries in HPO has HP prefix! HP imports terms from other ontologies like CHEBI. This is a general practice of ontologies where their scope are overlapping. Look up the term stigmastane derivative on Ontology Lookup Service (OLS) specifically looking at HP ontology. You’ll see CHEBI_131702, which is not only in HP but in for other ontologies! If you look at the data, you’ll see ~60% of terms have HP prefix. While the other terms are spread across many different prefixes:
+------+-----+
|prefix|count|
+------+-----+
| HP|19034|
|UBERON| 5624|
| GO| 2518|
| CHEBI| 1845|
| CL| 1177|
| PATO| 566|
| PR| 206|
| MPATH| 75|
| NBO| 64|
|HsapDv| 5|
| RO| 1|
+------+-----+
Thank you very much for clarifying!