A mismatch may occur between the max phase for a drug and the phases broken down by indication because the data source is not of the same origin.
What we show as the maximum clinical trial phase is the result of a curation process by ChEMBL in which they check different databases containing information on drugs in different countries.
Here is a non-exhaustive list of the sources they use for data annotation:
- USAN applications (US) (for molecule structures & list of clinical candidates)
- INN applications (EU)
- FDA Orange Book
- ATC classification (additional source of approved drugs)
- BNF (additional source of approved drugs)
- FDA new drug approvals
However, indications are drawn from fewer sources, which can lead to discrepancies, as you rightly point out. In this graph, you can see that a significant part of the annotation of indications comes from clinicaltrials.gov data:
In conclusion, despite the discrepancy, I would not say that the overall maximum clinical phase is incorrect as it is the one that is most covered. ChEMBL is aware of this and they are working on improving the richness of their data to minimise these cases.
Thank you very much for your question!