It seems that I can find some drug entities where the maximum clinical trial phase indicated (at the top) exceeds all the maximum phases across indications. E.g.:
BARASERTIB:
GANITUMAB:
Is the maximum clinical trial phase provided incorrectly for these cases? Or are there some pieces of information that I am missing when looking into this?
A mismatch may occur between the max phase for a drug and the phases broken down by indication because the data source is not of the same origin.
What we show as the maximum clinical trial phase is the result of a curation process by ChEMBL in which they check different databases containing information on drugs in different countries.
Here is a non-exhaustive list of the sources they use for data annotation:
USAN applications (US) (for molecule structures & list of clinical candidates)
INN applications (EU)
FDA Orange Book
ATC classification (additional source of approved drugs)
BNF (additional source of approved drugs)
FDA new drug approvals
However, indications are drawn from fewer sources, which can lead to discrepancies, as you rightly point out. In this graph, you can see that a significant part of the annotation of indications comes from clinicaltrials.gov data:
In conclusion, despite the discrepancy, I would not say that the overall maximum clinical phase is incorrect as it is the one that is most covered. ChEMBL is aware of this and they are working on improving the richness of their data to minimise these cases.
Thanks a lot for explaining this! Makes sense. Still might be a bit confusing to the users in the interface though, in the sense that the maximum clinical trial phase is not solely based on the indications listed. I would have indicated that the maximum phase is curated from a range of sources (and not only the indications listed, which is stated now) , but guess it’s not a big deal.