Missing targetClass id in 23.06+ targets ( json and parquet )

The targetClass id in 23.06+ targets are all invalid values of 0.

For example, you can confirm the problem with the json content below.


In 23.02- targets, It correctly contains a non-zero targetClass id.

I understand that the targetClass id comes from ChEMBL.
I am currently checking whether the targetClass id on ChEMBL has been lost.
But the download is very slow …
Please, asked you before my check, is there a problem with data conversion?

Hi @mnagaku and welcome to our Community!

Thanks for bringing this up! You are right in that this annotation is coming directly from ChEMBL. I took a closer look into this issue, and it turns out that the targetClass IDs being zero isn’t something we introduced. This ID corresponds to the _metadata.protein_classification.protein_class_id field from this ChEMBL dataset and it has been set to zero since ChEMBL32.

I’m reaching out to the ChEMBL team to check if this was intentional or if something changed on their side. I’ll keep you posted once I hear back from them.

Thank you!


@mnagaku I forgot to add that if you’re interested in extracting target classes, even though this ID is constant, you can use targetClass.label. I hope that is useful!

1 Like

I would like to use IDs instead of labels for data analysis reasons.