Missing paper from text mining

Dear Open Targets team,

I found an issue with the literature text mining resource. Specifically, my question is why a specific paper was not found by Europe PMC text mining, where it seems to me it should have matched all of the required criteria. Details below.

If I search for PORCN, and then evidences for Epilepsy, I get no evidences, specifically from the Europe PMC data source.

However, by manually querying pubmed, I found a 2022 paper which mentions the ‘PORCN’ gene and ‘epilepsy’ within the same sentence, in the abstract (first sentence of the results sub-section):
Novel insights into PORCN mutations, associated phenotypes and pathophysiological aspects - PubMed (nih.gov)

My understanding is that text mining should have spotted this co-occurrence of the gene name and disease name within the same sentence. ‘PORCN’ is the canonical ensembl gene name, and the word ‘epilepsy’ is clearly spelled out within the sentence. Then, why was this paper not found by text mining?

Hi, This is a tricky issue. I took a look into this publication and apparently we have both matches and cooccurrences (and therefore disease to target evidence) based on the incriminated publication, even the sentence you were referring to. However, I could not find cooccurrence between epilepsy and PORCN. This is not expected and is something we need to investigate further.

Matches from this sentence:

+--------+--------------------+--------+-------------------+----+
|    pmid|                text| section|              label|type|
+--------+--------------------+--------+-------------------+----+
|35101074|We report two cas...|ABSTRACT|    periventricular|  DS|
|35101074|We report two cas...|ABSTRACT|developmental delay|  DS|
|35101074|We report two cas...|ABSTRACT|       microcephaly|  DS|
|35101074|We report two cas...|ABSTRACT|              PORCN|  GP|
|35101074|We report two cas...|ABSTRACT|           epilepsy|  DS|
+--------+--------------------+--------+-------------------+----+

Cooccurrences:

+--------+--------------------+--------+------+-------------------+
|    pmid|                text| section|label1|             label2|
+--------+--------------------+--------+------+-------------------+
|35101074|We report two cas...|ABSTRACT| PORCN|developmental delay|
+--------+--------------------+--------+------+-------------------+

To capture this problem, I have opened a ticket on our issue tracker, where you can follow the progression here.

Thanks for reporting.

1 Like