Field "targetFromSource" in JSONs from https://ftp.ebi.ac.uk/pub/databases/opentargets/platform/latest/output/etl/json/evidence/sourceId=europepmc

Hello!

I used Open Targets v21.04 in my research.
In particular I was interested in the evidence derived from Europe PMC. There was a field in JSONs available for downloading (Index of /pub/databases/opentargets/platform/21.04/output/etl/json/evidence/sourceId=europepmc) titled “targetFromSource”, which contained target name exactly as it is given in the corresponding fragment of publication. I’ve found this field very useful for my research.

Unfortunately “targetFromSource” field is absent in the later releases of Open Targets.

Is it possible to obtain somehow “targetFromSource” for the Europe PMC evidence from Open Targets v22.11?

Is there any chance that the “targetFromSource” field will return?

Thank you in advance, regards.

Hi Pavel,
we are working on implementing a new ingestion pipeline for ePMC data, which explains the absence of the “targetFromSource” field in the mentioned JSON file for 22.11.

You could get the info, however, from the keywordId and label fields - for all type GP (gene-protein). This dataset contains all entity matches in the literature. For the equivalent dataset restricted to sentences where target-disease are found in the same sentence (similar to evidence), look at the cooccurrences dataset instead in the same location. All these are available in FTP and Google Cloud Platform.

❯ gsutil cat gs://open-targets-data-releases/22.11/output/literature-etl/json/matches/part-00484-0f17b441-4b7d-410b-8f8b-c798d2aa9d7c-c000.json | head -1 | jq '.'
{
  "pmid": "21850163",
  "pmcid": "PMC3144728",
  "pubDate": "2011-01-01",
  "date": "2011-01-01",
  "year": 2011,
  "month": 1,
  "day": 1,
  "organisms": [
    "Homo sapiens",
    "DBA/2J mice",
    "human",
    "mouse",
    "animal"
  ],
  "section": "table",
  "text": "cDNA FLJ55673, highly similar to Complement factor B",
  "trace_source": "gs://otar025-epmc/22.01/full-text/NMP_PMC3140016_PMC3149998.xml_chunk11.xml.jsonl",
  "endInSentence": 52,
  "label": "Complement factor B",
  "labelN": "bcomplemfactor",
  "startInSentence": 33,
  "type": "GP",
  "keywordId": "ENSG00000243649",
  "isMapped": true
}

I hope this helps!
Best wishes,
Annalisa

Hi Annalisa,

This surely helps a lot!
Thanks for the quick and informative reply.

Kind regards, Pavel.