Extracting PubMed/PMC information from the Open Targets Platform for a list of gene symbols

Is it possible to extract pubmed/PMC information based on a list of Gene symbols listed in Open Targets?

Hi @nic.coltman and welcome to the Community!

In our datasets, we maintain information on those publications that support our various annotations. However, the PubMed/PMC IDs are in different fields depending on the dataset you are browsing.
Could you please specify what kind of information you are interested in?


Hi Irene,

Thanks for the introduction - Iā€™m glad to be a part of this community!

In essence, I wish to do a little text mining and I was hoping to take a subset of the tractability dataset (probably those with small molecule clinical precedence to start with as only small), and seek as many PMIDs/PMCs as possible. As the Open Targets datasets are a pretty well curated and annotated, I was hoping to use this to start with, although I realise that not every publication associated with a target is necessarily curated within the Open Targets database.

Thanks in advance,

Hi @nic.coltman,
Thank you for your reply. If I understood your message correctly, you are interested in pulling a list of publications that address a set of targets of interest. This is not the kind of information you will find if you work with the datasets, as we usually include the PubMed IDs that support a specific annotation. For example, which publications support the Panel App assessment linking PTEN to Cowden Syndrome.

If you are interested in how a target is reported in the literature, I encourage you to take a look at our bibliography widget (docs - the section describing the Literature-based similar entities is what you might be looking for).
The dataset that populates this widget and describes the occurrences of a given target in the literature is this one: Index of /pub/databases/opentargets/platform/22.04/output/literature-etl/parquet/matches/

Thanks for your question!