Is it possible to extract pubmed/PMC information based on a list of Gene symbols listed in Open Targets?
Hi @nic.coltman and welcome to the Community!
In our datasets, we maintain information on those publications that support our various annotations. However, the PubMed/PMC IDs are in different fields depending on the dataset you are browsing.
Could you please specify what kind of information you are interested in?
Thanks for the introduction - I’m glad to be a part of this community!
In essence, I wish to do a little text mining and I was hoping to take a subset of the tractability dataset (probably those with small molecule clinical precedence to start with as only small), and seek as many PMIDs/PMCs as possible. As the Open Targets datasets are a pretty well curated and annotated, I was hoping to use this to start with, although I realise that not every publication associated with a target is necessarily curated within the Open Targets database.
Thanks in advance,
Thank you for your reply. If I understood your message correctly, you are interested in pulling a list of publications that address a set of targets of interest. This is not the kind of information you will find if you work with the datasets, as we usually include the PubMed IDs that support a specific annotation. For example, which publications support the Panel App assessment linking PTEN to Cowden Syndrome.
If you are interested in how a target is reported in the literature, I encourage you to take a look at our bibliography widget (docs - the section describing the Literature-based similar entities is what you might be looking for).
The dataset that populates this widget and describes the occurrences of a given target in the literature is this one: Index of /pub/databases/opentargets/platform/22.04/output/literature-etl/parquet/matches/
Thanks for your question!