You need to download the whole directory and read all the partitions at once (no need for for loops. Each individual file on this directory is just a chunk of the data (used for parallelised computing).
You might also benefit from looking at the searchTarget[Index of /pub/databases/opentargets/platform/23.09/output/etl/parquet/searchTarget] dataset. This dataset feeds the Open Targets Platform search, but it contains a lookup with all the alternative IDs we use for every target. The data is derived from the target dataset, so it’s basically the same information but it might be in a friendlier format if you want to build mappings beyond the approveSymbol.
In our upcoming release there will also be a graphQL API endpoint to perform the same task:
Thanks for your advice. I figured that the error was happening for the following reason: I used read_parquet() from the ‘arrow’ parquet to read the parquet files. arrow::read_parquet( ) reads the parquet files into a tibble. I then used unnest() from tidyr to expand the list-columns . Apparently, some entries in this list columns might be NULL or NA. If so, when using unnest(), this entries will get dropped. To fix this behavior one must specify ‘keep_empty = TRUE’ inside unnest.