L2G JSON download for Open Targets Genetics

Hi,

I am unable to find Open Targets Genetics data in the JSON format. I am particularly interested in l2g and lut/study-index. I can only see parquet files here: Index of /pub/databases/opentargets/genetics/latest

Are there JSON files for these Genetics datasets?

Many thanks,
Sid

1 Like

Hi @Sid and welcome to our Community!

In Open Targets Genetics, we only export our datasets in Parquet format, rather than JSON. This is due to Parquet’s efficiency in storing columnar data, which is really relevant for handling large datasets like ours.

Although we don’t offer JSON files directly, the data in Parquet files is equally valid and comprehensive, and you can always convert Parquet to JSON. Let me know if I can help with your specific use case.

Best,
Irene

Thanks @irene for your reply. That makes sense. My use case is that I want to load this data into a Postgres database, and loading JSON data is easier and fast.

I can load parquet files in python using pyarrow, and from there using pandas, I can either load the data in the database or convert it into JSON. However, loading parquet files in python is extremely slow and I am wondering if there is a more efficient way.

Do you have any suggestions or ideas for my use case?

Kind regards,
Sid

Hi @Sid,

if you’re using Python, you should be able to load parquet files with Pandas directly by specifying the directory. Pandas uses Pyarrow as the backend to interact with Parquet and I didn’t find it slow. Bare in mind that you’ll be loading data into memory, however considering that you’re working with the L2G/study index datasets shouldn’t raise any problems.

I hope that helps!
Irene

Hi @irene ,

Thanks for the suggestion :slight_smile: I was not aware of the pandas function to read parquet files. I will give that a try and that should fix my problem. Many thanks for your help :slight_smile:

Kind regards,
Sid