It would be very helpful for us to know when records in ClinVar were first submitted. This is important for us because the timing of emerging evidence is something we attempt to account for frequently. I expect that this falls outside of the province of typical Open Targets (OT) use cases, but I wanted to document the request anyways.
This timing is important for us for a number of reasons, and one that could eventually be important to other OT users is trend analysis. This is possible with a lot of evidence you all are collecting via linked publications, but clinical genetics associations stick out to me as one of the more important ones where using publications is neither possible (most of the time) nor necessary.
Including the submission dates for all the SCVs that go into RCVs then used to generate the gene+disease evidence records in OT would be a great way to do this. A useful summarization of this data might include the submitter and a summary of submission dates. We do this internally and note differences like this between trends in OMIM vs commercial testing centers:
This kind of analysis would be much easier if OT propagated more of that ClinVar information. It would also help all of us to better understand how to weight various sources of associations like this against OMIM, which is clearly waning in its ability to curate associations at the rate they are being observed [1].
There’s an ongoing effort in the background to date disease target evidence, which will hopefully open the possibilities for a wide range of useful applications. It’s not a straightforward process for many sources, and mostly rely on the date of the publication of the supporting paper.
For ClinVar, together with our partners at EVA, we decided to date RCVs by the first submission date (DateCreated of SCV). (see details here) Capturing all details from ClinVar is beyond our scope, however we believe this field is a good proxy for the novelty of the variant to disease relationship. This new field is expected to be available in the next release.
Hey @dsuveges, that’s awesome! We’ll certainly look for that in the next release then.
together with our partners at EVA, we decided to date RCVs by the first submission date (DateCreated of SCV)
This is how we’ve discussed doing it internally too FWIW. We have a request out to ClinVar support to see just how immutable those submission dates are and whether they’re relative to submission or their initial submission processing. I think the difference there should be trivial but I’m not certain yet – will report back if we hear anything unintuitive on it.
Hi Eric, It’s great to hear you were thinking about the same thing, it’s really reassuring. Please let us know what ClinVar responsed, we might revisit our plans accordingly.
They did mention that submission dates change when submissions are revised, as do the versions associated with them, so submission dates would be a very bad choice for finding the earliest dates attributable to an SCV/RCV. As you noted, DateCreated gets you closer to the emergence of a relationship and they shared 20220707_data_release_notes.pdf (page 3) as a particularly helpful reference. The most notable info this adds in addition to what’s already in Include submission or creation date in ClinVar evidence · Issue #393 · EBIvariation/CMAT · GitHub is this info for SCV dates:
SCV records:
First in ClinVar
This field has been been reported in the past as the date the SCV was created in our
internal database.
It was updated to represent the date the SCV record was first publicly available in ClinVar.
The July release has a bug that miscalculates this date as the first time the SCV record
was generated for public release, rather than actual date of public release; we expect this
bug to be fixed in the weekly release for July 17.
So I think it may be relatively recent that it’s possible to use this field in this way.