Getting "cannot resolve '`statisticalTestTail`'" error when processing evidence files for release 23.02

When trying to process the “evidence” files with the opentargets-etl-backend program, I run into an error I cannot figure out what to do with.

Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve '`statisticalTestTail`' given input columns: [alleleOrigins, allelicRequirements, ancestry, ancestryId, beta, betaConfidenceIntervalLower, betaConfidenceIntervalUpper, biologicalModelAllelicComposition, biologicalModelGeneticBackground, biologicalModelId, biomarkerName, biomarkers, biosamplesFromSource, clinicalPhase, clinicalSignificances, clinicalStatus, cohortDescription, cohortId, cohortPhenotypes, cohortShortName, confidence, contrast, datasourceId, datatypeId, diseaseCellLines, diseaseFromSource, diseaseFromSourceId, diseaseFromSourceMappedId, diseaseId, diseaseModelAssociatedHumanPhenotypes, diseaseModelAssociatedModelPhenotypes, drugFromSource, drugId, drugResponse, excludedBiotype, literature, log2FoldChangePercentileRank, log2FoldChangeValue, mutatedSamples, oddsRatio, oddsRatioConfidenceIntervalLower, oddsRatioConfidenceIntervalUpper, pValueExponent, pValueMantissa, pathways, pmcIds, projectId, publicationFirstAuthor, publicationYear, reactionId, reactionName, resolvedDisease, resolvedTarget, resourceScore, sex, significantDriverMethods, sourceId, statisticalMethod, statisticalMethodOverview, studyCases, studyCasesWithQualifyingVariants, studyId, studyOverview, studySampleSize, studyStartDate, studyStopReason, studyStopReasonCategories, targetFromSource, targetFromSourceId, targetId, targetInModel, targetInModelEnsemblId, targetInModelMgiId, targetModulation, textMiningSentences, urls, variantAminoacidDescriptions, variantFunctionalConsequenceFromQtlId, variantFunctionalConsequenceId, variantHgvsId, variantId, variantRsId]; line 1 pos 0;
'Project [datasourceId#123, targetId#676, alleleOrigins#101, allelicRequirements#102, ancestry#103, ancestryId#104, beta#105, betaConfidenceIntervalLower#106, betaConfidenceIntervalUpper#107, biologicalModelAllelicComposition#108, biologicalModelGeneticBackground#109, biologicalModelId#110, biomarkerName#111, biomarkers#112, biosamplesFromSource#113, clinicalPhase#114L, clinicalSignificances#115, clinicalStatus#116, cohortDescription#117, cohortId#118, cohortPhenotypes#119, cohortShortName#120, confidence#121, contrast#122, ... 59 more fields]

It seems some data-dependent error, but not sure how to check for that or how to fix…This is the 23.02 release data and the most recent JAR I compiled myself (15FEB2023 commit)

Hi @thondeboer!

Have you managed to resolve this? If not, please could you provide more details so someone on the team can take a look?

Thank you!

Hi @thondeboer,

This is due to a configuration issue from a missing key called data-sources-exclude for the evidence step. To fix this issue please run the ETL using the file platform.conf

Thank you!

1 Like