after setting up our API we found several values concering expression patterns which we can’t understand. I didn’t find any explanation in the documentation so I wanted to ask here, how are the levels on RNA and Protein calculated and can we use them to rank targets due to likelihood of expression?
Now about the specific fields you see in the API. For RNA expression:
value is a normalised TPM (transcipts per million) count for all transcripts of a given gene in a given tissue
level is a bin number (1 to 10), which is mentioned in the documentation above as “Binned value of expression”. If the value is -1, it means expression was lower than a threshold, and it was discarded
zscore is tissue specificity score, which is mentioned in the documentation above as “Tissue specificity”
For protein expression:
level is a categorical variable: 0 - Not detected/below threshold, 1 - Low expression, 2 - Medium, 3 - High
reliability is a technical flag passed on from the HPA data which reflects whether the value in the level field is reliable enough. You can discard values with "reliability = False`
So, in conclusion: yes, you could use the “level” field to rank targets by expression in a given tissue, but just keep in mind that this field has different ranges for RNA and protein expression.
Finally, in case you are interested in the fine technical details, here is the source code of the module which produces these datasets: GitHub link