What’s Missing in Missing Heritability?
“The issue is basically that there are traits where patterns of inheritance within the population strongly imply that most of the variation is due to genes, but attempts to ascertain which specific genetic variants are responsible for this variation have failed to yield much. For example, with height you have a trait which is ~80-90% heritable in Western populations, which means that a substantial majority of the population-wide variation is attributable to genes. But geneticists feel very lucky if they can detect a variant which can account for 1% of the variance.” - Razib Khan
Many common, chronic diseases have a significant genetic component, including things like type 2 diabetes, Crohn’s disease, rheumatoid arthritis and even obesity. After these diseases were predicted to be strongly heritable, they became targets for genetic research. However, this research has had far fewer successes than failures where the ability to discern the underlying genetic architecture of complex traits is concerned. What is missing heritability, where has it gone, and how can we find it?
Starting from Somewhere
The human genome is big, about 3 billion base pairs. In terms of genetic research, that means it’s not feasible to say, “I’m going to study cancer!”, pick a place, and start looking. Instead, for the past decade or so scientists have been correlating diseases and regions of the genome, building a map using just two primary tools: Genetic markers from a wide range of genetic regions, and a tool to measure the correlation of a particular region with a disease of interest. That correlation tool has most commonly been a Genome-Wide Association Study, or GWAS, although recently other methods are gaining traction.

All GWAS studies do is provide a correlation hypothesis. When a GWAS study is run, thousands of samples - in the study of diseases, usually cases and controls - are genotyped on an SNP array. After the genotyping, statistical tests (like the Pearson chi-squared test) can give the probability that there is no correlation between a particular SNP and the disease state. If the probability of no correlation is very low, that particular genetic region is added to a list of associated SNPs.
What’s the problem? First, GWAS provides no evidence of causation. That means that it’s possible that the correlation observed is not actually with the disease we’re interested in, but with some other variable that happens to align well with the disease one - a confounding factor, if you will. Although this can be be avoided somewhat through massive sample size (which in itself poses problems) all GWAS can ever provide is a measure of how well something correlates with something else. That brings me to the second - and probably more important - problem: The list of associated SNPs for most diseases can only explain a fraction of predicted heritability.

Figure of a classic Manhattan plot from a GWAS study on pancreatic cancer published in Nature Genetics.
That, fundamentally, is the crux of the problem - why isn’t GWAS picking up more heritability than it is? Is it “missing heritability” in that we don’t know enough about our genomes to deduce where it comes from - or do we simply know of too few variants to explain complex disease?
Maybe the Model’s Wrong: The Profound Impact of Considering Epistasis
The easiest conclusion to draw, of course, is the latter - that GWAS hasn’t been done enough yet, and we haven’t found all the variants that explain heritability. While that could easily be true, I’m skeptical. I see genes as the starting point - a massive book of code, if you will, that makes our whole system run. Belabouring the book analogy, some people can read books better than others; those with learning difficulties, for example, will read slower, and will make more mistakes, and those with dyslexia may misread whole words, which can change the meaning of an entire sentence or paragraph. Of course, the words on the page haven’t changed - but the ability to read them effectively is somewhat altered. The same is true, I think, of genetics in the human system; just because the underlying code is “right” doesn’t mean that the body can read it, interpret it, and act on it correctly. The pathway from genes to disease is long and complex, and I think that often overmuch emphasis is placed on genetics and not on the multitude of steps that happen afterward to elicit the diseased phenotype. Remember that disease is not the result of genetics directly - it’s (usually) the result of the proteins encoded by those genes. Reading into things like epistasis, RNA editing, and epigenetics has enhanced my skepticism.
In a recent PNAS paper, Eric Lander has discussed epistasis in the context of missing heritability. He says that the additive model that has been adopted in the study of complex disease genetics can’t be exactly right - that in context of this missing heritability problem, the additive effects of associated SNPs aren’t explaining everything because an additive model doesn’t take complex interactions into account. He postulates that if the interactions between genes are considered, the total heritability is much smaller than anticipated and thus the percentage of what identified variants explain becomes larger.
However, completely denying the additive risk model poses its own problems. That’s why Lander and his fellow authors haven’t denied it; they’ve extended it by introducing the limiting pathway (LP) model, which reduces to the original model for certain additive traits but provides flexibility for complex interactions to exist among other traits. The paper itself says it best:
“In short, genetic interactions may greatly inflate the apparent heritability without being readily detectable by standard methods. Thus, current estimates of missing heritability are not meaningful, because they ignore genetic interactions.
The results show that mistakenly assuming that a trait is additive can seriously distort inferences about missing heritability. From a biological standpoint, there is no a priori reason to expect traits to be additive. Biology is filled with nonlinearity: The saturation of enzymes with substrate concentration and receptors with ligand concentration yield sigmoid response curves; cooperative binding of proteins gives rise to sharp transitions; the outputs of pathways are constrained by rate-limiting inputs; and genetic networks exhibit bistable states.”
So Why Not Epigenetics?
DNA has also been shown to have environmentally-driven plasticity. Epigenetics - or the study of heritable changes in gene expression or cellular phenotype that don’t involve changes in the underlying DNA sequence - is another crucial place where heritability could be missing. It’s quite easy, in theory, to conceive of a modification like DNA methylation or histone modification that could alter gene expression without affecting the underlying nucleotide sequence of the genome - however, many scientists doubt that’s where missing heritability is hiding.
So why is epistasis in and epigenetics out? In order to answer that, it’s easiest to look at a variety of mathematical models published on the subject. Epistasis very nicely helps to explain the familial clustering of diseases with a large genetic component; epigenetics, according to many models, does not. Transgenerational epigenetic inheritance is often modeled through the gain and loss of DNA modifications at specific rates, with each modification contributing multiplicatively to disease risk. Because of this, epigenetics does not help to explain much of heritability; the short life-span of these modifications, so to speak, minimises their effect on heritability, and if they do persist they are likely to be in linkage disequilibrium with SNPs that have been identified in GWAS studies.
Conclusions: Missing Heritability Today
The amazing thing about genetics, to me, is how much gray area there still is. I mean, it’s not in every subject where a paper is published that changes the entire research paradigm of a subset of scientists. Do we know where missing heritability comes from? Of course not, otherwise it wouldn’t still be missing. Do we know where to start looking? Well, after the PNAS paper and other research, yes - I think revising the current model to incorporate nonlinear interactions between genes and seriously look at epigenetics’ implications in recurrence risk is a good place to start. I also think that computation - and particularly experimental translation into computational tractability - will play a huge part in modern genetics, because the two are inherently liked through the generation and analysis of massive amounts of data. Can we ever find the heritability we’re missing? Maybe. But I think to do that we would need to know the ins and outs of inheritance and the stuff that makes up our genome much better than we currently do.
I’ll leave you with this thought: The PNAS paper mentioned here is a highly controversial one, but I think that it’s important to emphasise it because so often in mathematical modeling the data is crafted to fit a pre-existing model and not the other way around. Mathematical modeling can be extraordinarily useful in biology, especially genetics, provided it doesn’t hold us back. If our models stop fitting the data properly, it’s time to adapt them - and thus, having tried the simplest additive explanation, I think the consideration of nonlinear epistatic interactions in our genome is a step forward in the search for missing heritability - or perhaps realising heritability isn’t missing at all.

All references used are cited in-line, including the PNAS paper mentioned and an excellent epigenetic mathematical model that made for an interesting read.
