The Genetics of What Makes Us Human
What in our genome makes our species unique and why has it been so hard to find?
Scholars of human origins are concerned with understanding why and how the traits that set humans apart from our primate relatives, like complex language and an exceptional capacity to invent and innovate, evolved. Understanding the genetic bases of these observable human traits, or phenotypes, could provide important clues as to when during our evolutionary history particular traits arose and under what conditions they spread. Since the sequencing of the human genome over 10 years ago, scientists have generated a deluge of genomic data. Nevertheless, the genetic basis of many traits of interest remains unclear. Why has uncovering what genetically defines our species proved so difficult and what do we actually know about the genetic foundations of what makes humans unique?
The Human Genome Sequence
February 2001 saw the culmination of a scientific endeavor hailed as the most significant achievement of the era: the sequencing of the human genome. This long-clamored-for undertaking was predicted to swiftly reveal the genetic bases of most human diseases, revolutionizing medicine, as well as divulge the genetic underpinnings of the traits distinguishing humans. But, even from the moment of its publication, the sequence seemed to offer more questions than answers. One of the most surprising initial findings was the identification of far fewer genes in the human genome than expected: researchers predicted the genome to contain ~100,000 genes, yet both the private and public research teams found fewer than 30,000. This finding defied the logic of the time holding that humans, as highly complex organisms, should have several times more genes than “simple" organisms like the fruit fly, Drosophila, for which the number of genes in the genome was known to be ~13,000. Yet, biologists were left to ponder the reality that humans have approximately the same number of genes as the roundworm. This low gene count also meant that only ~1.5% of the genome’s 3.2 billion nucleotide bases comprised protein coding genes.
Although humbled, scientists were eager to mine the prized sequence for genetic variants responsible for human phenotypes, particularly medical conditions. The most widespread tool used to do so was Genome-Wide Association Studies, or GWASs. Performing a GWAS involves sequencing genetic variants from across the genome in a good number, usually hundreds, of people with a similar genetic background. These variants, known as Single Nucleotide Polymorphisms, or SNPs, may covary with traits of interest. If so, an “associated” SNP can direct scientists to the “causal variant” that forms the genetic basis of that trait (for more details on how GWAS work, see Box A).
Hundreds of GWASs have been performed following the sequencing of the human genome probing the basis of all kinds of traits, from diabetes to psychiatric disorders. The hope was that GWASs would uncover the genetic basis of common complex human diseases the way earlier genetic mapping methods like pedigree studies and the use of genetic markers like restriction fragment length polymorphisms (RFLPs) had for single-gene Mendelian disorders. However, after nearly a decade of GWASs, the proportion of genetic variance accounted for by associated variants for even the most heritable of human traits is extremely low. For example, in the case of height, a trait that shows ~80% heritability in humans, only ~5% of the variance has been accounted for in GWASs. This puzzling outcome has been dubbed the “missing heritability” problem. This impression of missing heritability is certainly due in large part to overly simplistic assumptions of underlying genome biology used in GWASs, as well as their omission of rare variants and difficulties in distinguishing variants of small effect from false positives (see Box B).
Just as the genome sequence failed to provide scientists and physicians with a roadmap of consequential genetic variance in living humans, it has hardly yielded a catalog of all the genetic variants encoding the traits that set humans apart from our primate relatives. Of course, the genomes of other species are needed for comparative evolutionary analyses and, following the publication of the human sequence, biologists pushed for and succeeded in sequencing the genomes of the mouse in 2002; our closest living relative, the chimpanzee, in 2005; and the rhesus macaque, a primate often used as a human model in biomedical research, in 2007.
What these efforts most strikingly demonstrated was the extremely high level of sequence identity, or genetic similarity, among humans and other mammals. The high levels of genetic similarity between humans and chimps that geneticists previously estimated based on protein sequences were confirmed in comparisons of the two species’ genomes, with current estimates of genomic identity ranging from 96%-99% (see Box C). "Comparative genomics” analyses have also revealed how generally evolutionarily conserved, or unchanging, proteins are across animals. For example, the catalog of protein-coding sequences in the human and mouse genome only differ by ~300 genes despite at least 75 million years of independent evolution for both species. Further, the genes found in one species that are missing in another tend to belong to large “gene families,” or groups of highly similar genes that grow and differentiate by copying themselves within the genome. Such observations have led scientists to the conclusion that species are not distinguished by the possession of different genes.
So if species largely possess the same genes and these genes differ by very few DNA changes, what makes species different?
Although the chimpanzee and human genome contain almost all the same genes, scientists have discovered several notable cases of gene inactivation along the human lineage. Gene inactivation occurs when a gene mutates in such a way that the sequence can no longer be read by the cell. Although the gene sequence still exists in the genome, the gene is no longer functional and is referred to as a pseudogene. Scientists generally think that this kind of mutation will only spread in a species if the loss of the gene’s function is either neutral or beneficial.
The very first confirmed fixed genetic difference between humans and chimpanzees identified was the deactivation of the CMAH gene, which encodes an enzyme necessary to produce a form of sialic acid molecule found on the surface of most mammalian cells. These molecules are only found at very low levels in the brains of other species, which prompted researchers to speculate that their complete absence in humans could have been beneficial for brain development, but the exact impact of the deactivation of this gene remains unknown.
Several other examples of genes that have been deactivated in humans since the split with chimpanzees are now known, including MYH16, which encodes a component of the muscles of the jaw. The loss-of-function mutation was dated to be around 2.4 million years old, or the approximate age of the emergence of the genus Homo, which led anthropologists to reason that its loss could have co-occurred with the emergence of habitual human food processing and, as a consequence, the end of the need for a muscular chewing apparatus. This hypothesis was particularly appealing because one of the anatomical features distinguishing members of the genus Homo from earlier hominins is a more delicate jaw. The scientists that discovered the deactivation of MYH16 further argued that the decreased musculature of the face and cranium that this change produced allowed for the expansion of the braincase in early Homo and thereby enabled the increase in brain size observed in our genus.
However, a later study comparing a larger portion of the gene sequence across more species concluded that the deactivating mutation in the human lineage occurred much earlier—at approximately 5.3 million years ago. This change long predates the appearance of reduced jaw size in the human fossil record and raises the question of what the true phenotypic impact of the deactivation of this gene could have been. Other researchers have raised doubts as to whether the loss of this gene’s function would even have had a large impact on muscle mass, as muscle fiber density is highly plastic and activity-dependent.
Another gene whose function was lost in humans that could be linked to a human-specific phenotype is KRTHAP1, a gene that encodes a type of keratin expressed in chimpanzee hair follicles. The deactivation of this gene is dated to ~240,000 years ago and though the precise consequence of the loss of this protein remains uncertain, it is easy to imagine that it could have contributed to humans’ drastic reduction in body hair relative to other primates.
Copy Number Variation
In addition to gene loss, differences in the number of copies of a gene in the genome can affect phenotype. A major finding in the new era of genome biology is just how widespread this type of variation is. More copies of a gene generally means more of the gene’s product, which can alter anatomical structures or physiology.
For example, humans have gained a copy of SRGAP2, a gene involved in the growth of neurons during development. Researchers believe that this change results in the greater density of dendritic spines over a certain length observed in the neurons of the human neocortex. These structures are what allow contact among neurons, so this increased length in humans may boost neocortical connectivity. And humans have several more copies than chimpanzees of AQP7, a gene that contributes to the conversion of fat to energy. The impact of the human-specific duplications of AQP7 is uncertain, but biologists have speculated that they may contribute to humans’ impressive capacity for endurance exercise.
Gene deactivation and copy number variation are pretty obvious changes, making them relatively straightforward to identify. The task of recognizing meaningful changes to functional protein sequences is much more difficult. To return to the question of where the differences between species lie when so much of their genomes are identical, while the average chimpanzee and human may share over 98% of their DNA, this still translates to over 35 million single nucleotide variants, or SNVs, and 5 million “indels” (sequence that is “inserted” or “deleted” in one species’ genome compared with the genome of the other species). Although most of these changes do not occur within protein coding genes, ~80% of genes differ by at least one amino acid substitution between humans and chimps. But most of these changes will either have no or a negligible effect on protein structure and function.
This leaves scientists to search for consequential changes by scanning genes and the DNA sequence surrounding them for evidence of natural selection. One of the most straightforward ways to do this is to see whether a sequence shows more changes than would be expected by chance, as determined by the average number of differences in other parts of the genome. If so, this offers evidence of “accelerated evolution” and may translate to phenotypic evolution.
One of the most famous examples of accelerated gene evolution in humans is that of FOXP2. Abnormal disruption of this gene can cause speech and other linguistic disabilities, suggesting that FOXP2 function is essential for normal human language. Intrigued, scientists studying this gene discovered that it is highly evolutionarily conserved, meaning that nearly all vertebrates have the same version of this gene. Such a high degree of similarity across the tree of life usually indicates that a gene is critical for essential functioning and therefore cannot afford to evolve. But despite remaining essentially unchanged for around 500 million years, the human FOXP2 gene contains three amino acid changes since our evolutionary lineage split from that of the mouse. Even more fascinating is that two of these changes are entirely human-specific, meaning that they occurred just in the last ~6 million years since our evolutionary lineage diverged from chimpanzees. All this offers strong evidence that there is something special about the human version of FOXP2 and that it might contribute our unique language abilities.
FOXP2 is a transcription factor, meaning that the protein it encodes regulates the expression of other genes. Most transcription factors regulate many genes and FOXP2 appears to be no exception. One of the genes FOXP2 targets is CNTNAP2, which is also associated with language development and disabilities. Laboratory studies have demonstrated that SPRX2, another gene regulated by FOXP2, is active in synaptogenesis, or the growth of neural connections in brain.
Scientists seeking to illuminate the specific function of the human version of FOXP2, created mice with the human gene sequence. These mice were completely healthy, but the pitch of their squeaks was altered. Anatomically, they showed greater synaptic plasticity and a modified shape of the neurons in the striatum, a brain region involved in fine motor control. These findings offer amazing experimental support for FOXP2’s role in language and cognition, as well as offer tantalizing clues as to how FOXP2 functions mechanistically.
At least two genes associated with brain development and size show evidence of accelerated evolution on the human lineage: MCPH1 and ASPM. Disruptions in these genes are associated with primary microcephaly, a condition in which the size of the brain, and particularly the cerebral cortex, is greatly reduced. Some scientists hypothesize that microcephaly is essentially a reversal of the human brain to an ancestral state, implying that changes to MCPH1 and ASPM during hominin evolution caused the three-fold enlargement of the human brain that occurred since humans' and chimpanzees’ ancestors diverged. However, detailed comparative studies of these genes' sequences in humans and a large sample of other primates suggest that MCPH1 and ASPM have been evolving rapidly in several primate lineages, not only humans, obscuring their true role in human brain evolution.
Since whole genome sequences became available, several studies have searched the entire genomes of humans and chimpanzees to identify many potentially positively selected genes at once. Genes continually found to be enriched, or overrepresented, in these studies are involved in gametogenesis (sex cell production), sensory perception, immune response, and transcription. Although offering intriguing glimpses into our evolutionary past, these studies generally do not explore the functional outcome of the observed changes, which limits our ability to clearly link these genetic changes with specific phenotypes.
The repeated finding that transcription factors are among the genes evolving the most quickly among species offers us an intriguing clue. Because, like FOXP2, transcription factors regulate the expression of other genes, changes in these genes may have widespread downstream consequences. In 1975, Mary-Claire King and Alan Wilson published a paper destined to became a classic in molecular biology hypothesizing that changes in gene regulation, or when and where genes are turned on and off, were responsible for a greater proportion of observed difference between humans and chimpanzees than changes to the encoded proteins themselves. Over the last nearly 40 years, this hypothesis has continued to receive consideration and attention, especially in evolutionary developmental biology, AKA “evo devo,” a recently expanding branch of life science.
The idea that regulatory changes could affect major differences among organisms is based on knowledge gained in recent decades of how transcription factors orchestrate the complex and intricate process of development. Fertilization of an egg by a sperm sets off a chain reaction of transcription factor activity, resulting in growth factors being turned on and off at just the right time, in just the right sequence, and in the just the right part of the embryo. It is easy to imagine that slight changes in transcription factor activity during the dominos-like process of development could result in dramatic changes to the anatomy of the animal.
Many of the researchers that emphasize the evolutionary importance of this kind of regulatory change during development believe that changes to the transcription factor gene sequences themselves are not as critical, though, as are changes to their binding sites, the stretches of noncoding DNA that act as genomic anchors for transcription factors. These binding sites are often in the vicinity of the genes that the transcription factor regulates and are comprised of a just a few nucleotide bases. Transcription factors possess a DNA binding domain complementary to the binding site, allowing them to attach and affect the nearby gene’s expression, often either by attracting or blocking the cellular factors necessary for transcription. As a result, change to transcription factor binding sites may provide an evolutionary mechanism to “fine tune” gene expression.
Despite the logical appeal of phenotypic evolution through regulatory change, few empirical examples offer convincing evidence for this process. This is certainly the case at least partly because transcription factor binding sites, as noncoding DNA, are simply more difficult to study currently than protein coding sequences. We have “codon” tables to tell us when a nucleotide base change results in an amino acid change and pretty good models for predicting how an amino acid change will affect protein structure. When it comes to binding sites, we usually cannot even recognize where one occurs. Luckily, this is changing. Scientists have recently performed experiments with many transcription factors to identify where across the genome they bind.
Another clever way to identify binding sites of potential evolutionary significance involves comparing the noncoding regions of the genomes of many species to find blocks of largely conserved, or unchanged, sequence. A lack of change over lengthy evolutionary timescales indicates critical function in noncoding regions even more so than in coding regions, as changes to noncoding DNA are generally expected to occur more freely. Several research groups searched through regions demonstrating great similarity across the tree of life for sequences showing human-specific changes and, by extension, functional differences distinguishing humans from all other species.
One such functional noncoding region found to be evolving very quickly in humans, named HACNS1 (Human Accelerated Conserved Non-Coding Sequence 1), is practically identical in all land-living vertebrates, but sustained 16 base changes just in the time since the human-chimpanzee divergence. Researchers created strains of transgenic mice carrying different versions of this sequence: one with the human version, one with the chimpanzee version, and one with the rhesus monkey version. By inserting the sequence upstream of a reporter gene, which produces a visible substance in the tissue where it is expressed, the researchers were able to see whether the different versions of the sequence regulated the gene's expression differently. All three versions of the sequence produced gene expression in the ears, eyes, and limbs during embryonic development, but the human version was distinguished by driving much higher expression levels throughout more of the limb. Although this study did not reveal the natural role of this sequence in human development, its activity in the limbs is fascinating given the uniqueness of human hands and legs.
Since this study was published, over 60 other binding sites showing human-specific rapid evolution have been experimentally demonstrated to be involved in embryonic development, offering a considerable amount of empirical evidence for the evolution of regulatory change during development. Binding site loss can also have a major impact on phenotype, with a few cases now known of in humans. For example, humans lack a binding site found in chimpanzees in the vicinity of the AR gene, which encodes the male sex hormone receptor. This binding site drives expression of AR in the genitalia of chimpanzees, and its loss in humans appears to account for the absence of penile spines—bristle-like structures common in primates and other mammals—in human males. Another missing binding site in humans flanks GADD45G, a gene known to be active in tumor suppression. In chimpanzees, this binding site inhibits expression of GADD45G in the forebrain. Tumor suppressors often limit cell growth and scientists hypothesize that the loss of this sequence in humans could enable increased neuronal growth in these brain regions.
The Other Nucleic Acid
Another kind of noncoding DNA that has tuned out to be important is noncoding RNA genes. RNA has long featured in molecular biology as a kind of intermediary molecule, carrying information transcribed from the DNA to be translated into a protein. While this kind of messenger RNA (mRNA) is important, it turns out that noncoding DNA, as well as protein sequences, are transcribed into RNA molecules. One of the unanticipated discoveries following the sequencing of the genome was “ubiquitous transcription,” or the observation that, although only ~1.5% of the genome encodes proteins, nearly all of it is transcribed by cellular factors into RNA molecules. Together with improved techniques for sequencing RNA molecules, this finding has led to the naming of many new “species” of RNA, including microRNA, piwi-interacting RNA (piRNA), small interfering RNA (siRNA), and small nuclear RNA (snRNA). Unlike the familiar RNAs involved in protein synthesis, most of these recently-discovered RNAs seem to be involved in gene regulation, either by enabling or blocking gene transcription, or by interacting with mRNA transcripts.
HAR1 (Human Accelerated Region 1), the first noncoding RNA “gene” discovered to have undergone extremely rapid evolution on the human lineage, only accumulated two base changes in the 300 million years since the ancestors of chimpanzees and chickens diverged, but 18 differences in the ~5 million years since humans diverged from chimpansees. Using microscopic fluorescent tags designed to glow when HAR1 was transcribed, scientists were able to localize the expression of HAR1 to a certain class of neurons in the cerebral cortex during embryonic development. That a genetic feature so distinctive in humans could contribute to human brain development is intriguing, although clues as to the precise effect of HAR1 on the cerebral cortex are still being sought.
How Far Have We Come?
A little over a decade after the first draft of the human genome was published, distinctive human genetic traits of many different, and sometimes unexpected, kinds have been identified. But an emerging pattern is that we cannot seem to find the exact genetic basis for the phenotypes we are interested in, and neither can we seem to link the genetic changes we do find to clear, specific phenotypes. The first issue has led some to call the sequencing of the genome a disappointment. Hindsight, however, makes the expectation that the genome would be a biological “Rosetta Stone” seem naive. Of course, it has provided us with a multitude of hieroglyphics, but no means by which to translate them. The “decoding” of the genome will only be obtained through years of basic science. Clearly, as the above discussion of RNA illustrates, genome biology turned out to be much more complex than biologists anticipated. It’s worth keeping in mind that, although there were far fewer genes in the genome than expected, we still do not know the function of many human genes. But the knowledge we’ve gained over the past decade about genomic processes and architecture has been astounding and was obviously only possible using the genome sequence as a tool.
Progress in demonstrating the phenotypic consequence of human genetic traits is likely to be slower than we would like. This is because the kinds of experiments that can clearly reveal the impact of genetic changes are unethical to perform in humans. We cannot genetically engineer human embryos to carry the ancestral form of a gene or regulatory region to see what phenotypic changes are produced. As a result, many scientists have focused on traits that naturally vary in living humans. Indeed, this area of research has flourished since the sequencing of the genome, revealing much about notable human traits including lactase persistence (which gives some people the ability to digest milk past infancy), malaria resistance, tolerance for low oxygen conditions, and starch digestion, all of which appear to have evolved under positive selection.
But to get at the shared traits that define us as a species, researchers are left to either rely on abnormally occurring phenotypes, like microcephaly, or perform experiments in cell lines and model organisms like mice, as was done for FOXP2. Although the relevance to human phenotypes of results obtained working with lab animals is not always straightforward, as experimental techniques are ever becoming more sophisticated, we should be optimistic about near future findings. Although much challenging work lies ahead, we are undeniably entering a golden age of discovery for human genomics.