What Does the Human Genome Tell Us?
Why is it important to apply scientific discoveries to real-world problems?
The completion of the Human Genome Project was just the beginning of genomic study. As with so many things in the realm of science, more questions begged to be answered. Although we knew the sequence of bases in the human genome, what did it mean? What can the human genome tell us? How does it compare to other genomes? How can the HGP make a difference in our lives?
It always feels good to reach a goal, right? But there’s more to reaching a goal than the feeling of satisfaction. A good scientist considers what use their discoveries can be to the human race, to the environment, and to other species. The impact of the HGP is no different.
Once the HGP provided the sequence of nucleotide bases in the human genome, the next step was to interpret it so we could understand what it all means. A process called gene annotation takes the raw DNA sequence and identifies the locations of genes and other coding regions in a genome. It also attempts to determine what those genes do.
Gene annotation involves three main steps.
•First, scientists identify the noncoding regions of the genome. This step is important because it limits the areas of the genome that scientists need to analyze and focuses their efforts on the essential coding sections.
•The next step is identifying elements on the genome, a process called gene prediction. Gene prediction identifies regions that encode genes, such as those that code for proteins and other regions that regulate gene activity.
•Finally, scientists link biological information to these elements.
Annotating the human genome is an extremely complicated task! Before the HGP started, scientists estimated the human genome had between 40,000 to 100,000 genes—maybe more. After the project was finished, that estimate dropped to between 20,000 to 25,000 genes.
Scientists also discovered that only about 1 percent of human DNA is made up of protein-coding genes. The other 99 percent of the genome is noncoding.
Scientists once believed that noncoding DNA was junk and had no purpose. However, they have learned that some noncoding sequences control gene activity. For example, some noncoding DNA controls where and when genes are turned off and on. Other areas of noncoding DNA hold the instructions to make RNA molecules, which act as messengers carrying instructions from DNA to the cell. At least some noncoding DNA isn’t junk at all! We don’t know yet what might be revealed about other noncoding DNA.
Even within a single human gene, there are coding and noncoding regions. The human genome consists of small segments of DNA called exons that code for proteins.
The lengths of exons and introns vary greatly. A gene may have 10,000 bases, but only about 1,000 bases are actually part of the coding sequence.
In addition, the distribution of genes on a chromosome is not uniform. Some regions may have a large number of genes, while other regions on a chromosome have far fewer. These are just some of the factors that make identifying genes in the human genome so difficult.
The HGP produced the sequence of the 3 billion pairs of nucleotide bases in the human genome. The sequence gave scientists a blueprint of the genome. Then, they had to figure out how the blueprint worked.
In September 2003, National Human Genome Research Institute (NHGRI) launched a project named the Encyclopedia of DNA Elements (ENCODE). The goal of ENCODE was to identify and describe all of the functional parts of the human genome. By doing so, scientists hoped to gain insight on how the genome actually worked.
During the next decade, hundreds of researchers from labs worldwide linked more than 80 percent of the human genome sequence to specific biological functions. They mapped more than 4 million regulatory regions where proteins interacted with DNA. These findings gave them a much better understanding of how proteins turn genes on and off. They realized that the areas of DNA that did not code for a protein were not just junk DNA, but instead played a role as a type of genetic switch.
credit: DoD photo by Elaine Wilson
In September 2012, ENCODE published its results in a catalog of genetic data. The ENCODE catalog is like an operating manual for the human genome.
Scientists have used the data from the ENCODE project in disease research. For example, many regions of the human genome that do not contain protein-coding genes have been associated with disease.
These DNA regions often contain regulatory areas. Instead of affecting the protein itself, genetic changes in these areas may affect how much of a protein is made or when it is made. The disease may occur because abnormal amounts of the protein are being made.
Identifying the regulatory areas of the human genome can also explain why different types of cells have different properties. Why do muscle cells contract and generate force, while liver cells break down food? Although they both have the same complete set of DNA, muscle cells turn on specific genes, while liver cells turn on other genes.
With more research, investigators hope to better understand how the regulatory regions of DNA may control this process.
Humans come in many sizes and shapes. They have different personalities, skills, and interests. All of these characteristics make each person unique. However, at the genetic level, any two humans are about 99.9 percent identical. On average, the genomes of any two people differ by only one base out of every thousand bases. That tiny, 0.1 percent of variation in your genes makes you different from your brother or your neighbor!
Why do we look the way we do? Why are some people better at sports while others are better at art? Why do some people develop a disease while others do not? These are just a few of the questions that scientists hope to answer by studying the tiny differences in the human genome.
To better understand variation in the human genome, a group of researchers led by the NIH launched the International HapMap Project in October 2002. The $100-million project included scientists from public agencies and private organizations from around the world. Using the genome sequence provided by the HGP, the project attempted to map common human genome variations using four population groups: the Yoruba from Nigeria, Japanese in Tokyo, Han Chinese in Beijing, and Utah citizens with northern and western European ancestry.
credit: Tiago R. Magalhães, Jillian P. Casey, Judith Conroy, Regina Regan, Darren J. Fitzpatrick, Naisha Shah, João Sobral, Sean Ennis (CC BY 2.5)
In the HapMap project, researchers created a catalog of common genetic variations in humans. A genetic variation is the difference in DNA between one person and another. To identify variations, scientists looked for sites in the genome where the DNA sequence was different by a single base—A, C, G, or T. This type of variation is called a single nucleotide polymorphism (SNP) and is the most common type of DNA sequence variation in the human genome.
Most of the time, an SNP does not cause an important biological change in an organism, just as the meaning of a word does not change when a single letter is changed, such as in “realize” and “realise.” In some cases, the change of a single base can slightly change the gene’s function.
Other times, it can drastically change a gene’s function. Think about how the change in one letter can change a word’s meaning, such as “cat” and “sat.”
Researchers believe there are about 10 million SNPs in the human genome. Looking for all of these variations would be extremely time consuming and expensive. Luckily, scientists discovered a way to reduce the workload. As part of the HapMap project, researchers discovered that groups of SNPs often cluster in the same areas on chromosomes. These clusters are called haplotypes.
When completed in 2005, the HapMap project created a map of where these haplotype blocks could be found on chromosomes. It described specific variations common in the human genome and where these variations were located on chromosomes.
By comparing the HapMap details of different people, scientists could identify and compare areas of things such as genetic variation between two individuals and learn which of them is more likely to get cancer or which person is more likely to avoid developing dementia. This map made it easier for scientists to search for genetic variations and link a specific gene to a disease.
Although the main focus of the HGP was to sequence the human genome to provide a map to learn more about genes and disease, some scientists working on other species—plants, animals, fungi, bacteria, and viruses—are using the knowledge gained from the HPG in their own genomic studies. This has led to a branch of genomic research called comparative genomics.
Why would they want to do that? By comparing the characteristics that define organisms, they can identify similar and different areas in the genomes.
Comparative genomics can also help scientists study evolutionary changes among various organisms. This research can help them identify the genes that are the same among species as well as the genes that are different—those that give each species its unique characteristics.
credit: Martin Cooper (CC BY 2.0)
Scientists use high-powered computers to compare the human genome to the genomes of other species. By comparing the DNA sequences in the two genomes, they can determine how closely related humans are to other species.
For example, our closest relative, the chimpanzee, has a genome that is approximately 96 percent identical to the human genome, according to a National Geographic study. That shows how it takes only a tiny percentage of genomic difference to make a large impact.
Scientists can also use comparative genomics to study and understand disease. For example, although the chimpanzee genome is about 96 percent identical to the human genome, chimpanzees do not get certain human diseases, such as malaria and AIDS. Comparing the sequence of genes involved in these diseases might help scientists understand why humans are susceptible to these diseases while chimps are not.
Funny enough, the genome of the fruit fly is about 60 percent identical to the human genome! These two organisms, which look and behave very differently, share a core set of genes. By comparing the two genomes, researchers have discovered that about two-thirds of the human genes known to be linked to cancer have an equivalent gene in the fruit fly. In addition, when scientists inserted a human gene linked to early-onset Parkinson’s disease into fruit flies, the flies showed symptoms similar to those observed in humans with the disease.
Scientists hope that this discovery will one day lead to fruit flies being used as a new model to test treatments for Parkinson’s disease.
Comparative genomics may also help scientists better understand Earth’s evolutionary tree. Plants, animals, fungi, and bacteria may all look and behave differently, but they are all living creatures with genomes made of DNA, which holds the genes that code for thousands of proteins in each organism.
As the tools for sequencing genomes become more advanced, scientists will be able to compare more and more species, which might lead to discovering unknown connections among the species. Plus, this information could help scientists find new ways to conserve rare and endangered species.
How many photos do you have on your phone or other device? Most people have hundreds, if not thousands of pictures! That’s a lot of data, and it can be hard to find the one photo you’re looking for in a sea of photos you don’t want. Scientists have the same problem with their scientific data.
The HGP and the sequencing of other genomes has created a flood of data that scientists need to process, analyze, and store. To do so, they have turned to bioinformatics, a branch of biology that combines computer science, statistics, and math to understand biological processes.
It includes databases and knowledge bases to store, recall, and arrange data. Bioinformatics also develops methods and computer software tools to help scientists understand vast amounts of biological and genomic data. Bioinformatics can take enormous data sets and make sense of them.
Using high-powered computers, scientists can gather, compile, manipulate, and analyze large amounts of biological and genomic data. The HGP is one example of how bioinformatics can be used. Remember, the human genome has more than 3 billion base pairs. It’s essentially impossible to map all these pairs by hand!
However, using bioinformatics to compile, store, and analyze the data from human genome, scientists have been able to identify genome patterns in disease development. This has led to advances in treatments for different diseases.
We’ve touched on how different genetic markers can indicate whether or not a person will be more likely to get a disease at some point in their lifetime. In the next chapter, we’ll dive deeper into the impact the human genome has on health issues.
KEY QUESTIONS
•Why is data collection and storage such a major part of the Human Genome Project?
•Is there any danger in studying the genetic similarities and differences of different groups of people? Research times in history when people used science to harm instead of help.
There are a number of physical traits that make every person different. Comparing the genomes of different individuals can identify where differences exist in DNA and the effects these differences have on physical characteristics and health. Understanding these small DNA changes can help improve our knowledge of how the genome works. In this activity, you will investigate some physical characteristics and their variations and consider if they are examples of genetic variation.
•To start, crush the freesia flowers in a small jar. Keep the jar sealed until used later in this activity.
•Test the following characteristics on at least 20 people. Record your observations in your science journal.
•Bitter taste: Ask the subject to place a PTC strip on the tip of their tongue. Does it taste bitter or sweet?
•Coriander taste: What does the person taste when they put a small amount of coriander on their tongue? Does it have an herby taste or a soapy taste?
•Smell: Open the jar of crushed flowers and ask the subject to sniff. Can they smell anything?
•Earwax type: Use a wet wipe or tissue to gently wipe the entrance to the ear canal. Do not put anything inside the ear! Is there any earwax present? Is it dry or wet and sticky?
•Freckles: Does the subject have freckles?
•Dimples: Does the subject have dimples when they smile?
•zHand clasping: Have the subject clasp their hands together. Do they put their right thumb over the left or the left thumb over the right one?
•Create a data collection chart to organize your data.
•Once you have collected data from at least 20 people, create a bar graph to show the distribution of the different characteristics. What characteristics do you think are the most linked by genes? Which do you think could be due to other factors, such as education or environment? Choose one of the tested characteristics and research if scientists have linked it to a gene.
To investigate more, explore other characteristics that may be controlled by genes. How can you test for them? Also, try conducting this activity using people from several generations of at least two different families. What do your results tell you about genetic variation and heredity?
Some mutations are linked to genetic disorders. Some diseases are linked to a mutation in a single gene, while others are caused by mutations on several genes, helped along with environmental factors. In this activity, you will explore how easily DNA mutations can occur.
•To start, divide your classmates into groups of at least 10 students. Have them stand in a single file line. Make sure there is enough space between each person so they cannot hear the instructions given to the person in front of them.
•Give the first student in each line the same set of instructions. For example, tell them to draw a series of symbols or perform a series of actions. Make sure the rest of the students in the line do not hear the instructions.
•Have the first student whisper the instructions to the second student in line. The second student will whisper to the third student and continue down the line until the last student receives the instructions. How does this model what happens to DNA during several rounds of replication?
•The last student in line should perform the instructions that they received. What does this step represent?
•Have the first student perform the original instructions that they received. How does this compare to what the last student did? What types of mutations occurred in the instructions between the first and last students? How could these changes in the genome affect human health?
To investigate more, change the set of instructions or change how you deliver the instructions. How does this affect the mutations that arise? Why do you think this occurs?