Chapter 3 image

What Does the Human Genome Tell Us?

image

Why is it important to apply scientific discoveries to real-world problems?

 

image

Although scientific inquiry is a noble pursuit on its own, when the knowledge gained through the field can be applied to goals such as curing diseases, the entire human race benefits.

The completion of the Human Genome Project was just the beginning of genomic study. As with so many things in the realm of science, more questions begged to be answered. Although we knew the sequence of bases in the human genome, what did it mean? What can the human genome tell us? How does it compare to other genomes? How can the HGP make a difference in our lives?

It always feels good to reach a goal, right? But there’s more to reaching a goal than the feeling of satisfaction. A good scientist considers what use their discoveries can be to the human race, to the environment, and to other species. The impact of the HGP is no different.

Even after the monumental achievement of completing the HGP, there was more to be done.

Once the HGP provided the sequence of nucleotide bases in the human genome, the next step was to interpret it so we could understand what it all means. A process called gene annotation takes the raw DNA sequence and identifies the locations of genes and other coding regions in a genome. It also attempts to determine what those genes do.

In this way, gene annotation takes the genome sequence and tries to make sense of it. It makes the nucleotide sequence meaningful.

Gene annotation involves three main steps.

First, scientists identify the noncoding regions of the genome. This step is important because it limits the areas of the genome that scientists need to analyze and focuses their efforts on the essential coding sections.

The next step is identifying elements on the genome, a process called gene prediction. Gene prediction identifies regions that encode genes, such as those that code for proteins and other regions that regulate gene activity.

Finally, scientists link biological information to these elements.

THE SEARCH FOR GENES

Annotating the human genome is an extremely complicated task! Before the HGP started, scientists estimated the human genome had between 40,000 to 100,000 genes—maybe more. After the project was finished, that estimate dropped to between 20,000 to 25,000 genes.

1000 GENOMES PROJECT

 

The original HGP sequenced DNA from only a handful of individuals. Launched in 2008, the 1000 Genomes Project sequenced the genomes of more than 2,500 individuals from 26 different populations worldwide. With this information, scientists put together the largest and most detailed catalog of human genetic variation available to date. Medical researchers hope this information will help them identify the genomic variations that are linked to rare and common diseases in people worldwide.

image

credit: Lavanya Rishishwar and I. King Jordan (CC BY 4.0 International)

Scientists also discovered that only about 1 percent of human DNA is made up of protein-coding genes. The other 99 percent of the genome is noncoding.

Scientists once believed that noncoding DNA was junk and had no purpose. However, they have learned that some noncoding sequences control gene activity. For example, some noncoding DNA controls where and when genes are turned off and on. Other areas of noncoding DNA hold the instructions to make RNA molecules, which act as messengers carrying instructions from DNA to the cell. At least some noncoding DNA isn’t junk at all! We don’t know yet what might be revealed about other noncoding DNA.

Even within a single human gene, there are coding and noncoding regions. The human genome consists of small segments of DNA called exons that code for proteins.

Exons are separated by noncoding sequences of DNA called introns.

The lengths of exons and introns vary greatly. A gene may have 10,000 bases, but only about 1,000 bases are actually part of the coding sequence.

In addition, the distribution of genes on a chromosome is not uniform. Some regions may have a large number of genes, while other regions on a chromosome have far fewer. These are just some of the factors that make identifying genes in the human genome so difficult.

THE ENCODE PROJECT

The HGP produced the sequence of the 3 billion pairs of nucleotide bases in the human genome. The sequence gave scientists a blueprint of the genome. Then, they had to figure out how the blueprint worked.

In September 2003, National Human Genome Research Institute (NHGRI) launched a project named the Encyclopedia of DNA Elements (ENCODE). The goal of ENCODE was to identify and describe all of the functional parts of the human genome. By doing so, scientists hoped to gain insight on how the genome actually worked.

As with the Human Genome Project, NHGRI committed to sharing ENCODE results with the public.

During the next decade, hundreds of researchers from labs worldwide linked more than 80 percent of the human genome sequence to specific biological functions. They mapped more than 4 million regulatory regions where proteins interacted with DNA. These findings gave them a much better understanding of how proteins turn genes on and off. They realized that the areas of DNA that did not code for a protein were not just junk DNA, but instead played a role as a type of genetic switch.

You can explore the ENCODE catalog at this website. This site is used by scientists and much of it is very advanced, but it’s always interesting to see actual applications of the concepts we’re talking about in this book!Image

imageENCODEImage

 

GENE GENIUS

It confirmed that the number of genes that coded for proteins was between 20,000 to 25,000. It also identified thousands of other genes that had important roles in cells.

GENE GENIUS

A genome browser is a database that stores and displays genomic data from a genome that has been sequenced, assembled, and annotated.

image

That 0.1 percent variation in your genes is what makes you different!

credit: DoD photo by Elaine Wilson

Noncoding DNA turns genes on and off to control when and where proteins are produced.

In September 2012, ENCODE published its results in a catalog of genetic data. The ENCODE catalog is like an operating manual for the human genome.

Scientists have used the data from the ENCODE project in disease research. For example, many regions of the human genome that do not contain protein-coding genes have been associated with disease.

These DNA regions often contain regulatory areas. Instead of affecting the protein itself, genetic changes in these areas may affect how much of a protein is made or when it is made. The disease may occur because abnormal amounts of the protein are being made.

Identifying the regulatory areas of the human genome can also explain why different types of cells have different properties. Why do muscle cells contract and generate force, while liver cells break down food? Although they both have the same complete set of DNA, muscle cells turn on specific genes, while liver cells turn on other genes.

With more research, investigators hope to better understand how the regulatory regions of DNA may control this process.

TINY DIFFERENCE

Humans come in many sizes and shapes. They have different personalities, skills, and interests. All of these characteristics make each person unique. However, at the genetic level, any two humans are about 99.9 percent identical. On average, the genomes of any two people differ by only one base out of every thousand bases. That tiny, 0.1 percent of variation in your genes makes you different from your brother or your neighbor!

That 0.1 percent genetic variation also holds the answer to many biological questions.

Why do we look the way we do? Why are some people better at sports while others are better at art? Why do some people develop a disease while others do not? These are just a few of the questions that scientists hope to answer by studying the tiny differences in the human genome.

INTERNATIONAL HAPMAP PROJECT

To better understand variation in the human genome, a group of researchers led by the NIH launched the International HapMap Project in October 2002. The $100-million project included scientists from public agencies and private organizations from around the world. Using the genome sequence provided by the HGP, the project attempted to map common human genome variations using four population groups: the Yoruba from Nigeria, Japanese in Tokyo, Han Chinese in Beijing, and Utah citizens with northern and western European ancestry.

HELPFUL VARIATIONS

 

Some genetic variations develop because they are a benefit to the survival of an organism. For example, the genetic variation of the beta-globin gene that causes sickle cell disease is more commonly found in people who lived or had ancestors who lived in regions where malaria was common. Malaria is a serious and sometimes fatal disease that resembles the flu. It is caused by a parasite and is often transmitted from one person to another through the bite of a mosquito. People with the sickle cell variation of the beta-globin gene were protected against the malaria infection. In areas where malaria was a deadly killer, having the sickle cell variation was a good thing—you had a better chance of surviving and passing this genetic variation on to your children.

image

Genetic similarities between 51 worldwide human populations

credit: Tiago R. Magalhães, Jillian P. Casey, Judith Conroy, Regina Regan, Darren J. Fitzpatrick, Naisha Shah, João Sobral, Sean Ennis (CC BY 2.5)

While the Human Genome Project focused on the 99.9 percent of similarities in the human genome, the HapMap project focused on the 0.1 percent of variation.

In the HapMap project, researchers created a catalog of common genetic variations in humans. A genetic variation is the difference in DNA between one person and another. To identify variations, scientists looked for sites in the genome where the DNA sequence was different by a single base—A, C, G, or T. This type of variation is called a single nucleotide polymorphism (SNP) and is the most common type of DNA sequence variation in the human genome.

Most of the time, an SNP does not cause an important biological change in an organism, just as the meaning of a word does not change when a single letter is changed, such as in “realize” and “realise.” In some cases, the change of a single base can slightly change the gene’s function.

Other times, it can drastically change a gene’s function. Think about how the change in one letter can change a word’s meaning, such as “cat” and “sat.”

Researchers believe there are about 10 million SNPs in the human genome. Looking for all of these variations would be extremely time consuming and expensive. Luckily, scientists discovered a way to reduce the workload. As part of the HapMap project, researchers discovered that groups of SNPs often cluster in the same areas on chromosomes. These clusters are called haplotypes.

When completed in 2005, the HapMap project created a map of where these haplotype blocks could be found on chromosomes. It described specific variations common in the human genome and where these variations were located on chromosomes.

By comparing the HapMap details of different people, scientists could identify and compare areas of things such as genetic variation between two individuals and learn which of them is more likely to get cancer or which person is more likely to avoid developing dementia. This map made it easier for scientists to search for genetic variations and link a specific gene to a disease.

HOW DO WE COMPARE?

Although the main focus of the HGP was to sequence the human genome to provide a map to learn more about genes and disease, some scientists working on other species—plants, animals, fungi, bacteria, and viruses—are using the knowledge gained from the HPG in their own genomic studies. This has led to a branch of genomic research called comparative genomics.

The HapMap project is also mapping certain common genetic variations in specific populations around the world. This information helps scientists understand more about the disease risks of certain groups of people as well as how the environment has affected the genome.

 

GENE GENIUS

Researchers have found three key genes involved in inflammation in humans that appear to be absent in the chimpanzee genome. This absence might explain why chimps respond differently to certain illnesses.

GENE GENIUS

Despite their small size, mice and rats have genomes that are about the same size and contain a similar number of genes as humans. The human genome is about 85 percent identical to the rat and mouse genomes. These similarities explain why scientists have found using mice and rats in laboratory studies to be valuable.

Comparative genomics is an area of study in which scientists compare the complete genome sequences of different species.

Why would they want to do that? By comparing the characteristics that define organisms, they can identify similar and different areas in the genomes.

Comparative genomics can also help scientists study evolutionary changes among various organisms. This research can help them identify the genes that are the same among species as well as the genes that are different—those that give each species its unique characteristics.

image

This fruit fly is about 60 percent genetically identical to humans!

credit: Martin Cooper (CC BY 2.0)

Scientists use high-powered computers to compare the human genome to the genomes of other species. By comparing the DNA sequences in the two genomes, they can determine how closely related humans are to other species.

For example, our closest relative, the chimpanzee, has a genome that is approximately 96 percent identical to the human genome, according to a National Geographic study. That shows how it takes only a tiny percentage of genomic difference to make a large impact.

Scientists can also use comparative genomics to study and understand disease. For example, although the chimpanzee genome is about 96 percent identical to the human genome, chimpanzees do not get certain human diseases, such as malaria and AIDS. Comparing the sequence of genes involved in these diseases might help scientists understand why humans are susceptible to these diseases while chimps are not.

One day, the insights learned from comparing the two genomes could lead to new ways to treat and prevent these diseases and others.

Funny enough, the genome of the fruit fly is about 60 percent identical to the human genome! These two organisms, which look and behave very differently, share a core set of genes. By comparing the two genomes, researchers have discovered that about two-thirds of the human genes known to be linked to cancer have an equivalent gene in the fruit fly. In addition, when scientists inserted a human gene linked to early-onset Parkinson’s disease into fruit flies, the flies showed symptoms similar to those observed in humans with the disease.

MUTANTS!

 

DNA’s sequence of nucleotide bases is the code that carries the instructions for building proteins. While a change in one nucleotide in a long string of DNA may not seem like a big deal, it can have a real effect on an organism. There are three main types of mutation.

Substitution: One base is substituted for another in a DNA sequence.

Insertion: An entire base is added to the DNA sequence.

Deletion: An entire base is removed from the DNA sequence.

Mutations often occur when the cell makes a mistake while copying DNA for cell division. Other times, mutations occur when environmental agents called mutagens damage DNA. Mutagens such as ultraviolet light, radiation, and some chemicals can damage DNA by altering nucleotide bases so they look like other nucleotide bases. When DNA is copied, the damaged base might pair with an incorrect base, causing a mutation. Most of the time, mutations make no noticeable change in a gene’s expression function. Other times, a gene mutation causes a serious effect. If the mutation affects the gene’s ability to make a critical protein, then the cells that use that protein might not be able to work properly. This is one cause of cancer.

GENE GENIUS

A selective sweep is a mutation that occurs in a population and is so beneficial that it spreads through the population within a few hundred generations and eventually becomes the “normal” sequence. Your skin color is an example of selective sweep in humans. In Africa, our human ancestors had dark skin to protect against the sun’s direct ultraviolet rays. As our human ancestors moved from Africa to regions with less direct sunlight, the dark skin pigments were no longer needed for survival. Groups of early humans that moved to Asia and Europe gradually lost their dark pigment and their skin became lighter in color.

Scientists hope that this discovery will one day lead to fruit flies being used as a new model to test treatments for Parkinson’s disease.

Comparative genomics may also help scientists better understand Earth’s evolutionary tree. Plants, animals, fungi, and bacteria may all look and behave differently, but they are all living creatures with genomes made of DNA, which holds the genes that code for thousands of proteins in each organism.

As the tools for sequencing genomes become more advanced, scientists will be able to compare more and more species, which might lead to discovering unknown connections among the species. Plus, this information could help scientists find new ways to conserve rare and endangered species.

SO MUCH DATA!

How many photos do you have on your phone or other device? Most people have hundreds, if not thousands of pictures! That’s a lot of data, and it can be hard to find the one photo you’re looking for in a sea of photos you don’t want. Scientists have the same problem with their scientific data.

The HGP and the sequencing of other genomes has created a flood of data that scientists need to process, analyze, and store. To do so, they have turned to bioinformatics, a branch of biology that combines computer science, statistics, and math to understand biological processes.

Bioinformatics uses various computer technologies to better manage and study the vast amounts of biological and genomic data.

It includes databases and knowledge bases to store, recall, and arrange data. Bioinformatics also develops methods and computer software tools to help scientists understand vast amounts of biological and genomic data. Bioinformatics can take enormous data sets and make sense of them.

Using high-powered computers, scientists can gather, compile, manipulate, and analyze large amounts of biological and genomic data. The HGP is one example of how bioinformatics can be used. Remember, the human genome has more than 3 billion base pairs. It’s essentially impossible to map all these pairs by hand!

However, using bioinformatics to compile, store, and analyze the data from human genome, scientists have been able to identify genome patterns in disease development. This has led to advances in treatments for different diseases.

We’ve touched on how different genetic markers can indicate whether or not a person will be more likely to get a disease at some point in their lifetime. In the next chapter, we’ll dive deeper into the impact the human genome has on health issues.

TEXT TO WORLD

Look around at your friends and classmates. How are you all different? In what ways are you similar?

KEY QUESTIONS

Why is data collection and storage such a major part of the Human Genome Project?

Is there any danger in studying the genetic similarities and differences of different groups of people? Research times in history when people used science to harm instead of help.

Inquire & Investigateimage

Ideas for Suppliesimage

small jar with lid

fresh freesia flowers

PTC strips (used to test taste)

fresh, chopped coriander

small mirror

tissues or wet wipes

INVESTIGATE GENETIC VARIATION

There are a number of physical traits that make every person different. Comparing the genomes of different individuals can identify where differences exist in DNA and the effects these differences have on physical characteristics and health. Understanding these small DNA changes can help improve our knowledge of how the genome works. In this activity, you will investigate some physical characteristics and their variations and consider if they are examples of genetic variation.

 

To start, crush the freesia flowers in a small jar. Keep the jar sealed until used later in this activity.

Test the following characteristics on at least 20 people. Record your observations in your science journal.

Bitter taste: Ask the subject to place a PTC strip on the tip of their tongue. Does it taste bitter or sweet?

Coriander taste: What does the person taste when they put a small amount of coriander on their tongue? Does it have an herby taste or a soapy taste?

Smell: Open the jar of crushed flowers and ask the subject to sniff. Can they smell anything?

Earwax type: Use a wet wipe or tissue to gently wipe the entrance to the ear canal. Do not put anything inside the ear! Is there any earwax present? Is it dry or wet and sticky?

Freckles: Does the subject have freckles?

Dimples: Does the subject have dimples when they smile?

zHand clasping: Have the subject clasp their hands together. Do they put their right thumb over the left or the left thumb over the right one?

Create a data collection chart to organize your data.

Once you have collected data from at least 20 people, create a bar graph to show the distribution of the different characteristics. What characteristics do you think are the most linked by genes? Which do you think could be due to other factors, such as education or environment? Choose one of the tested characteristics and research if scientists have linked it to a gene.

imageInquire & Investigate

VOCAB LABimage

Write down what you think each word means. What root words can you find to help you? What does the context of the word tell you?

bioinformatics, comparative genomics, exon, gene annotation, intron, and mutation.

Compare your definitions with those of your friends or classmates. Did you all come up with the same meanings? Turn to the text and glossary if you need help.

To investigate more, explore other characteristics that may be controlled by genes. How can you test for them? Also, try conducting this activity using people from several generations of at least two different families. What do your results tell you about genetic variation and heredity?

Inquire & Investigateimage

HOW MUTATIONS OCCUR

Some mutations are linked to genetic disorders. Some diseases are linked to a mutation in a single gene, while others are caused by mutations on several genes, helped along with environmental factors. In this activity, you will explore how easily DNA mutations can occur.

 

To start, divide your classmates into groups of at least 10 students. Have them stand in a single file line. Make sure there is enough space between each person so they cannot hear the instructions given to the person in front of them.

Give the first student in each line the same set of instructions. For example, tell them to draw a series of symbols or perform a series of actions. Make sure the rest of the students in the line do not hear the instructions.

Have the first student whisper the instructions to the second student in line. The second student will whisper to the third student and continue down the line until the last student receives the instructions. How does this model what happens to DNA during several rounds of replication?

The last student in line should perform the instructions that they received. What does this step represent?

Have the first student perform the original instructions that they received. How does this compare to what the last student did? What types of mutations occurred in the instructions between the first and last students? How could these changes in the genome affect human health?

To investigate more, change the set of instructions or change how you deliver the instructions. How does this affect the mutations that arise? Why do you think this occurs?