Chapter 11

WHAT DOES THE FUTURE HOLD?

Iain McDonald, Michelle Leonard

The future of genetic genealogy offers a mixture of points of hope and potential problems, arising from both the testing companies and the populace. To put together this chapter, we have drawn on information supplied by experts in the field, leading people in the major testing companies and experiences with existing test-takers.

Protecting the future on a personal level

People keep dying. Unfortunately, this nasty habit can have important ramifications for others. Unless you dig them up, or a sample already exists in a lab, it is hard to test DNA from someone who is already dead. Thus it is important to get a sample of DNA from any aged relatives you may want to test in future and send it to a lab for a simple DNA test, after which they will store it securely.

It is also hard to get access to the DNA results of the dead. By default, most companies only allow access to people with the right combination of username and password. Assigning a benefactor who can control your account after your death is a good precaution to take now, even if you feel quite well. This can be done through major companies (e.g. Family Tree DNA, who provide an option to add the name, phone number and email address of a beneficiary under the Beneficiary Information tab within Account Settings). However, precautions should be taken on an individual level, e.g. ensuring your email will be checked by someone after your death, and providing a sealed document with relevant login details for your descendants to manage any accounts, data, subscriptions, etc.

Changes in DNA testing on a societal level

Like it or not, society is becoming more litigious and genetic testing is becoming more widespread and more precise. There is concern among some that positive genetic predisposition to certain conditions could become a factor in medical or life insurance quotations. As further conditions are identified, the number of medically relevant genes will increase. Genealogical DNA databases are also being used increasingly by law enforcement agencies to identify criminals.

Medical aspects are less concerning for Y-DNA. Y-DNA disorders are usually linked to male fertility issues, as the remaining genes have largely lost their functionality. Mitochondrial disorders are numerous, but future issues are likely to centre around autosomal DNA. For future generations, archived autosomal DNA may be an important source of information on both genealogical inheritance and inherited diseases. However, while a person is alive, sharing autosomal or mitochondrial DNA data risks publicly sharing you and your family’s susceptibility to inherited diseases. Aside from making private information public, in future this may affect insurance costs, unless data is properly pseudonymised so that it cannot be associated with you. Data should be shared only where you and your immediate family are happy with the level of privacy you retain, and you should make yourself aware of how organisations you share your data with will use it.

More generally, it is becoming increasingly important that persons handling DNA results for others (including genealogists and project administrators) adhere to the same ethical principles of confidentiality and data security that are expected among many other professions. Our ability to continue research relies on people sharing their DNA test results with each other, or at least with trusted intermediaries. Restrictions on this sharing may come either at a personal level, or be imposed by testing companies, by case law, or by broader privacy laws.

Legislation often arises as a result of individual cases of misunderstanding or malpractice. The onus is increasingly on individual researchers, project administrators and testing companies to prevent these situations. We have to balance the desire for privacy of individuals with the fact that most genetic test-takers want to be contacted. The key here is to provide clear, informed, written consent for any action taken. Personal information should remain private, unless the individual has consented to sharing it – this includes individual researchers taking genetic data from relatives, and information in family trees. There should also be a strict separation maintained between personal and genetic data where possible, so that genetic information cannot be traced back to a living individual, except where that individual has expressly consented to that.

Improvements to the status quo

Database size

Identifying relationships relies on a large database and deep testing by those in that database. Predictable improvements come from the recruitment of new test-takers, and from upgrading of existing tests by older test-takers.

Little over a decade ago the prevailing wisdom was that autosomal DNA could not be used for genetic genealogy. Fast forward to the present and autosomal DNA tests are by far the most popular on the market today with over 26 million test-takers across the databases by early 2019.

Family Tree DNA’s Y-DNA database has increased by an estimated 200,000 test-takers in the last year, with the number of sequencing tests approximately doubling. And each of these test-takers is typically testing deeper than their predecessors.

It is very difficult to predict how many people may test over the next few years, but it is clear that databases will continue to grow swiftly and database size will play an ever-increasing role in how successfully DNA testing can be used for genealogical purposes. Fuelling this is an increasing acceptance of DNA testing in genealogy, as the answers it can bring become more accurate. This results from test-takers seeking out new matches, wider marketing by testing companies and the wholesale decrease in price of genetic testing.

There will be many benefits to larger databases, including improved admixture percentages as reference panels increase and algorithms become more sophisticated. Additionally, the number of adoption and unknown ancestor mysteries that can be solved will increase dramatically as databases expand. The flipside of that is that many more people will discover hitherto unknown family secrets, making it more important than ever that people understand these possibilities before testing. The number of brick walls broken down will increase exponentially and confirmation of your ancestry via DNA will become easier to achieve. It is likely we will reach the point that most people with ancestry from areas where testing is most prevalent (North America, UK, Ireland, Australia, New Zealand, Scandinavia etc) will find close or reasonably close relatives have already tested when they enter the databases.

Another major benefit could be ancestor identification via DNA testing alone. Testing companies may develop the ability to automate personal family trees for test-takers based solely on their DNA matches. It is likely that the early days of such a system would require substantial trial and error and further traditional research. It would, however, be an amazing development for adoptees, those with recent unknown ancestor mysteries and test-takers who have yet to construct detailed family trees. Ancestry’s ‘New Ancestor Discoveries’ (NADs) beta feature is an early forerunner of such a potential tool. Large-scale sharing of both DNA and family tree data would be required to fully realise this future application; it would be essential to have millions of segments collaboratively mapped back to millions of ancestors so that when new test takers share any of the collectively mapped segments they can be immediately informed which ancestors their DNA segments have been assigned to.

Universal family tree

As DNA and genealogy continue to integrate over the coming years there is great potential for a universal family tree that incorporates traditional family trees with DNA test results. There are several websites already attempting to create universal family trees (i.e. FamilySearch, FindMyPast, WikiTree, Geni) but none have integrated with DNA databases yet and the fact that there are several, as opposed to just one, is a limiting factor. WikiTree and Geni, however, allow you to attach information about tests taken. LivingDNA recently announced the ‘One Family One World Project’ which aims, over a five-year period, to create a single worldwide family tree based on DNA. It is an ambitious but exciting project and it will be interesting to see how it develops.

Reconstructing the DNA of our ancestors

Reconstructing the DNA of our ancestors will become the goal of many a genetic genealogist over the coming years and is actually already partially achievable. When we undertake chromosome mapping we assign segments of our DNA back to our ancestors and, in essence, reconstruct parts of their genome. The objective is to reconstruct as much as we possibly can, but that largely depends on both how many descendants our ancestors have, and how many of those descendants have DNA tested. No matter how far the technology advances, if your second great-grandparents only had one child there is no way you can ever fully reconstruct their genomes. For ancestors with a large number of descendants, however, it may be possible to reconstruct all or, more realistically, a significant proportion. Rising database numbers will, once again, be essential to this endeavour. The more descendants of the ancestors we wish to reconstruct that test, the more of their DNA we will have to work with and the easier it will become to rebuild their genomes.

Back in 2014 Ancestry experimented with reconstructing the genome of a particular man and his two wives who lived over 200 years ago. David Speegle was a perfect subject for such a study as he fathered twenty-five children with wives Winifred Crawford and Nancy Garren. Using their DNA Circles feature Ancestry identified that a large number of their descendants (stemming from their 150 grandchildren) had already tested. Approximately 50% of the genomes of David, Winifred and Nancy were reconstructed during this study and it is easy to see how useful this could be if replicated across many other ancestors or ancestral couples. It was also determined that one of the three passed down the gene for blue eyes and male-pattern baldness to their descendants. The potential to find out which genes specific ancestors passed down to us could be a fascinating development.

Another interesting and successful attempt to reconstruct a genome was publicised recently when a large portion of the DNA of Iceland’s first known black man, Hans Jonatan, was recreated via his living descendants. 38% of his maternal African side was stitched together using segments from 182 of his descendants.

It is likely that more sophisticated tools will be developed in the coming years to assist in the reconstruction of ancestral genomes and convert them into a format compatible with the major testing companies or third-party tools. GEDmatch already provides a Tier 1 tool, named Lazarus, which fulfils this function to a certain extent, but it has a number of limitations. If the eventual outcome could be the existence of virtual kits for long-deceased ancestors in the commercial databases, that would be a game-changer for using DNA for genealogy and breaking down brick walls further back in time, as it would circumvent the current 5–7 generation limitation of autosomal DNA.

We can start reconstructing our ancestors right now by testing as many of their descendants, particularly older generation relatives, as possible, but the ways of going about this will become more sophisticated and perhaps even automated over time.

DNA-only ancestors

There are many places in the world where a dearth of records renders tracing family trees past a particular date almost impossible, yet DNA from our mystery distant ancestors can be traced. We can call these ancestors ‘DNA-only ancestors’ and over time we might find a place for them on our trees. It is perfectly plausible that we could know basic information about DNA-only ancestors and create simple identifiers for them e.g. ‘fifth great-grandfather Leonard lived in Ireland during the early to mid-1700s and had several children including fourth great-grandfather Jeremiah’. Despite the fact there is no documentation in existence that can provide an actual name for fifth great-grandfather Leonard, I could add him to my tree and map DNA segments back to him, especially if DNA helps me identify his other children and their descendants.

DNA phenotyping

DNA phenotyping is the prediction of physical appearance and biogeographical ancestry using DNA alone. This would have seemed like science fiction a few years ago, but it is actually coming to pass right now and in the future could be used to reconstruct the facial features and traits of our ancestors. Perhaps in time we will even be using these techniques in conjunction with reconstructed ancestral genomes we have built ourselves. It is already possible to predict eye colour, skin colour, hair colour, facial structure, freckles and ancestry composition, although the jury is out on the overall accuracy of this technology.

A forensics company named Parabon Nanolabs is currently at the forefront of this field and offers the Parabon Snapshot advanced DNA Analysis service, which has been used to create 3D facial reconstructions of both perpetrators and victims using crime scene DNA. Just recently a 3D photo-fit Parabon created of a suspect in a murder case helped identify the offender.

This kind of DNA technology has also been used in high-profile ancient DNA cases like Cheddar Man, Richard III and a soldier from the Battle of Dunbar. Not only were the latter’s facial features reconstructed, but it was also possible to work out that he had lived in south-west Scotland in the 1630s and had gone through periods of poor nutrition in his childhood.

These DNA techniques are almost certainly going to become a new aspect of genetic genealogy and, in time, regular consumers may be able to obtain digital 3D facial reconstructions for their ancestors. This would be especially interesting for those who lived prior to photography or for whom there are no surviving photographs.

It may not just be the appearance of our ancestors we will be able to predict in the future though – we may be able to learn about other traits and which ancestors we inherited these traits from. This would enable us to build up a more complete picture of our ancestors from their DNA and learn more about ourselves in the process.

This cutting-edge technology is bound to improve and evolve over time with more accurate facial reconstructions and better trait predictions for both ourselves and our ancestors. It will be very interesting to see how it affects DNA testing and the future of genetic genealogy.

Deeper testing

Mitochondrial DNA can now be fully sequenced. Test-takers who have fully sequenced their mtDNA have no new mutations to find and progress can only be made by obtaining closer matches, to get a better understanding of the geographical distribution of one’s ‘close’ relatives.

Y-DNA tests are still getting deeper (see p231-2), but the increasing uptake of sequencing tests is providing the most significant improvements. These create branches (sub-clades) in the Y-DNA family tree progressively closer to the present, allowing more recent ancestry to be explored. We call this ‘closing the genetic-genealogical gap’.

On average, the database size has to double before a person finds a match closer than their present closest relation. For Y-DNA or mtDNA, this step typically represents a relationship only a few centuries closer to the present. The genetic-genealogical gap is typically 1,000–3,000 years for most people, so closing it completely may take a long time.

Unlike mtDNA, there are still unread sections of Y-DNA. Current high-definition tests (e.g. Family Tree DNA’s Big Y-700) cannot estimate the age at which two people are related to much better than a factor of about two (e.g. between 1,000 and 2,000 years ago). Combining many tests gives slightly better results. A growing fraction of Y-DNA test-takers need advances in technology to find answers more precisely than this.

The potential for new and cheaper tests

With mt-DNA fully sequenced, new technologies are limited to autosomal and Y-DNA. In all tests, we can hope for a reduction in price. Once the sizeable outlay of machinery costs is recovered, the fundamental cost of tests is limited by the staff expense of running and analysing tests, plus the effective ‘rent’ for the machine’s bench space. Many people will pay US$100, but only a small fraction will pay US$500 or more for autosomal or Y-DNA sequencing tests. So the current focus is on making the existing sequencing tests cheaper.

The market for whole genome sequencing (WGS) is driven largely by the medical information in these tests. Genealogy is a secondary priority. Very little has so far been made of the inheritance of autosomal SNPs, and we could see this as a major emerging facet of autosomal DNA research once enough WGS tests are collected.

Improving tests requires either increasing read depth, by effectively repeating the same test more times, or by increasing read length through new technologies. WGS tests can now be bought with between 4x and 60x read depth, and reads of 150 to 250 base pairs. However, for genealogical relationships closer to the present, people may ultimately have to look towards deeper Y-DNA testing.

The Y chromosome has a highly repetitive structure and strong similarities between itself and other chromosomes: it is the length of the reads which currently limits how much of the chromosome can be read. Full Genomes Corp’s YElite 2.1 test, WGS tests and Family Tree DNA’s Big Y-700 test currently test around 14 million base pairs (Mbp), reaching one mutation per ninety years. Full Genomes Corp is trialling long-read technologies giving 20 Mbp (one mutation every sixty years), but at a price of around $3,000. The full Y-DNA sequence is about 57 million base pairs long. Depending on the mutation rate in these regions, we may ultimately be able to attain one mutation every twenty years.

Simultaneously, complex mutations are now being routinely extracted from sequencing tests. Read length also dictates the ability to extract complex mutations, with at least 700 Y-STRs being extractable from most NGS tests, including Big Y and Y Elite but even Family Tree DNA’s Big Y-700 test holds around 750 Y-STRs. Initial results indicate at least 6,000 Y-STRs can be read from longer read tests. Insertions, deletions and multi-nucleotide polymorphisms can increasingly be extracted, meaning that these rarer but still useful mutations can be more generally used in determining the closeness and accuracy of relationships.

If a combination of Y-STR plus Y-SNP data from sequencing tests becomes de rigeur, we will approach a situation where one mutation occurs roughly every generation. If long-read technology becomes more standard, we may see typically more than one mutation per generation. If it becomes affordable, we can start to probe individual generations of families, putting together a real family tree in the absence of complete paper records. However, this still appears financially prohibitive for at least several years into the future. Identifying a specific generation will also still rely on finding genetic distances between test-takers on two surviving lines (e.g. descendants of two brothers for Y-DNA tests), which becomes increasingly unlikely as generations pass and lines die out.

Input from the academic community

Major advances can be expected from studies like the 100,000 Genomes project. Meanwhile, existing datasets such as the People of the British Isles project and the Irish DNA Atlas project have yet to be fully exploited. Commercial companies like Living DNA are enlarging their comparison database by testing populations outside Europe.

Many academic datasets come with additional privacy restrictions, because the participants have not given their consent for individual results to be published. Nevertheless, they will provide results including very detailed population distributions and Y-DNA and mtDNA haplogroup trees.

In ancient DNA, the 1000 Ancient Genomes project (Uppsala University, funded to 2021), should greatly expand the number of ancient genomes we have for comparison, while prominent researchers such as David Reich are known to have further data. This will more accurately tie individual Y-DNA and mtDNA haplogroups to specific historical periods and cultures, and better map the migrations that have led to different autosomal admixtures.

Summary

Genetic genealogy will likely never solve some problems and will never replace traditional genealogy. Getting names onto a family tree will always rely on the written record that provides that name. With few exceptions, confidently putting people onto a family tree cannot be done with genetic genealogy alone.

The small but significant risk of dictats from litigation-averse testing companies, or poorly considered, or poorly implemented privacy laws, could restrict or effectively halt the sharing of information, but can be avoided by responsible action within the community.

However, the gap between what traditional and genetic genealogy can provide is closing, meaning the two fields are becoming increasingly inseparable. Testing databases are growing and, at least for autosomal and Y-DNA, tests are becoming cheaper and more detailed in their results. More people are starting to see genealogical mysteries unravelled that would never have previously been possible. The future of genetic genealogy remains bright: never in our history have our ancestors been able to speak to us so clearly.