3
Molecular Biology

3.1 Introduction

All organisms, from bacteria to blue whales, reproduce using DNA. Even the majority of viruses contain DNA. The only exceptions, the RNA viruses, must convert their RNA into DNA before they can replicate within their host cells. DNA is therefore quite literally ‘the stuff of life’, as it contains all the information that makes us who we are. Indeed, some biologists suggest that the sole reason for any organism's existence is to ensure that its DNA is replicated and survives into the next generation.

In protozoa, fungi, and multicellular plants and animals, DNA occurs in the membrane bound nucleus and the mitochondria. Plants also contain DNA in their chloroplasts. Therefore, in multicellular organisms, with the exception of certain specialised cells that lack these organelles, (e.g. mature mammalian red blood cells), every cell in the body contains DNA. In humans, unless heteroplasmy occurs, this DNA is identical in every cell, does not change during a person's lifetime, and is unique (identical twins excepted) to an individual.

DNA is composed of four nucleotide bases (adenine, thymine, cytosine, and guanine), and phosphate and sugar molecules. Nuclear DNA takes the form of a ladder twisted into the shape of a double helix in which the rails are composed of alternating sugar and phosphate molecules, whilst the nucleotides act as rungs joining the two rails together. Adenine is always joined to thymine and cytosine is always joined to guanine. Mitochondrial DNA is arranged slightly differently to nuclear DNA and will be dealt with later.

Within the nucleus of eukaryotic cells, DNA is found in structures called chromosomes. Human cells contain 23 pairs of chromosomes that vary in shape and size. Twenty‐two of these pairs are ‘autosomal chromosomes’. These contain the information that directs the development of the body (body shape, hair colour, etc.). The remaining pair of chromosomes is the X and Y ‘sex chromosomes’ that control the development of the internal and external reproductive organs. Each chromosome contains a strand of tightly coiled DNA. The DNA strand is divided into small units called genes and each gene occupies a particular site on the strand called its ‘locus’ (plural ‘loci’). The total genetic information within a cell is referred to as its ‘genome’.

In humans, there are about 35 000–45 000 genes and, on average, they each comprise of about 3000 nucleotides, although there is a great deal of variation. These genes code for proteins that determine our hair and eye colour, the enzymes that digest our food and every hereditable characteristic. Surprisingly, only a small proportion of the genome actually codes for anything and between these coding regions lies long stretches of repetitive non‐coding regions that exhibit a great deal of variability. Each gene exists in two alternative forms, called ‘alleles’, one of which is found in each of the pair of chromosomes. If DNA profiling detects only one allele, this is usually interpreted as a consequence of a person inheriting the same allele from both parents. If three or more alleles are detected, then this indicates that the sample contains DNA from more than one individual.

Because the sequence of nucleotides along the nuclear DNA chain is unique to all of us and is the same in every cell in our body, it is similar to a ‘barcode’ for identification. Although the entire human genome can be sequenced, it is not necessary to go to these lengths in order to identify an individual. Indeed, most of the DNA in every one of us is virtually the same, so sequencing all of it would not necessarily improve the reliability of identification. Instead, forensic scientists concentrate on regions of the genome that exhibit a high degree of variability. The textbooks of Buckleton et al. (2016) and Butler (2014) provide detailed accounts of the analysis of DNA in forensic contexts and Butler (2015) provides a good review.

3.2 DNA Sampling

We lose cells all the time. For example, whenever we blow our nose, brush our teeth, defecate, or comb our hair, and our skin cells are constantly being sloughed off. Consequently, because virtually all our cells contain DNA, we are also shedding DNA into the environment. It is therefore possible to isolate DNA from a wide variety sources (Table 3.1). Indeed, it is so easy to leave a trail of DNA that crime scene investigators must wear masks, and disposable over‐suits and over‐shoes to avoid contaminating the evidence they are collecting. In addition, the DNA profiles of crime scene investigators are held on a Staff Elimination Database. This means that if their DNA is recovered as a consequence of contamination, it can be excluded from the analysis. DNA contamination can also occur when transporting a dead body or via mortuary instruments. Therefore, it is preferable to take DNA samples from a corpse before it is moved. Similarly, all DNA samples need to be kept apart from the moment they are collected to avoid cross‐contamination. For example, contamination can occur if samples from a victim and a suspect are transported in the same container (even if they are in separate bags) or processed at the same time. Current analytical techniques, such as DNA‐17, can detect minute amounts of DNA and this increases the risk of detecting contaminants acquired during the collection, storage, and processing of samples. The results of DNA analysis therefore need to be interpreted with care. For example, our DNA can be recovered from bed sheets, even if we lay on them for only a short time. This means that one can prove that a man and a woman shared a bed, but not that they shared it at the same time.

Table 3.1 Potential sources of human DNA for forensic analysis.

Body fluids: blood, semen, saliva, urine, faeces, vomit
  • Tissues: skin, bone, hair, organs, fingernail scrapings
  • Fingerprints
  • Weapons
  • Bites
  • Discarded chewing gum

Drug packages spat out after storage in the mouth
  • Cigarette butts
  • Handkerchiefs and discarded tissues
  • Used envelopes and stamps
  • Cutlery
  • Used cups, mugs, bottled, or canned drinks
  • Clothing and bedlinen
  • Hairbrushes
  • Toothbrushes
  • Shoes and other footwear
  • Plasters
  • Used syringes

There are many means of collecting DNA and the best one depends upon the nature of the sample being tested – for example, whether it is a body fluid, a shirt, or a knife handle. If the sample is dry and hard, then dry swabbing with a sterile DNA‐free cotton/nylon/foam swab may be sufficient. However, if the biological material is dry, then a moist swab is more effective. Adhesive tapes, such as Scenesafe FAST™ tape and BVDA Gellifters®, which are used to collect trace evidence, can also be useful for collecting DNA (Hess and Haas 2017). The M‐Vac forensic DNA collection system is a patented device that applies a sterile saline solution to the evidence item and at the same time vacuums it up into a collecting bottle. The sample is then filtered to remove debris and sent to the laboratory for DNA sequencing. It is said to be better than many traditional sampling methods at extracting touch DNA from porous materials, such as brick and concrete.

The collection, transport, and/or storage of liquid and tissue samples present logistical problems, but these can be overcome by using FTA® cards. These cards contain chemicals that lyse any cells in the sample and immobilise and stabilise the DNA and RNA that is released. The cards also contain chemicals that preserve the DNA and RNA and thereby allow the samples to be stored at room temperature for long periods (>15 years). The cards can be pressed against liquid samples (e.g. saliva or semen) or the liquid can be dropped onto them. Tissue samples or blood clots can be squashed onto the cards. The nature of the substrate impacts on the ability to extract DNA from fingerprints (Ostojic and Wurmbach 2017). Kirgiz and Calloway (2017) found that FTA cards are useful for collecting DNA from fingerprints left on solid non‐porous surfaces, such as metal door handles and car steering wheels. After collection, to analyse a sample, punch a small disc from the card, wash it with FTA purification reagent, and then rinse it with TE−1 buffer. The disc is then dried, after which it can be subjected to PCR or Next Generation Sequencing.

In the UK, the Human Tissues Act states that it is illegal to take a sample of a person's DNA without their consent, except under certain conditions (e.g. to prevent or detect a crime or to facilitate a medical diagnosis). It is therefore illegal in the UK for a man to take DNA samples surreptitiously from his offspring to determine whether he really is the father. Similar rules apply in other countries, although there is a lot of variation.

3.3 DNA Analysis

Molecular biology is currently one of the most rapidly advancing areas of science. Therefore, methods of DNA analysis are constantly being refined and new technologies introduced. It is essential to keep a sense of proportion and distinguish between a genuinely useful advance and hype. More information obtained more quickly has the potential for overwhelming rather than speeding up an investigation if it is not genuinely useful or one is in a position to do anything with the extra information.

3.3.1 Polymerase Chain Reaction

Kary Mullins invented the polymerase chain reaction (PCR) in 1983 and it has since become one of the most powerful techniques in molecular biology. It is an enzymatic process that enables a particular sequence of bases along a strand of DNA to be isolated and amplified (copied) without affecting the surrounding regions. This makes it useful in forensic casework in which DNA samples are frequently limited in both quantity and quality. For instance, PCR has been applied to the identification of DNA from saliva residues on envelopes, stamps, drink cans, and cigarette butts. It also has the advantages of being sensitive and rapid. However, PCR is not suitable for the analysis of long strands of DNA, so it cannot be used in the older Restriction Fragment Length Polymorphism (RFLP) analyses in which the strands often contain thousands of bases.

Once a region of the DNA molecule is identified as worthy of investigation, the flanking sequences are ascertained so that PCR primers can be designed to identify the beginning and end of the sequence. The primers consist of short sequences of DNA that bind or hybridise onto their complimentary sequences on the test DNA sample. Once the primers have been designed, the PCR process is carried out, as outlined in Table 3.2.

Table 3.2 Summary of reactions involved in the polymerase chain reaction (PCR).

  1. Incubate the sample at 94–97 °C to separate the DNA helix into two separate strands and denature the DNA.
  2. Reduce the temperature to 50–60 °C to allow the primers to ‘anneal’ to the DNA.
  3. Raise the temperature to 70–72 °C to initiate the ‘polymerisation’ stage, in which Taq DNA polymerase enzyme uses the DNA template identified by the primers and the nucleotides adenine, guanine, cytosine, and thymine as building blocks to reproduce a complimentary copy of the template (Figures 3.1 and 3.2).
  4. Repeat the procedure in successive cycles of denaturing, annealing, and polymerisation, so that in a short time the original sequence is ‘amplified’ thousands, or even millions of times.
  5. After the amplification step, the PCR products (sometimes called ‘amplicons’) are separated on the basis of their length. In the past, this was done using flat bed gel electrophoresis, but this is being superseded by capillary electrophoresis as it is faster and can be automated. It is common practice to investigate several different sequences at the same time in a process referred to as ‘multiplexing’. This is achieved by designing primers that produce allele size ranges that do not overlap (Figure 3.3).
image

Figure 3.1 Diagrammatic representation of the PCR thermal cycling process. Each cycle takes about five minutes and the whole process lasts about three hours.

Source: Reproduced from Butler (2005), © 2005 Elsevier, with permission.

image

Figure 3.2 Diagrammatic representation of the amplification step of the PCR process.

Source: Reproduced from Butler (2005), © 2005 Elsevier, with permission.

image

Figure 3.3 Diagrammatic representation of the multiplex PCR process. (a) The arrows represent three sets of primers that have been designed to amplify three different loci: Locus A, Locus B, and Locus C. (b) The three loci are different sizes and can therefore be resolved easily on the basis of size separation.

Source: Reproduced from Butler (2005), © 2005 Elsevier, with permission.

Taq DNA polymerase is so‐called because it was discovered in the thermophilic bacterium Thermus aquaticus. It is not harmed by the denaturation part of the PCR cycle, therefore removing the need to add fresh enzyme after each denaturation. Consequently, an excess of Taq DNA polymerase, primers, and nucleotides are added at the start of the process and adjusting the annealing temperature controls the specificity of the reaction.

The primers are labelled with differently coloured fluorescent dyes, and therefore, after capillary electrophoresis, the PCR products can be detected by exposure to a laser beam that induces fluorescence at specific emission wavelengths that are then detected with a recording CCD camera. The results are printed out as a trace, referred to as ‘an electropherogram’ (Figure 3.4). Sometimes the machine may misinterpret a colour (e.g. it mistakes blue for yellow) and this gives rise to a false peaks – this phenomenon is called ‘bleed through’ or ‘pull up’. This can be recognised by careful analysis of the electropherograms across the colour spectrum. Other potential sources of error are ‘stutter peaks’ that occur immediately in front of (commonly) or after (less commonly) a real peak. Stutter peaks are easy to identify and are excluded from the interpretation when the sample is derived from a single person, but if it contains mixed DNA it can be difficult to discern a stutter peak from a real one. In addition, random flashes may occur owing to air bubbles, contaminants and other interferences, thereby resulting in background ‘noise’ that may mask small peaks or even be mistaken for peaks themselves. Re‐running the sample will usually identify ‘false peaks’ because they are unlikely to occur in exactly the same place twice. Surprisingly, there appear to be no universally accepted guidelines concerning the lower accepted limits that distinguish a ‘true peak’ from ‘background’.

image

Figure 3.4 Electropherograms of autosomal STR profiles. (a) An SGM Plus profile of a man (note the two peaks for the amelogenin locus). The profile is displayed in green, blue, and yellow channels of a four‐colour fluorescent system. The red channel was used for size marker and is not shown here. Most of the STR loci are heterozygous (i.e. there are two peaks) and the alleles are evenly matched (i.e. the peaks are the same size). The number beneath each peak indicates the size of each allele in repeat units. (b) Part of an SGM Plus profile of a mixed sample. Only the green channel is shown. The sample is obviously a mixture as three alleles that are present at D8S1179 and four alleles that are at D21S11. The minor component of a mixture is only identifiable if it is present above the level of ‘background noise’. The amelogenin peaks are of approximately the same size, indicating that this is a mixture of DNA from two men. If the mixture was from a man and a woman, the X amelogenin peak would have been higher than the Y peak. (c) Part of an STR profile following low copy number testing. This process can lead to heterozygote imbalance at some loci. For example, note the differences in peak height at D21S11 and D18S51. Owing to stochastic effects (see text), no two amplifications of the same sample yield behave identically and, therefore, duplicate LCN PCR amplifications should be undertaken and only those alleles present in both electropherograms should be recorded.

Source: Reproduced from Jobling and Gill (2004), © 2004 Macmillan Publishers Ltd., with permission.

3.3.2 Quantitative (Real Time) PCR

The Quantitative (Real Time) PCR technique is based on the PCR process and is designed to both quantify and amplify the targeted DNA. Two of the commonest means of quantification are the inclusion of a fluorescent dye (e.g. SYBR® Green) in the PCR reaction that intercalates with the DNA as it is produced and the TaqMan® assay (Figure 3.5). In the intercalation assay, the binding of SYBR Green to double‐stranded DNA brings about an increase in fluorescence. Therefore, as more amplicons are produced, the greater the fluorescence detected. Because the dye binds to any double‐stranded DNA molecule, the intercalation method is non‐specific (i.e. it will not distinguish between DNA molecules). In the TaqMan assay, oligonucleotide probes (TaqMan probes) are added that bind to a specific internal region of DNA between the forward and reverse PCR primers. The probes have a ‘reporter’ dye attached to their 5′ end and a ‘quencher’ dye attached to their 3′ end. When a high energy dye (reporter dye) is in close proximity to a low energy dye (quencher dye), there is a transfer of energy from the high energy dye to the low energy dye – this is what happens in the intact probe and it results in the fluorescence of the reporter dye being low or undetectable. When, during the PCR process the Taq DNA polymerase replicates a template on which a TaqMan probe is bound, the enzyme (which has 5′‐nuclease activity) splits the probe, thereby separating the reporter dye and the quencher dye – hence fluorescence of the reporter dye is increased, whilst the fluorescence of the quencher dye decreases. The method is extremely sensitive and can detect as little as a two‐fold increase in the level of a DNA sequence. Because custom‐designed primers are used, this method is more specific than the intercalation method but in both cases, with each cycle of the PCR process, more DNA is produced and this is measured as an increase in fluorescence. The DNA product is therefore ‘quantified’ as it accumulates in ‘real time’ and hence the terms ‘real time’ and ‘quantitative’ PCR (various abbreviations are used including qPCR, RT‐PCR, and QRT‐PCR). Once sufficient DNA is produced, it can be sequenced or used for Southern Blotting.

image

Figure 3.5 Diagrammatic representation of the TaqMan assay.

Source: Reproduced from Butler (2005), © 2005 Elsevier, with permission.

3.3.3 Pyrosequencing

Pyrosequencing identifies nucleotides through the generation of light. First a sequencing primer is hybridised onto a single stranded DNA template (e.g. a PCR‐amplified fragment of mitochondrial DNA coding for 16SrRNA). This is then incubated with the enzymes DNA polymerase, ATP sulphurylase, luciferase (which is chemiluminescent), and apyrase and the substrates adenosine 5′ phosphosulphate and luciferin. Sequentially adding one of the four nucleotides – i.e. adenine, cytosine, guanine and thymine in their triphosphate form (i.e. adenine triphosphate, etc.) initiates the reaction cascade. Let us say that the first nucleotide added is adenine triphosphate. If the first unpaired base of the template is thymine, then the adenine triphosphate binds to it and in the process releases its phosphate moiety as pyrophosphate (PPi). The enzyme ATP sulphyrase then combines the PPi with adenosine 5′ phosphosulphate to form ATP. The enzyme luciferase then uses this ATP to break down luciferin and in the process releases energy in the form of light and this is detected and measured. Finally, the enzyme apyrase breaks down any unbound nucleotides and ATP, thereby allowing the next nucleotide triphosphate to be added. Consequently, by sequentially adding nucleotides, one can identify the first unpaired base on the template, since light will only be produced when they complement one another. Therefore, pyrosequencing works by identifying one base at a time along the length of a strand of DNA and causing the release of light whenever a ‘match’ occurs. Obviously, as a means of discriminating between species or individuals, this technique is most reliable when there are many nucleotide differences between their DNA templates. Pyrosequencing is fast, accurate, and easily automated, so it can be used for large‐scale surveys. In addition, it is quantitative since the amount of light formed relates to the amount of nucleotide base binding to the DNA template. However, according to Børsting and Morling (2015), the use of pyrosequencing for human DNA samples in forensic case work is limited by its ability to sequence only short lengths of DNA and the restricted capability of current machines to undertake multiplexing. Nevertheless, Bus et al. (2016) demonstrate that it is possible to undertake multiplex pyrosequencing of InDel (insertion/deletion) markers for individual identification.

3.3.4 Next Generation Sequencing

Next Generation Sequencing (NGS), also referred to as ‘massively parallel sequencing’, promises to revolutionise our approach to DNA analysis. This is because it enables the sequencing of whole genomes relatively quickly and cheaply. Therefore, a single sequencing step provides information on both Short Tandem Repeats (STR) loci and Single nucleotide polymorphisms (SNPs) and therefore an indication of individual identity and external visible characteristics (e.g. eye colour) and biogeographical ancestry. However, nuclear DNA and mitochondrial DNA are difficult to analyse simultaneously, owing to the large differences in the amount of DNA present in the nucleus and the mitochondria. Although NGS has many forensic applications, it poses serious problems of how the information should be stored and the relevant data identified from among the huge amount of sequence data generated. For this reason, some workers propose moving to Cloud‐based storage and analysis (Celesti et al. 2017).

There are several NGS technologies that exhibit different strengths and weaknesses in terms of factors such as sample preparation, cost per run, number of reads per run, and the lengths of the reads. However, they all operate on the basis of sequencing millions of strands of DNA in parallel. For reviews, see Kuiper (2016), Levy and Myers (2016), and Yang et al. (2014). An alternative approach is to use shotgun metagenomics followed by NGS, in which the whole genomes of all DNA present in the sample are isolated, fragmented, and then sequenced (Jovel et al. 2016).

In 2019 several European forensic laboratories already employed NGS and its use will undoubtedly become more widespread in the future. In addition to the extra information gained from NGS, the availability of robotic platforms facilitates consistency in quality and reproducibility and increases sample throughput. NGS can be employed in many aspects of forensic science, but in the field of human identification, it must be capable of being used in conjunction with existing DNA databases. The more widespread adoption of NGS in forensic science will also require new agreed protocols for how NGS data should be obtained, stored, and analysed.

Although identical twins share the same STR DNA profile, NGS can reveal minor differences that arise through germline mutations. For example, in an experimental study, Weber‐Lehmann et al. (2014) used NGS to distinguish which identical twin was the true father of a child. The true father and the child shared five SNPs that were absent from the father's identical twin brother.

NGS is increasingly used for diagnosing genetic disorders and could be useful for identifying the cause of Sudden Unexplained Death Syndrome (SUDS). As its name suggests, SUDS manifests itself when an adolescent or young adult who was apparently otherwise healthy dies suddenly, often in their sleep, for no apparent reason. This inevitably results in suspicions that the person was the victim of a malicious act or had been taking drugs. SUDS is often associated with rare genetic disorders that cause abnormalities in heart function. Evidence of pathological changes is difficult to detect at autopsy and Brion et al. (2015) describe how NGS identifies genetic mutations linked to heart malfunction.

3.3.5 Fourth‐Generation Sequencing

Fourth‐generation sequencing (FGS) technology is being promoted as the natural successor to NGS (Feng et al. 2015). It is based on sequencing single molecules of DNA, RNA, or proteins by passing them through nanopore channels. The apparatus typically consists of two fluid filled chambers separated by a membrane within which is embedded a nanopore slightly larger than the molecule of interest. The specimen is placed in one of the chambers and the application of an electric current induces it to move through the nanopore to the opposite chamber. As a molecule moves through a pore it causes changes in the frequency, duration, and amplitude of the ionic current. Each base causes a different change in the current. Therefore, the sequence of nucleotides in a strand of DNA or RNA can be determined as it passes through a pore. Many pore types and membranes are available and they all have their advantages and disadvantages for different applications.

Some workers state that FGS has the potential to sequence a whole genome in less than 15 minutes for less than US $1000 (some claim it may ultimately drop to ~US $100). FGS technology offers many potential advantages in forensic science. In particular, it requires little sample preparation, removes the need for a PCR step, enables long lengths of DNA (10 000–50 000 bases) to be read in a single run and can identify extremely low levels of DNA. Portable FGS machines, such as MinION made by Oxford Nanopore Technologies, are already available that can be used in field situations. However, at the time of writing, there remains many practical problems to overcome with regards to replicatability. In part, this is because the molecules move so fast through the pores that it is difficult to faithfully record the base sequences.

3.4 Molecular Markers

There is a variety of molecular markers currently in use in forensic science to identify an individual, their gender, and their personal characteristics (Table 3.3). In addition, markers are also available to determine our health and probability of succumbing to certain diseases. This creates ethical issues about how much information we should seek to obtain and how data should be stored and shared between ‘interested parties’. The different markers vary in their effectiveness, susceptibility to degradation, and ease and cost of analysis. STR analysis forms the basis of most forensic DNA investigations and the other methods are utilised if these are more appropriate (e.g. the DNA is badly degraded) or more information is required because the suspect's STR profile is not on a DNA database.

Table 3.3 Advantages and disadvantages of the most commonly used methods of forensic DNA profiling.

Genetic Marker Advantages Disadvantages
Autosomal STRs Small sample size
Slight DNA degradation not a problem
Excellent discrimination
Discrimination seriously reduced if DNA badly degraded
Autosomal SNPs Extremely small sample size
Discrimination possible with badly degraded DNA
Tests for ancestry available
Discrimination power lower than for STRs.
Ability to distinguish mixed DNA profiles lower than STRs
Y‐linked STRs Male specific therefore useful for mixed gender DNA samples
Small sample size
Slight DNA degradation not a problem
Low discrimination especially among male relatives
Discrimination seriously reduced if DNA badly degraded
Y‐linked SNPs Male specific therefore useful for mixed gender DNA samples
Very small sample size
Slight DNA degradation not a problem
Low discrimination
Difficult to distinguish individuals if DNA from more than two males present
SINEs Gender determination not affected by allele deletion
Tests for racial origin possible
Limited number of studies to confirm effectiveness
Mitochondrial DNA Very small sample size
Effective with even badly degraded DNA
Heteroplasmy may facilitate identification
Lower discrimination than STRs
Limited discrimination if individuals are maternally related
Heteroplasmy may prevent accurate identification

3.4.1 Short Tandem Repeat Markers

Short Tandem Repeat Markers (STRs), also referred to as ‘microsatellites’ or ‘simple sequence repeats (SSRs)’, are brief lengths of the non‐coding region of the human genome consisting of less than 400 base pairs (hence ‘short’), in which there are 3–15 repeated units, each of 3–7 base pairs (hence ‘tandem repeats’). These STR sequences, or ‘markers’, can be divided into three categories: ‘simple’, ‘compound’, and ‘complex’. Simple STRs are those in which repeats are of identical length and sequence units. Compound STRs consist of two or more adjacent simple repeats, whilst complex STRs have several repeat blocks of different unit length and variable intervening sequences. STRs occur on all 22 pairs of autosomal chromosomes and the X and Y sex chromosomes.

STRs vary between individuals, this diversity resulting from the effects of mutation, independent chromosomal variation and recombination. However, STRs found on the Y chromosome exhibit less diversity than those on other chromosomes, because they do not undergo recombination. Consequently, STR diversity on Y‐chromosomes results solely from mutation. Because we inherit half our chromosomes from our mother and half from our father, each STR locus comprises two DNA components (sequence lengths). Therefore, in a method based on 17 STR loci, there are 34 DNA sequence lengths.

The shortness of the STR markers means that only small amounts of DNA are required. The marker sequences are easily amplified using PCR and their shortness reduces the risk of differential amplification. Because PCR amplification occurs in a non‐linear manner, reproducibility is affected by stray impurities and the shorter the sequence, the less risk there is of this occurring.

Human STR testing kits include a marker at the amelogenin locus to enable sex determination. Amelogenin is a substance involved in the organisation and biomineralization of enamel in developing teeth. In humans, the gene is expressed on both sex chromosomes, but that on the X chromosome is six base pairs shorter than that on the Y chromosome. Consequently, following PCR and electrophoresis, males being heterozygous (XY) express two peaks (or bands), whilst females, being homozygous (XX) express a single peak (Figure 3.4). The test is not totally reliable and problems arise if there is a deletion of the amelogenin gene on the Y chromosome – an important consideration in some ethnic groups, such as Malay and Indian populations (Chang et al. 2003).

There are over 2000 STR markers suitable for genetic mapping, but only a few of them are used routinely for forensic DNA profiling. Up until July 2014, the laboratories providing forensic science services in the UK used the SGMplus™ system (SGM+) that utilised 10 autosomal STR markers plus the amelogenin locus to enable sex determination. Subsequently, the DNA‐17 profiling methodology was adopted. This was partly a consequence of the need to standardise DNA profiling information throughout Europe. DNA‐17 multiplexes utilise the same 10 STR markers and amelogenin locus as those used by SGM+, but these are supplemented by 6 more STR loci.

The underlying chemistry of DNA‐17 is modified from that employed by SGM+ to improve PCR amplification and reduce the risk posed by inhibitory chemicals. For example, the DNA‐17 methodology incorporates several ‘mini‐STRs’ – these were obtained by developing primers that generate a shortened PCR product of an STR marker. These STR markers are therefore identifiable from a shorter sequence of DNA. Therefore, there is a better chance of recovering a profile from degraded DNA. In addition, DNA‐17 exhibits better sensitivity than SGM+ through increasing the number of PCR cycles. This means that STR profiles can be generated from less than 250 picogrammes (250 × 10−12 g) of DNA; levels this low are sometimes referred to as ‘low template DNA’. There are about 6–7 pg of DNA in a typical mammalian cell and some commentators state that, if pushed to its limits, DNA‐17 sequencing can generate an STR profile from as few as two cells. In addition, the latest instruments operate so fast that they can sequence a sample and provide results to compare against a DNA database within three hours.

In the UK, several manufacturers are accredited to make the DNA‐17 sequencing kits used by Forensic Service Providers. Although the kits all use the same STR markers, their profiling chemistries are not identical and involve different primers. In the vast majority of cases, this has no impact on the DNA profiles that are generated. However, in rare circumstances, two kits might yield different sequence lengths at a particular STR locus (e.g. 10, 10 cf. 10,11) or in one of them an STR locus may not be amplified (e.g. 10, 0). This is referred to as a ‘discordance’ or ‘discordant event’ and is caused by a ‘Primer Binding Site Mutation’ (PBSM). A PBSM can result in a primer being unable to bind or binding poorly at a particular locus. Therefore, in one kit, the presence of a PBSM at an STR marker means that the marker is either not amplified or it is poorly replicated, whilst in another kit, because the primer for that locus has a different sequence, it is able to bind and the STR locus is amplified faithfully. If it is suspected that a near match between two profiles is a consequence of one kit being affected by the presence of a PBSM, then one sample should be re‐analysed so that both profiles are generated using the same kit.

If all the STR markers in a DNA sample are sequenced, then it is said to yield a ‘full profile’. Saliva samples (or similar) taken under controlled conditions should always yield a full profile. Increasing the number of markers included in a DNA profile reduces the possibility of two unrelated DNA samples sharing the same full profile by chance – although this value is put at one in a thousand million (109) for both SGM+ and DNA‐17. DNA samples taken from crime scenes are often degraded and only some of the STR markers are identifiable. The sample is then said to yield a ‘partial profile’ and therefore provides a ‘partial match’ to a profile held on a DNA database or donated by a suspect. It is then necessary to calculate the strength of this match (specific match probability) using the approved statistical tests. It should be noted that because it incorporates more STR markers, even a partial profile obtained using DNA‐17 might provide more than a full match using SGM+. Provided there are 19–32 matching loci, the match probability of the partial DNA‐17 profile with the full profile of the suspect would still be recorded as one in a thousand million.

The increase in the number of markers and improved sensitivity offered by DNA‐17 are useful in many forensic contexts. To begin with, because one must match 17 markers, it is easier to eliminate suspects from an enquiry. Even close relatives are highly unlikely to share the same profile, although monozygotic (identical) twins would not be distinguished. The sensitivity means that DNA‐17 can be used to investigate ‘cold cases’ from which it was not previously possible to extract DNA profiles. Similarly, criminals now often make efforts to avoid leaving DNA evidence, so only minute amounts may be present at the crime scene. There is, however, a downside, and the ability to identify profiles from both degraded and low template DNA means that DNA‐17 is much more likely to identify contaminant profiles from individuals who have nothing to do with the case being investigated. For example, a package found at a crime scene could reveal the DNA profiles of three people, but it cannot be assumed that all or any of them are relevant to the investigation. The question is now moving from ‘can one detect DNA’ to ‘can one identify the DNA that is relevant to the investigation’. It should also be noted that the ability to detect trace levels of DNA is influenced by factors such as the swab type used. Therefore, extreme care is needed when collecting, processing, and interpreting DNA evidence.

To facilitate data analysis, profile interpretation software can estimate the probability that a mixed profile is a consequence of two or more persons being involved or there was contamination (Bright et al. 2015). Nevertheless, a competent defence lawyer would exploit any ambiguity in DNA profile interpretation to suggest that their client's DNA was present as a consequence of secondary transfer, or, alternatively, deposited before or after the criminal act took place.

In the USA, DNA profiling was initially based on 13 autosomal STR markers plus a gender marker. However, the CODIS Core Loci Working Group (CODIS = Combined DNA Index System) recommended that this should be increased to 20 autosomal STR markers and the amelogenin gender indicator (Table 3.4). Therefore, the number of autosomal markers shared by the European and US systems has doubled from 7 to 14. This means that it is easier to compare unknown profiles with those stored on both the UK and US DNA databases. This can be useful when investigating international criminal organisations and terrorism. However, it does require government approval and police forces cannot gain automatic access to the DNA database of another country.

Table 3.4 STR loci used in the SGM+, DNA‐17, CODIS, and expanded CODIS loci.

SGM+ DNA‐17 CODIS (13 autosomal loci) CODIS (20 autosomal loci)
vWA vWA vWA vWA
D8S1179 D8S1179 D8S1179 D8S1179
D21S11 D21S11 D21S11 D21S11
D18S51 D18S51 D18S51 D18S51
TH01 TH01 TH01 TH01
FGA FGA FGA FGA
D3S1358 D3S1358 D3S1358 D3S1358
D16S539 D16S539 D16S539 D16S539
D2S1338 D2S1338 D2S1338
D19S433 D19S433 D19S433
SE33
D2S441 D2S441
D10S1248 D10S1248
D22S1045 D22S1045
D1S1656 D1S1656
D12S391 D12S391
D5S818 D5S818
D13S317 D13S317
D7S820 D7S820
CSF1PO CSF1PO
TPOX TPOX
Amelogenin Amelogenin Amelogenin Amelogenin

A DNA profile report often takes the form of a table of alleles, and a hypothetical example obtained using the ProfilerPlus™ system is illustrated in Table 3.5. Only a restricted number of loci are shown. Currently, many more loci would be sequenced, but the principal remains the same. The numbers relate to the position of the alleles at each gene locus. For example, in suspect 1, the gene D3S1358 is heterozygous and expresses alleles 15 and 16, whilst in suspect 2 the gene is homozygous and allele16 is expressed on both chromosomes. Suspects 1 and 2 have different profiles to that found in the semen stain and are therefore classed as ‘exclusions’, whilst the profile of suspect 3 is the same as that found in the stain and is therefore ‘an inclusion’. The statistical frequency of that combination of alleles occurring in the population is then calculated by reference to a sample population. For example, for the locus VWA, if 8% of Englishmen expressed allele 15 and 21.6% expressed allele 16, the frequency of this pair of alleles would therefore be 2 × 0.08 × 0.216 = 0.0346, or 3.46% of the male English population. If the frequencies at all the loci are added together, then the frequency estimate for the whole DNA profile will be extremely small – perhaps one in a hundred million or more. Obviously, a great deal depends on the sample population used to generate the allele frequencies and corrections may need to be made to allow for this. For example, certain allele combinations will be common among close relatives, sub‐populations or ethnic groups but might be rare in the population at large. Consequently, the frequency of certain DNA characteristics may be extremely low as a national average but common among family members or an ethnic group.

Table 3.5 Table of alleles illustrating the hypothetical DNA profile of a semen stain and that of three suspects.

Allele Loci
D3S3158 VWA FGA D8S1179 D21S11 D18S51 D5S818 D13S317 D7S820 Amel
Stain 16,18 16,16 19,25 13,14 29,30 17,17 11,11 10,11 9,10 XY
Sus 1 15,16 16,16 19,25 13,14 29,30 14,17 11,11 10,11 9,10 XY
Sus2 16,16 15,16 21,23 14,14 27,28 17,17 10,11 8,9 8,9 XY
Sus 3 16,18 16,16 19,25 13,14 29,30 17,17 11,11 10,11 9,10 XY

Amel = amelogenin test for gender; Sus = suspect number.

DNAboost™ is a technique that was developed by the UK Forensic Science Service (FSS) in 2009 that applies computer analysis to DNA profiles. Following the demise of the FSS in 2012, the procedure was updated and re‐launched by the National DNA Database®. The programme facilitates the assembly of every potential allele combination within a sample and their comparison against SGM+ and DNA‐17 profiles held on the National DNA Database. This approach is useful when the sample contains DNA from two or more individuals that cannot be resolved or in which the DNA levels are low and/or degraded.

3.4.2 Y‐Short Tandem Repeat Markers

Over 200 STR markers are present on the human Y chromosome and commercial kits for use in forensic science are currently available for around 27 of them. They are useful in cases of rape and sexual assault, where there are mixed male and female DNA profiles and therefore separating the two is a major challenge. They are also used for paternity testing and familial searching (Kayser 2017).

Unlike conventional STR analysis, there is typically only one peak or band for each STR type in Y‐STR analysis and these can only originate from DNA from a male. In the case of multiple sexual assaults, more peaks are found, depending upon the number of men involved. The simultaneous detection of multiple Y‐STR loci produces additional genetic information without consuming additional DNA, and NGS technology permits STR, Y‐STR, and SNP markers to be identified in a single assay. A further advantage of Y‐STR analysis is that it enables DNA profiles to be made in cases of sexual assault in which the man did not produce sperm. This might be a consequence of a medical condition or vasectomy. In these circumstances, the absence of sperm means that only a small amount of male autosomal DNA is present and the female's autosomal DNA would swamp this. By specifically targeting the Y chromosome STR markers, one can focus on the minute amount of male DNA present.

Y‐STRs are interconnected and inherited as a haplotype. Therefore, they exhibit lower variability than autosomal STR markers and their discriminatory power is not as good. Unless a mutation occurs, all male relatives (sons, fathers, brothers, etc.) share the same profile. This is a problem when the suspect comes from an inbred population or a criminal family. Nevertheless, a subset of Y‐STRs have been identified that can distinguish fathers and sons in many but not all instances (Jobling and Tyler‐Smith 2017). In addition, like other DNA profiling techniques, the value of Y‐STR analysis can be as great in excluding suspects as identifying a culprit.

3.4.3 Mitochondrial DNA

Mitochondria are intracellular organelles that generate about 90% of the energy that cells need to survive. The numbers of mitochondria found in a human cell depend upon its energy needs and vary from zero in the mature red blood cell to over 1000 in a muscle cell. They are thought to descend from bacteria that evolved a symbiotic relationship with pre‐ or early eukaryotic cells many hundreds of millions of years ago. With time, the symbiotic relationship became permanent, but the legacy is reflected by present‐day mitochondria retaining their own bacterial type ribosomes and their own DNA (referred to as mtDNA) that is distinct from that found in the cell nucleus. Each mitochondrion contains between 2 and 10 copies of the mtDNA genome. The inheritance of mtDNA also differs from nuclear DNA, in that it is exclusively generated from the maternal side. This is because the sperm head is the only bit of a spermatozoon that enters the egg at the time of fertilisation. Usually, the spermatozoon's tail and the mid‐piece (which is the only bit containing mitochondria) shear off as the head enters the egg's perivitelline space. Occasionally, a few mid‐piece mitochondria are incorporated at fusion, but the egg subsequently destroys them. Consequently, the only mtDNA present in a developing embryo is derived from the egg. This means that mtDNA sequencing is an excellent means of identifying the correct mother or grandmother of a child in cases of disputed or unknown parenthood. A tragic instance of where this was necessary occurred in Argentina following the collapse of the military dictatorship. During the dictatorship there was a so‐called ‘Dirty War’ (Guerra Sucia) lasting from 1976 to 1983, in which thousands of men and women ‘disappeared’ (being thrown out of an aeroplane into the sea was reportedly common). Supporters of the dictatorship adopted many of the babies of those who were ‘disappeared’. Following the return of democratic government, it was possible to unite grandmothers with their grandchildren by using a combination of mtDNA and STR analysis (Penchaszadeh 1997).

Human mtDNA is a circular DNA molecule that contains 16 569 base pairs that code for 37 genes, that in turn code for the synthesis of 2 ribosomal RNAs, 22 transfer RNAs, and 13 proteins. Unlike nuclear DNA, the mitochondrial genome is extremely compact and about 93% of the DNA represents coding sequences. The remaining, non‐coding region is called the control region or displacement loop (D‐loop). The D‐loop region consists of about 1100 base pairs and exhibits a higher mutation rate than the coding region and about 5–10 times the rate of mutation within nuclear DNA. The mutations occur as substitutions, in which one nucleotide is replaced by another: the length of the loop region is not changed. The mutations result from mtDNA being exposed to high levels of mutagenic free oxygen radicals that are generated during the mitochondrion's energy generating oxidative phosphorylation process. The substitutions persist because mtDNA lacks the DNA repair mechanisms that are found in nuclear DNA. The mutations result in sequence differences between even closely related individuals and make analyses of the D‐loop region an effective means of identification. Because the mtDNA is inherited only from the mother, it also allows tracing of a direct genetic line. Furthermore, unlike the inheritance of nuclear DNA, there are no complications owing to recombination. Different mitochondrial lineages, called haplogroups, can be used to identify the course of human migrations. For example, in Europe there are currently said to be 10–12 major haplogroups. A phylogenetic tree of human mtDNA (PhyloTree) is available at www.phylotree.org that provides sequence information on the various human mitochondrial haplogroups. In 2015, it listed 5437 haplogroups (van Oven 2015).

Unfortunately, autosomal SNP analysis and haplogroup analysis often provide different predictions of ancestry from the same sample. Consequently, some workers consider mtDNA haplogroup analysis to be of limited use in forensic science as an indicator of ethnicity (Emery et al. 2015). Nevertheless, the advent of NGS means that it is now possible to sequence the whole mitochondrial genome and the extra information may increase the usefulness of the mtDNA analysis in forensic science. For example, Sturk‐Andreaggi et al. (2017) describe how an analysis tool called AQME (AFDIL‐QIAGEN mtDNA Expert) can identify haplogroups from NGS profiles.

The D‐loop is divided into two regions, each consisting of about 610 base pairs, known as the hypervariable region 1 (HV1) and hypervariable region 2 (HV2). It is these two regions that are normally examined in mtDNA analysis by PCR amplification using specific primers designed to base pair to their ends. This is then followed by DNA sequence analysis. Because of the high rate of substitutions, analysis of just these short regions is sufficient to differentiate between closely related sequences. Mitochondrial DNA varies by about 1–2.3% between unrelated individuals (Inman and Rudin 1997). Although mtDNA sequencing does not have the discriminating power of STR DNA profiling, it can prove effective where STR DNA analysis fails.

The mtDNA sequence of all the mitochondria in any one individual is usually identical – this is referred to as ‘homoplasmy’. However, in some people, differences in base sequences occur at one or more locations (Figure 3.6). These differences arise from an individual containing two or more genetically distinct types of mitochondria. This condition is known as ‘heteroplasmy’ and it can have a significant impact in forensic investigations (Lo et al. 2005). Heteroplasmy was once considered rare, but it is now believed to occur in 10–20% of the population. To make matters worse, heteroplasmy is not expressed to the same extent in all the tissues of the body. For example, two hairs from an individual might have different proportions of the base pairs contributing to the heteroplasmy and this might result in exclusion rather than a match (Linch et al. 2001). This is because heteroplasmy may result from the high mutation rate or from either inheritance at the germ line level or the level of somatic cell mitosis and mtDNA replication.

image

Figure 3.6 Diagrammatic representation of (a) heteroplasmy and (b) homoplasmy at position 16093. In (a), the nucleotides cytosine (C) and thymine (T) are present at position 16093, whilst in (b), only thymine is found.

Source: Reproduced from Butler (2005), © 2005 Elsevier, with permission.

Mitochondrial DNA analysis is often used where the sample does not contain much nuclear DNA. Because there are numerous mitochondria in a single cell and each mitochondrion contains multiple copies of the mitochondrial genome, it is possible to extract far more mtDNA than nuclear DNA. Epithelial cells, which are the commonest cell type used in forensic casework, contain an average of 5000 molecules of mtDNA (Bogenhagen and Clayton 1974). Mitochondrial DNA analysis does, however, suffer from a number of problems. For example, all maternally related individuals are likely to have the same mtDNA sequences, so the discriminating powers are limited compared to autosomal STR analysis. Heteroplasmy is either a problem or a useful trait, depending on the circumstances. It creates problems because a mixed sequence is typical of where more than one individual contributed to the DNA profile. A difference of only one base pair between the mtDNA profile of the sample and the suspect is considered insufficient to prove either a match or exclusion, whilst a difference in two or more base pairs is grounds for exclusion. By contrast, heteroplasmy provides an identifying characteristic where the suspect expresses the same heteroplasmy characteristics as the sample.

Mitochondrial DNA analysis detects differences in sequences and is therefore more time‐consuming and costly than STR analysis that detects differences in lengths. In addition, the rarity of mtDNA sequences must be determined by empirical studies and the results are less statistically reliable. Finally, owing to the high copy number per cell, there is a high risk of contamination and cross‐contamination associated with mtDNA sequencing.

3.4.4 RNA

Ribonucleic acid (RNA) is chemically similar to DNA, but is a single stranded molecule, has the sugar ribose (as opposed to deoxyribose) in its backbone, and incorporates the base uracil (as opposed to thymine). Humans, like all eukaryotic organisms, contain four types of RNA, each of which is produced in a process called transcription from information coded for in DNA. Each type of RNA has a specific function:

Messenger RNA (mRNA) carries the information that is necessary to make a particular protein from a gene on the DNA molecule to ribosomal RNA.

Ribosomal RNA (rRNA), as its name suggests, is found in the organelles known as ribosomes that are located in the cytoplasm. During translation, rRNA binds to transfer RNA and catalyses the formation of peptide bonds between amino acids in order to make proteins.

Transfer RNA (tRNA) molecules transfer specific amino acids to rRNA during protein synthesis and place them in the correct orientation on the mRNA.

MicroRNA (miRNA) molecules are small and consist of about 22 nucleotides. They do not code for anything and their function is to silence mRNA and regulate gene expression after transcription has taken place. They therefore reduce or stop the production of specific proteins.

The co‐extraction and analysis of DNA and RNA presents logistical problems. There are several commercially available kits for the extraction of RNA from forensic samples, but they all have their individual strengths and weaknesses (Grabmüller et al. 2015). Although of fundamental significance to normal cell function, RNA is not commonly employed as a forensic indicator. This is partly because RNA is less stable than DNA and it is rapidly broken down after death by a combination of autolysis and microbial decomposition. The rate of RNA degradation depends partly upon its location and it is possible that the extent of the degradation could be used as an indication of the PMI or the age of a bloodstain (Kim et al. 2017; Lech et al. 2016). So far, this work remains at the preliminary stages.

Because mRNA and miRNA are cell‐specific, they indicate sex (van den Berge and Sijen 2017) and the provenance of body fluids (van den Berge et al. 2016b). The small size of miRNA molecules means that they are less vulnerable to degradation and therefore have potential as forensic indicators (Sauer et al. 2017; Sirker et al. 2017). The identification of body fluids is relevant where the investigator needs to distinguish between menstrual blood and that shed from a wound. This arises in cases of alleged sexual abuse/rape and bloodstains are found on clothing or bedding. These cases are especially problematic when the alleged abuse occurs within families, since the presence of a father's DNA on a daughter's duvet or underwear has little significance. Although all cells contain the same DNA, they produce different proteins, depending upon their function and therefore express different mRNA profiles. For menstrual blood, the enzymes matrix metalloproteinase 7 and matrix metalloproteinase 10 are effective markers (Juusola and Ballantyne 2007). These enzymes are expressed during menstruation when they are responsible for the breakdown of components of the extracellular matrix. This breakdown occurs as part of normal endometrial remodelling and the enzymes are absent during the early and mid‐secretory phases. It is worth remembering that individual differences and health factors can affect tests such as these, so any results should always be treated with care. For example, matrix metalloproteinases, including matrix metalloproteinase 7, are over‐expressed in certain cancer cells.

3.4.5 Single Nucleotide Polymorphism Markers

Single Nucleotide Polymorphism (SNPs) arise from differences in a single base unit (Figure 3.7) and are the commonest form of genetic variation. They occur throughout the genome, including the X and Y sex chromosomes and mitochondrial DNA (Mehta et al. 2017). We all have our own distinctive pattern of SNPs and this, therefore, provides a means of identification. The stability of SNPs, compared to STRs, means that they are less likely to be lost between generations and they are sometimes used in paternity cases.

image

Figure 3.7 Diagrammatic representation of a single nucleotide polymorphism.

Mini‐sequencing enables the base at a given SNP to be determined. Once the bases at several sites at different loci are known, one can produce a profile similar to that of an STR profile. Using allele frequencies for each SNP, the likelihood of two persons sharing the same SNP profile can be estimated. Because the maximum number of alleles at each site is only 4 (A, C, G, or T), 50–100 SNPs must be examined to achieve the same discriminatory power as STR‐based profiling. However, microarray hybridisation allows numerous SNP loci to be examined simultaneously, and this speeds up the process. NGS technology offers the potential to identify simultaneously over 100 SNP loci and numerous STR loci in a single sequencing step. This will undoubtedly lead to a greater use of SNP analyses in criminal investigations. For example, SNaPshot® assays provide evidence personal appearance and ethnic origin can be conducted using NGS (see later).

Because SNP analysis requires minute quantities of sample and the segment size can be even smaller than that needed for STR analysis, the technique provides information, even when the DNA is severely degraded. However, its effectiveness is compromised in mixed DNA samples, because it is difficult to distinguish which SNP belonged to which person. Furthermore, a quantitative test would be required in this context and this is not possible with some of the current SNP assays. There would be less of a problem if the mixture were composed of DNA from a single male and a single female, because Y‐linked SNPs could only originate from the male. However, if more than one male contributed to the DNA sample or the sexes were the same (e.g. male rape), their separation is difficult.

3.4.6 Mobile Element Insertion Polymorphisms

Mobile elements are DNA sequences that can change their position within the genome. Several different types of mobile element exist, but it is the short interspersed elements (SINEs) and in particular the Alu elements, that are most commonly used in forensic analyses. Alu elements are about 300 nucleotides long and most are fixed at a particular locus. However, a few subfamilies are polymorphic for insertion presence/absence and can be used to determine genetic relationships between populations and paternity. For example, the Innotyper® 21 genotyping kit utilises 20 Alu markers and the amelogenin marker. This kit has been used to distinguish ethnicity from among various ethnic groups in South Africa (Ristow et al. 2017) and South America (Moura‐Neto et al. 2017). Mobile elements are also employed in the Innoquant® kit that is used to determine the quantity and quality of the DNA in a forensic sample (Pineda et al. 2014). This enables samples to be screened before they are subjected to STR analysis to identify those that are unlikely to yield a profile. The kit utilises two differently sized elements: the Yb8 Alu element that is 80 bp and the intra SVA element that is 207 bp. There are numerous copies of both elements within the genome and therefore they are readily detected. The amount of Yb8 Alu is used to quantify the amount of DNA present, the amount of intra SVA provides a measure of amplifiable DNA, and the ratio of the two provides an indication of the extent to which the DNA has degraded.

3.5 DNA Databases

The UK National DNA Database (NDNAD) (Table 3.7) was established in April 1995, following a recommendation from the Royal Commission on Criminal Justice in 1993. Scotland and Northern Ireland have their own DNA databases, but export profiles to the NDNAD. The National DNA Database Strategy Board, which contains representatives from a wide range of ‘interested parties’, provides oversight and governance of how the NDNAD operates. The Board includes ‘representatives of the National Police Chief's Council, the Home Office, the DNA Ethics Group, the Association of Police and Crime Commissioners, the Forensic Science Regulator, the Information Commissioner's Office, the Biometrics Commissioner, representatives from the police and devolved administrations of Scotland and Northern Ireland and such other members as might be invited’. The intention is to ensure that the information on the DNA database is kept secure and not abused.

Table 3.7 Information stored on individuals on the NDNAD.

  1. Name, gender, date of birth, and ethnic appearance as described by the arresting officer – this latter information can be of dubious authenticity.
  2. Type of DNA sample used (e.g. mouth swab, blood, hair, semen)
  3. Type of DNA test employed (e.g. DNA‐17, STRplus™)
  4. DNA profile: in the case of STR analysis, this would consist of a string two digit numbers and the amelogenin sex indicator
  5. Data on the police force that collected the sample
  6. Arrest summons number: this provides a link to the Police National Computer which is stores criminal record and police intelligence information
  7. A unique bar code that identifies the record and provides a link to the stored DNA sample

The NDNAD was the first national DNA database to be established. Since then, over 60 countries have brought in their own DNA databases. All of these databases utilise STR profiles, although there is some variation in the nature and number of loci employed.

Different countries have their own rules concerning the collection and storage of DNA and fingerprint evidence. In the UK, the rules were substantially changed following the introduction of the Protection of Freedoms Act 2012: DNA and fingerprint provisions. Currently, a DNA sample collected from a suspect or a victim as part of an investigation (e.g. a saliva sample) can be stored for only six months, after which it must be destroyed. This allows time for the DNA to be analysed and the sequence uploaded onto the NDNAD, but prevents samples being databanked. Evidence collected from a crime scene, such as a knife stained with blood, can be retained indefinitely. This has proved extremely useful in cold case reviews. For example, in 1982, Yiannoulla Yianni was raped and murdered in her home in Camden, North London, but the investigation was unable to make any progress. The breakthrough came in December 2015, when James Warnock was charged with distributing indecent images of children from his computer and consequently a routine DNA sample was taken from him and the profile added to the NDNAD. It was then automatically checked against profiles obtained from criminal investigations. DNA evidence did not form part of the original investigation, because the technology for identifying individuals was not yet developed. However, semen samples had been retained from stained bedsheets and an STR profile was subsequently made as part of a cold case review. However, it had to wait until Warnock's DNA was added to the NDNAD for the link to be made. In 2016, 34 years after the offence, he was brought to trial, found guilty, and jailed for life.

The retention times of DNA profiles and fingerprints are the same and depend upon the age of the person and whether they are charged but not convicted or charged and convicted of a recordable offence. Recordable offences are so‐named because they are those for which the police are required to keep a record. Overall, they are serious offences that could result in a custodial sentence being imposed if found guilty. However, the list of recordable offences also includes those that do not carry a prison sentence, such as begging. Profiles and prints can be retained indefinitely if a person of any age is found guilty of a recordable offence, regardless of whether they are sent to prison or given a caution. If an adult is convicted of a minor recordable offence, then his/her profile and prints can be retained indefinitely. For a person aged under 18 years of age, convicted of a minor recordable offence, the retention period depends upon whether it is a first or second offence. For a first offender, the retention period is five years, but for a repeat offender, it is indefinite. If a person is not convicted, the retention period depends upon the nature of the charge and varies from a maximum of five years (three years plus a two‐year extension if granted by a District Judge) to being removed from the database. Profiles and prints collected as part of a speculative search (e.g. all the adult males living at an address) can only be retained if they provide a match to a criminal offence. In this case, their retention depends upon the nature of the crime and whether a conviction is obtained.

Removing DNA profiles from a DNA database is far more difficult than adding them. Despite the legislation to facilitate the removal of DNA profiles of innocent people and those charged with minor offences and highly unlikely to re‐offend, in 2016 the NDNAD contained the profiles of 12.5% of all men and 3% of all women in the UK. By contrast, the law enforcement agencies of some countries actively attempt to obtain DNA profiles of as many people as possible, regardless of whether they engaged in criminal activity. For example, on 2 July 2015, Kuwait's National Assembly passed Law number 78/2015 that requires all citizens, residents, and visitors to provide a DNA sample for inclusion in the police DNA database. This approach to populating DNA databases is often justified on the grounds that ‘if you haven't done anything wrong, you have nothing to fear’. This indicates a rather naïve understanding of the justice system, since innocent people are sometimes convicted of crimes they did not commit. Similarly, being the subject of a police investigation because your DNA profile is similar to the partial profile recovered from, say, a terrorist bomb, is an extremely stressful experience and can lead to family breakdown and financial loss, even if never charged. In addition, what counts as ‘wrong’ varies with time and between societies. We cannot predict how attitudes will change with time. For example, in the UK it is not long ago that homosexuality was a criminal offence, it was virtually impossible for a wife to prosecute her husband for rape, foxhunting was a popular country pursuit, a child could be caned in school for misbehaviour, and smoking was permitted on aeroplanes.

People opposed to DNA databases argue that the information could be used to identify personal characteristics such as risk of contracting cancer or a degenerative disease. These could then be made available to third parties, such as employers or insurance companies. At present, DNA profiles based on STR loci consist of a string of numbers that cannot provide meaningful information on personal characteristics. However, the adoption of NGS technology could lead to a person's whole genome, or a substantial proportion of it, being stored. This would then be available for future analysis. It is perfectly feasible that within a few years it will be possible to determine a great deal about an individual's appearance, character, and susceptibility to mental illness and disease from a DNA sample.

3.5.1 Who Should Have Access to the Data Stored on a DNA Database?

Access to information on a DNA database would be of benefit to many official national and international organisations. At one level, it could be argued that making the information available in any situation in which a crime was committed or could be committed in the future should be facilitated. For example, people trafficking, drug smuggling, and international terrorism could all benefit from greater cooperation between countries and security organisations. However, the more people who have access to information, the greater the opportunity for both mistakes and misuse to occur. Currently, the number of people who have access to the NDNAD is carefully controlled but this cannot be guaranteed for the future. In particular, an obvious development aim is to design hand‐held or mobile devices that would enable police to generate DNA profiles from a crime scene or an individual and upload them onto the NDNAD for direct comparison. This would inevitably require more people to gain access and this in turn would facilitate the planting of evidence and the illegal downloading or tampering with stored information. For example, persons on witness protection schemes are sometimes provided with a new name and identity, but they cannot be provided with a new DNA profile.

3.6 Confounding Factors in DNA Analysis

For a successful criminal prosecution, it is often necessary to obtain reliable DNA evidence. It is therefore important to be aware of factors that limit its effectiveness. There are two basic problems to overcome: the first is to obtain a DNA profile and the second is to be sure that if a profile is obtained, it is relevant to the investigations. Although it is now possible to obtain DNA profiles from extremely small amounts of DNA, if the sample is severely degraded, it may yield limited information or none at all. The presence of inhibitors can also interfere with the extraction and sequencing of DNA, although recent advances have made this less of a problem than it was in the past. By contrast, the sensitivity of current DNA sequencing technologies means that distinguishing relevant and non‐relevant profiles resulting from contamination is becoming a serious issue.

3.6.1 DNA Degradation

After we die, the DNA within our tissues and body fluids starts to decay, unless these are rapidly dried or frozen. The speed of DNA degradation depends upon the taphonomic conditions, but currently it is impossible to estimate accurately the PMI from DNA analysis. In addition to natural decay, exposure to high temperatures and chemical treatment compromises the extraction of DNA. For example, whilst it is sometimes possible to obtain DNA from improvised explosive devices after detonation or from used bullet casings, the profiles are often incomplete. Similarly, an item covered in bleach or thrown into a river may not yield DNA. The degradation of DNA proceeds uniformly across the genome and there are no regions that are noticeably more resistant to degradation than others (Hanssen et al. 2017).

Evidence of DNA degradation is often exhibited by a progressive decline in peak height with increasing sequence length. This is because longer sequences are more vulnerable to the effects of degradation. If the profile contains DNA from more than one individual, this problem can be exacerbated, because the two (or more) DNA samples may not degrade at the same rate or in an identical manner. If one of a pair of alleles fails to be recorded, this is referred to as ‘allele dropout’. Consequently, a heterozygous individual may appear homozygous at one or more gene loci. Allele dropouts are often a feature of low levels of DNA (100–125 pg DNA) in the sample. Similarly, an additional allele may be observed – this is referred to as ‘allele drop‐in’. Allele drop‐in results from contamination and becomes obvious when the allele does not appear in repeated independent PCR reactions. There is an increased chance of allele drop‐ins occurring when the number of PCR cycles is increased.

3.6.2 DNA Transfer

If a DNA profile is the main or sole source of incriminating evidence, then it will be subject to robust questioning during court proceedings. When that profile is obtained from low template DNA, then the intensity of the questioning is likely to increase. This is because our DNA can be transferred by both animate and inanimate means onto objects we never physically came into contact with (Meakin and Jamieson 2013).

There are two basic mechanisms by which DNA is transferred: ‘direct’ and ‘indirect’. Direct transfer relates to DNA that is deposited on an object when we speak, cough, or sneeze in its vicinity. That is, the transfer involves only the donor and the object receiving the DNA. Indirect transfer relates to DNA that is transferred from the donor onto an object through the action of an intermediary. For example, person A shakes the hand of person B and person B then unintentionally transfers person A's DNA onto a knife that he uses to kill someone. Person A's DNA is then detected on the knife and he is implicated in the crime. Needless to say, there are numerous ways in which DNA can become accidentally or intentionally transferred. For example, DNA can be transferred from blood or semen stained clothing onto previously ‘clean’ clothing when they are machine or hand washed together (Voskoboinik et al. 2017). This needs to be considered when investigating criminal offences that take place within a family setting or people who live together. It is even possible for DNA to transfer from the outside of a police evidence bag to the exhibit inside when it is examined (Fonneløp et al. 2016).

There is uncertainty concerning the extent to which people differ in their propensity to shed epithelial DNA from their hands (Taylor et al. 2016). In addition, the amount of DNA we transfer onto objects we touch is affected by how frequently we wash our hands and our commitment to personal hygiene – such as our propensity to pick our nose, wiggle a finger in an ear, lick our fingers after eating, and other even less savoury practices. It will therefore probably not come as a surprise to some that men tend to leave more cellular DNA on the objects they touch than women (Lacerenza et al. 2016). The amount of touch DNA recovered will also depend upon the physical and chemical nature of the object we touch and the environmental conditions and length of time between deposition and sampling.

As a general rule, the dominant DNA profile recovered from an object is probably going to be the one that is most important. However, as always in forensic science, it is essential to keep an open mind. For example, several studies demonstrate that the last person to handle an object may not be the one to yield the dominant DNA profile (e.g. van den Berge et al. 2016a). However, there are contradictory reports concerning how frequently this occurs in real‐life scenarios. In simulation studies, Pfeifer and Wiegand (2017) found that when two people used a range of hand‐held tools, one after the other, the first user's DNA was either not detected or only found as a minor component. However, if the second user wore gloves, then the first user's DNA was found in 37% of tests. Gloves are a good way of transferring DNA between objects. For this reason, crime scene investigators should change gloves between handling objects. Similarly, DNA can be transferred via fingerprint brushes and therefore these should not be used more than once if it is intended to extract DNA from the revealed prints.

There are numerous studies on the potential for DNA transfer under a range of conditions (e.g. Fonneløp et al. 2017). However, in an understandable attempt to control the number of variables, many of them use clean implements or clothing. In real life, firearms, tools, phones, knives, clothing, etc. are seldom clean. In addition, numerous people often handle them before the person who committed the crime picked them up. This not only complicates the analysis, but could also lead to arrest of someone with a criminal record (or just on a DNA database), but who had nothing to do with the crime scene. Alternatively, the person responsible for a crime could argue that the low levels of their DNA recovered from a crime scene resulted from contamination or were planted by the police or somebody wishing to frame them. Taylor et al. (2017) provide a statistical approach to identifying transfer DNA.

3.7 Evidence from Molecular Markers

3.7.1 DNA Profiling

The process of sequencing of an individual's DNA is known as ‘DNA profiling’ or ‘DNA typing’. Human DNA profiles are stored on computer databases – in the UK, this is the NDNAD, whilst in America it is the Combined DNA Index System (CODIS). Although the term ‘DNA fingerprinting’ is sometimes used, it is not really appropriate. This is because the courts and the scientific community accept fingerprints as unique identifying features. For example, identical twins do not have the same fingerprints, even though they share the same DNA profile. Consequently, experts presenting DNA based evidence in court, talk about probabilities of a match between two samples rather than stating ‘yes, they match’ or ‘no, they do not match’. The possibility that ‘an evil twin brother’ committed the crime does not belong solely in the realms of fiction. Several cases have arisen in the UK, America, and elsewhere, in which DNA recovered from a crime scene could have been derived from either of two brothers. Often, these can be resolved by other evidence, such as fingerprints or one of the twins being in prison at the time of the offence. However, in the absence of such evidence and if the twins fail to cooperate with the police, it is an extremely difficult situation for the prosecution to resolve. It also indicates that DNA databases need to be updated so as to keep track of where twin matches could occur. The number of women who give birth to twins (or more) is increasing, although there are notable variations between countries and racial groups. Part of the increase is a consequence of in vitro fertilisation (IVF): about 1% of women conceive twins (or more) naturally, whilst for women who have IVF this figure increases to 24%. In the UK, 10 500–11 000 women per annum have multiple births. About one‐third of twins are identical (i.e. homozygous), and this means that a large number of people in the population would be judged to have very similar (fraternal twins) or identical (identical twins) DNA profiles using current STR markers. In 2016, the NDNAD contained the profiles of over 8000 identical twins and 10 identical triplets.

If the DNA profile of a suspect differs significantly from that retrieved from the evidence, it is said to be an ‘exclusion’. This is a useful result because, in conjunction with other evidence, it enables police to rule a suspect out of their enquiries or a convicted person to prove his or her innocence. If the interpretation of the DNA profile presents problems, such as owing to contamination or DNA degradation, the results are said to be ‘inconclusive’. This means that further evidence must be sought before a suspect can be either excluded or confirmed as being relevant to the enquiries. If the profiles are deemed to match, then the suspect's profile is called ‘an inclusion’ and the significance of the match has to be calculated by quantifying the ‘random match probability (Pm)’. This is a statistical test that estimates the chance of two unrelated people sharing the same DNA profile. Where the markers are not linked (e.g. autosomal STRs), and are therefore inherited independently, the Pm can be estimated by multiplying the individual allele frequencies in a sample population. Consequently, the more loci that are included in the analysis and the greater the heterozygosity of each of these loci, then the smaller will be the value of the Pm and the greater will be the probability that the two profiles originated from the same person. Although the procedure for comparing profiles is based upon statistical analyses, Jeanguenat et al. (2017) argue that there remains an element of subjectivity and therefore the potential for bias which needs to be taken account of in the decision‐making process. Similarly, it is preferable that those who undertake the DNA sequencing that generates the profiles are not informed of the case that is being investigated.

Despite the high discriminating power of DNA‐based evidence, in the absence of other corroborating evidence, it is unlikely to be accepted by a court as proof that a person was (or was not) responsible for a particular crime. The Pm value can be compromised by several factors that need to be taken into account. First, if the DNA has begun to degrade, it may not provide a full profile. Second, both the victim of a crime and the suspect may be closely related (many serious crimes are committed by people related to their victim) and therefore share several alleles by descent. Similarly, they may both originate from the same sub‐population, some of which are characterised by high levels of inter‐marriage. Discriminating between the remains of closely related people who die in natural or manmade disasters can sometimes be problematic for the same reasons.

3.7.2 The Prosecutor's Fallacy and the Defence Attorney's Fallacy

A fallacy occurs when one's reasoning relies upon a misunderstanding or misrepresentation of the facts and this therefore results in the argument one is presenting becoming invalid. The terms ‘prosecutor's fallacy’ and ‘defence attorney's fallacy’ were originally coined by Thompson and Schumann (1987) and relate to the inappropriate presentation of statistical evidence in court proceedings. The terms are often used in American literature and although they are relevant to the presentation of any type of evidence in which numerical data are involved, they are especially appropriate to the presentation of molecular evidence. The prosecutor's fallacy occurs when the prosecution attempts to prove a suspect's guilt solely on the basis of associative evidence. Associative evidence is that which involves matching material from two or more sources. For example, it might be fibres retrieved from a suspect and those found on his victim, DNA profiles from a semen sample and a suspect or soil samples found on a shoe, and those found at a crime scene. For example, let us assume that a match is found between the DNA profile of a suspect and that recovered from a bloodstain found on a windowsill following a burglary. It would be incorrect for the prosecution to claim that ‘the DNA profile recovered from the windowsill indicates there is only a 1 in 10 million chance that the suspect is innocent’. This is wrong, because the prosecutor is assuming the guilt of the suspect and ignoring all other evidence. For example, the suspect might have been in hospital undergoing heart surgery at the time of the offence. Even assuming that the suspect had the opportunity to commit the offence, the prosecutor is assuming that because the probability of finding matching DNA profiles if the suspect is innocent is small, then it follows that the probability of the suspect being innocent is also small.

The defence attorney's fallacy is the mirror image of the prosecutor's fallacy and assumes that associative evidence is not relevant. For example, it arises when the defence argues that if the frequency of a particular DNA profile is one‐in‐a‐million and the UK has a population of 60 million, therefore there are at least 59 other people than the suspect who might have committed the crime. This assumes that all 60 had an equal likelihood of leaving the DNA profile found at the crime scene. This is clearly impossible because of those 59, some will be babies, others might be disabled or in nursing homes, and most would be many miles from the crime scene.

Current STR DNA profiling techniques result in extremely low Pm values and therefore the statistical probability that a person chosen at random might have a particular profile becomes vanishingly small. This makes the prosecutor's fallacy an extremely easy trap to fall into.

3.7.3 Forensic Applications of DNA Profiling

DNA profiling to identify assailants in serious crimes such as rape and homicide gains the most attention in the media. However, it is routinely used in minor crimes such as burglary and car theft. Indeed, one of the principal purposes of the setting up of the NDNAD was to facilitate the solving of so‐called volume crimes. DNA profiling is also used for paternity testing and in particular it is sometimes used to establish family relationships among people claiming visas and/or asylum – although in the latter instance, its use needs to be done with extreme care (e.g. Karlsson and Holmlund 2007). DNA profiling is used in insurance company fraud investigations, where there is a dispute over who the driver was in a traffic accident. For example, when a car is in an accident and the airbags are activated, they deliver a powerful blow. This, coupled with the forces from the accident, means that the bags become contaminated with saliva, nasal mucus, vomit and, not infrequently, blood. Consequently, DNA profiles obtained from the bags can demonstrate who was sat where at the time of the accident. Similarly, where there are claims that food was contaminated with blood or other body fluids, it is not unusual for these to prove to have originated from the complainant. The disaffected employee responsible for intentionally contaminating food with spit, urine, and other body fluids during processing in the factory or kitchen, can also be identified by DNA profiling.

DNA profiling is not restricted to solving crimes involving human victims and it is used in investigations ranging from badger baiting to bioterrorism (Table 3.8). For example, Lorenzini (2005) relate an interesting case in which it was used to solve a case of wildlife poaching. A poacher had snared a wild boar in an Italian National Park and then killed it with a knife, leaving the corpse under a bush so that he could collect it after nightfall. Whilst he was away, conservation officers found the body and attempted to arrest the poacher when he returned. However, the poacher claimed that the boar was already dead when he found it and they had to let him go – though they confiscated the boar's body. A post‐mortem on the boar indicated that the knife wound was inflicted when the animal was still alive and a search of the poacher's home yielded a bloodstained homemade knife. DNA analysis indicated that the blood originated from a wild boar rather than a domestic pig and that the DNA profile matched that of the dead boar. As a result, the suspect was successfully prosecuted for animal cruelty and poaching.

Table 3.8 The application of DNA sequencing to forensic investigations.

Information obtained from DNA analysis Forensic application
Personal Identity Unidentified body, homicide, theft, sexual assault
Paternity/ maternity Disputed parenthood
Identification of plant and animal species Illegal wildlife trade, fraud
Identification of individual animals and plants Linking animal/plant to a person, dognapping
Genealogy of animals and plants Mislabelling/fraud (e.g. passing wild caught animals as captive born). Disputed pedigree
Identification of archaea, bacteria, and viruses Bioterrorism, food contamination
Cause of death Sudden unexplained death syndrome (SUDS)

The popularity of TV dramas and documentaries featuring forensic science has led to concerns that the public will develop unreal expectations of forensic investigations (Cino 2017). In particular, in the real world, investigations are seldom quick and they are also fallible. Furthermore, if the DNA is badly degraded, it may be impossible to obtain a full profile, or any profile at all. There is also a risk that so much faith is placed in DNA evidence that its absence (or presence) could lead both investigators and juries to discount other sources of evidence. In addition, there are concerns that criminals are becoming more careful at avoiding leaving DNA evidence.

3.7.4 Familial Searching

Despite the increasing size of DNA databases, many DNA profiles recovered from crime scenes fail to yield a match. However, people who are genetically related share similar DNA profiles and the degree of similarity is a reflection of the closeness of relatedness. Therefore, by narrowing the search for similarity from ‘compared to all records on the database’ to ‘compared to a particular subset based on profile characteristics, ethnicity, and residence’, it is sometimes possible to identify a family link. This is based on the following broad social assumptions:

  1. Most crimes, whether they are petty burglary or murder, are committed by people who live locally and/or know the area. Therefore, if a match does not occur on the DNA database, the search focuses on those of the right age, etc. living nearby.
  2. Criminal behaviour often runs in families and a child raised among people who commit crimes is more likely to offend than one who was raised by a non‐offending family. Therefore, even though person X is not on the DNA database, their brother might be.
  3. The majorities of both offenders and victims of crime are people who live in poor socio‐economic conditions. These are also people who tend to remain close to the place where they were born. For example, many criminal gangs are highly territorial, their ‘patch’ being based on where they grew up and some of their members may be related.

Needless to say, these are sweeping generalisations, but in the absence of other leads, they provide an initial line of enquiry. Once a potential family link is established, those members not on the DNA database can be requested to donate DNA and this may lead to the identification of the culprit. Inevitably, this approach is controversial, because families/communities who have little trust in the police feel that they are being unfairly picked on or it is a subterfuge to get their DNA profiles onto the DNA database. It can also cause serious family problems through revealing instances of previously unknown paternity or non‐paternity and who is on the DNA database. The alternative of requesting DNA samples from everyone living in the vicinity of a crime is even more controversial, as well as being a large, costly, and time‐consuming exercise.


3.7.5 Determination of Ethnicity

Determining ethnicity is useful when attempting to identify a skeletonised body or building up the profile of a suspect whose DNA is not on a database. For example, in Europe, if we know that a body belongs to a person of Asian heritage, then we can focus our attentions on a smaller group of people. However, this work is controversial because of fears that it could be used for racial profiling and the targeting of specific groups of society. All ethnic groups share the same alleles used in DNA‐17 and CODIS profiles, although in some groups certain alleles are more or less frequent than in others. Therefore, although one can make tentative inferences about a person's ancestry from their STR profile, it is not an especially reliable technique for this purpose. By contrast, SNPs and to a lesser extent mitochondrial DNA are better ‘ancestry informative markers’. For example, various combinations of SNPs (often referred to as ‘panels of SNPs’) can be used to infer a person's geographical or ethnic origin (Bulbul et al. 2016; Kidd et al. 2014). However, there is a need for more comprehensive databases and more discriminatory SNP loci (Soundararajan et al. 2016). According to Phillips et al. (2016), the results obtained using currently available SNP panels need to be treated with caution.

Several commercial organisations offer ‘ancestry testing’. One submits a saliva swab and they provide you with report stating that you are, for example, 40% European, 20% African, and 20% Asian. Some offer greater specificity and, for example, state which country in Africa or Europe contributed to those elements. Many companies do not state which markers or algorithms they use and the same sample submitted to different companies can give different ancestries. The reporting of ancestry as percentage composition is of dubious statistical worth and fosters the dangerous concept that there is such a thing as 100% racial purity. Although an SNP or mtDNA haplotype might be referred to as being an ‘ancestor informative marker’, it does not automatically mean that two people expressing that marker share a direct ancestor. For example, there have been claims that various famous people, including Eva Braun (Adolf Hitler's girlfriend), must have been Jewish, because they expressed the mtDNA N1b1 haplogroup. Whilst this haplogroup is commonly expressed in Ashkenazi Jews, it is not unique to them. Its absence does not mean that a person is not Jewish, nor its presence indicate that they are Jewish. Of course, if we trace our family tree back through the generations, we have an ever increasing number of ancestors: go back 3 generations and we have 8 ancestors, 4 generations and we have 16 ancestors, and by 7 generations, we have 128 ancestors. Assuming that a generation is 25 years, then that means 200 years ago we would have had 256 ancestors (8 generations). The question therefore rapidly becomes which ancestor do you wish to choose to define your ancestry?

The ability to predict any characteristic from DNA sequences depends upon the reliability of the markers and the database used. Therefore, the absence of certain markers from a person's DNA profile does not necessarily mean that they cannot have had Viking or Hausa ancestors. Instead, it means that they do not have the markers that are being used to determine them.

3.7.6 Determination of Physical Appearance

Being able to narrow down the list of suspects is a crucial part of any criminal investigation. Eyewitness statements are valuable, but they may not be available and are notoriously unreliable. If a full STR profile is obtained from a crime scene that proves a perfect match to that of a named individual on a DNA database, then the police know exactly who they are looking for. However, if there is no match, then the only information is whether the suspect is a man or a woman. Our physical characteristics, sometimes referred to as our phenotypic expression, (e.g. skin and hair colour, iris pigmentation, height, and shape of the jaw), result from the interaction of many genes with one another and by age, environment, disease, and diet. Kayser (2015) provides a detailed review of how DNA analysis can determine phenotypic expression.

Many forensic studies on phenotypic expression focus on pigmentation, although this is a complex phenomenon and influenced by over 120 genes that interact with one another and have varying effects. The melanocortin 1 receptor (MC1R) gene exhibits considerable variability among Caucasian individuals and has been studied in detail. MC1R resides on the surface of melanocytes and controls their production of the pigments eumelanin (black‐brown) and pheomelanin (red‐yellow). People with red hair express MC1R polymorphisms that reduce the ability of the melanocytes to synthesise eumelanin/overproduce pheomelanin. Consequently, the melanocytes produce mainly pheomelanin and the individual tends to have red hair and pale, freckled skin. The prediction of physical appearance is mostly based on SNP analysis followed by statistical prediction modelling. For example, 36 SNPs are currently used to predict skin pigmentation (Walsh et al. 2017). Similarly, HIrisPlex analysis predicts the probabilities of eye and hair colour in a single multiplex assay. Six of the SNPs in the assay predict blue and brown eye colour with an accuracy of more than 94%, whilst 22 of the SNPs predict whether hair is a light or dark shade of black, blond, brown, or red (Walsh et al. 2014). This information helps with the identification of skeletonised remains or suspects whose DNA is not on a database. However, a suspect can reduce the likelihood of recognition by colouring their hair or putting on a wig and wearing coloured contact lenses to disguise their eye colour.

Traits such as height and weight are particularly recalcitrant to prediction, owing to complicating environmental factors, such as diet. For example, between 1957 and 1977, the average height Japanese men increased by 4.3 cm and that of women by 2.7 cm. Similar increases are seen in other populations that have experienced improvements in diet and health care provision. To be of practical use, a predictive test needs to predict a character within a relatively narrow range. For example, predicting that the suspect was a male between 1.65 and 1.85 m tall and weighed 65–90 kg would simply indicate that he was within the range of most adult men in England and Wales.

The Parabon® Snapshot™ DNA Phenotyping System (‘Snapshot’) is a commercially available DNA analysis service that analyses a large number of SNPs to predict the colour of a person's skin, hair, and eyes, as well as their gender, freckling, face shape, and genetic ancestry. Because it is not currently possible to predict a person's age, weight, or height from DNA, the computer‐generated portrait assumes that they are 25 and have an average body mass index (i.e. 22). To date, most commentators consider the predictions based on colorations to be reliable, but less convinced by the predictions of face shape and genetic ancestry (Wolinsky 2015).

Lippert et al. (2017) combined whole genome NGS and 3D photographs of 1061 volunteers to develop a model to predict facial traits, skin colour, age, height, and weight. The study is highly controversial, with some commentators stating that the authors claim too much from a limited dataset (Reardon 2017). What is not in doubt is that as NGS develops, it will become easier to sequence simultaneously large numbers of SNPs and thereby gain a better understanding of a person's physical and physiological traits. This is generating serious concerns about the privacy of personal information (Toom et al. 2016).

3.7.7 Determination of Personality Traits

The factors determining whether a person develops criminal tendencies are exceedingly complex and beyond the scope of this book. However, certain genetic characteristics expressed in combination with life experiences can predispose a person to particular behaviours (Lo et al. 2017). For example, the presence of certain long interspersed element (LINE)‐1 insertions in the DNA of neuronal cells in the pre‐frontal cortex influences a person's likelihood of developing addiction to cocaine (Doyle et al. 2017). Similarly, men expressing a particular allele combination that results in low brain monoamine oxidase A (MAOA) activity and who are mistreated as children, are allegedly more likely to be violent as adolescents and/or adults (Bernet et al. 2007). MAOA breaks down monoamine neurotransmitters (e.g. serotonin, adrenalin, noradrenaline) at aminergic synapses. If the rate of breakdown slows down too much, then there is repetitive stimulation of the neurons. Within the brain, this results in excitation or suppression of particular nerve tracts and therefore changes in behaviour. There is a length of DNA alongside the gene that codes for MAOA, which determines how much of the enzyme is produced. One version of this region, called 3R, results in low levels of MAOA. Unfortunately, this research has been exploited by commercial companies offering to identify whether you have the so‐called ‘warrior gene’ – that is express the 3R variant – and it has even been used as part of the defence evidence in criminal trials in the USA (Buckholtz and Meyer‐Lindenberg 2013). It must be emphasised that it is the combination of genotype and life experience that make a particular behaviour more likely to occur. They do not mean that the behaviour will occur. The presentation of DNA constitution as a mitigating factor for criminal behaviour (e.g. ‘I was unable to exercise self‐control as a result of my genetic constitution; I therefore cannot be held entirely accountable for my actions’) in a court of law is highly controversial and it is uncertain how juries will react to such evidence (Berryessa 2017; Glenn and Raine 2014). For example, rather than acting as a mitigating factor, a jury may be more likely to convict if they perceive a person's genetic constitution means that he/she is a future threat who is likely to re‐offend. What is certain is that this area of research will play an important role in forensic science in the future.

3.7.8 Determination of Age

The determination of age from DNA evidence is not yet possible. Telomere length decreases with age, but it is not consistent enough to use as an age indicator (Márquez‐Ruiz et al. 2017). Telomeres are DNA sequences found at the tips of chromosomes, where they serve a protective function.

Age‐associated changes in DNA methylation occur at sites within certain genes and have potential as indicators of chronological age. Regression analysis demonstrates a good relationship between age and DNA methylation in blood and saliva, but it is not known if these are replicated in the same manner in DNA obtained from other body fluids or tissues (Hong et al. 2017; Vidaki et al. 2017). Further studies are required with larger datasets and it needs to be determined to what extent disease and nutrition affect DNA methylation. Preliminary studies indicate that the levels of DNA methylation in bloodstains remain relatively stable for several years if the blood is stored on tissue paper at room temperature (Zbieć‐Piekarska et al. 2015). However, it is currently uncertain whether DNA methylation status remains stable in bloodstains left under natural scenarios or within dead bodies.

3.8 Future Directions

New approaches to DNA analysis are constantly being developed. For example, various methods of DNA amplification, including PCR, will operate in droplet‐based microfluidic technologies (Ding and Choo 2017). Droplet digital PCR (ddPCR) offers several advantages over qPCR for detecting specific mutations in disease diagnosis. Similarly, Drop‐seq (droplet‐based sequence technology) is a fast and efficient way of barcoding RNA and analysing mRNA transcripts from single cells. These technologies are suitable for developing into portable devices and it is therefore likely that they will be adapted for forensic applications (Bruijns et al. 2016).

As our understanding of the human genome evolves and laboratory techniques are refined, we will undoubtedly extract more and more information from increasingly small amounts of DNA. Indeed, it is possible to obtain information from such small amounts of DNA that the problem of contamination and secondary transfer are serious practical issues. In addition, the expansion of DNA databases is raising ethical and legal issues that need addressing, if the public is to retain confidence in the authorities responsible for law and order. Collecting DNA samples should not be an end in itself rather than the collection of samples from crime scenes. The presentation of DNA evidence in court also needs to be improved. Molecular science is a complex subject and overburdened with abbreviations and abstruse terminology. Consequently, it can be hard for a jury to understand the strengths and weaknesses of the ‘evidence’ put before it. The same can also be said of many prosecution and defence lawyers, who become blinded by science and seduced by the statements such as there being a ‘one in 16 billion chance of the DNA sample coming from another person’. One must consider the nature of profiles, the collection and processing of the samples, and the method employed to calculate the match probability. There is therefore a need to develop mechanisms that enable the presentation of DNA‐based evidence in court in a standardised and clear way, so that both its strengths and weaknesses are apparent. The public also needs to be educated to accept that the portrayal of forensic science on TV and film is seldom realistic and it is not always possible to retrieve a perfect DNA profile from a crime scene.

Whilst DNA databases have led to the arrest of many rapists and murderers, most victims of violence know their assailant and the crimes are a consequence of sudden, unplanned anger or opportunism and fuelled by drink, or drugs, or both. The highly intelligent, charismatic serial killer, who is a staple feature of crime fiction and film plots is, mercifully, an exceedingly rare individual. Poorly educated people living in the less affluent parts of our society commit most violent crimes and their victims come from the same background. These crimes are usually solved by standard police procedures. Indeed, despite all the technical advances, pitifully few cases of rape and sexual assault result in successful prosecutions. The solving of many crimes does not therefore depend on ever more sophisticated technologies, but on the appropriate use of existing ones coupled with effective police work.