© Springer Nature Singapore Pte Ltd. 2020
P. Shrivastava et al. (eds.)Forensic DNA Typing: Principles, Applications and Advancements https://doi.org/10.1007/978-981-15-6655-4_3

3. Sequential Advancements of DNA Profiling: An Overview of Complete Arena

Kriti Nigam1, Ankit Srivastava1, Subhasish Sahoo2, I. P. Dubey2, 3, I. P. Tripathi3 and Pankaj Shrivastava4  
(1)
Dr. APJ Abdul Kalam Institute of Forensic Science and Criminology, Bundelkhand University, Jhansi, UP, India
(2)
DNA Profiling Unit, State Forensic Science Laboratory, Bhubaneswar, Odisha, India
(3)
Faculty of Science and Environment, MGCGV, Satna, Madhya Pradesh, India
(4)
DNA Fingerprinting Unit, State Forensic Science Laboratory, Department of Home (Police), Government of Madhya Pradesh, Sagar, Madhya Pradesh, India
 

Abstract

Discovery of DNA profiling technology has led the forensic investigations to another level of confidence. This technique is among the utmost discoveries of twentieth century that has revolutionized the criminal justice system. This chapter briefly recapitulates the sequential progressions made in the discipline of forensic DNA fingerprinting which aids the justice system in multiple ways by making it far more efficient in comparison to the existing conventional techniques. Right from the discovery of this substantial technique, current capillary electrophoresis based methods using autosomal STRs along with lineage markers (Y STRs, X-STRs, mtDNA) are covered here along with an insight to the latest advancements including the next generation sequencing (NGS).

Keywords
DNA profilingSTRsDNA markersNGS

3.1 Introduction

The most magnificent breakthrough of late twentieth century that revolutionized scientific community and gave an exceptional vigor to criminal justice system is DNA profiling. As name is suggesting, DNA profile is the profile of a person generated by using its DNA. This is pretty much similar to the barcodes printed on different products available in a super market that barcode reader reads to give details about that particular product. DNA typing or DNA fingerprinting are the terms that are synonymously used to refer this technique. The credit of this vital discovery goes to Sir Alec Jeffreys, a geneticist from University of Leicester, UK. During September 1984, while studying on tentative ways to sort out immigration and paternity controversies by signifying the genetic linkage among different persons, Sir Jeffreys employed restriction fragment length polymorphism (RFLP) for analyzing DNA. He noticed that DNA has certain repetitive sequences, commonly known as minisatellite or variable number of tandem repeats (VNTRs) that are perpetually present in all human beings. However, their length varies among individuals (Jeffreys et al. 1985a). He utilized these variations in establishing the identity of a person. He soon realized that these variations are unique to each individual except for identical twins and thus, could be used to affirmatively ascertain the identity and individuality of a person (Jeffreys et al. 1985b). Eventually this ground breaking discovery of Dr. Alec Jeffreys emerges out as a potential tool for exonerating the innocent and incarcerating the guilty. Present chapter gives a chronological over view of different DNA profiling systems.

3.2 Restriction Fragment Length Polymorphism (RFLP)

RFLP is a pioneer technique of DNA fingerprinting. It employs a molecular method for DNA analysis i.e., it permits identification of different individuals on the basis of exclusive patterns generated by single restriction enzyme nicking DNA at peculiar sites commonly referred to as restriction endonuclease recognition sites. Polymorphic nature of genetic codes is the key factor behind this technique. Nicked DNA is then separated into different fragments using gel electrophoresis depending on relative sizes of generated fragments. Since DNA carries an overall negative charge, different fragments have a tendency to move toward positive pole. Smaller fragments will obviously move faster. Gel with these bands is then denatured by placing it in sodium hydroxide (NaOH) solution. The single-stranded DNA so obtained is transferred into a nylon or nitrocellulose sheet by capillary blotting, named after its inventor as southern blotting (Southern 1975). This sheet is then fixed by autoclaving. This single-stranded fragmented DNA on nylon or nitrocellulose sheet is then allowed to base pair with the labeled RFLP probes. These probes may be labeled with radioactive or chemiluminescent tag (Klevan et al. 1995). This process is commonly referred to as hybridization. When such hybridized DNA is exposed to X-rays, it produces distinct bands on X-ray film. The autogram so generated is the unique genetic signature of an individual with an exception of monozygotic twins who have same autogram. Shortly after its development, in 1986, the technique of DNA profiling was first utilized in solving a criminal case. Colin Pitchfork, a U.K. resident, was convicted of a double rape and murder because his DNA profile matched with DNA found at both crime scenes (Jobling and Gill 2004). Through RFLP, paternity can be excluded with a cumulative probability greater than 99.9% with the use of as few as four probes (Jeffrys et al. 1987) likewise 3–5 probes can provide an individualized fingerprint in forensic testing (Alghanim and Almirall 2003). Sequences of VNTR serve as the basis for fingerprinting. For many years RFLP technique served criminal justice system by resolving paternity disputes, ascertaining criminal identity, studying evolution and migration of wildlife, studying breeding patterns in animal populations and the detection, etc. However, requirement to large quantity of good quality sample, being expensive, tedious and time consuming, were certain limitations associated with this technique. This leads to the need of discovery of more efficient and advance techniques of DNA profiling.

3.3 Polymerase Chain Reaction (PCR)

Before discussing various PCR-based DNA profiling techniques, it is highly desirable here to discuss the basic conception of PCR first. This technique became a response to various limitations of existing RFLP technique. Even small and degraded DNA samples which deemed to be unfit for RFLP could be analyzed using this technique. PCR is an excellent technique for replicating single-stranded DNA from a template with the help of synthetic primers and a polymerase enzyme (Kleppe et al. 1971; Panet and Khorana 1974). Credit of this exceptional discovery goes to Kary Mullis who, in 1983 while working with Cetus Corporation, gave a very simple idea of making millions of replicates of desired DNA fragment with high fidelity (Mullis and Faloona 1987; Mullis et al. 1986). In the year 1993, Kary Mullis was awarded Nobel Prize in chemistry for his invention. PCR amplification is a chain reaction of series of three steps repeated in a sequential manner.

3.3.1 Denaturation

This is the first step of PCR amplification in which helix of DNA unwinds to separate the two strands of DNA by raising the temperature up to 98 °C. Now each strand acts as a template for synthesizing new strand. Ingredients of this step are a primer (like probes of RFLP), base pairs and a heat stable DNA polymerase usually Taq polymerase.

3.3.2 Annealing

During this step, reaction mixture’s temperature is decreased to 50–65 °C, this allows annealing between the single-stranded DNA templates and the primers. Primers have been named so because they mark the initial location for the synthesis of new strand of DNA.

3.3.3 Elongation

In this step, DNA polymerase came into play. Temperature of reaction mixture is again raised to approximately 75–80 °C and Taq polymerase add new nucleotides to the primer complementary to parent strand.

In order to ascertain whether the desired DNA fragment has been amplified or not (also known as amplicon or amplimer), agarose gel electrophoresis may be performed. Results are then compared with molecular weight markers which contain fragments of DNA of identified size.

3.4 PCR-Based Techniques

3.4.1 HLA DQA1

HLA stands for human leukocyte antigens—a group of proteins, located on the outer membrane of all the nucleated cells of our body. Genes of these antigens are positioned on sixth chromosome. Major histocompatibility complex (MHC) is the generic name of the genetic province to which HLA loci belong (Hugh et al. 1984). The MHC contains genes (including HLA) that are liable for immune responses of body. HLA antigens perform vital task in self-recognition and discriminating self from non-self. HLA antigens thus, crucially strategize defense mechanism against foreign substances. Every individual possess a unique set of HLA proteins inherited from his parents. The first typing method that was used to detect HLA alleles was—the hybridization of sequence specific oligonucleotide (SSO) probes with PCR-amplified DNA. Researchers have suggested various alternative SSO typing methods, chiefly varying on the length, sequence of probes, and their methods of detection. Initially, P32 SSO probes were used to hybridize with the desired region of HLA-DQA gene (Saiki et al. 1986a), but shortly afterwards biotin emerge out as an alternative of conventional P32 label (Saiki et al. 1987). The PCR-SSO technique was later applied to other allelic sequences also like DP (Bugawan et al. 1988), DQ (Horn et al. 1988; Morel et al. 1990) and DR (Erlich et al. 1989; Baxter-Lowe et al. 1989).

Although SSO technique is appropriate for investigation of large sample sizes but it is not a method of choice for analysis of small number of samples. Reverse dot blotting technique emerged out as a solution of this problem. Procedurally, it employs same PCR amplification procedure and SSO probes, but these probes are adhered to a solid support medium. The sample so amplified is labeled and hybridized on that supporting panel. A single hybridization and stringency wash permits the determination of polymorphic sequences of the selected sample. Conjugate streptavidin horseradish peroxidase is used to detect positive reaction by having a colored soluble substrate (Buga et al. 1990). Alternatively, chemiluminescence could also be employed for visualization (Buyse et al. 1993).

HLA-DQA was the foremost commercially available PCR-based kit for DNA fingerprinting. This kit had a potential to differentiate six different alleles i.e., 1.1, 1.2, 1.3, 2, 3, and 4. Initially HLA-DQA strips contain nine SSA probes. Perkin-Elmer later introduced a more sophisticated kit DQA1 with 11 SSO probes.

3.4.2 Amplified Fragment Length Polymorphism (AFLP)

AFLP is a fingerprinting technique based on PCR, which was explained for the first time by Vos et al. (1995). Then after, many protocols have been declared from time to time suggesting various modifications, all including following three basic steps:

(1) restriction endonucleases are employed in digesting genomic DNA, forming DNA fragments along with its ligation to double-stranded adaptors with known adaptor sequences, comprising of two nucleotides; (2) these DNA fragments are specifically amplified using primers complementary to adaptor sequences and the unknown genomic DNA; and (3) labeled fragments are separated by electrophoresis followed by silver staining (Fry et al. 2009). Fragmented samples are then subjected to automated analysis. Locus D1S80, with repeat size of 16 bp, is a popular choice for DNA fingerprinting (Thymann et al. 1993). However, other available choices include YNZ22, Apolipoprotein-B, and Collagen 2A1 (De Guglielmo et al. 1994). AFLP profiling can discriminate between illicitly grown marijuana and hemp (Coyle et al. 2003; Datwyler and Weiblen 2006; Hakki et al. 2003), identify illegal hallucinogenic fungi (Coyle et al. 2001; Lee et al. 2000; Linacre et al. 2002), legally protected owls species and their hybrids (Haig et al. 2004). This automation is the biggest advantage of this technique allowing a trouble free comparison of DNA samples in an economic manner.

3.4.3 Short Tandem Repeats (STRs)

In spite of the triumph of PCR-based AFLPs, in 1990s there was a switch to PCR of short tandem repeats (STRs); a technique that utilizes too small repeat units, with a length of just 2–7 bp. STRs are named as microsatellites or simple sequence repeats. Since their discovery, they are often referred as ‘gold standard’ in personal identification in forensic context. STRs alternatively known as microsatellites are short repeats of DNA sequences having repetitive units ranging from 1 to 6 bp (Chambers and MacAvoy 2000; Tautz 1993), entire length ranging up to 100 nucleotides. STRs are usually located near centromere of chromosome. Another category of these sequence repetitions is not so long i.e., it consists of 2–6 bp repeats only—popularly known as minisatellites. This small length of repeat unit makes them capable of easier amplification. Additionally, they also have a relatively higher resistance to the problems associated with degraded and contaminated DNA. Nucleotide sequence mutation might be the probable reason of these variations. STRs are present in both pro and eukaryotes, including human beings. They are somewhat evenly scattered all through human genome, consisting about 3% part of the entire genomic structure. But within chromosomes, they are not so uniformly distributed (Koreth 1996). Exploration of noncoding regions revealed that majority of STRs is found in those regions which do not code for any protein. Only about 8% of STRs are positioned in the coding regions (Ellegren 2000). Length of such repeated units vary among individuals which is their most significant aspect from forensic point of view. Not only their length but their densities also vary among chromosomes. Nineteenth chromosome is found to have highest density of STRs in humans (Subramanian 2003). An individual can be either homozygous i.e., with same number of repeat units or heterozygous i.e., with different number of repeat units at a certain locus. Tetranucleotide repeats, abundant in Adenine (A) are used in forensic genotyping (Goodwin et al. 2011; Nadir 1996). DNA Commission of the International Society of Forensic Genetics (ISFG) in 1993 assigned nomenclature to STR loci and its allelic variants. The STR loci are named in a peculiar manner, for example, D1S80, where D refers to DNA, 1 signifies the chromosome number where on STR locus is located, S refers to STR, and 80 is its exclusive identifier signifying position of the repeat i.e., 80th on the 1st chromosome. Name of STRs are given on the basis of base composition of the repeat unit (in parenthesis). After that the number of times of its repetition is mentioned in subscript, e.g., (GTA)4, a trinucleotide STR having sequence GTA is repeated 4 times. The first STR was discovered in the year 1991. In 1994 only four loci namely, F13A1, THO1, Vwa and FES/FPS were available (Kimpton et al. 1994; Lygo et al. 1994) but within a year this number got extend to seven (Gill et al. 1997; Kimpton et al. 1996; Sparkes et al. 1996a, b). By the end of 2000, number of available STR multiplex increased to 16 including amelogenin—a sex determining marker (Krenke et al. 2002). Today more than 20 STR multiple systems are available commercially for DNA typing. STR is found to be more sensitive in comparison to conventional single-locus RFLP methods and more discriminating when compared to other PCR-based profiling techniques, like HLA-DQA1 (Saiki et al. 1986b). Since then several researchers have been conducted researches in this field studying different populations for developing different standardized protocols across the globe (Butler 2005). Main advantages of the STRs markers are that by multiplexing, they can test more than ten STR loci in fast, simple, and simultaneous way (Morretti et al. 2001). Classification of STRs is made on the basis of length of their repeat units i.e., mononucleotides, dinucleotides, trinucleotides, and so on. However, tetranucleotides are utilized most due to the reason of minor stutter products probability, amplicons having one repeat less than true allele (Romeika and Yan 2013). In multiplex analysis kits, tetra- and pentanucleotide systems are also incorporated due to their ability to offer results with an augmented exclusion index. Multiplexing aids simultaneous analysis of several different loci. This does not merely save time, but also saves the evidences by utilizing a lesser sample size. Current trends of DNA profiling use multi-allelic STR markers which not only have structural comparability with original minisatellites but have much shorter repeats which makes them easier to amplify using PCR. The Federal Bureau of Investigations (FBI) in 1997 established a database named CODIS (Combined DNA Index System) that integrated amelogenin sex loci and 13 autosomal loci in the process of development of STR Typing System (Sullivan et al. 1993). These 13 STR loci set have efficient distinguishing abilities having an average probability of random match one in a quadrillion (1 × 10−15), which further extended to 20 markers (Budowle et al. 1998) while the standard set of Europe consisted of 12 STR markers (Gill et al. 2006). These loci are located on different chromosomes in form of highly polymorphic noncoding regions. Together with amelogenin, new multiplexes have been introduced in a single reaction within past few years that can amplify even more than 16 loci (Dixon et al. 2005). STR profiling has numerous applications including parentage testing, identification of disaster victims, rape felon’s identification, and many more (Yoshida et al. 2011). Butler et al. offered a summary of each STR marker (24 forensic autosomal STR loci) including their classification, length of repeat unit, chromosomal location, region of STR repeats, etc. (Butler and Hill 2012).

The application of STR multiplexing method to the biological evidences has changed the entire scenario of our justice system dealing with different heinous crimes. Multiplex PCR with fluorescently labeled primers has become a vital method adopted for short tandem repeat’s amplification having applicability in testing individual identity. STRs have popularly become the preferred markers for various applications of human identification (HID) due to their unique and polymorphic nature (Butler 2006). Measurement of length of different variable alleles is the foundation of individualization by STR profiling (Giardina et al. 2011). Due to its higher accuracy rate, STR Marker technology has become applicable in routine criminal investigations. In fact, this is one among the most responsive and extensively acknowledged techniques of scientific community. In the investigative procedure of various offences like manslaughter, assassination, rape, kidnapping, etc., STR profiling is a frequent technique today. But the capillary electrophoresis (CE) based DNA fragment size detection technique is not capable of distinguishing between alleles of different sequences but of similar lengths in complex paternity cases based on STR mutation. The upcoming NGS technology can discriminate these alleles and also helps in identifying mixed samples (Berglund et al. 2011).

3.5 Lineage Markers

Autosomal STRs are greatly used by forensic science laboratories for human identification as well as for human kinship determination. However, in some complex cases based on kinship relationships, the results of autosomal STRs may be inconclusive. Consequently, the analysis of STRs located on sex chromosomes came into play (Roewer 2003).

3.5.1 Autosomal Single Nucleotide Polymorphisms (SNP) Typing

SNPs have a lower heterozygosis in comparison to STRs. Where STRs needs a template size of 300 bp, the size of template is just 50 bp in case of SNP typing. Thus SNPs have become significant tools in degraded samples analysis. SNP typing played a crucial role in identification of victims of World Trade Center disaster which took place in 2001 (Brenner and Weir 2003; Marchi 2004). The European Network of Forensic Science Institutes (ENFSI) and the US FBI Scientific Working Group on DNA Methods (SWGDAM) are functioning on recommendations concerning standardization of SNPs and its use in degraded biological samples (Gill et al. 2004).

3.5.2 Y-STR Typing

Y-STRs are short tandem repeats located on Y chromosome and thus, are male specific. Polymorphic nature of Y chromosome plays a crucial role in differentiating unrelated males as well as proving lineage in paternal generations (Coble et al. 2009). Forensic significance of these Y-STRs lies in the fact that besides being male specific, they have a low mutation rate and have specific population allele distribution (Kayser et al. 2001, 2005, 2007; Kayser 2003). Y-STR is a brilliant way to recognize male DNA which often gets mixed with female DNA in offences involving sexual assaults (Kayser 2017; Roewer 2009; Prinz et al. 1997). In such instances, autosomal STR typing is often complicated or sometimes even unfeasible to give conclusive results. Detection capabilities of Y-STR for male DNA mixed with female DNA samples is also far above any discrepancy i.e., 1:2000 (Ballantyne and Kayser 2013). Presently existing commercial Y-STR m are much vigorous and even male mixed samples (Gang rape cases), samples contaminated with different biological fluids, aged and degraded samples could be analyzed successfully (Brenner 2010). Not only fixing the guilty, Y-STR typing is also important in exonerating the innocent in false accusations. Despite of numerous advantages, Y-STR possesses some limitations as well i.e., it represents a single locus and it requires large databases for study (Willuweit et al. 2011; Ballentyne et al. 2010). Inability of exclusion of paternally related suspects lowers its credential in comparison to autosomal STRs. However, the possible answer of this limitation emerges out in terms of rapidly mutating Y-STRs (RM Y-STRs). These STRs have high rates of mutation and thus, can potentially differentiate paternally related suspects (Ballentyne et al. 2012; Roewer et al. 1992). If we go back in history of Y-STRs, it was first described by Rower and Epplen in 1922 (Davis et al. 2013). Since then several advancements have taken place. Various Y-STR kits are now available commercially. However, two most commonly used ones are PowerPlex® Y23 (Promega Corporation) and Yfiler® Plus PCR Amplification Kit (Thermo Fisher Scientific, USA). PowerPlex® Y23 employs 17 traditional Y-STR loci along with six new loci including DYS481, DYS533, DYS549, DYS570, DYS576, and DYS643, thereby enhancing its discriminating power (Jain et al. 2016; Thompson et al. 2013). On the other hand, Yfiler® Plus PCR Amplification Kit allows multiplex amplification of 27 Y-STRs including seven RM Y-STRs (Fig. 3.1).
../images/487740_1_En_3_Chapter/487740_1_En_3_Fig1_HTML.png
Fig. 3.1

Steps involved in Y screening for determining the male fractions

3.5.3 Mitochondrial DNA Typing

Cytoplasm of almost all eukaryotic cells contains mitochondrial DNA (mtDNA)—a separate genome apart from nuclear genome (Hameed et al. 2015). mtDNA a 16-kb circular molecule present in multiple copies. mtDNA is maternally inherited and thus, could easily prove linkage between maternally related people. It codes for enzymes and proteins required for proper functioning of mitochondria. It codes for 13 proteins, 2 rRNA, and 22 tRNA. Mitochondrial genome imparts its forensic significance due to its stability and multiple copies. mtDNA in mammals has two different points of origin of replication. The point of origin of heavy chain is usually guanine rich and is located within D-loop (Displacement Loop). However, in contrast to this, point of origin of light chain is usually cytosine rich and is located nearly opposite to D-loop. The current region of forensic interest is D-loop (Pai et al. 1997). Approximately 1100 bp constitute the noncoding hypervariable regions of D-loop. This region is named so because of its higher rate of mutation in comparison to other regions of mtDNA (Quintans et al. 2004). This region is further subdivided into HV1, HV2, and HV3 regions with position range from 16,024 to 16,365, 73 to 340, and 438 to 574, respectively. From forensic point of view HV3 is rarely targeted, commonly HV1 and HV2 are examined. mtDNA obtained from calcified tissues and hair could be a potential tool of personal identification especially in cases of mass disasters where evidences obtained are often environmentally challenged (Kavlick et al. 2011). Despite all aforementioned advantages, the biggest challenge associated with the analysis of mtDNA is the interpretation of heteroplasmy. It refers to existence of different mtDNA haplotypes within a person (Bar et al. 2000). However, many researchers consider this phenomenon as a rare event (Payne et al. 2013). Various degraded samples can be analyzed by mitochondrial DNA Analysis like the hairs without root, teeth, skeletonized remains and various types of biological material which are degraded environmentally which cannot be processed by nuclear DNA Analysis. From very less quantity of samples, DNA can be detected and thus, maternal lineage can be established.

3.5.3.1 Preparation of the Sample

The first step in the sample preparation is to clean the sample properly in order to remove contaminants. Bone and teeth samples are need to be preprocessed by sanding or scrapping to remove the dirt or other unwanted debris attached to it. Similarly, hair samples need to be cleaned by sonication for removal of any dust particles or microbial growths. Hair or hair shaft region is used for mitochondrial analysis.

3.5.3.2 Extraction of Sample

The samples are extracted using C6H5OH/CCl3 or other basic chemicals in order to break up the DNA from different cofactors, proteins, and ions. Further it is purified to get pure DNA sample. Various commercial extraction kits are also available for extraction of DNA in very less time. Then the samples are quantified using various quantification methods and amplified by subjecting to polymerase chain reaction using HVR-I and HVR-II primers.

3.5.3.3 Sequencing of mtDNA

PCR-based mtDNA typing through automated sequencing has proven to be a legitimate, robust, and dependable means of forensic analysis. Applied Biosystem’s SOLiD sequencing by ligation is very famous and practiced by various laboratories (Sheshanna et al. 2014). This method is utilized for entire mtDNA sequencing coupled with targeted re-sequencing. Method incorporates preparation of sample, emulsion PCR and preparation of substrate, basic chemistry of ligation along with imaging and data analysis. Solid sequencing is an encoding system consisting of two bases having an accuracy rate of 99.94% (Kircher and Kelso 2010), adequate for abundant samples to identify SNPs. For last decade, next generation sequencing (NGS) by Illumina has become widely acceptable. NGS provides a unified platform to prevail over the extensive range of complicated samples encountered. Genetic analysis of evidences of mass fatality with NGS let scientists to take out vital informations even from compromised samples. It is possible to get clue of mutations and SNPs from the whole genome sequencing where several suspects and victims are involved. The sequencing can be performed on compromised samples such as preserved tissues; blood exposed to different temperatures, samples of hair, etc. In comparison to other techniques tested in laboratories, multiplex mini sequencing of mtDNA provides elevated rate of success (Kinra 2006).

3.5.4 X-STR Typing

Autosomal STRs have been used utilized for forensic services long before the Y-STR and X-STR markers. But these X- and Y-STR markers offer the potential to provide additional information which autosomal STRs fail to accomplish. The first major breakthrough in the field of X chromosomal markers was done by Race and Sanger, when they detected Xga blood group (Tippett and Ellis 1998). Females have a homologous pair of X chromosome. The use of X-STRs is highly proficient for determination of kinship especially between father and daughter by increasing the power of discrimination. In contrast to this, in cases related to questioned mother–daughter relationships, X chromosome markers fails to provide any specific information, just like autosomal STRs. In case of mixed stains especially in rape cases, X-STRs can positively identify female DNA (Shin et al. 2005; Szibor et al. 2000). In reference of X-STR, in the first multiplex study nine loci were analyzed (in three different multiplexes) including duplex PCR (DXS6789 and DXS6795), Triplex PCR (DXS7133, DXS9895, and DXS9898), and Quadruplex PCR (GATA164A09, DXS6803, DXS8378, and DXS7132) (Son et al. 2002). The greater the number of markers, higher will be degree of discrimination. Currently more than 40 X-STR markers have been identified for forensic purposes (Jedrzejczyk et al. 2010; Machado and Medina-Acosta 2009; Szibor 2007; Szibor et al. 2006).

3.6 Next Generation Sequencing

Since 2005, next generation sequencing (NGS) most commonly referred to, massively parallel sequencing (MPS) has substantially changed the horizon of genomic research. NGS is a ‘non-Sanger sequencing’ method that can simultaneously sequence billions of DNA molecules to minimize the fragment cloning (Yang et al. 2014; Aly and Sabri 2015; Mostafa et al. 2015). This sequencing technique works on loop array sequencing that can simultaneously analyze huge data. Several commercial systems have been introduced based on this sequencing process. This vital technology has developed in following three sequencing generations:
  1. 1.

    First generation sequencing: The first pyro-sequencing based system was ‘454-Genome Sequencing System’, introduced by Roche in 2005. This system primarily detects pyrophosphate which gets released after incorporation of each nucleotide in a new synthetic DNA strand. However, this system is obsolete now.

     
  2. 2.

    Second generation sequencing: In 2006, Solexa released Illumina, which used the technology of sequencing by synthesis. In 2007, ‘SOLiD’ a combination of two base encoding system and oligonucleotides ligation based ‘second generation sequencing’ system was introduced. In 2010, Ion Personal Genome Machine (PGM) MiSeq were released by Illumina and Torrent. PGM used semiconductor DNA technology while MiSeq is a versatile instrument performing all the steps starting from cluster generation, amplification, and sequencing up to data analysis.

     
  3. 3.

    Third generation sequencing: This technology is based on Single Molecule Real Time (SMRT) DNA sequencing system for determining the base composition of the single DNA molecule. SMRT sequencing basically relay on sequencing-by-synthesis approach; an SMRT chip is used that contains numerous mode-zero wave-guides. To this SMART chip, DNA polymerase molecules are attached for the synthesis of desired DNA fragments. DNA is labeled with fluorescent markers generating an illuminating signal which is captured by a sensor. SMRT technology can attain an average read lengths of 5500–8500 bp and it is also capable of directly detecting certain epigenetic modifications like 4-mC (methyl Cytosine), 5-mC, and 6-mA (methyl Adenine) (Meldrum et al. 2011; Schadt et al. 2010; Murray et al. 2012).

     

STR analysis is mainly carried out via separation of DNA through capillary electrophoresis (CE) based on size of DNA (Butler 2012). Although capillary electrophoresis fails due to detect internal sequence variation in STR alleles. This STR allele sequence variation is currently considered to be very crucial in different cases having traces or compromised DNA samples. Some significant techniques such as mass spectroscopy and next generation sequencing (NGS) have presently been used in forensics to learn and recognize the possible the variations in internal sequence of STR alleles (Pitterl et al. 2010; Planz et al. 2012; Rockenbauer et al. 2014a). Analysis of variation in STR sequence is these days becoming even more imperative as this will enhance the discrimination of individuals in mixed DNA samples.

NGS is the key of revealing and exactly identifying STR allele variations. The CE-based STR assay may identify the alleles as per their relative size in comparison to a sequenced allelic ladder, albeit variations in internal sequence may exist (Gettings et al. 2015). An in-depth evaluation of sequence variation of STR alleles can be made by using by NGS technology. However, such result should be compared with the PCR-CE-length-based genotype. This further assists in studying the STR allele’s rates of mutation and enhancing the understanding of rate of mutation of STR for a specific allele. An additional demerit of CE-based STR analysis is the variation in allele peak w.r.t. size of allelic ladder particularly if there are insertions or deletions (indels) in the contiguous regions of that allele which is not frequent but have a potential of its presence. For a particular marker, PCR-CE based assay can be performed within a day. Still the biggest benefit of NGS is its cumulative STR analysis and NGS assay despite of relatively longer time that it takes (Børsting and Morling 2015). Additionally NGS is also capable of analyzing DNA samples which is hard with CE-based analysis. Out of all available NGS methods, sequencing by ligations has least rate of error, and the paramount platform for various genetic applications in context to criminal justice system (Børsting and Morling 2015). Thus, progression of NGS integrated with STRs, is the need of the hour since STRs alone constitute about 15% of human genome (Gettings et al. 2015; Ayres et al. 2002). Out of 24 autosomal STR loci, nine loci showed a boost in alleles exceeding 30% when sequenced with NGS platform as compared to PCR-CE-length-based genotype (Gettings et al. 2015). However rest of 15 STR loci showed less repeat region variation, for instance, loci like D5S818, D7S820 and D13S317. But this less variation in the contiguous region is also helpful in understanding the role of different mutational events in evolution process. NGS sequencing reveals the true STR loci variation, as new alleles have been detected by sequencing the simple STRs which is vital in enhancing the statistical strength of analysis.

3.6.1 Advantages of NGS Technology

Several limitations of present CE-based analysis are there which prompted the forensic fraternity to discover the utility of NGS technology in criminal justice system. Limitations of existing system includes loss of crucial genomic information from compromised DNA samples, low-resolution genotyping of current markers, low-resolution mixture and mtDNA analysis and inability to analyze manifold genetic polymorphisms using a single workflow in a single reaction. NGS has revolutionized the existing genomic research by exceptionally improving the time, speed, cost, accuracy, and sequence length (Berglund et al. 2011; Dalsgaard et al. 2013; Fordyce et al. 2011, 2015; Gelardi et al. 2014; Phillips et al. 2014; Rockenbauer et al. 2014b; Scheible et al. 2014). The supremacy of NGS technology has its applications in different genetic arenas, like sex and mitochondrial chromosomes, autosomes, etc. NGS has impending applications in numerous aspects of research, including forensic microbiological, plant and animal analyses, construction of DNA database, phenotypic and ancestry inference, studies of monozygotic twins and species and body fluid identification. DNA sequencing is no longer a tedious task.

3.6.2 Application in Forensic Science

Due to its utility in forensic investigations, DNA analysis has been rendered as an important tool in criminal justice system. Forensic DNA analysis is often encountered with limited, contaminated and highly degraded samples, and the need of the hour is reproducibility, higher precision as well as consideration of time and cost factors.

3.6.2.1 STR Sequence Variation Detection

STR analysis is the most essential and commonly employed technique in criminal justice system. It has several advantages including low DNA template requirement, speedy and accurate allele determination, fluorescence-based detection and multiplex amplification, employment of the plentiful genomic element and digitized results. Present, STR-based databases have been established by more than 60 countries, and these databases persist to grow quickly. For instance, over 27 million entries are there in the forensic database of China (Ministry of Public Security-China 2012). Statistically, there is higher probability of an accidental correlation between unrelated individuals if analysis is based solely on 13 CODIS STR markers of routine use i.e., FGA, CSF1PO, TPOX, THO1, VWA, D5S818, D3S1358, D8S1179, D7S820, D16S539, D13S317, D21S11, and D18S51 or 15 markers (13 CODIS loci plus D19S433 and D2S1338). Assimilation of more STR markers is thus recommended to avoid this unfavorable situation. However, owing to technical faults of fluorescent-based CE sequencers which are presently in use, detection of more STR markers at the same time, would not be an easy task. Conventional CE-based STR typing using CE is unable to distinguish identical alleles of varied sequences. As a result, this technique fails to resolve cases of STR mutations in complex paternity disputes. Mixed DNA samples pose additional challenge in forensic DNA investigations. Such samples have low rates of detection and thus, are not so helpful in forensic investigations. In the beginning, NGS technology was not considered fit for STR analysis due to much shorter read length. However, with technological advancements; continual efforts have been made to increase the average read length. Many researchers have now employed NGS technology for STR analysis as it is able to easily distinguish similar length alleles, digital read count could considerably smooth the process of complex paternity analysis and the recognition of compromised DNA samples. For instance, a pioneer research has been conducted by Zajac and his colleagues. In their study, by 454 Genome Sequencing System, they used trinucleotide threading (TnT) approach to analyze three CODIS STR loci, D18S51, TPOX, and CSF1PO (Zajac et al. 2009). After this, Irwin et al., coupled with multiplex identifier technology, analyzed 13 CODIS STR loci for single source samples using 454 GS Junior system (Irwin et al. 2011). Bornman et al. went a step ahead and demonstrated that the AMEL gene and 13 CODIS STR loci can be accurately identified using high-throughput sequencing technology even for mixed samples (Bornman et al. 2012). Warshauer et al. developed software named STRait Razor, able to analyze NGS data for 44 STRs, including 21 Y and 23 autosomal chromosome STRs (Warshauer et al. 2013). Van Neste et al. established a reference allele database using Illumina’s MiSeq system to detect mixed and single source DNA samples; they found that genotyping results of most locus were reliable and stable (Van Neste et al. 2014). NGS technology thus, can considerably assist the analysis of complex and compromised DNA samples and consequently can significantly enhance the cost-efficacy and competence of cases of forensic interest.

3.6.2.2 Single Nucleotide Polymorphisms (SNP)

The ‘whole-genome sequencing’ based on NGS enables the exploitation of SNPs in large scale for criminal investigation with higher accuracy. Multiple important morphological characters for instance skin, hair, and eye color have been predicted with 80–90% accuracy rate by this method. SNP technique also overcomes the stuttering artifacts and it has comparatively less chance of mutation that enables it to prove the kinship (Berglund et al. 2011).

3.6.2.3 Uniparental Markers (Lineage-Based Genetic Markers)

Mitochondrial DNA (MtDNA)

mtDNA has proved its utility as a potential forensic tool due to its characteristics like small size, maternal inheritance, multiple copies, lack of recombination, and high mutation rate, in variety of cases involving low amounts of DNA is recovered and where maternal lineage is under investigation. Current forensic mtDNA analysis usually detects only polymorphisms within a hypervariable region. However, additional polymorphic loci are required for using mtDNA as a genetic haplotype marker and to enhance the discriminating capacity of identification. Consequently, NGS technology has great prospective in whole mitochondrial sequences analysis. Binladen et al. utilized a technique of coding of primer and created 256 tagged primers to be used in multiple parallel sequencing, which allowed sequencing of 256 samples in one run (Binladen et al. 2007). Gunnarsdottir et al. simultaneously sequenced complete mitochondrial genomes of 109 Filipino individuals using NGS technology (Gunnarsdottir et al. 2011). Heteroplasmy of human mtDNA is a very common event and is often encountered in different cells of an individual (Cao et al. 2006). Forensic mitochondrial analysis is often get affected by mtDNA heteroplasmy. The advantages of detecting heteroplasmy at the whole mitochondrial genome level by using NGS (Li et al. 2010) includes high sensitivity and accuracy, low cost, simple operation, and high throughput (Tang and Huang 2010). A different study used 454 GS Junior system to simultaneously examine; a Y chromosome STR locus (DYS389I/II), an Autosomal STR locus (D18S51) and multiple mitochondrial hypervariable regions and results confirmed that a mixing ratio as minimal as 1:250 of two DNA sources is detectable. Authors further assert that a mixing ratio as low as 1:1000 might be detectable by increasing the sequencing coverage (Holland et al. 2011). Sixty-four complete mitochondrial genome sequences were examined in order to evaluate the haplotypes defined via NGS technology at complete mitochondrial genome level with conventional Sanger sequencing. The results manifested differences in <0.02% of nucleotides by employing these methods. Distinction was noticed in or around homopolymeric stretches, as these areas are more susceptible to sequencing flaws (Parson et al. 2013). Mikkelsen et al. described that if the results have careful visual inspection, in homopolymers, by using the 454 NGS method, up to six bases, 95% of the reads could be correctly sequenced (Mikkelsen et al. 2014). Earlier-unreported heteroplasmy in GM9947A component of the National Institute of Standards and Technology human mtDNA SRM-2392 standard reference was revealed. The recent developments in the complete mtDNA sequencing based on NGS technology overcome the limitation of only noncoding region sequencing. By virtue of NGS new mtDNA phylogeny and haplogroups have been discovered and important ancestry-based biogeographical markers are also found. NGS also solved the problem of mtDNA heteroplasmy due to mixed samples from the same person (Melton et al. 2012).

Y-STR

Use of Y-STRs explicitly resolves inferential issues of paternal relationships among male individuals and to determine male DNA mixed with high female background. NGS technology was used to compare more than 10,000,000 nucleotides of the Y chromosome of two males who have same ancestry before 13 generations were compared (Xue et al. 2009). Four genetic distinctions were revealed which suggested that Y chromosome sequencing could be used to resolve issues related to discrimination between mixed male samples offspring of the same parents. Van Geystelen et al. used Y chromosome SNPs to develop AMY-tree and fruitfully confirmed the variations among 118 unrelated males belonging to 109 diverse geographical locations (Van Geystelen et al. 2013a, b). From this research, AMY-tree emerged out as a potential tool for determining Y chromosome pedigrees and identifying unknown Y-SNPs belonging to diverse geographical locations. NGS technology is very useful in the Y chromosome sequencing of mixed DNA samples of male perpetrators belong to the same parent to distinguish them individually.

3.6.2.4 Epigenetic Markers

Several recent studies suggest various applicabilities of epigenetic markers in the field of forensic science. Epigenetic markers can be used to predict tissue type (Frumkin et al. 2011), to distinguish monozygotic (MZ) twins (Li et al. 2013), and to determine the age of a DNA donor with precision (Bocklandt et al. 2011a). Epigenetic markers are capable of identifying the specific body fluid source of DNA, the age of the source person, and DNA fabrication (Lee et al. 2012; Bocklandt et al. 2011b; Courts and Madea 2011). NGS technology-based epigenetic approaches comprise complete-genome methylation beadchips, bisulfite sequencing (Grunau et al. 2001), methylated DNA immunoprecipitation sequencing (Weber et al. 2005) and reduced representation bisulfate sequencing (Meissner et al. 2005). Since these sequencing methods require DNA samples in large amount; their capability to utilize minute DNA samples will be critical for the accomplishment of this technique. However, traces of DNA (say about 100 pg) have been effectively analyzed through genome-wide amplification of a bisulfite-modified DNA template, subsequently pyro-sequenced quantitative methylation detection (Paliwal et al. 2010). Another encouraging study was performed with trace blood spot samples using bisulfate genomic DNA sequencing (Xu et al. 2012).

3.6.2.5 MicroRNA (miRNA)

MicroRNAs are a set of minute endogenous RNA molecules having length range of 18–24 nucleotides. Due to their relative resilience to deterioration, tiny size and extremely tissue-divergent or tissue-specific expression, they are considered to be appropriate for forensic identification of body fluids, postmortem interval (PMI) analysis and species identification (Courts and Madea 2010). For body fluid identification due to its tissue-specific expression, it is an ideal biomarker. Introduction of miRNAs in the field of forensic sciences is a recent event. Mostly RT-PCR and biochip technology are used for the analysis of miRNA. Hanson et al., in the year 2009 initiated the use of miRNA profiling in forensic science and demonstrated that from forensic samples 452 miRNAs were genotyped by means of quantitative PCR method (Hanson et al. 2009). In a different study, on a microarray, the expression levels of 718 miRNAs in menstrual blood, saliva, venous blood, semen and vaginal secretions were profiled (Zubakov et al. 2010). Among these, 14 differentially expressed miRNAs were recognized as probable candidates for identification of body fluid. Millions of miRNA sequences can be analyzed quickly by using NGS technology, to identify miRNA expression in diverse disease states as well as organ- and developmental stage-specific expression, thus acting as a promising instrument for forensic analysis. For miRNA profiling, NGS is highly sensitive (Courts and Madea 2010; Wang et al. 2012).

3.7 Conclusion

DNA profiling has brought a boom in forensic science, but it was not just an overnight miracle. It is a bonafide result of the tireless efforts of the researchers of this field over last several decades that DNA profiling technique has emerged and established itself as an infallible technique for personal identification. This technique has changed lives of many people, not only those who have been awarded time to time in making significant contribution to this field, but also to those people who were waiting for justice. Advancements are being made in this field at a higher pace. NGS has undoubtedly given birth to the opportunity of extremely responsive and high-throughput analysis. However, more work is required to fulfill this objective, especially to overcome the problems with error rate, low-template library preparation, issues with NGS data processing and mining and type estimations. There is a need of generation of guidelines for NGS’s applications in field of criminal justice system. With unremitting translational efforts of forensic scientists and technical advancements of NGS technology, it is hopefully believed that those days are not far away when NGS technology would probably become an effortlessly available customary practice in forensic science.