William A. Gahl, David R. Adams, Thomas C. Markello, Camilo Toro, Cynthia J. Tifft
Rare and novel disorders often present in childhood and represent a diagnostic challenge that can be addressed using advanced genetic techniques. In the United States, rare disorders are defined as those affecting <200,000 people (about 1 in 1,500 persons), but no single definition has been agreed on internationally.
An estimated 8000 rare disorders are recognized, and the existence of approximately 23,000 human genes suggests that many more genetic diseases will be discovered in the future. Potential reasons patients may remain undiagnosed despite extensive prior investigation include:
One approach toward investigating undiagnosed diseases was taken by the National Institutes of Health (NIH) Undiagnosed Diseases Program (UDP ), which was expanded to a nationwide Undiagnosed Diseases Network (UDN). For the >4,000 patient applications to the UDP, prior investigations are recounted in a summary letter from the referring clinician and documented with medical records that include photos, videos, imaging, and histologic slides of biopsy material. Specialty consultants review the records, and the UDP directors determine the next steps. Accepted patients come to the NIH Clinical Center for a week-long inpatient admission. Approximately half the patients with undiagnosed diseases have neurologic disease; cardiovascular, rheumatology, immunology, and pulmonary problems are also common. Approximately 40% of accepted patients are children, who often have congenital anomalies and neurologic disorders.
Patients remain without a definitive diagnosis after an extensive workup in part because every individual has a unique genetic and environmental background, and diseases have variable expression. Undiagnosed conditions include those never before seen, unusual presentations of otherwise recognizable conditions, and combinations of conditions that obfuscate each other's identities. A thorough clinical investigation allows the clinician to broaden the differential diagnosis through research, consultation, and clinical testing. Extensive phenotyping, imaging, and other tests provide better documentation of the presentation and allow for association with diseases not yet discovered, genetic variants, and patient cohorts.
A complete history anchors the data and includes prenatal and neonatal findings, developmental milestones, growth pattern, onset and progression of symptoms and signs, precipitating influences, response to medications, and a pedigree to determine which family members are possibly affected. Pertinent physical findings include dysmorphisms, organomegaly, neurologic impairment, bone involvement, and dermatologic findings. Because many rare and novel disorders are multisystemic, consultants play a critical role in every diagnostic evaluation. Typical studies performed to address possible diagnoses are listed in Table 101.1 ; neurodevelopmental or neurodegenerative phenotypes require even more extensive studies (Table 101.2 ).
Table 101.1
An inpatient admission allows for close interaction among experts in different fields, informs the evaluation of complex cases, and often leads to new disease discovery. In the last situation, other family members require evaluation to ascertain whether they are affected with the disorder.
Once phenotyping is complete, a list of candidate genetic disorders can be compiled. Laboratory testing is available for an increasingly large number of molecular disorders. Examples of genetic panels include those for X-linked cognitive impairment, hereditary spastic paraplegia, spastic paraplegia and gait disorders, spinocerebellar ataxias, dystonias, and mitochondrial disorders. Some of these are expensive and may exceed the cost of exome sequencing . On the other hand, exome and genome sequencing are not useful for detecting diseases caused by many types of genetic disorder, including from DNA repeats. In addition, exome sequencing may provide less certainty for excluding genetic diseases than a disease-specific test panel.
Single nucleotide polymorphism (SNP ) arrays and next-generation sequencing (NGS) provide valuable genome-wide structural information. The human genome's 3.2 billion bases include many that are polymorphic, customarily defined as differing between any 2 people >1% of the time. In most human populations, about 4 million differences exist between any 2 unrelated individuals (about 1 polymorphism for every 1,000 bases in the genome on average). Within a single ethnic population, about 1 common SNP occurs per 3,000-7,000 bases, where common means a >10% chance that the base will differ between 2 unrelated people. Approximately 1 million of these common SNPs can be included on a DNA hybridization array and examined simultaneously, revealing copy number variants, mosaicism, and regions of identity by descent. These results complement NGS results; one example is the pairing of sequence variants detected by exome or genome sequencing with trans -oriented deletions detected by SNP assay.
Technical advances have allowed for massive, inexpensive DNA sequencing, making it feasible to determine the sequence of the coding regions of almost all the human genes. Because this involves 1.9% of the 3.2 billion bases in the human genome, exome sequences comprise approximately 60,000,000 bases. Using current technology, clinical exome sequencing adequately sequences >80% of known genes and >90% of genes that have been associated with human disease. The average exome sequencing produces about 35,000 bases (0.06%) that differ from the “reference” sequence and from any other unrelated human sequence of the same ethnic group. These variants include some laboratory and computational errors. In practice, most variants are inconsequential polymorphisms and minor polynucleotide repeats that occur near intron/exon boundaries. However, each of the 35,000 variants of unknown significance is a potential disease-causing variant, yet only 1 (or 2 for compound heterozygous recessive cases) is the disease-causing mutation for a monogenic disorder (with perhaps 2 or 3 additional loci modifying severity). The clinician and bioinformatician must reduce the number of candidate variants to a tractable number, which is challenging. For instance, a variant causing an adult-onset disease may look just as damaging as a different variant causing congenital-onset disease. However, the likelihood of the presence of the associated diseases is much different in an adult vs a child.
Certain rules are used to separate likely-interesting variants from likely-uninteresting variants. For example, variants that segregate in a family consistent with a given inheritance model (e.g., dominant or recessive) are retained, while those that segregate in an inconsistent manner are set aside. This segregation filter requires careful clinical data collection and experimental design, since it depends on correct assignment of affected vs unaffected statuses in the family and collection of sequencing data for family members besides the proband.
A 2nd technique used to evaluate sequence variants is pathogenicity assessment . Bioinformaticians estimate the likelihood that a given DNA sequence variant will have biologic consequences (e.g., change protein function or gene expression). Factors such as nucleotide conservation and differences in coded amino acids are used to create a pathogenicity estimate, or score. Various software programs take different, often overlapping, approaches. PolyPhen-2, SIFT, and MutationTaster rate the pathogenicity of amino acid changes. Computer modeling programs such as CADD, Eigen, and M-CAP, trained on model genetic changes that are already validated, predict effects on gene expression of noncoding variants. These filters are very powerful because of large population datasets that are publically available, including the 1000Genomes project, ExAC, and the UK's 10K genome project. In the next 1 or 2 yr, datasets with genome populations in the 100,000 to 1 million range (e.g., GnomAD database) will further improve these filters and provide better subpopulation frequencies. Ultimately, a multiethnic, graph theory-based alignment should allow successful filtering of variants in currently incomplete genomic regions such as the HLA region. Overall, computational pathogenicity assessment has false-positive and false-negative rates of 10–20%.
Some filters compare variants to databases that contain previously measured or asserted properties of variants found in human populations, such as population frequency information (e.g., ExAC), or curated evidence for association with human disease (e.g., CLINVAR). The latter, while potentially useful, is quite incomplete for many genes, but this is improving. One common pitfall of database-derived filters is an inaccurate designation of certain variants as rare. This typically happens when the database is missing information from human populations in whom the variant is seen more often than in the included populations.
Several points need to be considered when employing genome-scale sequencing for clinical diagnostics. Positive predictive value gives the likelihood that a positive test is a true positive. This is higher in a population in whom a disease is common and lower in a population in whom the disease is rare. A person being tested with exome sequencing will show no clinical signs or symptoms of most of the genetic diseases for which the exome sequencing tests. Therefore, many apparently positive findings will be false positives, variants associated with phenotypes that do not match the person being tested.
Individual vs family studies are relevant because family data allow for the proband's variants to be substantially filtered. This advantage must be weighed against the financial costs of studying families vs individuals. Furthermore, family studies are useless if an affected person is called unaffected, or vice versa. Therefore, phenotyping family members is critical. For later-onset conditions, younger siblings may not be suitable for inclusion in an exome sequencing study unless their affected status can be determined unambiguously. Datasets with large numbers of young individuals may have many pathologic variants that cause disease in elderly persons and are inappropriate for filtering variants in late-onset adult diseases or for prenatal counseling about late-onset disease inheritance risks.
Data revisiting policies must be addressed. Genome-scale sequencing generates data for many genes besides those involved in the current diagnostic effort; these data may be useful in the future care of the patient. Some unreported mutated genes, not currently associated with disease, may be implicated in the future as disease risk factors or even as protective factors. In the current testing environment, time-limited data reuse policies and storage and reuse fees are increasingly common. In fact, the storage of data is now becoming more expensive than the cost of re-generating the data.
Early discussion with a genetic specialist is critical. Genetic counseling should be sought before an exome sequencing study is sent. Proper consent for exome sequencing studies is an involved process, including discussions of disease risk factors, unrelated medical conditions, carrier states, and cancer susceptibility. Consented individuals should be asked which types of results they would like to receive.
Anticipating findings that are difficult to use clinically is an important part of counseling. Variants of unknown significance (VUS) are problematic, and genome-scale sequencing amplifies the problem by including variable numbers of results that are difficult to use for medical decision-making. Discussing such variants with families can be challenging; counseling families about the likelihood of receiving this type of result before testing is performed can help the family to cope when the report is returned (see Chapter 94 ).
When used as a gene panel, exome sequencing rules in but does not rule out . An exome study is a cost-effective way to test many genes simultaneously, but coverage of any given exon varies. Therefore, exome studies cannot always exclude variants in a panel of genes. With careful analysis involving laboratory validation performed on many similarly processed individuals, the exome coverage of any given gene can be assessed. However, commercial/clinical testing facilities may be unwilling to perform such an analysis when a large set of genes needs to be considered. Therefore, a gene panel can be useful when the index of suspicion is high for a disorder caused by a large group of genes. Cerebellar ataxia and hereditary spastic paraplegia are examples (see Chapters 615.1 and 631 ).
Providing information to the testing facility improves the chance of diagnosis. Exome sequencing interpretation benefits substantially from the incorporation of an accurate and detailed phenotype. The more clinical information provided to the testing lab, the more specific and useful will be the clinical report.
The role of whole genome sequencing (WGS) is not yet defined in clinical practice but remains a consideration when exome sequencing yields no diagnosis. The fundamental issue is whether the VUS findings in an exome will be more meaningful than any additional variants discovered by WGS, rather than a clinical conclusion that there is no germline genetic/molecular cause for the undiagnosed patient. WGS tools have less confidence because of net lower coverage, take more time to process, and generate variants in noncoding regions of the genome that are much more difficult to filter and interpret.
Despite filtering for frequency and predicted deleteriousness, a variant identified by exome or genome sequencing cannot be interpreted as the cause of an individual's disease unless it has been previously demonstrated to cause a disease with a similar phenotype. To prove causality, medical genetics relies on association (the recurrence of mutations within a gene among individuals with a similar phenotype). For rare diseases, there may be too few affected patients to demonstrate a statistically significant association, and other evidence from phenotype ontologies, metabolomics, glycomics, proteomics, and lipidomics may be required. In addition, models (e.g., mice, zebrafish, fruit flies, yeast, cultured cells) can be developed to recapitulate the disease. The variant in question can also be linked to a biologic process or pathway that is known to cause a similar phenotype when disturbed. Finally, standardized and correlated phenotypic and genomic data are deposited into a database to identify other individuals with a similar phenotype and mutations in the same gene.
Physicians may apply their past biases to a group of variants that could be disease causing, but this is often misleading. A standardized computational approach is preferable. For example, the Human Phenotype Ontology standardizes the description of a disease and, because the descriptors have been mapped to other human diseases and to mutant model organisms, identifies possible candidate genes and genetic networks for causing the disease. Similarly, untargeted laboratory screening tests provide an unbiased survey of patient cellular biology and physiology and a more informed prioritization of candidate variants.
The ultimate proof of causality is to ameliorate the disease process by correcting the genetic defect; this might be demonstrable in a model system that recapitulates the human disease. Alternatively, a search for other patients with a similar phenotype and mutations in the same gene can be performed using public databases established using strict statistical and biologic standards.
Of the UDP's 1st 500 pediatric applications, >10% had more than 1 family member (usually a sibling) similarly affected. The age distribution had peaks at 4-5 yr (reflecting patients with congenital disorders) and at 16-18 yr (representing disorders with symptom onset at early school age). Most applicants had been on a diagnostic odyssey for >5 yr. Of the 200 pediatric cases accepted, 25% received a diagnosis; half were obtained using conventional diagnostic methods, including clinical suspicion, biochemical testing with molecular confirmation, or radiographic interpretation. Otherwise, SNP analysis and NGS yielded the diagnosis; all involved rare diseases.
Pediatric medical records require attention to what has and what has not been completed previously. The electronic medical record is an important tool, but “copy forward” functions can perpetuate errors, such as reports of normal testing when in fact the test was recommended or ordered but not performed. Repetitive copying also fosters sloppiness in critical thinking, failure to take an adequate history, and missing the nuances of symptom progression. A history and physical examination should be performed anew and all prior testing results confirmed through copies of original laboratory reports.
Prolonged and painful procedures should be performed under sedation, but the risks associated with sedation must be weighed against the value of information and samples obtained.
When a child comes to a genetics clinic for evaluation, the parents ask these questions:
The answers all require an accurate diagnosis. The lack of a diagnosis makes both the family and the physician uncomfortable, raises suspicion among relatives and acquaintances, and creates feelings of guilt about not having worked hard enough to obtain a diagnosis. Families often consult more and more specialists, becoming frustrated with the lack of coordination among providers. Families should save copies of every test and every visit from each institution in a binder for travel among institutions. A 2- to 3-page narrative summarizing the child's history, medications, list of healthcare providers with contact information, main medical issues, level of functioning on well days and sick days, and interventions that worked in the past can be invaluable in an emergency room setting. An electronic copy is easily updated. Parents can always be the best advocates for their child, particularly an undiagnosed child.
Recommendations to parents of an undiagnosed child are similar to those that apply to any child with chronic illness:
Rare and new genetic disorders can present at any age; a gene's “severe” mutations may manifest early in life while “mild” mutations present later. Diagnoses of known disorders can have very different bases, such as the extent of recognition of a clinical entity, a molecular confirmation, or biochemical evidence. Some variants identified by SNP and exome sequencing analyses may represent new diseases.
One example of the use of these technologies to discover a new diagnosis involves 2 brothers whose parents were first cousins. The brothers had an early-onset spastic ataxia-neuropathy syndrome, with lower-extremity spasticity, peripheral neuropathy, ptosis, oculomotor apraxia, dystonia, cerebellar atrophy, and progressive myoclonic epilepsy. A homozygous missense mutation (c.1847G>A; p.Y616C) in AFG3L2 , which encodes a subunit of a mitochondrial protease, was identified by exome sequencing. The AFG3L2 protein can bind to another AFG3L2 molecule or to paraplegin. UDP collaborators in Germany used a yeast model system to demonstrate that the patients' mutation affects the specific amino acid involved in the formation of both these complexes. As a result, the brothers exhibited the signs and symptoms of a known AFG3L2 defect, autosomal dominant spinocerebellar ataxia type 28 (SCA28), and also deficits attributable to a paraplegin defect, hereditary spastic paraplegia type 7 (SPG7). Other features of a mitochondrial disorder (oculomotor apraxia, extrapyramidal dysfunction, myoclonic epilepsy) were also present. The 2 brothers represent the 1st such cases in the world and expand the phenotype of AFG3L2 disease.
A 2nd example involves 2 siblings ages 5 and 10 yr with hypotonia, developmental delays, facial dysmorphisms, hearing loss, nystagmus, seizures, and atrophy on brain MRI. In this case the leading clue was biochemical in nature, and genetic analysis confirmed the diagnosis. Urine thin-layer chromatography for oligosaccharides identified a strong band determined by mass spectrometry to consist of a tetrasaccharide containing 3 glucoses and 1 mannose. This suggested a defect of glucosidase I, the 1st enzyme involved in endoplasmic reticulum trimming of N -linked glycoproteins from a high-mannose to a complex form. Mutation analysis confirmed compound heterozygous variants in the glucosidase I gene, establishing the diagnosis of congenital disorder of glycosylation IIb. The 2 siblings were the 2nd and 3rd patients in the world with this disorder.
Occasionally an autosomal dominant disorder, typically presenting in adulthood, can manifest as a completely different and more severe disorder when pathologic variants in the same gene are inherited from each parent; the child is a compound heterozygote. This was the case in a 3 yr old child who inherited 2 variants in GARS, the gene causing autosomal dominant Charcot-Marie-Tooth disease (CMT) 2D. The child had severe intrauterine and postnatal growth retardation, microcephaly, developmental delay, optic nerve atrophy and retinal pigment changes, as well as an atrial septal defect. Neither parent was symptomatic at the time the child was evaluated; the parents had normal electromyography and nerve conduction studies. This case emphasizes the need to consent families before any genetic testing as to the possibility of receiving unexpected results in additional family members. In this case, genetic counseling was expanded to include possible CMT2D in the parents.