© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020
S. AsamitsuDevelopment of Selective DNA-Interacting LigandsSpringer ThesesRecognizing Outstanding Ph.D. Researchhttps://doi.org/10.1007/978-981-15-7716-1_1

1. Introduction

Sefan Asamitsu1  
(1)
Kumamoto University, Kumamoto, Japan
 
 
Sefan Asamitsu

Abstract

Deoxyribonucleic (DNA) is one of the biomacromolecules and carries the genetic information of living organisms. DNAs contain four bases, adenine (A), thymine (T), cytosine (C), guanine (G), and stably exist in the nucleus forming double helix structures through Watson–Crick base pairs. Over the past two decades, the secondary structures of DNA have proven to have profound implications on various biological, neurological and pharmacological events. In addition, intensive studies on the creation of synthetic ligands that interact with the secondary structures of DNA and affect a specific transcriptional process have been performed with the aim of developing potential molecular probes and therapeutic agents for human diseases. This chapter summarizes the biological significance of DNA structures beyond the Watson–Crick structure and their interacting ligands.

Keywords
Deoxyribonucleic acid (DNA)Non–canonical DNADNA–binding ligandsHuman diseases

1.1 DNA-Binding Molecules

1.1.1 Naturally Occurring Molecules

Deoxyribonucleic (DNA) is one of the biomacromolecules and carries the genetic information of living organisms. DNAs contain four bases, adenine (A), thymine (T), cytosine (C), guanine (G), and stably exist in the nucleus forming double helix structures through Watson–Crick base pairs (Fig. 1.1). While primary genetic information, a DNA base sequence is virtually identical in all the cells belonging to an individual organism, the expression pattern of the genes is diverse and dependent on organs, tissue type, cell lineage, and even on the unit of a single cell. By virtue of numerous and extensive researches that have been continuing from the onset when researchers having a question “What DNA is all about,” we now know that the diversity of the gene expression is precisely governed by specific protein–DNA interaction, protein–protein interaction, and those cooperative combinations, which are primarily based on the DNA base sequence, structure, and its modification pattern.
../images/500906_1_En_1_Chapter/500906_1_En_1_Fig1_HTML.png
Fig. 1.1

B-form double helical DNA and Watson–Crick base pairs

DNA-binding molecules can influence such highly governed gene expression systems through their direct interaction with the target DNAs. At the initial direction of the study on DNA-binding molecules, the contribution of biophysical chemists revealed that some antibiotics such as chromomycin, actinomycin D, netropsin, distamycin A, and calicheamicin oligosaccharide were identified to have sequence-specific DNA-binding properties and the DNA-drug complexes were elucidated at the atomic level using NMR and X-ray crystallography technique (Fig. 1.2). Chromomycin was shown to preferably bind a GC-rich sequence of duplex DNA and to have the ability to inhibit RNA synthesis. The binding mode of the drug complexed with a 5′-AAGGCCTT-3′ duplex DNA was solved at the atomic level by an NMR analysis (Fig. 1.2a) [1]. Two molecules face each other side by side in an antiparallel orientation and the dimeric chromomycin with one magnesium centered adaptively molded into the minor groove to generate a symmetrical complex structure. Interestingly, a later study revealed that chromomycin was capable of adaptively binding to an unusual DNA form adopted by a CCG trinucleotide repeat sequences with these consecutive cytosine residues ejected [2]. Similarly, actinomycin D showed a GC-rich sequence-preferred binding propensity and an antitumor activity by interfering with DNA replication and RNA transcription processes (Fig. 1.2b) [3]. Actinomycin D also has a high binding affinity to a G•G or T•T mismatch-containing hairpin structure formed by CGG or CTG trinucleotide-repeated DNA, respectively [4, 5]. On the contrary, netropsin, distamycin A, and N-methylpyrrole-based oligopeptide are the classical crescent-shaped minor-groove binders that predominantly recognize AT-rich sequences and enforce the thermal stability of the duplex with minimally affecting a microstructure of DNA (Fig. 1.2c, d) [611]. Initially, solution and crystal structures of the 1:1 binding complexes were elucidated by the NMR and X-ray crystallography structural analysis [6, 7]. The subsequent studies on the atomic-level structures of the complexes of distamycin A and the duplex DNAs (5′-CGCAAATTGGC-3′, 5′-CGCAAATTTGCG-3′, and 5′-GTATATAC-3′) showed that distamycin A bound to the duplex in an antiparallel 2:1 binding mode, where the two distamycin molecules assembled side by side through an antiparallel orientation into the minor groove [810]. The sequence-preferential affinity to the duplex DNA relies on the formation of hydrogen bonds at the specific sites of the thymine or adenine base, i.e., amide protons of distamycin A were in close proximity to an N3 position at the adenine or an O2 position at the thymine [810].
../images/500906_1_En_1_Chapter/500906_1_En_1_Fig2_HTML.png
Fig. 1.2

Naturally occurring products that bind to DNAs in a sequence-dependent manner

1.1.2 Pyrrole–Imidazole Polyamides

The unique binding preferences of the naturally occurring molecules introduced here, as evidenced by NMR and X-ray diffraction structural analyses, evoked the notion that finely designed synthetic molecules should be able to target a wider repertoire of DNA sequences. Notably, Dervan and colleagues developed distamycin analogs as programmable DNA-binding molecules, named pyrrole–imidazole polyamides (PIPs) [1113]. PIPs are comprised of N-methylpyrrole (P) and N-methylimidazole (I) through amide bonds and can distinctly recognize A-T or T-A base pairings and G-C base pairings with high specificity and affinity comparable to natural transcriptional factors (Fig. 1.3) [8, 9, 11, 14]. The mechanism for the base-pair discrimination by PIPs has been established by a series of the atomic-level structural analyses that were performed by a group of Wemmer and Dervan. In the antiparallel binding mode where two molecules of an I-P-P oligoamides were inserted side by side into the minor groove of DNA, the binding orientation of the dimeric oligoamides against the duplex DNA was defined to be an N → C direction with respect to the 5′ → 3′ direction of the proximal DNA strand [11]. Moreover, an electron-donating nitrogen at an N3 position of the imidazole ring was shown to be able to form a hydrogen bond with a hydrogen of the exo-amine of a guanine residue [11, 14]. The commitment of the imidazole moiety to the hydrogen bond formation confers the selectivity to G-C pairs over A-T or T-A pairs. The later relevant structural studies on PIPs confirmed that the other forms of PIPs, including hairpin [12] and cyclic types [13], were capable of discerning A-T or T-A and G-C base pairs, relying on the preferred hydrogen bond formation when an imidazole moiety pairs with a pyrrole at the opposite strand (Fig. 1.3) [1517].
../images/500906_1_En_1_Chapter/500906_1_En_1_Fig3_HTML.png
Fig. 1.3

Chemical structure of pyrrole–imidazole polyamide (PIP) and the crystal structure of the complex of a PIP and a duplex DNA (PDB code: 3omj)

I here would like to summarize the rule of base-pair discrimination by PIPs; PIPs recognize a C-G base pair by the antiparallel pairing of a pyrrole (P)/imidazole (I) pair, and an A-T or a T-A base pair by the antiparallel pairing of P/P pair (Fig. 1.4). A β-alanine (β) [18] or γ-aminobutyric acid (GABA) which are alternative constituents of PIPs for promoting the relaxing of the entire structures or the adaptive folding into the hairpin forms, respectively, also prefers an A-T or a T-A base pair.
../images/500906_1_En_1_Chapter/500906_1_En_1_Fig4_HTML.png
Fig. 1.4

Base-pair discrimination by a P-P or I-P pair of PIPs

A crucial advance in the issues of PIP synthesis was reported in 1996, describing a method for the solid-phase synthesis (SPS) of PIPs using Boc chemistry and the gram-scale synthesis of monomer units without chromatographic purification [19]. Therefore, the synthetic timescale was shortened for one polyamide from months to days. The subsequent progress of the solid-phase approach using an Fmoc chemistry has been reported, showing optimized yields and purities [20]. Nowadays, the Fmoc SPS approach has been adopted for rapid and facile PIP synthesis.

The sequence-specific binding of PIPs to the predetermined sequences permits the transcriptional regulation of intended genes. There have been numerous successful examples to date [21]. Gottesfeld, Dervan, and colleagues first reported a prominent work, where a PIP designed to target TFIIIA binding site (5′-ATGACT-3′) interfered with 5S RNA gene expression in a kidney cell [22]. Continuously, the Dervan group has been a leading research group in this field of targeted gene regulation by PIP molecules and keeping offering the invaluable insight into PIP design for enhanced specificity and affinity, methodologies of how to target a specific gene of interest, the molecular mechanism of the action of PIPs, and in vivo application (see Dervan’s group website http://​dervan.​caltech.​edu/​).

Conjugation strategy offers promise for more diverse gene transcription control in a site-specific fashion, where the functional domain connected to a PIP domain is capable of exerting its function in a specific locus in a tunable manner. For instance, an alkylating agent-connecting PIP allows for a sequence-specific alkylation [23]. Notably, the occupation and alkylation at a template strand in the coding region of genes are capable of arresting the progression of an RNA polymerase [24, 25]. Our group reported a PIP-seco-CBI conjugate molecule that directly targets mutant DNA in a KRAS driver oncogene. This molecule was capable of selectively alkylating the oncogenic codon 12 mutant DNA in the code region, repressing KRAS expression, and blocking the downstream RAS signaling pathway (Fig. 1.5) [26]. Consequently, it caused strand cleavage and tumor growth suppression in a xenograft mouse model [26]. Another important oncogenic mutation in KRAS gene, a codon 13 mutation was also found to be targetable by a finely designed PIP-seco-CBI conjugate molecule [27].
../images/500906_1_En_1_Chapter/500906_1_En_1_Fig5_HTML.png
Fig. 1.5

A PIP-indole-seco-CBI conjugate to target the oncogenic mutant codon 12 in KRAS gene

A PIP domain connected to a chromatin modulator allows for converting the action sites of the modulator from a vast extent of chromosomes into the vicinity of the binding sites of the PIP domain, resulting in the epigenetic alteration of gene expression in a site-specific manner. In the previous studies of our group, suberoylanilide hydroxamic acid (SAHA), an inhibitor of histone deacetylases (HDACs), was covalently connected to PIPs that target different sequences to construct an in-house SAHA-PIP conjugate library (Fig. 1.6a) [28, 29]. The SAHA module tethered to a PIP would render the vicinal chromatin a loose state by exerting a site-specific inhibition of deacetylation of the histone proteins and trigger the transcriptional activation at the specific genes. A small screening of the thirty-eight compounds by a DNA microarray for examining the global gene expression profiles demonstrated that the respective SAHA-PIP conjugates displayed distinct transcription activation patterns in mouse and human fibroblasts, primarily based on different DNA recognition by the PIP domains [29]. The functional analysis of the DNA microarray [29] and the subsequent studies [3033] revealed that each SAHA-PIP conjugate distinctly activated a certain gene network, where SAHA-PIP K, A, G, I, X, and L were identified to transcriptionally activate a different panel of genes important for gametogenesis [30], pancreas cell [29], cardiac cell [29], pluripotency [31], retinal cell [32], and neural development [33], respectively (Fig. 1.6b).
../images/500906_1_En_1_Chapter/500906_1_En_1_Fig6_HTML.png
Fig. 1.6

SAHA-PIP conjugates that activate a particular gene network

Analogously, N-(4-Chloro-3-(trifluoromethyl)phenyl)-2-ethoxybenzamide (CTB), which is known to be a histone acetyltransferase (HAT) activator, exhibits the similar effect when it is conjugated to a PIP. A PIP I sequence connected to a CTB, named CTB I, induced a panel of genes similar to the case of treatment of SAHA I (Fig. 1.7) [34]. These results indicate that sequence-specific DNA recognition status derived from the PIP domain governs the action site of the chromatin modulating agents.
../images/500906_1_En_1_Chapter/500906_1_En_1_Fig7_HTML.png
Fig. 1.7

CTB-PIP conjugate (I) that exhibits a similar gene regulation profile compared to the SAHA-PIP conjugate (I) as indicated by a heatmap obtained from DNA microarray analysis

The use of a bromodomain inhibitor as the partner of the conjugation with PIPs was also found to be a suitable option for the targeted transcriptional activation (Fig. 1.8a). Ansari and colleagues reported that the repressive and expanded GAA microsatellite repeats (>120 repeats) that silence frataxin (FXN) expression in Friedreich’s ataxia (FRDA) was selectively activated by the treatment of GGA-repeat-targeting PIP tethered to a bromodomain inhibitor (JQ1), named Syn-TEF1 (Fig. 1.8b), where targeted recruitment of an elongation factor (P-TEFb), concomitant with the assembled BRD4 across the GAA repeats restores FXN expression in FRDA patient-derived cells harboring a broad range of repeat expansion (Fig. 1.8b) [35]. Similarly, a CBP30 derivative, another class of bromodomain inhibitors, was successfully utilized to design a sequence-specific transcriptional activation tool, named Bi-PIP. The Bi-PIP targets the coactivator P300/CBP family of proteins and causes P300-dependent histone acetylation at the specific loci (Fig. 1.8b) [36]. A well-designed in vitro ChIP assay demonstrated that the Bi-PIP efficiently and selectively acetylated the histone H3 in a dose- and sequence-dependent manner. The mechanism of the action of the Bi-PIP as demonstrated by in vitro assays was biologically validated, where the most upregulated protein-coding gene included multiple putative binding sites of the PIP domain of the Bi-PIP in the gene body and promoter [36].
../images/500906_1_En_1_Chapter/500906_1_En_1_Fig8_HTML.png
Fig. 1.8

Bromodomain inhibitor-PIP conjugates; a Bromodomain inhibitors, b Syn-TEH1, c Bi-PIP

1.1.3 Trinucleotide Repeat-Targeting Molecules

In the human genome, a vast number of microsatellite repeat sequences were prevalent and some of them have profound implications in biological and neuropathological contexts [37, 38]. In this section, we highlight the trinucleotide microsatellite sequence and its implications in hereditary diseases from the viewpoint of DNA conformations and their targeting molecules as potential therapeutic agents and diagnosis tools.

The expansion mutations are often seen in the human genome, and the elongation exceeding a defined threshold level causes hereditary disorders, including trinucleotide repeat diseases (Fig. 1.9) [39, 40]. The simple trinucleotide repeats, as exemplified by (CNG)n (where N represents any nucleotides) or (GAA)n, possess the expandable nature of repeats through the DNA replication, repair, and recombination processes, which are likely mediated by peculiar DNA conformation formed by the repeat sequences. This repeat expandable nature is accelerated with an increase in the length of the repeats and thus causes a decreased age of onset and increased severity in individuals of the subsequent generation.
../images/500906_1_En_1_Chapter/500906_1_En_1_Fig9_HTML.png
Fig. 1.9

Genomic locations and expansion thresholds of triplet repeats associated with trinucleotide repeat diseases. The repeat expansion exceeding defined thresholds is a pathogenic origin. UTR; untranslated regon,. SCA; spinocerebellat ataxia

Mutant RNAs and proteins derived from the expanded repeat DNA regions are directly associated with the pathogenesis of each hereditary disorder [41]. For example, the expansion of repetitive CAG trinucleotides (>36 repeats) within the first exon of the Huntingtin (HTT) gene causes Huntington’s disease [42]. The expanded repeat-derived polyglutamine (PolyQ) tracts tethered to a full-length HTT protein and toxic long hairpin RNA structures formed by the repetitive region of HTT transcripts are primary pathogenesis substances [4345]. In myotonic dystrophy type 1 (DM1), transcripts harboring excessive CUG repeats that are transcribed from CTG repeat sequences situated at the 3′ UTR of DMPK gene, form aggregative RNA foci within the nucleus by assembling multiple RNA-binding proteins [46, 47]. Those recruited proteins such as splicing regulators are unable to function properly any longer, ultimately leading to the pathogenesis of DM1. The CGG repeats located in the 5′-UTR of FMR1 gene is associated with Fragile X syndrome (FXS, > 200 repeats) and FXTAS (55–200 repeats). While the pathogenesis of FXTAS is thought to be attributable to the capability of extended CGG repeated RNAs to form multiple high-order structures [48], more frequent CGG repeats in FXS cause epigenetic repression of the gene transcription by DNA methylation within the CGG repeat and 5′-UTR region [49]. The repetitive GAA runs observed in Friedreich’s ataxia (FRDA) also have a similar situation, where repressive chromatin across these repetitive genome regions hinders the RNA polymerase initiation and elongation, thus silencing the gene expression [50, 51]. Recently, it has been found that repeat-associated non-ATG translation (RAN translation) is also the pathogenic source of (CAG)n, (CGG)n, and other repeat-associated disorders [52, 53].

The common principle for these trinucleotide repeat diseases is depicted as the unusual structural features of nucleic acids (Fig. 1.10). Particularly, excessive repeat sequences are known to adopt stable hairpin structures [37, 41]. CAG, CTG, and CGG repeated DNAs all adopt a stem-loop structure with 1 bp mismatch-containing 5′-CXG-3′/3′-GXC-5′ motifs (Fig. 1.10a, c). As discussed earlier, actinomycin D was found to have a considerable affinity to a 5′-CTG-3′/3′-GTC-5′ or 5′-CGG-3′/3′-GGC-5′ mismatched motif as elucidated by X-ray diffraction structural analysis (Fig. 1.11) [4, 5]. Notably, the administration of actinomycin D with a DM1 (CTG repeat) patient-derived cell and the mouse model exhibited the transcriptional repression, reduce the toxic RNA foci, and recover pathogenic splicing defects [54]. Nakatani and colleagues reported a rationally designed synthetic small molecule, in which napthyridine and azaquinolone residues are connected though an appropriate hinge (named NA), which can target a 5′-CAG-3′/3′-GAC-5′ motif [55]. The NMR structural analysis elegantly revealed a specific hydrogen bond-driven interaction between the ligand and the motif with mismatched adenine residues flipped out (Fig. 1.12) [55]. Based on this molecular scaffold, T•T and G•G mismatched motif-targeting ligands were also developed, and some of them were demonstrated to arrest the progression of DNA polymerase by ligand-mediated stabilization of those repetitive motifs [56]. Zimmerman and colleagues have created a series of designer molecules dually targeting for the d(CTG•GTC)n and r(CUG•GUC)n capable of selectively inhibiting the transcription process, cleaving the transcripts, and reducing the toxic RNA foci within nuclei (Fig. 1.13) [57]. Chenoweth and colleagues also reported a new class of synthetic molecules targeting the three-way junctions formed by CAG•CTG repeats (Fig. 1.14) [58]. The spatially defined molecules centered by a triptycene moiety likely mold into the cavity of the junction, and the triple cationic moieties (represented as “R” in Fig. 1.14) would contribute to increased binding affinities by interacting with the anionic DNA backbones. Interestingly, these molecules can modulate a microstructure of the DNA, whose character is potentially applicable to a chemical tool toward influencing the repeat-driven expansion/contraction and transcription processes.
../images/500906_1_En_1_Chapter/500906_1_En_1_Fig10_HTML.png
Fig. 1.10

Non-canonical DNA structures formed by expandable repeats; a hairpin formed by CNG repeats, b G-quadruplex (G4) formed by GGGGCC or CGG repeats, c slipped hairpin formed by CTG/CAG repeats, d triple helix formed by GAA/CTT repeats

../images/500906_1_En_1_Chapter/500906_1_En_1_Fig11_HTML.png
Fig. 1.11

(Left) crystal structure of a 2:1 actinomycin-d(ATGCTGCAT)2 complex (PDB code: 1mnv); (Right) crystal structure of a 2:1 actinomycin-d(ATGCGGCAT)2 complex (PDB code: 4hiv)

../images/500906_1_En_1_Chapter/500906_1_En_1_Fig12_HTML.png
Fig. 1.12

a Solution structure of a 2:1 NA-d(CTAACAGAATG•CATTCAGTTAG)2 complex (PDB code: 1 × 26). b Intermolecular hydrogen bonds between NA and guanine or adenine bases

../images/500906_1_En_1_Chapter/500906_1_En_1_Fig13_HTML.png
Fig. 1.13

Designer molecules that dually target d(CTG/GTC)n and the derived r(CUG/GUC)n transcripts; (left) compound 5; (center) compound 6; (right) compound 9 in Ref. [57]

../images/500906_1_En_1_Chapter/500906_1_En_1_Fig14_HTML.png
Fig. 1.14

Three-way junction formed by CAG and CTG repeat DNAs and triptycene molecules that bind to the cavity of the junction

1.2 G-Quadruplex

1.2.1 Structures and Biological Significance of G-Quadruplexes

As discussed until the last section, the formation of the alternative conformation of DNAs, other than B-form DNAs, in cellular dynamics is now widely accepted [37, 59, 60]. Besides the repeat-associated unusual structures, the G-quadruplex (G4) structure is considered to be another important form of nucleic acids [61, 62]. It consists of several G-tetrad layers that comprise four planar guanines linked through Hoogsteen hydrogen bonding and is folded stably under physiological conditions with monovalent metal cations (such as Na+ and K+) (Fig. 1.15). G4s can be formed by guanine-rich sequences, and a motif 5′-G≥3N1–7G≥3N1–7G≥3N1–7G≥3-3′ is advocated as consensus sequences that have the ability to form intramolecular G4s [63, 64], although several exceptions have been reported to the present time [6567]. Extensive physical characterizations of the G4 structure utilizing UV and CD spectroscopy revealed that the structure has extremely high thermal stability when possessing one or two nucleotide(s) between the G-tracts (Tm = approximately 70–90 °C) [68].
../images/500906_1_En_1_Chapter/500906_1_En_1_Fig15_HTML.png
Fig. 1.15

a Structure and schematic illustration of a G-tetrad. b Schematic illustrations of typical intramolecular G-quadruplex (G4) structures: (left) crystal structure of parallel-type telomere G4 (PDB code: 1kf1); (center) solution structure of antiparallel-type telomere G4 (PDB code: 143d); (right) solution structure of hybrid-type telomere G4 (PDB code: 2gku)

The formation of such G4s in the genome is involved in biological events related to human diseases. For example, G4 formation in the promoters of genes, most notably of genes involved in cellular proliferation and related to cancer, controls the expression of downstream genes by interfering with transcription factor binding (Fig. 1.16a) [69]. Similarly, G4 formation blocks the progression of DNA polymerase and sometimes causes severe DNA damage, such as strand breakage (Fig. 1.16b) [7073]. Alternatively, G4s can function positively as markers for chromatin remodeling in G-rich regions to recruit histone H3.3 variants, and as the origin of replication at certain loci (Fig. 1.16c) [74]. These mechanisms rely on the involvement of G4-binding proteins. Recently, it was reported that G4s affect epigenetic alterations. The formation of a single G4 at some loci stalls replication temporarily; this time lag causes an irreversible change in the histone pattern of each duplication process (Fig. 1.16d) [75, 76]. Furthermore, the excessive formation of repeated G4 structures accounts for some hereditary disorders [74]. Those vital roles of G4s in the transcription and replication processes were biophysically supported by the observation that G4 structures are markedly stable in spatially limited environments like holes inside processing DNA/RNA polymerases [77]. These various biological functions of G4s may be derived from their sequence, stability, location in the genome, local environments, and combinations of these factors; however, this issue has not been elucidated fully.
../images/500906_1_En_1_Chapter/500906_1_En_1_Fig16_HTML.png
Fig. 1.16

Large variety of biological functions of G4 DNA. a The interferene of transcription factor binding or nucleosome occupancy by G4s can regulate the downstream gene expression. b The arrest of DNA polymerase progression occasionally induces strand breakage. c A series of actions of G4-binding proteins leads to the recruitment of histone H3.3 variants to reconstitute correctly the structure of the nucleosome. G4s play an important role in initiating replication. d Stalling the duplication process temporarily at G4s results in irreversible changes in histone modification patterns. Recycled and newly recruited histones are colored in green and red, respectively

Historically, the G4 structures observed in a telomere tandem repeat (GGGTTA)n region at the end of the chromosome initially attracted a great deal of attention, as a single-stranded background offers a greater likelihood of G4 formation. The group of Neidle and Hurley first identified that a telomere G4 DNA-interactive molecule is a telomerase inhibitor, using a series of 2,6-diamidoanthraquinone derivatives [78]. The discovery of telomestatin, a naturally occurring macrocycle compound that exhibits telomerase-inhibiting activity by binding to telomeric G4 structures, suggested the existence of G4s in vivo [79, 80]. The creation of extremely high affinity and specific antibodies to G4s has enabled the visualization of G4s by immunofluorescence, which demonstrated the existence of G4s not only in the single-stranded telomere region but also in duplex regions in nuclei [81, 82]. Regarding G4s in duplex regions, structural analysis initially focused on one that was observed in the nuclease hypersensitive element (NHE) III1 located in the promoter region of a c-myc oncogene [83], the biological significance of which Hurley demonstrated by showing that it was associated with control of gene expression [69]. At a relatively early stage of G4 studies, other biologically important genes, such as hTERT [84], c-kit [85], KRAS [86, 87], BCL2 [88, 89], and VEGF [90], were also identified as genes in which the formation of a G4 was involved in transcriptional regulation. In parallel, many high-resolution G4 structures were elucidated at the atomic level using nuclear magnetic resonance (NMR) and X-ray crystallography, opening a new avenue for the rational design of G4 ligands [9196]. Recently, G4 ChIP seq analysis revealed that approximately 10,000 actual G4 structures form in the human genome [97]. In addition, the growing number of reports of G4-interacting proteins and their relevant functions supports the biological functions of G4s [98102]. For instance, some of the RNA helicases recognize G4 structures and manifest their unfolding activity so that DNAs are correctly replicated by DNA polymerases [101]. When such helicases were mutated so as to be devoid of unfolding ability, replication forks stalled in the genome at folded G4s, which resulted in genome instability. These mechanisms have been shown in some cases to be associated with genetic diseases [101, 102].

1.2.2 G-Quadruplexes and Cancers

As mentioned earlier, the formation of G4 structures in human telomeric DNA was first assumed, due to the characteristic guanine-rich sequence (TTAGGG)n and a single-stranded context of human telomere. Abundant evidence has accumulated during the past two decades that the G4 structures are truly formed in the telomere region and has an important role in telomere-end processing in cells [81, 103]. More importantly, stabilization of telomere G4s and blockage of telomerase activities by small molecules, exemplified by telomestatin, is a new strategy for antitumor therapy [79, 80].

G4-forming sequences observed in the promoter of cancer-related genes have also received a great deal of attention as potential biomedical targets for antitumor therapy [104]. Generally, targeting the promoter region rather than expressed proteins has several advantages, including the lower likelihood of point mutations and the development of drug resistance. Quarfloxin, a G4-interacting ligand, had completed Phase II trials as a candidate therapeutic agent candidate against several tumors, including neuroendocrine tumors, carcinoid tumors, and lymphomas [105]. Quarfloxin disrupts the G4–nucleolin complexes of ribosomal DNA in the nucleolus, which in turn redistributes nucleolin into the nucleoplasm where it binds specifically to a G4 in the promoter region of c-myc proto-oncogene to inhibit its gene expression. Although the Phase III trials for Quarfloxin are currently not proceeding due to high albumin binding, several tumor-related genes were identified as genes in which the formation of a G4 was involved in transcriptional regulation, showing that G4s are potential molecular targets for cancer therapy. In this section, we would like to discuss the detailed exposition of the telomere and G4-driven oncogenes in terms of the direct targetability by synthetic ligands.

1.2.2.1 Telomere

A telomere is a structure of the ends of the chromosome, in which a repeated microsatellite sequence and its specifically interacting components (called a shelterin complex) protect the DNAs from DNA repair mechanisms [106]. The human telomeric DNA comprises a single microsatellite repeat sequence, (GGGTTA)n, with a 3′ overhang at its terminus (200 ± 75 nucleotides). In normal somatic cells, the length of the telomere sequence gradually shortens with DNA replication, which limits cell growth and proliferation, as the expression of telomerase is almost entirely silenced. Telomerase is a reverse transcriptase enzyme that adds a repeated DNA sequence to the 3′ end of telomeres. It consists of a catalytic subunit, hTERT, and TR/TERC, which is an RNA template that is used during the elongation of telomeres by hTERT. Although TR/TERC is globally expressed (regardless of cell type), hTERT is silenced in somatic cells and is reactivated in nearly 90% of human cancers. Aberrant telomerase activity disturbs the balance of the normal telomere maintenance mechanisms, contributing to the acquisition of immortality. Hence, inhibition of telomerase has long been considered as a potential therapeutic strategy for human cancers, and several telomerase inhibitors have entered preclinical or clinical trials. However, no clinically important benefits of these drugs have been reported to date. Recently, quadruplex-binding telomerase inhibitors have been considered as an alternative strategy for curing telomerase-positive cancers, as they exhibit high antitumor activity while minimally affecting normal somatic cells in vivo.

As mentioned before, 2,6-diamidoanthraquinone derivatives and telomestatin were first found to be telomerase inhibitors through their binding to telomere G4s. Similarly, RHPS4 was shown to induce telomere dysfunction by disturbing the integrity of the shelterin complex in mammal cancer cells [107]. The later relevant studies found that a large repertoire of alternative higher-order structures derived from the canonical telomere G4 have been thought to be adopted at 3′ overhang region [65, 108110]. Those structures and their specific motifs are amenable to a gain of specificity for telomere G4s.

1.2.2.2 c-myc

c-myc encodes a multifunctional transcription factor that can act as a transcription activator of some genes involving the cell proliferation, while acting as a transcription repressor of other genes involving the growth arrest [111, 112]. There are a broad variety of c-myc-responsive genes that engage in the important cellular functions in concert, such as cell proliferation, metabolic transformation, and metastatic capacity [113]. In tumor cells, MYC protein function is almost always activated primarily through upstream oncogenic pathways. As the overexpression of the MYC protein is observed in various human malignancies (particularly in 80% of solid tumors), the downregulation of the gene may be an effective way toward cancer therapy. However, it is generally considered to be an undruggable target at the protein level because of its short half-life and unstructured nature [104].

The c-myc promoter region contains the nuclease hypersensitive element (NHE) III1, which is located −142 to −115 base pairs upstream of the P1 promoter (Fig. 1.17a). There is one putative G4-forming sequence (PQS) in this element, which is capable of forming a nonduplex species, possibly accompanied by local unwinding or melting of the duplex structure under the influence of negative supercoiling stress (Fig. 1.17a) [114116]. Structural dynamics in this region have also been considered to be a possible key mechanism in certain carcinomas to largely govern c-myc transcription, and the formation of a G4 is likely to act as a downregulator (Fig. 1.17b). Hence, G4-interacting ligands may contribute to suppression of the downstream c-myc gene expression by ligand-mediated G4 stabilization [117, 118]. In this context, the c-myc targeting G4-interacting ligands have been studied during the past two decades with an aim toward drug applications for antitumor therapy.
../images/500906_1_En_1_Chapter/500906_1_En_1_Fig17_HTML.png
Fig. 1.17

a c-myc promoter has one putative G4-forming sequence (PQS). b Solution structure of G4 from a NHE III1 region in the vicinity of the P1 promoter (PDB code: 1xav)

1.2.2.3 VEGF

Tumor progression and metastasis render the tumors more mature and malignant than undeveloped neoplasms, eventually resulting in the deterioration and immortality. Overexpressed vascular endothelial growth factor (VEGF) proteins including VEGFA, VEGFB, VEGFC, VEGFD, VEGFE, and PIGF in tumor cells are responsible for induced neovascularization. The expression of human VEGF, which is frequently elevated in many types of cancer, is regulated mainly at the transcriptional level [119, 120]. In a reporter assay system using several cancer cell lines, regulation of VEGF was basically regulated by a sequence from −85 to −50 relative to a transcription initiation site containing five arrays of more than three consecutive G-tracts, which is likely to adopt the G4 form of DNAs [121, 122]. VEGF is the attractive target molecule for malignant tumor therapy and its targeted antibody drugs have been approved for solid tumor treatment [123, 124]. Interestingly, VEGF gene has a promoter region in which the G4-forming sequences are located (Fig. 1.18). The sequences are also consensus sequences for transcription factors such as Egr-1 and Sp1, suggesting the dynamic equilibrium of DNA forms in this region also affects the gene regulation [93, 121].
../images/500906_1_En_1_Chapter/500906_1_En_1_Fig18_HTML.png
Fig. 1.18

a VEGF promoter has one PQS located close to the transcription start site (TSS) and hormone response element (HRE) that regulate the transcription. b Solution structure of G4 from the vicinity of the promoter (PDB code: 2m27)

Initially, the interaction of TMPyP4 and telomestatin with G4 oligonucleotides proved to unwind the duplex DNA oligomer into ssDNA oligomer and stabilize the G4 structure [90], and Se2SAP, a global G4-interacting ligand, efficiently suppressed VEGF expression in two adenocarcinoma cell lines (HEC1A and MDA-MB-231) [125]. These data offer the possibility that the transcription regulation of VEGF is controllable by ligand-mediated G4 stabilization and lead to the application of G4-interacting ligand to cancer therapy. Similarly, a perylene monoimide derivative, PM2, was found to be a VEGF downregulator likely by direct interaction with the G4 structure [126]. A quindoline derivative, SYUIQ-FM05 also demonstrated strong interactions with a VEGF G4 and exhibited potential antiangiogenic and antitumor activities [127]. On the basis of these successful reports, several VEGF G4-preferred ligands have been developed, through small screening using docking and/or spectroscopic approaches [128, 129]. Biological activities of these ligands have never examined thus far, and therefore, a future study is awaited.

1.2.2.4 BCL2

BCL2 (B-cell lymphoma 2) is recognized as an apoptosis-related gene whose translated product resides on the cytoplasmic face of the mitochondrial outer membrane and acts to suppress the movability of apoptosis-induced proteins by controlling mitochondrial membrane permeability [130]. Overexpressed BCL2 protein expression is associated with aberrant carcinoma growth in various human diseases, particularly solid tumors such as lymphomas, non-small cell lung cancer, myeloma, and melanoma, being recognized as a target for cancer therapy in the past three decades [131]. Several approaches have been made to downregulate of the BCL2 expression in cancer cells by small molecule to disrupt protein–protein interactions [132], antisense oligonucleotides [133], and peptidomimetics [134] toward cancer therapy. Overexpression of BCL2 is also indicated to be a principal element of chemoresistance, particularly for lymphocytic cancers [135, 136]. For instance, transfection of BCL2 into A549 cells induced resistance to the apoptotic effect triggered by triazine derivative 12459, a G4-interacting ligand that inhibits telomerase activity [137]. As another approach, the molecular decay effect by guanine-rich AS1411 aptamer that can be stably folded into a G4 structure causes the destabilization of BCL2 mRNA and degradation with RNase by interfering with the binding of nucleolin to the AU-rich element of BCL2 mRNA, eventually inducing apoptosis [138]. This approach is reminiscent of the involvement of G4 formation in the gene expression.

Amplification and translocation of BCL2 are shown to be equally common mechanisms that cause its overexpression in human cancer cells [139]. The human gene for BCL2 includes P1 and P2 promoters and has multiple transcription start sites. The major transcription regulation is less driven by a TATA box in promoter 2, while the P1 promoter that is situated 1386–1423 nucleotides upstream of the translation start site has been largely implicated in the control of BCL2 transcription (Fig. 1.19a) [140]. The GC-rich element exists in 1490−1451 nucleotides upstream of the P1 promoter, where multiple transcription factors have been said to be implicated in BCL2 gene expression including Sp1 [140], WT1 [141], E2F [142], and NGF [143]. That regulatory effect by the G4 formation in this region was suggested by luciferase reporter assays, in which mutation or deletion in this region resulted in an increase in promoter activity in B lymphocytes (DHL-4) [141] or human promyelocytic leukemia (HL-60) cells [144]. More recently, Onel, Yang, and coworkers demonstrated by a luciferase reporter assay using BCL2 promoter and mutated sequences that the formation of another G4 situated almost on the upper region of the P1 promoter attenuated the promoter activity (Fig. 1.19) [89]. Based on these reports, an approach to stabilizing the G4s formed in the regulatory element and attenuating the promoter activity by ligands has also been studied for cancer therapy, similar to the small molecule targeting of the c-myc G4.
../images/500906_1_En_1_Chapter/500906_1_En_1_Fig19_HTML.png
Fig. 1.19

a BCL2 promoter has two G4-forming elements that were shown to attenuate the BCL2 promoter activity. b Solution structure of G4 from the vicinity of the P1 promoter (PDB code: 2f8u)

In addition to the G4s, i-motif, another form of DNA that forms in cytosine-rich sequences is involved in transcriptional regulation, in which the binding of hnRNP LL to the i-motif structure likely activates the BCL2 gene expression [145]. Moreover, an i-motif-interacting molecule, IM-48, was identified to modulate the BCL2 gene expression by affecting the dynamic equilibrium of the i-motif and the flexible hairpin form [145], opening a new avenue to more precisely modulate the gene expression of BCL2. Targeting such canonical DNAs formed in the regulatory element of the promoter may be an effective way to specifically target a particular target to combat the tumor.

1.2.2.5 c-kit

The c-kit proto-oncogene encodes a receptor tyrosine kinase that is bridged and activated by the binding of dimerized stem cell factors (SCF), and in turn stimulate proliferation, differentiation, and survival in hemopoietic precursor cells [146148] Malfunctions of the KIT protein acquired by overexpression or mutations have been associated with several diseases including gastrointestinal stromal tumors (GIST), mastocytosis, and acute myelogenous leukemia (AML). Although the kinase inhibitor Imatinib (Glivec) has been successfully developed as an FDA approved drug for GIST, the long-term exposure often causes secondary mutations at exon 13, 14 or 17 that encodes tyrosine kinase domains [149]. Notably, drug resistance derived from mutations at exon 17 is found to severely attenuate the therapeutic effect by imatinib [150]. A compelling approach to fundamentally suppress c-kit expression would be highly desirable.

The human c-kit promoter is devoid of both a TATA box and CCAT boxes [151, 152]. Instead, the region within 200 nucletides upstream from TSS is highly rich in GC content, where several transcription factors are implicated (Fig. 1.20a). Two well-defined G4 structures were resolved, and the three-dimensional structural dynamics are shown to be involved in the regulation of c-kit gene transcription, accelerating the development of c-kit G4-preferred ligands (Fig. 1.20b) [153156]. The modulation of such structural dynamics by small molecules is effective for suppressing gene expression and exhibiting an apoptotic effect.
../images/500906_1_En_1_Chapter/500906_1_En_1_Fig20_HTML.png
Fig. 1.20

a c-kit promoter has two PQSs, where several transcription factors are likely involved. b Solution structure of G4 from the proximal PQS in the vicinity of the promoter (PDB code: 2o3m)

1.2.2.6 hTERT

hTERT (human telomerase reverse transcriptase, TERT), which encodes the catalytic subunit of telomerase, has considerable attention as a compelling biomedical target particularly for cancers, since elevated TERT expression was often observed in ~90% of human cancer cells, whereas it is normally silenced in most of the normal cells [157, 158]. Aberrantly expressed TERT accelerates telomerase activity to irregularly maintain the telomere length [159]. Other than the canonical role as the maintenance of telomere length, TERT has been considered to suppress BCL2-dependent apoptosis [160] to regulate chromatin state [161, 162] and DNA damage responses [163], and to promote MYC and Wnt-driven cellular proliferation [163, 164].

The mutations that were identified in >70% of melanomas partially account for the elevated level of TERT expression [165]. The recent studies demonstrated that C to T mutations in the sense strand (G to A mutations in the antisense strand) in the TERT promoter highly activated transcription through creating a new consensus sequence for the binding of ETS/TCF (E-twenty six/ternary complex factor) [166]. Patients who have tumors expressing elevated levels of TERT exhibit even worse entire survival rates compared to those who do them expressing relatively lower levels of it [167]. These observations clearly indicate that a TERT promoter targeting based on the mutations might have a great impact on tumor therapeutics covering a wide range of tumors.

1.2.2.7 KRAS

The RAS gene family including HRAS, NRAS, and KRAS was first discovered in human tumors as driver oncogenes and has long been recognized as important therapeutic targets. Mutation of the KRAS gene is one of the most oncogenic driver mutations in pancreatic, colorectal, and lung cancers and plays a role in acquiring and increasing the drug resistance [168, 169]. Hence, the direct targeting for active KRAS by small molecules was considered to be a compelling strategy to combat the KRAS mutant tumors, yet it remains at an unsuccessful stage. Recently, our group has developed a novel approach that directly targets the mutant DNA using an alkylating pyrrole–imidazole polyamide (PIP) molecule, where it is capable for selectively alkylating oncogenic codon 12 mutant DNA and causing strand cleavage and consequent tumor growth suppression in tumor xenograft model of cancer in mice [26].

G4-mediated promoter targeting is also reported. The NHE in the KRAS proximal promoter is highly abundant in G-rich sequences, and several transcription factors interact with a G4 structure formed in this region [86, 87, 170172]. A polypurine G-rich element located in approximately −300 to −100 nucleotides upstream of the exon 0/intron 1 boundary in a murine genome, or human genome was likely to be a component of the promoter activity and the PQS [86, 87, 170174]. Importantly, pyrene-modified oligonucleotides that were devised to be a more stable form of the KRAS G4 was able to attract the transcription factors essential for transcription and to exhibit a strong antiproliferative activity through a G4-decoy effect in pancreatic cancer cells [175].

1.2.2.8 c-myb

c-myb is largely expressed in an early stage of the differentiation of hematopoietic cells, and its expression is gradually decreased toward the end of the differentiation [176]. It encodes a transcription factor that plays a critical role in the proliferation, differentiation, and survival of haematopoietic progenitor cells. c-myb was identified by the discovery of v-myb oncogene found in avian myeloblastosis virus and E26 [177]. This gene is also recognized as a proto-oncogene, high expression of which is related to promoting the development of hematologic cancers and adenocarcinomas by a mechanism based on its canonical proliferative property [178182].

The regulation of c-myb expression at a transcription level relies on multiple activating and repressing transcription factors in a cell-type-dependent fashion [183188]. Notably, a region in the promoter with three (GGA)4 triplet repeats beginning 17 nucleotides downstream of the transcription initiation site on the antisense strand was implicated in the promoter activity by forming very thermally stable higher-order parallel G4structures [189191]. Partial deletion of the (GGA)4 triplet repeats not to be capable of forming the dimerized G4 enhances the promoter activity, suggesting that the G4 structures formed by utilizing together the three (GGA)4 triplet repeats should function as a negative regulator of the c-myb promoter activity [189]. Additionally, MAZ protein may bind to the c-myb G4 structure and negatively regulate the promoter activity.

Recently, the group lead by Yuan performed a reporter assay to examine extensively the way that folded and unfolded G4s in the c-myb promoter activity affect the gene expression [192]. In this system, four PQSs in the c-myb promoter were selected as potential G4 formation elements, and the involvement of the respective G4s in promoter activity was measured using a set of promoter-containing plasmids where mutations were made so as not to form G4s in the respective PQSs (Fig. 1.21a). The promoter activity of the PQS1-mutated plasmid was markedly reduced, whereas the PQS1, 2-, and 3-mutated plasmid exhibited no significant changes in these promoter activities. These data strongly implied that the transcription regulation on those G-rich sequences was considerably mediated by the formation of the G4 structures on the PQS1 element, where the binding of a transcription suppresser was likely impeded. The newly discovered c-myb G4-interacting ligand, topotecan specifically increased the transcription level in the wild-type plasmid without affecting the case of using PQS1-mutated plasmid (Fig. 1.21b). This downregulatiing effect was confirmed in the endogenous c-myb gene expression at the protein level.
../images/500906_1_En_1_Chapter/500906_1_En_1_Fig21_HTML.png
Fig. 1.21

a c-myb promoter has multiple PQSs. b Chemical structure of topotecan that efficiently represses the MYB protein expression

When the story moves more specifically to human diseases, c-myb proto-oncogene is identified as a target in glioma stem cells for glioblastoma multiforme (GBM) therapy, in which expression was considerably elevated in GBM tissues relative to normal tissues [193]. Interestingly, telomestatin, a global G4-interacting ligand, cause the impairment of the maintenance of GSC stem cell state through an apoptotic pathway largely by reducing a c-myb expression in vitro and in vivo. Although the direct interplay of telomestatin and c-myb G4s in the promoter has not examined, these observations offer the possibility that direct c-myb G4 DNA targeting might be a compelling therapeutic approach to GBM treatment.

1.2.2.9 Others (PDGFR-β, PDGF-A, STAT3, FGFR2)

Other G4s formed in putative regulatory elements in the promoters of cancer-related genes have been reported and are proposed as targetable by G4-interacting ligands (in promoters in genes for PDGFR-β [194], PDGF-A [195], STAT3 [196], FGFR2 [197]). For instance, GSA11129, which can interact with a G4 in the gene for PDGFR-β promoter to shift the equilibrium to a G4 species, was demonstrated to reduce the transcription level and to inhibit PDGF-β-driven cell proliferation and migration [194]. The G-rich element of the proximal promoter in the gene for PDGF-A also forms a stable G4 structure even in the duplex context, and TMPyP4 reduced the basal promoter activity of PDGF-A, suggesting that targeting the PDGF-A G4 by the ligand specific for this G4 may be feasible as cancer therapy for gliomas, sarcomas, and astrocytomas [195, 198202].

1.2.3 G-Quadruplex-Interacting Ligands

Studies on the G4s from extensive aspects render researches led to a belief in the notion that G4s can form in the guanine-rich region in the human genome and are regarded as biologically and pharmaceutically important. In this context, numerous researchers have made tremendous efforts to get highly active G4 ligands and some of them attained great success in the development of drugs in vivo [203]. However, these drugs are still only midway toward approval for clinical use.

One conceivable obstacle to impede the clinical application of G4-interacting molecules seems to rest with selectivity, although the global or multiple G4 targeting approaches may in some cases be effective [204207]. As mentioned earlier, approximately 10,000 G4 structures exist in the human chromatin [97]. A growing number of G4-driven genes have also been reported, suggesting the high importance of the expanded varieties of G4-interacting ligands that possess differential binding profiles [208, 209]. However, poor ligand designability originating from the topological similarity of the skeleton of diverse G4s has remained a bottleneck for gaining specificity toward the individuals. Very recently, researchers came to enter the new phase of the development of next-generation G4-interacting ligands in which they consider the ligand selectivity to a particular G4 to be targeted, not only leading to developing highly antitumor and bioactive molecules with minimized side effects toward antitumor therapy, but also creating chemical biology tools for the detailed investigation of the functions of individual G4s in the genome [209]. In the next section, we address the recent progress of G4-interacting molecules that can discriminate particular G4 structures from the others.

1.2.4 Addressing the Specificity of Ligands to Particular G-Quadruplexes

1.2.4.1 Global G-Quadruplex-Selective Ligands

Since G4-interacting molecules were developed based on duplex DNA-binding molecules, researchers initially endeavored the development of G4 ligands that have a clear selectivity to G4 structures over the duplex DNA [210, 211]. A telomere G4-interacting molecule, 2,6-diamidoanthraquinone derivatives, was first found to act as a telomerase inhibitor by the group of Neidle and Hurley, as discussed before. Cationic porphyrin, TMPyP4, was also identified to be a G4 binder, whose planar skeleton and cationic propensity would be preferable for G4 binding [212]. Moreover, several commercially available G4 ligands such as BRACO19 [213], Pyridostatin [214], Phen-DC3 [215], L2H2-6OTD [216], and L1H1-7OTD [217] that have negligible binding affinities to duplex DNAs is dispensable to biochemical, biophysical, and chemical biology studies on G4s.

1.2.4.2 Flat-Shaped Compounds that Were Originally Developed in Different Fields

Flat-shaped compounds that were originally developed in different fields are often re-recognized as being G4 ligands because of their planar geometry and availability. In line with this background, some of these compounds possess an inherent preference for the topologies of certain G4s. For example, NMM IX prefers to bind to a hybrid or parallel topology [218220], whereas crystal violet can discern an antiparallel topology (Fig. 1.22a, b) [221].
../images/500906_1_En_1_Chapter/500906_1_En_1_Fig22_HTML.png
Fig. 1.22

DNA G4 ligands with a preference toward particular topologies or G4s. a, b Studies in the field of G4s shed light on NMM IX and crystal violet as topology-preferred ligands. cm Synthetic ligands likely to interact with loops and grooves that offer distinct environments as scaffolds for specific molecular recognition

1.2.4.3 Loops and Grooves that Offer Distinct Environments for Specific Molecular Recognition

In the past three decades, a series of intensive studies using NMR techniques and X-ray crystallography for the atomic-level elucidation of a library of G4 structures has facilitated and rationalized the design of G4 ligands that exhibit specificity between different G4s. One approach to gain specificity among many types of G4s without reducing the binding affinity is the use of loops and grooves that offer distinct environments for specific molecular recognition. For instance, the core G-tetrad layers of three types of the telomere G4 structures are centered with loops differently positioned (Fig. 1.15b). Based on this principle, several successful attempts were made to achieve preference toward a particular G4 over other quadruplexes. CPT2 can visually discern antiparallel G4s from parallel ones by fitting into the shape of the groove (Fig. 1.22c) [222]. ThT-HE, a thioflavin T analogue that was modified by the addition of a hydroxyethyl group at the N3 position of the benzothiazole ring, exhibits a clear preference toward a c-myc parallel G4 relative to other parallel structures (c-kit DNA, c-src DNA, and NRAS RNA) in a sodium-dominant buffer using fluorescence detection (Fig. 1.22d) [223]. Acridine–peptide conjugates were developed to discriminate between distinct G4s. Two peptide sequences (with substituents at different sites to contact distinctly the loops and grooves) were attached to an acridine core moiety that targets the planar surface of a G-tetrad. SPR-binding assays showed that Compounds 10, 14, 19, and 21 could distinguish specific G4s (Fig. 1.22e) with very high binding affinities (KD = 4–25 × 10−9 M) [224]. Molecular modeling suggested that the spatial allowance of the rectangular acridine moiety upon occupying the wider square shape of a G-tetrad would facilitate the correct positioning of substituents and their distinct interaction with the loops and grooves. GQR was identified among several BODIPY derivatives as a particular parallel G4-preferred light-up probe (93del: 5′-G4TG3AG2AG3T-3′ over c-myc and c-kit parallel G4s) (Fig. 1.22f) [225]. NDI 3 was developed as a ligand with specificity for a c-kit G4, in which a planar core naphthalenediimide was functionalized with two lysines with boc-protected side chains (Fig. 1.22g) [226]. The preference for this interaction possibly relies on the specific contact with the loops or grooves. Phen-Et, a phenanthroline-bisbenzimidazole caroboxyamide molecule, shows a preference for c-myc and c-kit parallel quadruplexes over any topology of telomere G4s (parallel, antiparallel, hybrid, or higher-ordered topologies), albeit with a moderate binding affinity (KD ~ 1.6 × 10−5 M) (Fig. 1.22h) [153]. Computer-aided modeling studies underscored the significance of the optimal projection of N,N-dimethylaminoethyl side chains at the N position of the benzimidazole moiety for recognizing the propeller loops of promoter G4s. Guanosine moiety can be used for specific recognition when attached to a dansyl moiety to yield DDG, in which two azide-labeled dinucleosides are linked across a dansyl dialkyneamide through click chemistry (Fig. 1.22i) [227]. This conjugate is capable of recognizing specifically a c-myc parallel G4 against a c-kit parallel one. TOxaPy, a crescent-shaped molecule that is alternately made up of pyrimidine and oxazole rings, shows preferential binding to a telomere with antiparallel topology over a telomere with parallel topology with a high binding affinity (KD = 2 × 10−7 M) (Fig. 1.22j) [228]. Specific groove binding of ToxaPy to the antiparallel topology has been predicted by a docking analysis, but this has not been confirmed. It is worth describing the last three molecules, BTC-f [229], TH3 [230], and IZCZ-3 [231], because these molecules have been shown to reduce off-target effects in biological experiments (Fig. 1.22k–m). Therefore, information about their in vitro preference is biologically confirmed.

1.2.4.4 Template-Guided Component Assembly and Linkage Through Click Chemistry

Template-guided component assembly and linkage through click chemistry give highly effective ligands, usually resulting in a gain of specificity. The ligand is constructed in situ on the basis of the close proximity and appropriate direction of the functionalized components upon specific binding to a certain target. Using a telomere G4 as a template, pyridostatin (PDS)-based adduct 10 was found to be a more potent TRF1 competitor in cellular experiments than PDS, which is a general G4 binder; however, the binding affinity was slightly reduced (Fig. 1.23) [232]. Using the same strategy, an RNA G4-targeting ligand, carboxyPDS, was identified (to be discussed in a later section). This system is clearly applicable to other types of G4 targets.
../images/500906_1_En_1_Chapter/500906_1_En_1_Fig23_HTML.png
Fig. 1.23

Template-guided component assembly and linkage through click chemistry gave the highly effective ligand, pyridostatin-based adduct 10, which is a more potent TRF1 competitor than PDS

1.2.4.5 Targeting of Non-canonical Higher-Order G-Quadruplex Structures

Recently, non-canonical higher-order G4 structures have been highlighted because of the specific and precisely controlled targeting of designated G4s. Better use of the specific motif in those structures is made by acquiring specificity. Hurley developed a small molecule that is specific to the higher-order G4 structure observed in the hTERT promoter via dual-motif targeting, mismatched duplex stem loop, and its proximal G4 (GTC365, Fig. 1.24a) [233]. In parallel, Phan’s group extensively studied duplex stem-loop-containing G4 motifs using both bioinformatics and biophysical approaches [67, 234236]. These G4/duplex motifs serve as a dual binding site that has been shown by NMR spectroscopy to be simultaneously targetable with two distinct molecules (Netropsin/Phen-DC3 or other G4 ligands) (Fig. 1.24b) [237]. Although this non-linked dimeric system is at a primitive stage for the specific targeting of duplex-containing G4 motifs, careful design and linkage of the readout molecules such as PIP may allow the creation of highly specific hybrid molecules in future.
../images/500906_1_En_1_Chapter/500906_1_En_1_Fig24_HTML.png
Fig. 1.24

a GTC365 is a highly specific to hTERT G4 containing a mismatched hairpin stem loop, which is recognized by a guanidine moiety. b Coaddition of netropsin and Phen-DC3 is able to simultaneously recognize both duplex and G4 segments on duplex stem-loop-containing G4 motifs, respectively. N-methylpyrrole is highlighted in blue

1.2.4.6 Cell-Based Screening of G-Quadruplex Ligands

Cell-based screening of G4 ligands overcomes the incompatibility between the outcomes of in vitro and cellular applications, which occurs often in in vitro-based ligand discovery. Moreover, it can be applied to the discovery of potential highly specific ligands. Luciferase reporter assays performed on a 96-well plate using a human gastric carcinoma cell line (HGC-27) led to the discovery of two benzo[a]phenoxazine (BPO) derivatives as potent c-kit G4 ligands (Fig. 1.25a) [238]. Subsequent RT–qPCR and SPR-binding analyses confirmed these two molecules acted as endogenous c-kit gene suppressors in an HGC-27 cell line, probably through binding to c-kit promoter G4s. Similarly, two quinazolone derivatives were identified that could downregulate c-kit expression at the protein level (Fig. 1.25b) [239]. Most recently, one striking PDS analog named PDC12 was discovered using a unique cell-based screening approach. It enables the induction of G4-dependent transcriptional reprogramming by stabilizing a single G4 located at the BU-1 locus (Fig. 1.25c) [240]. Interestingly, the local transcription reprogramming by PDC22 occurs in two stages, i.e., the loss of H3K4me3 and DNA cytosine methylation. It is noteworthy that the changes in the histone pattern were irreversible, even after the compound was removed. This was the first example of a small molecule that induces epigenetically heritable effects by targeting DNA secondary structures; thus, it represents a potentially new approach to epigenetic therapy.
../images/500906_1_En_1_Chapter/500906_1_En_1_Fig25_HTML.png
Fig. 1.25

Hit compounds by cell-based screenings. a, b The c-kit-targeting ligands that were identified. The different parts of chemical structures are highlighted in purple, for clarity. c PDC12 was identified as the most potent candidate with the ability to induce G4-dependent transcriptional reprogramming by stabilizing a single G4 located at the BU-1 locus

1.2.4.7 Specific Targeting of Telomere G-Quadruplexes

As mentioned above, the human telomere region comprises a single microsatellite repeat sequence, (GGGTTA)n, with a 3′ overhang at its terminus (200 ± 75 nucleotides). A large variety of alternative higher-order structures derived from the canonical telomere G4 have been considered for adoption, and the specific motifs in those structures are amenable to a gain of specificity for telomere G4s using unique methodologies (Fig. 1.26a) [8688]. Dimeric G4 ligands target dimeric G4s. Tandemly aligned G4 ligands permit the favorable discrimination of a dimeric G4 from a monomeric one. The dinickel salophen dimer [241], berberine dimer [242], and telomestatin derivative tetramer [243] are successful examples of such ligands (Fig. 1.26b). In contrast, chiral helical supramolecules, Ni-M, exhibit a binding preference to dimers over monomers, with a 200-fold selectivity, probably because two consecutive G4s offer a preferred binding site (Fig. 1.26b) [244]. Conversely, the other enantiomer, Ni-P, is capable of specifically converting a monomeric antiparallel form to a monomeric hybrid form [245]. It is also interesting that, more recently, Ni-M was shown to exhibit binding affinity to a left-handed Z-G4 in an enantioselective manner [246]. The junction pocket between two G4 units also serves for specific recognition. It is possible that Helicene M1 enantioselectively recognizes the helicity of the junction cavity to some extent (Fig. 1.26b) [247]. IZNP1 was shown to be correctly positioned into that junction by molecular modeling and to exhibit a reduced binding affinity to TERRA multimeric RNA G4s [248]. Notably, this molecule caused telomeric DNA damage and telomere dysfunction, without affecting several well-studied oncogenes that have monomeric G4s in their promoters. DATPE specifically detects a dimeric G4 after insertion into the junction pocket (Fig. 1.26b) [249]. Furthermore, binding of m-TMPipEOPP to the side faces of dimeric G4s conferred a preference for multimeric G4s over the monomeric form under molecularly crowded conditions (Fig. 1.26b) [250]. TzPyBDo is able to selectively discern dimeric G4s with a negligible binding affinity to monomers (Fig. 1.26b) [251]. The junction cavity located between the duplex and G4 is also an attractive target with respect to specificity, because it provides an exceptionally targetable pocket. For example, the potential G4–duplex interface formed by telomeric repeats may be a unique target for molecular binding and specific interference with telomere-related functions. A docking analysis suggested that BSU6039 can be accommodated in the cavity by forming several hydrogen bonds (Fig. 1.26b) [110]. A dihydropyrimidin-4-one derivative was identified as a G-triplex ligand using virtual screening, but exhibited binding affinity to G4 structures (this will be discussed in a later section) [252]. A long-loop DNA sequence arranged by a monomeric G4 was also amenable to molecular recognition by hybridization of the complementary strand, which was demonstrated by an atomic-level NMR analysis [65].
../images/500906_1_En_1_Chapter/500906_1_En_1_Fig26_HTML.png
Fig. 1.26

Specific telomere G4 targeting by ligands. a Telomere G-stretch sequences potentially adopt non-orthodox G4s that offer specific binding motifs. b Several telomere G4-preferred binders based on the specific-motif recognition

1.2.4.8 RNA G-Quadruplex-Interactive Molecules

It seems that RNA is more likely to form a G4 structure because of its single-stranded state, flexibility, susceptibility to modifications, and individual distinct movements inside cells. In fact, several RNA G4s have been reported that exhibit specialized functions [253256]. In terms of RNA G4 ligand design, the acquisition of RNA G4 specificity against a DNA G4 is generally difficult because of its structural similarity to DNA G4s and the unavailability of a defined structure at the atomic level. However, some methodologies have been developed recently. CarboxyPDS was successfully identified in a small screening based on template-guided component assembly and linkage through click chemistry, as mentioned before (Fig. 1.27a) [232]. This ligand highly stabilized a TERRA RNA G4 (ΔTm = 20.7 °C), and the stabilization was not affected by the addition of up to 100 equivalents of a telomere DNA G4 competitor. It is worth mentioning that carboxyPDS has been successfully used for the selective stabilization of endogenous RNA G4s in cells [257]. RGB-1 was identified as a highly specific RNA G4 ligand in a chemical screening that relied on high-throughput assessments of the reverse-transcribed products of a G4-containing RNA template in the presence of chemicals (Fig. 1.27b) [258]. RGB-1 was demonstrated to cause RNA G4-mediated suppression of NRAS mRNA translation in breast cancer cells. Cell-based screening of G4 ligands, as mentioned previously, was also effective in this case. QUMA 1 was hit, which selectively stained RNA in HeLa cells during an enzyme-digestion-based screening of an in-house compound library (Fig. 1.27c) [259]. The hit compound exhibited desired (or more) properties that allowed the visualization of RNA G4 dynamics in live cells. ISCH-nras was uniquely developed for the detection of a particular RNA G4 (NRAS), based on the hybridization of a tail RNA sequence adjacent to a G-rich sequence by a DNA molecule connected to a quadruplex-triggered fluorescent probe (Fig. 1.27d) [260].
../images/500906_1_En_1_Chapter/500906_1_En_1_Fig27_HTML.png
Fig. 1.27

ad RNA G4-targeting ligands. d It is noteworthy that ISCH-nras1 was able selectively to target and detect an NRAS RNA G4 in a cellular context

1.2.4.9 Specific Localization for the Selective Targeting of Particular G-Quadruplexes; the Mitochondrial G-Quadruplex

Very recent studies showed that putative G4-forming sequences are present in mitochondrial DNA, and the formation of G4s is thought to have biological functions [261]. Furthermore, the fact that the mitochondrial transcription factor A (TFAM) displayed a binding affinity to G4 structures evoked great interest [262]. Hence, the mitochondrial G4 is of increasing importance, and identification of its specific targeting is crucial for the understanding of its detailed biological functions. ZnPc1 was shown to be localized in mitochondria, and photodynamic treatment (PDT) using this molecule induced the production of reactive oxygen species, a collapse of the mitochondrial membrane potential, and chromatin condensation, eventually leading to apoptosis (Fig. 1.28a) [263]. TP2Py was also shown to be strongly colocalized with mitochondria and to be a potent chemotherapy/radiotherapy for cancer (Fig. 1.28b) [264]. The planar geometry of the core part of these two ligands is suggestive of a potential binding property as G4 ligands, although these articles did not mention it. Other mitochondria-targeting planar compounds may also be involved in such G4 recognition [265, 266]. Their detailed mechanisms of action inside mitochondria await analysis.
../images/500906_1_En_1_Chapter/500906_1_En_1_Fig28_HTML.png
Fig. 1.28

Representative mitochondria-localized compounds that are likely to bind to G4 structures; a ZnPC1, b TP2py

1.2.4.10 Alternative Nucleic Acid form as a Biomedical Target, G-Triplex

The G-triplex was initially regarded as a transient DNA form and a possible intermediate in the G4 folding process. A growing body of literature suggests that such a structure forms stably under physiological conditions [109, 267269]. Along with its potential biological significance, small molecules targeting G-triplexes increasingly command considerable attention. Acridone–PNA conjugates highlight dual-site targeting by a planar acridone moiety appended with a Gly-GGG-Lys PNA sequence (Fig. 1.29a) [270]. The PNA moiety associated with one guanine of three G-tetrads to form a hybrid PNA + DNA G4. This ligand is thought to prefer a G-rich sequence in a single-stranded context over a prefolded G4; thus, it might be especially useful for the targeting of G-triplex structures in such cellular dynamics. A dihydropyrimidin-4-one derivative is identified from Mcule chemical database by simple docking programs as both G-triplex and G4 structures (Fig. 1.29b) [252].
../images/500906_1_En_1_Chapter/500906_1_En_1_Fig29_HTML.png
Fig. 1.29

ac G-triplex-targeting ligands. b A platform for their evaluation constructed by DNA origami

Our group has devised a nanoplatform constructed by DNA origami for studying such intermediates of G4 such as G-triplex and G-hairpin and found that PDC, a well-known G4-interacting ligand, unexpectedly recognized the G-triplex and G-hairpin structures (Fig. 1.29c) [271]. Considering this, the ability to recognize the intermediates of G4 might be an essential component for the high binding affinity, selectivity, or inducing ability of the G4 structures from the stable duplex or single-stranded DNA. The platform manifests the power to assess an unprecedented G4-binding property of a ligand.

1.3 Conclusion and Future Prospects

I have discussed expandable trinucleotide repeats and G-quadruplexes (G4s) in terms of molecular targets by synthetic ligands toward creating potential drugs or chemical tools. From the therapeutic aspect of G4 ligands, the G4 is relatively recently considered to be a potential biomedical target particularly for tumor or neurologic disease therapy, and a considerable body of evidence has been accumulating that G4-interacting drugs exhibit good antitumor activities. However, limited fruits remain. As protein-targeting drugs face the same situations, G4-interacting drugs displayed low selectivities to the targeted G4 structure, mainly due to the similar skeleton among different G4 forms prevalent in the genome. In this chapter, I have introduced G4-interacting ligands that were devised to gain selectivity to a particular G4 structure. The selectivity issues remain incompletely solved but, if accomplished, would substantially impact cancer therapy. Besides, the G4-driven oncogenes introduced here are known to usually well correlated and concertedly influence tumorigenesis, tumor growth, and malignant transition [160, 163, 183, 272, 273]. Although this relationship is not fully elucidated, combinatorial approaches may be a good option for further therapeutic advancements [273].

Collectively, the abovementioned non-canonical DNA conformations have profound implications in various biological, neurological, pharmacological events, primarily based on human diseases. The subsequent chapters include my Ph.D. study addressing the development of DNA-sequence and DNA-form selective ligands toward elucidating the function of non-canonical DNA structures relevant to human diseases.