© Sammie Bae 2019
Sammie BaeJavaScript Data Structures and Algorithmshttps://doi.org/10.1007/978-1-4842-3988-9_4

4. JavaScript Strings

Sammie Bae1 
(1)
Hamilton, ON, Canada
 

This chapter will focus on working with strings, the JavaScript String object, and the String object’s built-in functions. You will learn how to access, compare, decompose, and search strings for commonly used real-life purposes. In addition, the chapter will explore string encoding, decoding, encryption, and decryption. By the end of this chapter, you will understand how to effectively work with JavaScript strings and have a fundamental understanding of string encoding and encryption.

JavaScript String Primitive

JavaScript’s native String primitive comes with various common string functions.

String Access

For accessing characters, you use .chartAt().
1  'dog'.charAt(1); // returns "o"

.charAt(index) takes an index (which starts at 0) and returns the character at that index location in the string.

For string (multiple-character) access, you can use .substring(startIndex, endIndex), which will return the characters between the specified indices.
1  'YouTube'.substring(1,2); // returns 'o'
2  YouTube'.substring(3,7); // returns 'tube'
If you do not pass a second parameter (endIndex), it will return all the character values from the specified start position until the end.
1  return 'YouTube'.substring(1); // returns 'outube'

String Comparison

Most programming languages have a function that allows you to compare strings. In JavaScript, this can be done simply by using less-than and greater-than operators.
1  var a = 'a';
2  var b = 'b';
3  console.log(a < b); // prints 'true'

This can be really useful for comparing strings when sorting algorithms, which is covered later in the book.

However, if you are comparing two strings of different lengths, it starts comparing from the start of the string until the length of the smaller string.
1  var a = 'add';
2  var b = 'b';
3
4  console.log(a < b); // prints 'true'
In this example, a and b are compared. Since a is smaller than b, a < b evaluates to true.
1  var a = 'add';
2  var b = 'ab';
3  console.log(a < b); // prints 'false'
In this example, after 'a' and 'b' are compared, 'd' and 'b' are compared. Processing cannot continue because everything in 'ab' has been looked at. This is the same as comparing 'ad' with 'ab'.
1  console.log('add'<'ab' == 'ad'<'ab'); // prints 'true'

String Search

To find a specific string within a string, you can use .indexOf(searchValue[, fromIndex]). This takes a parameter that is the string to be searched as well as an optional parameter for the starting index for the search. It returns the position of the matching string, but if nothing is found, then -1 is returned. Note that this function is case sensitive.
1  'Red Dragon'.indexOf('Red');    // returns 0
2  'Red Dragon'.indexOf('RedScale'); // returns -1
3  'Red Dragon'.indexOf('Dragon', 0); // returns 4
4  'Red Dragon'.indexOf('Dragon', 4); // returns 4
5  'Red Dragon'.indexOf(", 9);    // returns 9
To check for the occurrence of a search string inside a larger string, simply check whether -1 was returned from .indexOf.
1  function existsInString (stringValue, search) {
2          return stringValue.indexOf(search) !== -1;
3  }
4  console.log(existsInString('red','r')); // prints 'true';
5  console.log(existsInString('red','b')); // prints 'false';
You can use an additional parameter to search after a certain index in a string. An example is counting occurrences of certain letters . In the following example, the occurrences of the character 'a' will be counted:
1  var str         = "He's my king from this day until his last day";
2  var count       = 0;
3  var pos         = str.indexOf('a');
4  while (pos !== -1) {
5    count++;
6    pos = str.indexOf('a', pos + 1);
7  }
8  console.log(count); // prints '3'
Finally, startsWith returns true (boolean) if the string starts with the specified input, and endsWith checks whether the string ends with the specified input.
1  'Red Dragon'.startsWith('Red'); // returns true
2  'Red Dragon'.endsWith('Dragon'); // returns true
3  'Red Dragon'.startsWith('Dragon'); // returns false
4  'Red Dragon'.endsWith('Red'); // returns false

String Decomposition

For decomposing a string into parts, you can use .split(separator), which is a great utility function. It takes one parameter (the separator) and creates an array of substrings.
1  var test1 = 'chicken,noodle,soup,broth';
2  test1.split(","); // ["chicken", "noodle", "soup", "broth"]
Passing an empty separator will create an array of all the characters.
1  var test1 = 'chicken';
2  test1.split(""); // ["c", "h", "i", "c", "k", "e", "n"]

This is useful for when there are items listed in a string. The string can be turned into an array to easily iterate through them.

String Replace

.replace(string, replaceString) replaces a specified string within a string variable with another string.
1  "Wizard of Oz".replace("Wizard","Witch"); // "Witch of Oz"

Regular Expressions

Regular expressions (regexes) are a set of characters that define a search pattern. Learning how to use regexes is a massive task of its own, but as a JavaScript developer, it is important you know the basics of regexes.

JavaScript also comes with the native object RegExp, which is used for regular expressions.

The constructor for the RegExp object takes two parameters: the regular expression and the optional match settings, as shown here:
i      Perform case-insensitive matching
g      Perform a global match (find all matches rather than stopping after first match)
m      Perform multiline matching
RegExp has the following two functions:
  • search(): Tests for matches in a string. This returns the index of the match.

  • match(): Tests for matches. This returns all the matches.

The JavaScript String object also has the following two regex-related functions that accept the RegExp object as an argument:
  • exec(): Tests for matches in a string. This returns the first match.

  • test(): Tests for matches in a string. This returns true or false.

Basic Regex

Here are the basic regex rules:

^: Indicates the start of a string/line

\d: Finds any digit

[abc]: Finds any character between the brackets

[^abc]: Finds any character not between the brackets

[0-9]: Finds any digit between the brackets

[^0-9]: Finds any digit not between the brackets

(x|y): Finds any of the alternatives specified

The following returns index 11, which is the index of the character D, which is the first character of the matched regex:
1  var str = "JavaScript DataStructures";
2  var n = str.search(/DataStructures/);
3  console.log(n); // prints '11'

Commonly Used Regexes

Regexes are immensely helpful for checking the validity of user input in JavaScript. One common type of input check is to validate whether it has any numeric characters.

The following are five regexes that developers often use.

Any Numeric Characters

   /\d+/
1  var reg = /\d+/;
2  reg.test("123"); // true
3  reg.test("33asd"); // true
4  reg.test("5asdasd"); // true
5  reg.test("asdasd"); // false

Only Numeric Characters

  /^\d+$/
1  var reg = /^\d+$/;
2  reg.test("123"); // true
3  reg.test("123a"); // false
4  reg.test("a"); // false

Floating Numeric Characters

   /^[0-9]*.[0-9]*[1-9]+$/
1  var reg = /^[0-9]*.[0-9]*[1-9]+$/;
2  reg.test("12"); // false
3  reg.test("12.2"); // true

Only Alphanumeric Characters

   /[a-zA-Z0-9]​/
1  var reg = /[a-zA-Z0-9]/;
2  reg.test("somethingELSE"); // true
3  reg.test("hello"); // true
4  reg.test("112a"); // true
5  reg.test("112"); // true
6  reg.test("^"); // false

Query String

/([^?=&]+)(=([^&]*))/

In web applications, web URLs often pass in parameters in the URL for routing or database query purposes.

For example, for the URL http://your.domain/product.aspx?category=4&product_id=2140&query=lcd+tv , the URL might respond to a back-end SQL query like the following:
1  SELECT LCD, TV FROM database WHERE Category = 4 AND Product_id=2140;
To parse these parameters, regexes can be useful.
1 var  uri = 'http://your.domain/product.aspx?category=4&product_id=2140&query=lcd+tv' ;
2 var  queryString =  {};
3  uri.replace(
4                  new RegExp ("([^?=&]+)(=([^&]*))?" , "g" ),
5                  function ($0, $1, $2, $3) { queryString[$1] =  $3; }
6  );
7  console.log('ID: ' +  queryString['product_id' ]); // ID: 2140
8  console.log('Name: ' +  queryString['product_name' ]); // Name: undefined
9  console.log('Category: ' +  queryString['category' ]); // Category: 4

Encoding

Encoding is a general concept in computer science that represents characters in a specialized format for efficient transmission or storage.

All computer file types are encoded in specific structures.

For example, when you upload a PDF, the encoding may look like this:
 1  JVBERi0xLjMKMSAwIG9iago8PCAvVHlwZSAvQ2F0YWxvZwovT3V0bGluZXMgMiAwIFIKL1BhZ2VzIDMgMCBS\
 2  ID4+CmVuZG9iagoyIDAgb2JqCjw8IC9UeXBlIC9PdXRsaW5lcyAvQ291bnQgMCA+PgplbmRvYmoKMyAwIG9i\
 3  ago8PCAvVHlwZSAvUGFnZXMKL0tpZHMgWzYgMCBSCl0KL0NvdW50IDEKL1Jlc291cmNlcyA8PAovUHJvY1Nl\
 4  dCA0IDAgUgovRm9udCA8PCAKL0YxIDggMCBSCj4+Cj4+Ci9NZWRpYUJveCBbMC4wMDAgMC4wMDAgNjEyLjAw\
 5  MCA3OTIuMDAwXQogPj4KZW5kb2JqCjQgMCBvYmoKWy9QREYgL1RleHQgXQplbmRvYmoKNSAwIG9iago8PAov\
 6  Q3JlYXRvciAoRE9NUERGKQovQ3JlYXRpb25EYXRlIChEOjIwMTUwNzIwMTMzMzIzKzAyJzAwJykKL01vZERh\
 7  dGUgKEQ6MjAxNTA3MjAxMzMzMjMrMDInMDAnKQo+PgplbmRvYmoKNiAwIG9iago8PCAvVHlwZSAvUGFnZQov\
 8  UGFyZW50IDMgMCBSCi9Db250ZW50cyA3IDAgUgo+PgplbmRvYmoKNyAwIG9iago8PCAvRmlsdGVyIC9GbGF0\
 9  ZURlY29kZQovTGVuZ3RoIDY2ID4+CnN0cmVhbQp4nOMy0DMwMFBAJovSuZxCFIxN9AwMzRTMDS31DCxNFUJS\
10  FPTdDBWMgKIKIWkKCtEaIanFJZqxCiFeCq4hAO4PD0MKZW5kc3RyZWFtCmVuZG9iago4IDAgb2JqCjw8IC9U\
11  eXBlIC9Gb250Ci9TdWJ0eXBlIC9UeXBlMQovTmFtZSAvRjEKL0Jhc2VGb250IC9UaW1lcy1Cb2xkCi9FbmNv\
12  ZGluZyAvV2luQW5zaUVuY29kaW5nCj4+CmVuZG9iagp4cmVmCjAgOQowMDAwMDAwMDAwIDY1NTM1IGYgCjAw\
13  MDAwMDAwMDggMDAwMDAgbiAKMDAwMDAwMDA3MyAwMDAwMCBuIAowMDAwMDAwMTE5IDAwMDAwIG4gCjAwMDAw\
14  MDAyNzMgMDAwMDAgbiAKMDAwMDAwMDMwMiAwMDAwMCBuIAowMDAwMDAwNDE2IDAwMDAwIG4gCjAwMDAwMDA0\
15  NzkgMDAwMDAgbiAKMDAwMDAwMDYxNiAwMDAwMCBuIAp0cmFpbGVyCjw8Ci9TaXplIDkKL1Jvb3QgMSAwIFIK\
16  L0luZm8gNSAwIFIKPj4Kc3RhcnR4cmVmCjcyNQolJUVPRgo=.....

This is a Base64-encoded PDF string. Data like this is often passed to the server when a PDF file is uploaded.

Base64 Encoding

The btoa() function creates a Base64-encoded ASCII string from a string. Each character in the string is treated as a byte (8 bits: eight 0 and 1s).

The .atob() function decodes a string of data that has been encoded using Base64 encoding. For example, the string “hello I love learning to computer program” in a Base64-encoded string looks like this: aGVsbG8gSSBsb3ZlIGxlYXJuaW5nIHRvIGNvbXB1dGVyIHByb2dyYW0.
1  btoa('hello I love learning to computer program');
2  //  aGVsbG8gSSBsb3ZlIGxlYXJuaW5nIHRvIGNvbXB1dGVyIHByb2dyYW0
1  atob('aGVsbG8gSSBsb3ZlIGxlYXJuaW5nIHRvIGNvbXB1dGVyIHByb2dyYW0');
2  // hello I love learning to computer program

Learn more about Base64 at https://en.wikipedia.org/wiki/Base64 .

String Shortening

Have you ever wondered how URL-shortening sites such as Bit.​ly work? A simplified URL compression algorithm follows a certain structure, as shown here for www.google.com :
  1. 1.

    The database creates a unique integer-based ID for the URL. In Figure 4-1, www.google.com has the entry 11231230 in the database.

     
../images/465726_1_En_4_Chapter/465726_1_En_4_Fig1_HTML.png
Figure 4-1

Database entries

  1. 2.

    The integer ID is shortened into a string. When shortened with Base62 encoding, 11231230 will be VhU2.

     
../images/465726_1_En_4_Chapter/465726_1_En_4_Fig2_HTML.png
Figure 4-2

Database entries after shortening

For the shortening part, the following algorithm can be used. There are 62 possible letters and numbers, consisting of 26 lowercase letters, 26 uppercase letters, and 10 numbers (0 to 9).
1 var  DICTIONARY = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789" .split(");
2
3 function  encodeId(num) {
4               var  base =  DICTIONARY.length;
5               var  encoded = " ;
6
7               if  (num === 0 ) {
8                    return  DICTIONARY[0 ];
9               }
10
11                     while  (num > 0 ) {
12                        encoded +=  DICTIONARY[(num %  base)];
13                        num = Math .floor(num /  base);
14          }
15
16              return  reverseWord(encoded);
17  }
18
19 function  reverseWord(str) {
20              var  reversed = "" ;
21              for  (var  i =  str.length - 1 ; i >= 0 ; i-- ) {
22                                reversed +=  str.charAt(i);
23          }
24          return  reversed;
25  }
26
27 function  decodeId(id) {
28              var  base =  DICTIONARY.length;
29              var  decoded = 0 ;
30
31              for  (var  index = 0 ; index <  id.split("" ).length; index++ ) {
32                          decoded =  decoded *  base + DICTIONARY.indexOf(id.charAt(index));
33              }
34
35              return  decoded;
36  }
37
38  console.log(encodeId(11231230 )); // prints 'VhU2'
39  console.log(decodeId('VhU2' )); // prints '11231230'

Encryption

Encryption is extremely important when protecting people’s information online. Have you ever seen the warning in Figure 4-3 in the Google Chrome browser?
../images/465726_1_En_4_Chapter/465726_1_En_4_Fig3_HTML.jpg
Figure 4-3

SSL warning

This likely means the web site you are trying to access does not have the proper Secure Sockets Layer (SSL) certificate.
../images/465726_1_En_4_Chapter/465726_1_En_4_Fig4_HTML.png
Figure 4-4

TSL process

TSL is a standard security technology for establishing an encrypted link between the server and the client (browser). The following are the simplified steps of the TSL process. In this process, asymmetric encryption is used for different keys for encryption and decryption by the server. The browser only uses symmetric encryption, which uses a single key to both encrypt and decrypt the data.
  1. 1.

    The server sends its asymmetric public key to the browser.

     
  2. 2.

    The browser creates a symmetric key for the current session, which is encrypted by the server’s asymmetric public key.

     
  3. 3.

    The server decrypts the browser’s session via its private key and retrieves the session key.

     
  4. 4.

    Both systems have the session key and will use this to transmit data securely.

     

This is secure because only the browser and the server know the session key. If the browser was to connect to the same server the next day, a new session key would be created.

The SSL warning message is a sign that the browser and server may not be encrypting the data on that connection.

The most commonly used public-key encryption algorithm is the RSA algorithm.

RSA Encryption

RSA is an encryption algorithm based on the difficulty of factoring large integers. In RSA, two large prime numbers and a supplementary value are generated as the public key. Anyone can use the public key to encrypt a message, but only those with the prime factors can decode the message.

There are three phases in the process: key generation, encryption, and decryption.
  • Key generation: The public key (shared) and private key (kept secret) are generated. The construction method of the keys generated should be also secret.

  • Encryption: The secret message can be encrypted via the public key.

  • Decryption: Only the private key can be used to decrypt the message.

Here’s an overview of the algorithm:
  1. 1.
    Select two (usually large) prime numbers, p and q.
    1. a.

      The product of p and q is denoted as n.

       
    2. b.

      The product of (p-1) and (q-1) is denoted as phi.

       
     
  2. 2.
    Choose two exponents, e and d.
    1. a.

      e is typically 3. Other values greater than 2 can be used.

       
    2. b.

      d is a value such that (e × d) % phi = 1.

       
     
Encryption process is as shown:
        m - message:
        m^e % n = c
        c - encrypted message
Decryption process is as shown:
        c^d % n = m
This is the implementation of calculating d:
 1  function modInverse(e, phi) {
 2      var m0 = phi, t, q;
 3      var x0 = 0, x1 = 1;
 4
 5      if (phi == 1)
 6        return 0;
 7
 8      while (e > 1) {
 9          // q is quotient
10          q = Math.floor(e / phi);
11
12          t = phi;
13
14          // phi is remainder now, process same as
15          // Euclid's algo
16          phi = e % phi, e = t;
17
18          t = x0;
19
20          x0 = x1 - q * x0;
21
22          x1 = t;
23      }
24
25      // Make x1 positive
26      if (x1 < 0)
27         x1 += m0;
28
29      return x1;
30  }
31  modInverse(7,40); // 23
Key pairs of a public key and a private key also need to be generated.
 1  function RSAKeyPair(p, q) {
 2          // Need to check that they are primes
 3          if (! (isPrime(p) && isPrime(q)))
 4                  return;
 5
 6          // Need to check that they're not the same
 7          if (p==q)
 8                  return;
 9
10          var n = p * q,
11                  phi = (p-1)*(q-1),
12                  e = 3,
13                  d = modInverse(e,phi);
14
15          // Public key: [e,n], Private key: [d,n]
16          return [[e,n], [d,n]]
17  }
Let’s pick 5 and 11 as the primes and see an example where message is 50.
1  RSAKeyPair(5,11); //Public key: [3,55], Private key: [27,55]
   p = 5, 11
   n = p x q = 55
   phi = (5-1) x (11-1) = 4 x 10 = 40
   e = 3
   (e x d) % phi = 1 (3 x d) % 40 = 1
   (81) % 40 = 1. 81 = 3 x d = 3 x 27
   d = 27
   Encryption:
           m - message: 50
           m^e % n = c
           50^3 % 55 = 40
   Encrypted message.,c:
           40
Decryption:
c^d % n = m
40^27 % 55 = 50
This fully encrypts 50, and the receiver can decrypt that back to 50. Typically the prime numbers chosen are very large for the RSA algorithm. This is because the prime factorization of large numbers takes a long time to compute. Today’s standard is to use a 4,096-bit prime product. Computing its prime factors would take years even for advanced computers to compute. Figure 4-5 shows the largest possible value for a 4,096-bit number.
../images/465726_1_En_4_Chapter/465726_1_En_4_Fig5_HTML.jpg
Figure 4-5

24096

Summary

Various natively implemented string functions were covered in this chapter and are summarized in Table 4-1.
Table 4-1

String Function Summary

Function

Usage

charAt(index)

Accesses a single character at index

substring(startIndex, endIndex)

Accesses part of string from startIndex to endIndex

str1 > str2

Returns true if str1 is lexicographically bigger than str2

indexOf(str, startIndex)

Index of the desired str starting at startIndex

str.split(delimiter)

Breaks a string into an array with the specified delimiter

str.replace(original,new)

Replaces original with new

In addition, a JavaScript native Regex object can be used for commonly used string validation. Table 4-2 provides a summary.
Table 4-2

Regex Summary

Regex Pattern

Usage

 /\d+/

Any numeric characters

/^\d+$/

Only numeric characters

/^[0-9]*.[0-9]*[1-9]+$/

Float numeric characters

/[a-zA-Z0-9]​/

Only alphanumeric characters