Combining and separating strings

Strings can be combined (joined or concatenated) and separated (split) quite easily. Tokenization is the process of splitting something up into individual tokens; in this case, a sentence is split into individual words. When a web page is parsed by a browser, the HTML, JavaScript, and any other code in the page is tokenized and identified as a keyword, operator, variable, and so on. The browser then uses this information to display the web page correctly, or at least as well as it can.

Python does much the same thing. The Python interpreter tokenizes the source code and identifies the parts that are part of the actual programming language and the parts that are data. The individual tokens are separated by delimiters, characters that actually separate one token from another. If you import data into Excel or another spreadsheet program, you will be asked what it should use as a delimiter: a comma, tab, space, and so on. Python does the same thing when it reads the source code.

In strings, the main delimiter is a whitespace character, such as a tab, a newline, or an actual space. These delimiters mark off individual characters or words, sentences, and paragraphs. When special formatting is needed, other delimiters can be specified by the programmer.

String concatenation was demonstrated in Basic string operations. An alternative way to combine strings is by joining them. Joining strings combines the separate strings into one string. The catch is that it doesn't concatenate the strings; the join() method creates a string in which the elements of a string sequence are joined by a given separator. The following screenshot demonstrates this action. Line 29 is a normal concatenation; the results are printed in line 31. Line 30 joins string 1 with string 2, with the results in line 32:

Joining strings

As you can see, the results are not what you expect. The join() method is actually designed to be used to create a string where the individual characters are separated by a given separator character. The following screenshot demonstrates this more common use of join():

Common string join

After a sequence of strings is created in line 35 (known as a tuple, and explained further in Tuples), the join() method is called in two different ways. Line 36 is a simple call of the function itself; the result is a string, with the quotation marks shown. Line 37 is the print() function calling join(); the resultant string is printed normally, without the quote marks.

Finally, splitting strings separates them into their component parts. The result is a list containing the individual words or characters. The following screenshot shows two ways to split a string:

Splitting strings

In line 39, the default split is performed, resulting in the string being split at the spaces between words. Line 41 performs the string split on the commas, though essentially any character can be used.