Understanding the ROT-13 substitution cipher – rot13.py

ROT-13 is a simple substitution cipher that transforms text and substitutes each character with another, thirteen characters after it. For example, the letter a would be substituted with the letter n and vice versa. Elements such as numbers, special characters, and a character's case are unaffected by the cipher. While Python does offer a built-in way of decoding ROT-13, we are going to pretend that it doesn't exist and manually decode ROT-13 data. We will use the built-in ROT-13 decoding method in our script.

Before we pretend that this functionality doesn't exist, let's quickly use it to illustrate how we could encode and decode ROT-13 data with Python 2:

>>> original_data = 'Why, ROT-13?' 
>>> encoded_data = original_data.encode('rot-13') 
>>> print encoded_data 
Jul, EBG-13? 
>>> print encoded_data.decode('rot-13') 
Why, ROT-13? 

Decoding or encoding with ROT-13 in Python 3 requires a slightly different approach with the native codecs library:

>>> import codecs
>>> enc = codecs.getencoder('rot-13')
>>> enc('Why, ROT-13?')
('Jul, EBG-13?', 12)
>>> enc('Why, ROT-13?')[0]
'Jul, EBG-13?'

Now, let's look at how you might approach this if it weren't already built-in. While you should never reinvent the wheel, we want to take this opportunity to practice list operations and introduce a tool to audit code. The code from the rot13.py script in the code bundle for this chapter is demonstrated next.

The rot_code() function defined at line 32 accepts a ROT-13-encoded or ROT-13-decoded string. On line 39, we have rot_chars, a list of characters in the alphabet. As we iterate through each character in the supplied input, we will use this list to substitute the character with its counterpart 13 elements away. As we execute this substitution, we will store them in the substitutions list instantiated in line 43:

032 def rot_code(data):
033 """
034 The rot_code function encodes/decodes data using string
035 indexing
036 :param data: A string
037 :return: The rot-13 encoded/decoded string
038 """
039 rot_chars = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i',
040 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't',
041 'u', 'v', 'w', 'x', 'y', 'z']
042
043 substitutions = []

On line 46, we begin to walk through each character, c, in the data string. On line 49, we use a conditional statement to determine if the character is uppercase or lowercase. We do this to preserve the case of the character as we process it:

045     # Walk through each individual character
046 for c in data:
047
048 # Walk through each individual character
049 if c.isupper():

On line 54, we attempt to identify the index of the character in our list. If the character is a non-alphabetical character, we will receive a ValueError exception. Non-alphabetical characters, such as numbers or special characters, are appended to the substitutions list unmodified as these types of values are not encoded by ROT-13:

051             try:
052 # Find the position of the character in
053 # rot_chars list
054 index = rot_chars.index(c.lower())
055 except ValueError:
056 substitutions.append(c)
057 continue

Once we have found the index of the character, we can calculate the corresponding index 13 characters away by subtracting 13. For values less than 13, this will be a negative number. Fortunately, list indexing supports negative numbers and works splendidly here. Before appending the corresponding character to our substitutions list, we use the string upper() function to return the character to its original case:

059             # Calculate the relative index that is 13
060 # characters away from the index
061 substitutions.append(
062 (rot_chars[(index-13)]).upper())

The else statement of the conditional block handles lowercase characters. The following code block is substantially the same functionality as what we just covered. The difference is that we never use lowercase or uppercase because the character is already in the proper case to be processed:

064         else:
065
066 try:
067 # Find the position of the character in
068 # rot_chars list
069 index = rot_chars.index(c)
070 except ValueError:
071 substitutions.append(c)
072 continue
073
074 substitutions.append(rot_chars[((index-13))])

Finally, on line 76, we collapse the substitutions list to a string using the join() method. We join on an empty string so that each element of the list is appended without any separating characters. If this script is invoked from the command line, it will print out the processed string, Jul, EBG-13?, which we know corresponds to ROT-13?. We have the following code:

   
076     return ''.join(substitutions)
077
078 if __name__ == '__main__':
079 print(rot_code('Jul, EBG-13?'))

The following screenshot illustrates how we can import our rot13 module and call the rot_code() method to either decode or encode a string:

Make sure that the Python interactive prompt is opened in the same directory as the rot13.py script. Otherwise, an ImportError will be generated.