Strings Are Immutable

Once a string has been created, it is immutable. You can’t slice it up into substrings, trim characters off it, add characters to it, or replace one character or substring with another.

“What?” I hear you ask. “Then how are we supposed to do our string processing?” Don’t worry, you can still do all of those things, but they don’t affect the original string—copies (of the relevant pieces) are made instead.

Why did the designers of the .NET Framework make strings immutable? All that copying is surely going to be an overhead. Well, yes, it is, and sometimes you need to be aware of it.

That being said, there are balancing performance improvements when dealing with unchanging strings. The framework can store a single instance of a string and then any variables that reference that particular sequence of characters can reference the same instance. This can actually save on allocations and reduce your working set. And in multithreaded scenarios, the fact that strings never change means it’s safe to use them without the cross-thread coordination that is required when accessing modifiable data. As usual, “performance” considerations are largely a compromise between the competing needs of various possible scenarios.

In our view, an overridingly persuasive argument for immutability relates to the safe use of strings as keys. Consider the code in Example 10-51.

Example 10-51. Using strings as keys in a dictionary

string myKey = "TheUniqueKey";
Dictionary<string, object> myDictionary = new Dictionary<string, object>();

myDictionary.Add(myKey, new object());

// Imagine you could do this...
myKey[2] = 'o';

Remember, a string is a reference type, so the myKey variable references a string object which is initialized to "TheUniqueKey". When we add our object to the dictionary, we pass a reference to that same string object, which the dictionary will use as a key. If you cast your mind back to Chapter 9, you’ll remember that the dictionary relies on the hash code for the key object when storing dictionary entries, which can then be disambiguated (if necessary) by the actual value of the key itself.

Now, imagine that we could modify the original string object, using the reference we hold in that myKey variable. One characteristic of a (useful!) hash algorithm is that its output changes for any change in the original data. So all of a sudden our key’s hash code has changed. The hash for "TheUniqueKey" is different from the one for "ThoUniqueKey". Sadly, the dictionary has no way of knowing that the hash for that key has changed; so, when we come to look up the value using our original reference to our key, it will no longer find a match.

This can (and does!) cause all sorts of subtle bugs in applications built on runtimes that allow mutable strings. But since .NET strings are immutable, this problem cannot occur if you use strings as keys.

Another, related, benefit is that you avoid the buffer-overrun issues so prevalent on other runtimes. Because you can’t modify an existing string, you can’t accidentally run over the end of your allocation and start stamping on other memory, causing crashes at best and security holes at worst. Of course, immutable strings are not the only way the .NET designers could have addressed this problem, but they do offer a very simple solution that helps the developer fall naturally into doing the right thing, without having to think about it. We think that this is a very neat piece of design.

So, we can obtain (i.e., read) a character at a particular index in the string, using the square-bracket indexer syntax. What about slicing and dicing the string in other ways?