Before we get into how to use strings, we will cover why they are the way they are. For developers coming from other languages, this is a very reasonable question to ask.
We won't go into the details of Unicode, but there are several ways of viewing a piece of Unicode text in Swift. This is done by using different collections:
let string = "The ☀ and 🌙" string.utf8.count // 19 string.utf16.count // 13 string.unicodeScalars.count // 12
In addition to everyone reporting a different number of symbols in the string, you may have also noticed that they are all wrong. String
itself, however, has the right answer:
string.count // 11
This is because String
is an ordered collection of Character
. Character
represents what we humans would consider one symbol, regardless of how many bytes it consists of.
The reason for the discrepancies is, of course, the two emojis:
let moon = Character("🌙") String(moon).utf8.count // 4 String(moon).utf16.count // 2 moon.unicodeScalars.count // 1 let sun: Character = "☀" String(sun).utf8.count // 6 String(sun).utf16.count // 2 sun.unicodeScalars.count // 2
Even a simple letter such as é may surprise you:
let accented_e: Character = "é" String(accented_e).utf8.count // 2 String(accented_e).utf16.count // 1 accented_e.unicodeScalars.count // 1
There may be several ways of representing the same symbol in Unicode, but Character
still considers them to be equal:
let another_accented_e: Character = "e\u{0301}" // "e" + combining acute accent String(another_accented_e).utf8.count // 3 String(another_accented_e).utf16.count // 2 another_accented_e.unicodeScalars.count // 2 accented_e == another_accented_e // true
Let's see what kind of a collection String
is:
Comparing this diagram with the one for Array
in the previous lesson, we see that both MutableCollection
and RandomAccessCollection
are missing.
This is because, as we have seen, symbols may take up varying amounts of space, and in a MutableCollection
, we can replace one element for another. But what if we replace one character with one that takes more space? Then we would have to move all succeeding characters to make room, and the MutableCollection
protocol does not allow this. It is the same with RandomAccessCollection:
it requires taking approximately the same amount of time to retrieve the 5th element as the 20,000th, and we can't do that when the elements are not of the same size.
So, why not add some padding and make all characters in a string take up the same amount of memory? Well, we did have an array of characters in the previous lesson, which does just that. Let's bring it back and compare its memory usage with the corresponding string:
An instance of Character
takes up eight bytes in an array. The most common characters usually take up two bytes or fewer in a string, and as strings are often the largest collections in an application, wasting all that space is not really an option.
Just like arrays, strings have indices, which refer to the position of every single character. But before we get into what the type of strings index is, we should cover what it is not: an integer.
The index type of an array is an integer. Because every element takes up the same amount of space, you can ask for the 500th element and it will multiply 500 with the byte size of an element, add the memory address of the first element, and find the element at the resulting address.
If we ask a string for the 500th character, it has to start with the first character, see how much space it takes, move past it, see how much space the next character takes, and so on, and repeat this 500 times.
On StackOverflow and other places, you will often find code examples which add a new subscript to String with an integer parameter, allowing us to do something such as this:
for i in 0..<string.count { let character = string[i] // ... }
This is extremely inefficient. Consider what is actually happening here: the string has to process the first character, then the first and second characters, then the first, second, and third characters, and so on. For a string of merely 500 characters, it will have processed the first character 500 times, the second one 499, and so on until it has processed characters n(n+1)/2 or 125,250 times, plus 500 to find the count.
The following, however, will visit each character exactly once, and is much simpler:
for character in string { // ... }
The actual index type of String
is String.Index
. It's a custom type whose inner workings we are blissfully unaware of. All operations on it are performed using the standard Collection
and BidirectionalCollection
methods on String
.
let alphabet = "abcdefghijklmnopqrstuvwxyz" let b_index = alphabet.index(after: alphabet.startIndex) let a_index = alphabet.index(before: b_index) let g_index = alphabet.index(a_index, offsetBy: 6) let e_index = alphabet.index(g_index, offsetBy: -2)
nil
if the result goes beyond this limit:let no_index = alphabet.index(e_index, offsetBy: 30, limitedBy: alphabet.endIndex)
nil
if it is not found:let i = alphabet.index(of: "z")
let a_e_distance = alphabet.distance(from: a_index, to: e_index)
Perhaps the biggest drawback of using this custom type instead of an integer comes up during debugging, when we would like to see what it contains. If we just print an index to the console, we get something like this:
Swift.String.Index(_compoundOffset: 100, _cache: Swift.String.Index._Cache.character(1))
This contains exactly nothing of interest. If we add this extension in a unit test module, we get something more useful:
// use in unit tests extension String.Index: CustomDebugStringConvertible { // The offset into a string's UTF-16 encoding for this index. public var debugDescription: String { return "\(encodedOffset)" } }
Now, when we print an index, we get the zero-based position of this index in the string if this string, so far, only contains characters that can be expressed in one UTF-16 code unit. So it's not always correct, but better than nothing.
This topic is a primer into the wide world of strings. In this section, we have covered concepts such as collection, index, and debugging. We'll continue our journey with strings in the next section.
The String.index(of:)
method finds the index of the first occurrence of a character in a string. Create a method which finds all the indices of a character.
To use an Xcode playground to find the indices of a character.
StringsExtra
Xcode project, and go to the StringsExtra.swift
file.extension String {
index(of:)
:public func indices(of character: Character) -> [Index] { var result = [Index]() var i = startIndex
endIndex
, as it will crash. This check also takes care of empty strings:while i < endIndex { if self[i] == character { result.append(i) }
i = index(after: i) } return result } }
This is the traditional way of implementing it, to show how to work directly with indices. Later, we will learn a much simpler and concise way to do this.
StringsExtraTests.swift
.func testIndices()