5.6. Using Arrays and Collections

Computer programs have always been great at dealing with multiple items of the same kind of thing. For example, we wouldn't write a program to generate payroll for one employee. We want to do it for a thousand. We don't have just one song in our media library. We have hundreds of them. We don't have one enemy spaceship on screen in our game. We have many of them.

Repetition is everywhere. Now, we've already talked about writing code that can repeat, writing loops, but what we're focused on here is defining data that repeats. Okay. We are quickly covering a lot of ways to group data, but just in case you're a little hazy on the difference between this idea and the composite data type from the previous chapter, here is the difference.

Difference Between Composite Data type and Arrays

Something like this would involve a composite data type:

“In my program, I want an album to be made of a title, an artist, a year and a genre”.

But something like this would involve arrays and collections:

“Now I want a thousand albums, or a thousand strings, or a thousand integers, or 12 months or 31 days”.

Now, sure. There's nothing actually stopping you from manually typing out a thousand separately-named integer variables in your source code, except perhaps pain in your hands.

But importantly, when you're writing source code, you often won't know exactly how many of something you're going to have when your program runs.

A media player program doesn't know in advance how many albums will be on someone's device. A payroll program still needs to work if the number of employees changes from week to week. So, we need a way in our code to say…

“I have many of something, even if I'm not completely sure what many will always mean”.

The most fundamental and still incredibly useful way to do this is called the array.

Instead of creating a single integer variable, we can create an array of integers, a collection of them, several independent values inside one container. See Fig 5.6.1.

Image

Fig 5.6.1: A sample array

As you already know, syntax is not the focus of this course, but I want to demonstrate that why a syntax to create an array is different across languages, one thing that's very common is to see square brackets being used somewhere in the syntax. See Fig 5.6.2.

Image

Fig 5.6.2: Square brackets are used somewhere in the syntax of various languages

In Fig 5.6.2, the brackets are all doing the same thing. They create an array of five integers and provide the values 1, 3, 5, 7, 9 for those five numbers. Now of course, this could be an array of integers, it could be an array of strings or an array of Booleans or whatever you want an array of. You could have five of them, 10 of them or 10,000.

Each array you make has a name, but each individual element inside that array does not have a name. Let me say that again. The array, that container that holds the values has a name, but the individual elements inside it do not. This is because we don't want to have to name everything, but we still need to be able to get to each item, whether to look at it or change it.

We can do that because an array is always an ordered collection of items. What that means is each item, or what we usually call an element inside the array, won't have a name, but it will have an index. It will have a number. That number isn't up to us. The index of each array is always in order. It's a strict sequence. There's nothing random about this.

If I have an array of five integers, the index of this will be [0], [1], [2], [3], [4]. See Fig 5.6.3.

Image

Fig 5.6.3: An array called “myArray” containing 5 elements

We can get or set any particular element in array by knowing the name of the array and providing a number: the index. Once again, it's most common to see the square brackets being used to get to the element at a specific index in the array.

As you can see in Fig 5.6.4, I'm retrieving the middle element [2] by printing out the value, which is 5. But I could also use myArray[4] to change the element, that is, reaching into position 4, which gives the last element 99.

Image

Fig 5.6.4: Retrieving elements in the middle and last positions (5 and 99)

Basic Array Characteristics

Let's cover a few things that are usually true about classic (standard) arrays.

Image

Fig 5.6.5: Summary of basic characteristics of arrays

  1. They are zero-based index. Typically, an array index starts at zero (0), not at 1.
  2. The index goes up one by one, so with an array of five elements, the last element has index 4. There are a few languages like COBOL and Smalltalk where array starts with index 1 but it's far more common to assume the index always starts at zero.
  3. It's common that any standard array you make is limited to holding one specific data type. That you make an array of integers, an array of strings, or an array of Booleans, you can make as many as you like, but you can't mix the data inside them. A few languages are more flexible and allow different types of data at different positions. If you define your own composite data type, like the game data type from the previous chapter, you could then make an array of games with each element in the array itself containing multiple pieces of data.
  4. Also, in many languages, when a basic array is created, it's created at a specific size. It can be any size, for example 5, 50 or 5,000. But once it's created, it can't have elements added or removed from it. The term you sometimes hear for this is that the array is immutable, which means unchangeable. However, bear in mind you can create an array using a variable as the size of it and the variable can change based on previous logic in your program (see the last line of code in Fig 5.6.5). So, you don't have to know the exact size of every array when you're actually writing your source code. You just have to know that each time the program runs and the array gets created, it will be created at a specific size.

Collections

Arrays are the classic way to hold and organized multiple data items in your program, but they aren't the only way. As programming languages evolved over the years, this idea has been extended. Many languages now offer additional methods to do this, including dynamic arrays that can have new elements added and removed as necessary.

Image

Fig 5.6.6: Collections of arrays and their characteristics

You can have arrays that can hold any type of data (Type-flexible arrays) and aren't limited to just integers or strings. See Fig 5.6.6. This is actually the default in several languages, including Python, Ruby and JavaScript. We also have Associative Array, an array that uses a different kind of index, not zero based, where you have your own choice of what you want to use as the index for every single element.

As they get more advanced, we stop calling them arrays. We have other specialized names like dictionaries, queues, lists, stacks and hash tables. While they're each good at different things, they are similar and that they're all designed to hold multiple items. Because of that, the general term for all of these in programming is collections.

Getting deeper into collections is a topic for another course, particularly because different programming languages have very different options for this. So, I have one last thing to cover in this chapter. We are going to explore the ideas that are shared between many of the most popular programming languages. That's the ideas of object orientation.