Chapter 16. Collections and Generics: Data structures

image with no caption

Sorting is a snap in Java. You have all the tools for collecting and manipulating your data without having to write your own sort algorithms (unless you’re reading this right now sitting in your Computer Science 101 class, in which case, trust us—you are SO going to be writing sort code while the rest of us just call a method in the Java API). The Java Collections Framework has a data structure that should work for virtually anything you’ll ever need to do. Want to keep a list that you can easily keep adding to? Want to find something by name? Want to create a list that automatically takes out all the duplicates? Sort your co-workers by the number of times they’ve stabbed you in the back? Sort your pets by number of tricks learned? It’s all here...

Congratulations on your new job—managing the automated jukebox system at Lou’s Diner. There’s no Java inside the jukebox itself, but each time someone plays a song, the song data is appended to a simple text file.

image with no caption

Your job is to manage the data to track song popularity, generate reports, and manipulate the playlists. You’re not writing the entire app—some of the other software developer/ waiters are involved as well, but you’re responsible for managing and sorting the data inside the Java app. And since Lou has a thing against databases, this is strictly an in-memory data collection. All you get is the file the jukebox keeps adding to. Your job is to take it from there.

You’ve already figured out how to read and parse the file, and so far you’ve been storing the data in an ArrayList.

Challenge #1 Sort the songs in alphabetical order

You have a list of songs in a file, where each line represents one song, and the title and artist are separated with a forward slash. So it should be simple to parse the line, and put all the songs in an ArrayList.

Your boss cares only about the song titles, so for now you can simply make a list that just has the song titles.

But you can see that the list is not in alphabetical order... what can you do?

You know that with an ArrayList, the elements are kept in the order in which they were inserted into the list, so putting them in an ArrayList won’t take care of alphabetizing them, unless... maybe there’s a sort() method in the ArrayList class?

image with no caption
image with no caption

When you look in ArrayList, there doesn’t seem to be any method related to sorting. Walking up the inheritance hierarchy didn’t help either—it’s clear that you can’t call a sort method on the ArrayList.

image with no caption
image with no caption

Although ArrayList is the one you’ll use most often, there are others for special occasions. Some of the key collection classes include:

If you put all the Strings (the song titles) into a TreeSet instead of an ArrayList, the Strings would automatically land in the right place, alphabetically sorted. Whenever you printed the list, the elements would always come out in alphabetical order.

And that’s great when you need a set (we’ll talk about sets in a few minutes) or when you know that the list must always stay sorted alphabetically.

On the other hand, if you don’t need the list to stay sorted, TreeSet might be more expensive than you need—every time you insert into a TreeSet, the TreeSet has to take the time to figure out where in the tree the new element must go. With ArrayList, inserts can be blindingly fast because the new element just goes in at the end.

image with no caption

Q:

Q: But you CAN add something to an ArrayList at a specific index instead of just at the end—there’s an overloaded add() method that takes an int along with the element to add. So wouldn’t it be slower than inserting at the end?

A:

A: Yes, it’s slower to insert something in an ArrayList somewhere other than at the end. So using the overloaded add(index, element) method doesn’t work as quickly as calling the add(element)—which puts the added element at the end. But most of the time you use ArrayLists, you won’t need to put something at a specific index.

Q:

Q: I see there’s a LinkedList class, so wouldn’t that be better for doing inserts somewhere in the middle? At least if I remember my Data Structures class from college...

A:

A: Yes, good spot. The LinkedList can be quicker when you insert or remove something from the middle, but for most applications, the difference between middle inserts into a LinkedList and ArrayList is usually not enough to care about unless you’re dealing with a huge number of elements. We’ll look more at LinkedList in a few minutes.

The Collections.sort() method sorts a list of Strings alphabetically.

image with no caption
image with no caption

Now your boss wants actual Song class instances in the list, not just Strings, so that each Song can have more data. The new jukebox device outputs more information, so this time the file will have four pieces (tokens) instead of just two.

The Song class is really simple, with only one interesting feature—the overridden toString() method. Remember, the toString() method is defined in class Object, so every class in Java inherits the method. And since the toString() method is called on an object when it’s printed (System.out.println(anObject)), you should override it to print something more readable than the default unique identifier code. When you print a list, the toString() method will be called on each object.

image with no caption

Your code changes only a little—the file I/O code is the same, and the parsing is the same (String.split()), except this time there will be four tokens for each song/line, and all four will be used to create a new Song object. And of course the ArrayList will be of type <Song> instead of <String>.

image with no caption

Something’s wrong... the Collections class clearly shows there’s a sort() method, that takes a List.

ArrayList is-a List, because ArrayList implements the List interface, so... it should work.

But it doesn’t!

The compiler says it can’t find a sort method that takes an ArrayList<Song>, so maybe it doesn’t like an ArrayList of Song objects? It didn’t mind an ArrayList<String>, so what’s the important difference between Song and String? What’s the difference that’s making the compiler fail?

image with no caption

And of course you probably already asked yourself, “What would it be sorting on?” How would the sort method even know what made one Song greater or less than another Song? Obviously if you want the song’s title to be the value that determines how the songs are sorted, you’ll need some way to tell the sort method that it needs to use the title and not, say, the beats per minute.

We’ll get into all that a few pages from now, but first, let’s find out why the compiler won’t even let us pass a Song ArrayList to the sort() method.

We’ll just say it right here—virtually all of the code you write that deals with generics will be collection-related code. Although generics can be used in other ways, the main point of generics is to let you write type-safe collections. In other words, code that makes the compiler stop you from putting a Dog into a list of Ducks.

Before generics (which means before Java 5.0), the compiler could not care less what you put into a collection, because all collection implementations were declared to hold type Object. You could put anything in any ArrayList; it was like all ArrayLists were declared as ArrayList<Object>.

WITHOUT generics

Objects go IN as a reference to SoccerBall, Fish, Guitar, and Car objects

And come OUT as a reference of type Object

WITH generics

Objects go IN as a reference to only Fish objects

And come out as a reference of type Fish

Of the dozens of things you could learn about generics, there are really only three that matter to most programmers:

Q:

Q: But don’t I also need to learn how to create my OWN generic classes? What if I want to make a class type that lets people instantiating the class decide the type of things that class will use?

A:

A: You probably won’t do much of that. Think about it—the API designers made an entire library of collections classes covering most of the data structures you’d need, and virtually the only type of classes that really need to be generic are collection classes. In other words, classes designed to hold other elements, and you want programmers using it to specify what type those elements are when they declare and instantiate the collection class.

Yes, it is possible that you might want to create generic classes, but that’s the exception, so we won’t cover it here. (But you’ll figure it out from the things we do cover, anyway.)

Since ArrayList is our most-used generified type, we’ll start by looking at its documentation. The two key areas to look at in a generified class are:

1) The class declaration

2) The method declarations that let you add elements

Understanding ArrayList documentation (Or, what’s the true meaning of “E”?)

image with no caption

The “E” represents the type used to create an instance of ArrayList. When you see an “E” in the ArrayList documentation, you can do a mental find/replace to exchange it for whatever <type> you use to instantiate ArrayList.

So, new ArrayList<Song> means that “E” becomes “Song”, in any method or variable declaration that uses “E”.

image with no caption

In other words, the “E” is replaced by the real type (also called the type parameter) that you use when you create the ArrayList. And that’s why the add() method for ArrayList won’t let you add anything except objects of a reference type that’s compatible with the type of “E”. So if you make an ArrayList<String>, the add() method suddenly becomes add(String o). If you make the ArrayList of type Dog, suddenly the add() method becomes add(Dog o).

Q:

Q: Is “E” the only thing you can put there? Because the docs for sort used “T”....

A:

A: You can use anything that’s a legal Java identifier. That means anything that you could use for a method or variable name will work as a type parameter. But the convention is to use a single letter (so that’s what you should use), and a further convention is to use “T” unless you’re specifically writing a collection class, where you’d use “E” to represent the “type of the Element the collection will hold”.

A generic class means that the class declaration includes a type parameter. A generic method means that the method declaration uses a type parameter in its signature.

You can use type parameters in a method in several different ways:

image with no caption

This:

public <T extends Animal> void takeThing(ArrayList<T> list)

Is NOT the same as this:

public void takeThing(ArrayList<Animal> list)

Both are legal, but they’re different!

The first one, where <T extends Animal> is part of the method declaration, means that any ArrayList declared of a type that is Animal, or one of Animal’s subtypes (like Dog or Cat), is legal. So you could invoke the top method using an ArrayList<Dog>, ArrayList<Cat>, or ArrayList<Animal>.

But... the one on the bottom, where the method argument is (ArrayList<Animal> list) means that only an ArrayList<Animal> is legal. In other words, while the first version takes an ArrayList of any type that is a type of Animal (Animal, Dog, Cat, etc.), the second version takes only an ArrayList of type Animal. Not ArrayList<Dog>, or ArrayList<Cat> but only ArrayList<Animal>.

And yes, it does appear to violate the point of polymorphism. but it will become clear when we revisit this in detail at the end of the chapter. For now, remember that we’re only looking at this because we’re still trying to figure out how to sort() that SongList, and that led us into looking at the API for the sort() method, which had this strange generic type declaration.

For now, all you need to know is that the syntax of the top version is legal, and that it means you can pass in a ArrayList object instantiated as Animal or any Animal subtype.

And now back to our sort() method...

Remember where we were...

image with no caption
image with no caption
image with no caption

So here we are, trying to read the sort() method docs to find out why it was OK to sort a list of Strings, but not a list of Song objects. And it looks like the answer is...

image with no caption

The sort() method can take only lists of Comparable objects.

Song is NOT a subtype of Comparable, so you cannot sort() the list of Songs.

At least not yet...

image with no caption
image with no caption
public final class String extends Object implements Serializable,
                                              Comparable<String>, CharSequence

The Java engineers had to give you a way to put a constraint on a parameterized type, so that you can restrict it to, say, only subclasses of Animal. But you also need to constrain a type to allow only classes that implement a particular interface. So here’s a situation where we need one kind of syntax to work for both situations—inheritance and implementation. In other words, that works for both extends and implements.

And the winning word was... extends. But it really means “is-a”, and works regardless of whether the type on the right is an interface or a class.

image with no caption

Q:

Q: Why didn’t they just make a new keyword,“is”?

A:

A: Adding a new keyword to the language is a REALLY big deal because it risks breaking Java code you wrote in an earlier version. Think about it—you might be using a variable “is” (which we do use in this book to represent input streams). And since you’re not allowed to use keywords as identifiers in your code, that means any earlier code that used the keyword before it was a reserved word, would break. So whenever there’s a chance for the Sun engineers to reuse an existing keyword, as they did here with “extends”, they’ll usually choose that. But sometimes they don’t have a choice...

A few (very few) new keywords have been added to the language, such as assert in Java 1.4 and enum in Java 5.0 (we look at enum in the appendix). And this does break people’s code, however you sometimes have the option of compiling and running a newer version of Java so that it behaves as though it were an older one. You do this by passing a special flag to the compiler or JVM at the command-line, that says, “Yeah, yeah, I KNOW this is Java 1.4, but please pretend it’s really 1.3, because I’m using a variable in my code named assert that I wrote back when you guys said it would OK!#$%”.

(To see if you have a flag available, type javac (for the compiler) or java (for the JVM) at the command-line, without anything else after it, and you should see a list of available options. You’ll learn more about these flags in the chapter on deployment.)

We can pass the ArrayList<Song> to the sort() method only if the Song class implements Comparable, since that’s the way the sort() method was declared. A quick check of the API docs shows the Comparable interface is really simple, with only one method to implement:

The big question is: what makes one song less than, equal to, or greater than another song?

You can’t implement the Comparable interface until you make that decision.

image with no caption

And the method documentation for compareTo() says

image with no caption

It looks like the compareTo() method will be called on one Song object, passing that Song a reference to a different Song. The Song running the compareTo() method has to figure out if the Song it was passed should be sorted higher, lower, or the same in the list.

Your big job now is to decide what makes one song greater than another, and then implement the compareTo() method to reflect that. A negative number (any negative number) means the Song you were passed is greater than the Song running the method. Returning a positive number says that the Song running the method is greater than the Song passed to the compareTo() method. Returning zero means the Songs are equal (at least for the purpose of sorting... it doesn’t necessarily mean they’re the same object). You might, for example, have two Songs with the same title.

(Which brings up a whole different can of worms we’ll look at later...)

We decided we want to sort by title, so we implement the compareTo() method to compare the title of the Song passed to the method against the title of the song on which the compareTo() method was invoked. In other words, the song running the method has to decide how its title compares to the title of the method parameter.

Hmmm... we know that the String class must know about alphabetical order, because the sort() method worked on a list of Strings. We know String has a compareTo() method, so why not just call it? That way, we can simply let one title String compare itself to another, and we don’t have to write the comparing/alphabetizing algorithm!

image with no caption

There’s a new problem—Lou wants two different views of the song list, one by song title and one by artist!

But when you make a collection element comparable (by having it implement Comparable), you get only one chance to implement the compareTo() method. So what can you do?

The horrible way would be to use a flag variable in the Song class, and then do an if test in compareTo() and give a different result depending on whether the flag is set to use title or artist for the comparison.

But that’s an awful and brittle solution, and there’s something much better. Something built into the API for just this purpose—when you want to sort the same thing in more than one way.

image with no caption

Look at the Collections class API again. There’s a second sort() method—and it takes a Comparator.

image with no caption

An element in a list can compare itself to another of its own type in only one way, using its compareTo() method. But a Comparator is external to the element type you’re comparing—it’s a separate class. So you can make as many of these as you like! Want to compare songs by artist? Make an ArtistComparator. Sort by beats per minute? Make a BPMComparator.

image with no caption

Then all you need to do is call the overloaded sort() method that takes the List and the Comparator that will help the sort() method put things in order.

The sort() method that takes a Comparator will use the Comparator instead of the element’s own compareTo() method, when it puts the elements in order. In other words, if your sort() method gets a Comparator, it won’t even call the compareTo() method of the elements in the list. The sort() method will instead invoke the compare() method on the Comparator.

So, the rules are:

Q:

Q: So does this mean that if you have a class that doesn’t implement Comparable, and you don’t have the source code, you could still put the things in order by creating a Comparator?

A:

A: That’s right. The other option (if it’s possible) would be to subclass the element and make the subclass implement Comparable.

Q:

Q: But why doesn’t every class implement Comparable?

A:

A: Do you really believe that everything can be ordered? If you have element types that just don’t lend themselves to any kind of natural ordering, then you’d be misleading other programmers if you implement Comparable. And you aren’t taking a huge risk by not implementing Comparable, since a programmer can compare anything in any way that he chooses using his own custom Comparator.

We did three new things in this code:

1) Created an inner class that implements Comparator (and thus the compare() method that does the work previously done by compareTo()).

2) Made an instance of the Comparator inner class.

3) Called the overloaded sort() method, giving it both the song list and the instance of the Comparator inner class.

Note: we also updated the Song class toString() method to print both the song title and the artist. (It prints title: artist regardless of how the list is sorted.)

image with no caption

The sorting works great, now we know how to sort on both title (using the Song object’s compareTo() method) and artist (using the Comparator’s compare() method). But there’s a new problem we didn’t notice with a test sample of the jukebox text file—the sorted list contains duplicates.

It appears that the diner jukebox just keeps writing to the file regardless of whether the same song has already been played (and thus written) to the text file. The SongListMore.txt jukebox text file is a complete record of every song that was played, and might contain the same song multiple times.

image with no caption

From the Collection API, we find three main interfaces, List, Set, and Map. ArrayList is a List, but it looks like Set is exactly what we need.

Notice that the Map interface doesn’t actually extend the Collection interface, but Map is still considered part of the “Collection Framework” (also known as the “Collection API”). So Maps are still collections, even though they don’t include java.util.Collection in their inheritance tree.

(Note: this is not the complete collection API; there are other classes and interfaces, but these are the ones we care most about.)

image with no caption
image with no caption

We added on to the Jukebox to put the songs in a HashSet. (Note: we left out some of the Jukebox code, but you can copy it from earlier versions. And to make it easier to read the output, we went back to the earlier version of the Song’s toString() method, so that it prints only the title instead of title and artist.)

image with no caption
image with no caption

First, we have to ask—what makes two Song references duplicates? They must be considered equal. Is it simply two references to the very same object, or is it two separate objects that both have the same title?

This brings up a key issue: reference equality vs. object equality.

When you put an object into a Hashset, it uses the object’s hashcode value to determine where to put the object in the Set. But it also compares the object’s hashcode to the hashcode of all the other objects in the HashSet, and if there’s no matching hashcode, the HashSet assumes that this new object is not a duplicate.

In other words, if the hashcodes are different, the HashSet assumes there’s no way the objects can be equal!

So you must override hashCode() to make sure the objects have the same value.

But two objects with the same hashCode() might not be equal (more on this on the next page), so if the HashSet finds a matching hashcode for two objects—one you’re inserting and one already in the set—the HashSet will then call one of the object’s equals() methods to see if these hashcode-matched objects really are equal.

And if they’re equal, the HashSet knows that the object you’re attempting to add is a duplicate of something in the Set, so the add doesn’t happen.

You don’t get an exception, but the HashSet’s add() method returns a boolean to tell you (if you care) whether the new object was added. So if the add() method returns false, you know the new object was a duplicate of something already in the set.

image with no caption
image with no caption

TreeSet is similar to HashSet in that it prevents duplicates. But it also keeps the list sorted. It works just like the sort() method in that if you make a TreeSet using the set’s no-arg constructor, the TreeSet uses each object’s compareTo() method for the sort. But you have the option of passing a Comparator to the TreeSet constructor, to have the TreeSet use that instead. The downside to TreeSet is that if you don’t need sorting, you’re still paying for it with a small performance hit. But you’ll probably find that the hit is almost impossible to notice for most apps.

image with no caption

TreeSet looks easy, but make sure you really understand what you need to do to use it. We thought it was so important that we made it an exercise so you’d have to think about it. Do NOT turn the page until you’ve done this. We mean it.

TreeSet can’t read the programmer’s mind to figure out how the objects should be sorted. You have to tell the TreeSet how.

To use a TreeSet, one of these things must be true:

Lists and Sets are great, but sometimes a Map is the best collection (not Collection with a capital “C”—remember that Maps are part of Java collections but they don’t implement the Collection interface).

Imagine you want a collection that acts like a property list, where you give it a name and it gives you back the value associated with that name. Although keys will often be Strings, they can be any Java object (or, through autoboxing, a primitive).

Map example

image with no caption
image with no caption

Remember earlier in the chapter we talked about how methods that take arguments with generic types can be... weird. And we mean weird in the polymorphic sense. If things start to feel strange here, just keep going—it takes a few pages to really tell the whole story.

We’ll start with a reminder on how array arguments work, polymorphically, and then look at doing the same thing with generic lists. The code below compiles and runs without errors:

Here’s how it works with regular arrays:

image with no caption
abstract class Animal {
   void eat() {
     System.out.println("animal eating");
   }
}
class Dog extends Animal {
   void bark() { }
}
class Cat extends Animal {
   void meow() { }
}

So we saw how the whole thing worked with arrays, but will it work the same way when we switch from an array to an ArrayList? Sounds reasonable, doesn’t it?

First, let’s try it with only the Animal ArrayList. We made just a few changes to the go() method:

Passing in just ArrayList<Animal>

image with no caption

Compiles and runs just fine

image with no caption

Because of polymorphism, the compiler let us pass a Dog array to a method with an Animal array argument. No problem. And an ArrayList<Animal> can be passed to a method with an ArrayList<Animal> argument. So the big question is, will the ArrayList<Animal> argument accept an ArrayList<Dog>? If it works with arrays, shouldn’t it work here too?

Passing in just ArrayList<Dog>

image with no caption

When we compile it:

image with no caption

Imagine the compiler let you get away with that. It let you pass an ArrayList<Dog> to a method declared as:

public void takeAnimals(ArrayList<Animal> animals) {
   for(Animal a: animals) {
      a.eat();
   }
}

There’s nothing in that method that looks harmful, right? After all, the whole point of polymorphism is that anything an Animal can do (in this case, the eat() method), a Dog can do as well. So what’s the problem with having the method call eat() on each of the Dog references?

Nothing. Nothing at all.

There’s nothing wrong with that code. But imagine this code instead:

image with no caption

So that’s the problem. There’s certainly nothing wrong with adding a Cat to an ArrayList<Animal>, and that’s the whole point of having an ArrayList of a supertype like Animal—so that you can put all types of animals in a single Animal ArrayList.

But if you passed a Dog ArrayList—one meant to hold ONLY Dogs—to this method that takes an Animal ArrayList, then suddenly you’d end up with a Cat in the Dog list. The compiler knows that if it lets you pass a Dog ArrayList into the method like that, someone could, at runtime, add a Cat to your Dog list. So instead, the compiler just won’t let you take the risk.

If you declare a method to take ArrayList<Animal> it can take ONLY an ArrayList<Animal>, not ArrayList<Dog> or ArrayList<Cat>.

image with no caption

Array types are checked again at runtime, but collection type checks happen only when you compile

Let’s say you do add a Cat to an array declared as Dog[] (an array that was passed into a method argument declared as Animal[], which is a perfectly legal assignment for arrays).

image with no caption

It compiles, but when we run it:

image with no caption
image with no caption

It looks unusual, but there is a way to create a method argument that can accept an ArrayList of any Animal subtype. The simplest way is to use a wildcard—added to the Java language explicitly for this reason.

image with no caption

So now you’re wondering, “What’s the difference? Don’t you have the same problem as before? The method above isn’t doing anything dangerous—calling a method any Animal subtype is guaranteed to have—but can’t someone still change this to add a Cat to the animals list, even though it’s really an ArrayList<Dog>? And since it’s not checked again at runtime, how is this any different from declaring it without the wildcard?”

And you’d be right for wondering. The answer is NO. When you use the wildcard <?> in your declaration, the compiler won’t let you do anything that adds to the list!

You probably remember that when we looked at the sort() method, it used a generic type, but with an unusual format where the type parameter was declared before the return type. It’s just a different way of declaring the type parameter, but the results are the same:

This:

public <T extends Animal> void takeThing(ArrayList<T> list)

Does the same thing as this:

public void takeThing(ArrayList<? extends Animal> list)