Chapter 6
IN THIS CHAPTER
Understanding basic stream operations
Examining the stream interface
Filtering and sorting streams
Computing sums, averages, and other values
One of the most common things to do with a collection is to iterate over it, performing some type of operation on all of its elements. For example, you might use a for each
loop to print all of the elements. The body of the foreach
loop might contain an if statement to select which elements to print. Or it might perform a calculation such as accumulating a grand total or counting the number of elements that meet a given condition.
In Java, for each
loops are easy to create and can be very powerful. However, they have one significant drawback: They iterate over the collection's elements one at a time, beginning with the first element and proceeding sequentially to the last element. As a result, a for each loop must be executed sequentially within a single tread.
That’s a shame, since modern computers have multicore processors that are capable of doing several things at once. Wouldn’t it be great if you could divide a for each
loop into several parts, each of which can be run independently on one of the processor cores? For a small collection, the user probably wouldn’t notice the difference. But if the collection is extremely large (say, a few million elements), unleashing the power of a multicore processor could make the program run much faster.
While you can do that with earlier versions of Java, the programming is tricky. You have to master one of the more difficult aspects of programming in Java: working with threads, which are like separate sections of your program that can be executed simultaneously. You’ll learn the basics of working with threads in Book 5, Chapter 1. For now, take my word for it: Writing programs that work with large collections of data and take advantage of multiple threads was a difficult undertaking. At least until Java 8.
With Java 8 or later, you can use a feature called bulk data operations that’s designed specifically to attack this very problem. When you use bulk data operations, you do not directly iterate over the collection data using a for each
loop. Instead, you simply provide the operations that will be done on each of the collection's elements and let Java take care of the messy details required to spread the work over multiple threads.
At the heart of the bulk data operations feature is a new type of object called a stream, defined by the Stream
interface. A stream is simply a sequence of elements of any data type which can be processed sequentially or in parallel. The Stream
interface provides methods that let you perform various operations such as filtering the elements or performing an operation on each of the elements.
Streams rely on the use of lambda expressions to pass the operations that are performed on stream elements. In fact, the primary reason Java’s developers introduced lambda expressions into the Java language was to facilitate streams. If you haven’t yet read Book 3, Chapter 7, I suggest you do so now, before reading further into this chapter. Otherwise you’ll find yourself hopelessly confused by the peculiar syntax of the lambda expressions used throughout this chapter.
In this chapter, you learn the basics of using streams to perform simple bulk data operations.
Suppose you have a list of spells used by a certain wizard who, for copyright purposes, we’ll refer to simply as HP. The spells are represented by a class named Spell
, which is defined as follows:
public class Spell
{
public String name;
public SpellType type;
public String description;
public enum SpellType {SPELL, CHARM, CURSE}
public Spell(String spellName, SpellType spellType,
String spellDescription)
{
name = spellName;
type = spellType;
description = spellDescription;
}
public String toString()
{
return name;
}
}
As you can see, the Spell
class has three public fields that represent the spell’s name, type (SPELL
, CHARM
, or CURSE
), and description, as well as a constructor that lets you specify the name, type, and description for the spell. Also, the toString()
method is overridden to return simply the spell name.
Let’s load a few of HP’s spells into an ArrayList
:
ArrayList<Spell> spells = new ArrayList<>();
spells.add(new Spell("Aparecium", Spell.SpellType.SPELL,
"Makes invisible ink appear."));
spells.add(new Spell("Avis", Spell.SpellType.SPELL,
"Launches birds from your wand."));
spells.add(new Spell("Engorgio", Spell.SpellType.CHARM,
"Enlarges something."));
spells.add(new Spell("Fidelius", Spell.SpellType.CHARM,
"Hides a secret within someone."));
spells.add(new Spell("Finite Incatatum", Spell.SpellType.SPELL,
"Stops all current spells."));
spells.add(new Spell("Locomotor Mortis", Spell.SpellType.CURSE,
"Locks an opponent's legs."));
Now, suppose you want to list the name of each spell on the console. You could do that using a for each
loop like this:
for (Spell spell : spells)
System.out.println(spell.name);
Written with streams, the code would look like this:
spells.stream().forEach(s -> System.out.println(s));
Here, I first use the stream
method of the ArrayList
class to convert the ArrayList
to a stream. All of the classes that inherit from java.Collection
implement a stream method that returns a Stream
object. That includes not only ArrayList
, but also LinkedList
and Stack
.
Next, I use the stream's forEach
method to iterate the stream, passing a lambda expression that calls System.out.println
for each item in the stream. The forEach
method processes the entire stream, writing each element to the console.
Suppose you want to list just the spells, not the charms or curses. Using a traditional for each
loop, you'd do it like this:
for (Spell spell : spells)
{
if (spell.type == Spell.SpellType.SPELL)
System.out.println(spell.name);
}
Here an if
statement selects just the spells so that the charms and curses aren’t listed.
Here’s the same thing using streams:
spells.stream()
.filter(s -> s.type == Spell.SpellType.SPELL)
.forEach(s -> System.out.println(s));
In this example, the stream
method converts the ArrayList
to a stream. Then the stream’s filter
method is used to select just the SPELL
items. Finally, the forEach
method sends the selected items to the console. Notice that lambda expressions are used in both the forEach
method and the filter
method.
The filter
method of the Stream
class returns a Stream
object. Thus, it is possible to apply a second filter to the result of the first filter, like this:
spells.parallelStream()
.filter(s -> s.type == Spell.SpellType.SPELL)
.filter(s -> s.name.toLowerCase().startsWith("a"))
.forEach(s -> System.out.println(s));
In this example, just the spells that start with the letter A are listed.
The Stream
interface defines about 40 methods. In addition, three related interfaces — DoubleStream
, IntStream
, and LongStream
— extend the Stream
interface to define operations that are specific to a single data type: double
, int
, and long
. Table 6-1 lists the most commonly used methods of these interfaces.
TABLE 6-1 The Stream and Related Interfaces
Methods that Return Streams |
Explanation |
|
Returns a stream consisting of distinct elements of the input stream. In other words, duplicates are removed. |
|
Returns a stream having no more than |
|
Returns a stream consisting of those elements in the input stream that match the conditions of the predicate. |
|
Returns the stream elements in sorted order using the natural sorting method for the stream's data type. |
|
Returns the stream elements in sorted order using the specified |
Mapping Methods |
Explanation |
|
Returns a stream created by applying the |
|
Returns a |
|
Returns an |
|
Returns a |
Terminal and Aggregate Methods |
Explanation |
|
Executes the action against each element of the input stream. |
|
Executes the action against each element of the input stream, ensuring that the elements of the input stream are processed in order. |
|
Returns the number of elements in the stream. |
|
Returns the largest element in the stream. |
|
Returns the smallest element in the stream. |
|
Returns the average value of the elements in the stream. Valid only for |
|
Returns the sum of the elements in the stream. Result type is |
|
Returns a summary statistics object that includes property methods named |
The first group of methods in Table 6-1 define methods that return other Stream
objects. Each of these methods manipulates the stream in some way, then passes the altered stream down the pipeline to be processed by another operation.
The filter
method is one of the most commonly used stream methods. It's argument, called a predicate, is a function that returns a boolean value. The function is called once for every element in the stream and is passed a single argument that contains the element under question. If the method returns true
, the element is passed on to the result stream. If it returns false
, the element is not passed on.
The easiest way to implement a filter predicate is to use a lambda expression that specifies a conditional expression. For example, the following lambda expression inspects the name field of the stream element and returns true
if it begins with the letter a (upper- or lowercase):
s -> s.name.toLowerCase().startsWith("a")
The other methods in the first group let you limit the number of elements in a stream or sort the elements of the stream. To sort a stream, you can use either the element’s natural sorting order, or you can supply your own comparator, either as a function or as an object that implements the Comparator
interface.
The second group of methods in Table 6-1 are called mapping methods because they convert a stream whose elements are of one type to a stream whose elements are of another type. The mapping function, which you must pass as a parameter, is responsible for converting the data from the first type to the second.
One common use for mapping methods is to convert a stream of complex types to a stream of simple numeric values of type double
, int
, or long
, which you can then use to perform an aggregate calculation such as sum
or average
. For example, suppose HP's spells were for sale and the Spell
class included a public field named price
. To calculate the average price of all the spells, you would first have to convert the stream of Spell
objects to a stream of doubles. To do that, you use the mapToDouble
method. The mapping function would simply return the price
field:
.mapToDouble(s -> s.price)
Methods in the last group in Table 6-1 are called terminal methods because they do not return another stream. As a result, they are always the last methods called in stream pipelines. Note that if you don’t call a terminal method, no data from the stream will be processed — the terminal method is what gets the ball rolling.
You have already seen the forEach
method in action; it provides a function that is called once for each element in the stream. Note that in the examples so far, the function to be executed on each element has consisted of just a single method call, so I’ve included it directly in the lambda expression. If the function is more complicated, you can isolate it in its own method. Then the lambda expression should call the method that defines the function.
Aggregate methods perform a calculation on all of the elements in the stream, then return the result. Of the aggregate methods, count
is straightforward: It simply returns the number of elements in the stream. The other aggregate methods need a little explanation because they return an optional data type. An optional data type is an object that might contain a value, or it might not.
For example, the average
method calculates the average value of a stream of integers, longs, or doubles and returns the result as an OptionalDouble
. If the stream was empty, the average is undefined, so the OptionalDouble
contains no value. You can determine if the OptionalDouble
contains a value by calling its isPresent
method, which returns true if there is a value present. If there is a value, you can get it by calling the getAsDouble
method.
Here’s an example that calculates the average price of spells:
OptionalDouble avg = spells.stream()
.mapToDouble(s -> s.price)
.average();
Here is how you would write the average price to the console:
if (avg.isPresent())
{
System.out.println("Average = "
+ avg.getAsDouble());
}
Streams come in two basic flavors: sequential and parallel. Elements in a sequential stream are produced by the stream
method and create streams that are processed one element after the next. Parallel streams, in contrast, can take full advantage of multicore processors by breaking its elements into two or more smaller streams, performing operations on them, and then recombining the separate streams to create the final result stream. Each of the intermediate streams can be processed by a separate thread, which can improve performance for large streams.
By default, streams are sequential. But creating a parallel stream is easy: Just use the parallelStream
method instead of the stream
method at the beginning of the pipeline.
For example, to print all of HP’s spells using a parallel stream, use this code:
spells.parallelStream()
.forEach(s -> System.out.println(s));
Note that when you use a parallel stream, you can’t predict the order in which each element of the stream is processed. That’s because when the stream is split and run on two or more threads, the order in which the processor executes the threads is not predictable.
To demonstrate this point, consider this simple example:
System.out.println("First Parallel stream: ");
spells.parallelStream()
.forEach(s -> System.out.println(s));
System.out.println("\nSecond Parallel stream: ");
spells.parallelStream()
.forEach(s -> System.out.println(s));
When you execute this code, the results will look something like this:
First parallel stream:
Fidelius
Finite Incatatum
Engorgio
Locomotor Mortis
Aparecium
Avis
Second parallel stream:
Fidelius
Engorgio
Finite Incatatum
Locomotor Mortis
Avis
Aparecium
Notice that although the same spells are printed for each of the streams, they are printed in a different order.