The collect operation

Some of the usages of the collect() operation are very simple and recommended for any beginner, while other cases can be complex and inaccessible even for a seasoned programmer. Together with the operations discussed already, the most popular cases of collect() we present in this section are more than enough for all the needs a beginner may have. Add the operations of numeric streams we are going to present in the Numeric stream interfaces section, and the covered material may easily be all a mainstream programmer will need for the foreseeable future.

As we have mentioned already, the collect operation is very flexible, and allows us to customize the stream processing. It has two forms:

Let's look at the second form of the collect() operation. It is very similar to the reduce() operation, with the three parameters we have just demonstrated. The biggest difference is that the first parameter in the collect() operation is not identity or initial value, but the container—an object—that is going to be passed between functions and that maintains the state of the processing. For the following example, we are going to use the Person1 class as the container:

class Person1 {
private String name;
private int age;
public Person1(){}
public String getName() { return this.name; }
public void setName(String name) { this.name = name; }
public int getAge() {return this.age; }
public void setAge(int age) { this.age = age;}
@Override
public String toString() {
return "Person{name:" + this.name + ",age:" + age + "}";
}
}

As you can see, the container has to have a constructor without parameters and setters, because it should be able to receive and keep the partial results—the name and age of the person that is the oldest, so far. The collect() operation will use this container while processing each element and, after the last element is processed, will contain the name and the age of the oldest person. Here is the list of people, which should be familiar to you:

List<Person> list = List.of(new Person(23, "Bob"),
new Person(33, "Jim"),
new Person(28, "Jill"),
new Person(27, "Bill"));

And here is the collect() operation that should find the oldest person in the list:

Person1 theOldest = list.stream().collect(Person1::new,
(p1, p2) -> {
if(p1.getAge() < p2.getAge()){
p1.setAge(p2.getAge());
p1.setName(p2.getName());
}
},
(p1, p2) -> { System.out.println("Combiner is called!"); });

We tried to inline the functions in the operation call, but it looks a bit difficult to read, so here is the better version of the same code:

BiConsumer<Person1, Person> accumulator = (p1, p2) -> {
if(p1.getAge() < p2.getAge()){
p1.setAge(p2.getAge());
p1.setName(p2.getName());
}
};
BiConsumer<Person1, Person1> combiner = (p1, p2) -> {
System.out.println("Combiner is called!"); //prints nothing
};
theOldest = list.stream().collect(Person1::new, accumulator, combiner);
System.out.println(theOldest); //prints: Person{name:Jim,age:33}

The Person1 container object is created only once—for the first element processing (in this sense, it is similar to the initial value of the reduce() operation). Then it is passed to the accumulator that compared it with the first element. The age field in the container was initialized to the default value of zero and thus, the age and name of the first element were set in the container as the parameters of the oldest person, so far. 

When the second element (the Person object) of the stream was emitted, its age field was compared with the age value currently stored in the container (the Person1 object), and so on, until all elements of the stream were processed. The result is shown in the preceding comments.

The combiner was never called because the stream is not parallel. But when we make it parallel, we need to implement the combiner as follows:

BiConsumer<Person1, Person1> combiner = (p1, p2) -> {
System.out.println("Combiner is called!"); //prints 3 times
if(p1.getAge() < p2.getAge()){
p1.setAge(p2.getAge());
p1.setName(p2.getName());
}
};
theOldest = list.parallelStream()
.collect(Person1::new, accumulator, combiner);
System.out.println(theOldest); //prints: Person{name:Jim,age:33}

The combiner compares the partial results (of all the stream subsequences) and comes up with the final result. Now we see the Combiner is called! message printed three times. But, as in the case of the reduce() operation, the number of partial results (the stream subsequences) may vary.

Now let's look at the first form of the collect() operation. It requires an object of the class that implements the java.util.stream.Collector<T,A,R> interface where T is the stream type, A is the container type, and R is the result type. One can use the of() method of the Collector interface to create a necessary Collector object:

The functions one has to pass to the preceding methods are similar to those we have demonstrated already. But we are not going to do it for two reasons. First, it is somewhat more involved and pushes beyond the scope of this introductory course, and, second, before doing that, one has to look in the java.util.stream.Collectors class that provides many ready-to-use collectors.  As we have mentioned already, together with the operations discussed in this book and the numeric streams operations we are going to present in the Numeric stream interfaces section, they cover the vast majority of the processing needs in mainstream programming, and there is a good chance you would never need to create a custom collector at all.