Some of the usages of the collect() operation are very simple and can be easily mastered by any beginner, while other cases can be complex and not easy to understand even for a seasoned programmer. Together with the operations discussed already, the most popular cases of collect() usage we present in this section are more than enough for all the needs a beginner may have and will cover most needs of a more experienced professional. Together with the operations of numeric streams (see the next section Numeric stream interfaces), they cover all the needs a mainstream programmer will ever have.
As we have mentioned already, the collect() operation is very flexible and allows us to customize stream processing. It has two forms:
- R collect(Collector<T, A, R> collector): Processes the stream elements of type T using the provided Collector and producing the result of type R via an intermediate accumulation of type A
- R collect(Supplier<R> supplier, BiConsumer<R, T> accumulator, BiConsumer<R, R> combiner): Processes the stream elements of type T using the provided functions:
- Supplier<R> supplier: Creates a new result container
- BiConsumer<R, T> accumulator: A stateless function that adds an element to the result container
- BiConsumer<R, R> combiner: A stateless function that merges two partial result containers: adds the elements from the second result container into the first result container
Let's look at the second form of the collect() operation first. It is very similar to the reduce() operation with three parameters we have just demonstrated: supplier, accumulator, and combiner. The biggest difference is that the first parameter in the collect() operation is not an identity or the initial value, but the container, an object, that is going to be passed between functions and which maintains the state of the processing.
Let's demonstrate how it works by selecting the oldest person from the list of Person objects. For the following example, we are going to use the familiar Person class as the container but add to it a constructor without parameters and two setters:
public Person(){}
public void setAge(int age) { this.age = age;}
public void setName(String name) { this.name = name; }
Adding a constructor without parameters and setters is necessary because the Person object as a container should be creatable at any moment without any parameters and should be able to receive and keep the partial results: the name and age of the person who is the oldest, so far. The collect() operation will use this container while processing each element and, after the last element is processed, will contain the name and the age of the oldest person.
We will use again the same list of persons:
List<Person> list = List.of(new Person(23, "Bob"),
new Person(33, "Jim"),
new Person(28, "Jill"),
new Person(27, "Bill"));
And here is the collect() operation that finds the oldest person in the list:
BiConsumer<Person, Person> accumulator = (p1, p2) -> {
if(p1.getAge() < p2.getAge()){
p1.setAge(p2.getAge());
p1.setName(p2.getName());
}
};
BiConsumer<Person, Person> combiner = (p1, p2) -> {
System.out.println("Combiner is called!");
if(p1.getAge() < p2.getAge()){
p1.setAge(p2.getAge());
p1.setName(p2.getName());
}
};
Person theOldest = list.stream()
.collect(Person::new, accumulator, combiner);
System.out.println(theOldest); //prints: Person{name='Jim', age=33}
We tried to inline the functions in the operation call, but it looked a bit difficult to read, so we decided to create functions first and then use them in the collect() operation. The container, a Person object, is created only once before the first element is processed. In this sense, it is similar to the initial value of the reduce() operation. Then it is passed to the accumulator, which compares it to the first element. The age field in the container was initialized to the default value of zero and thus, the age and name of the first element were set in the container as the parameters of the oldest person, so far. When the second stream element (Person object) is emitted, its age value is compared to the age value currently stored in the container, and so on, until all elements of the stream are processed. The result is shown in the previous comments.
When the stream is sequential, the combiner is never called. But when we make it parallel (list.parallelStream()), the message Combiner is called! is printed three times. Well, as in the case of the reduce() operation, the number of partial results may vary, depending on the number of CPUs and the internal logic of the collect() operation implementation. So, the message Combiner is called! can be printed any number of times.
Now let's look at the first form of the collect() operation. It requires an object of the class that implements the java.util.stream.Collector<T,A,R> interface, where T is the stream type, A is the container type, and R is the result type. You can use one of the following methods of() (from the Collector interface) to create the necessary Collector object:
static Collector<T,R,R> of(Supplier<R> supplier,
BiConsumer<R,T> accumulator,
BinaryOperator<R> combiner,
Collector.Characteristics... characteristics)
Or
static Collector<T,A,R> of(Supplier<A> supplier,
BiConsumer<A,T> accumulator,
BinaryOperator<A> combiner,
Function<A,R> finisher,
Collector.Characteristics... characteristics).
The functions you have to pass to the preceding methods are similar to those we have demonstrated already. But we are not going to do this, for two reasons. First, it is more involved and pushes us beyond the scope of this book, and, second, before doing that, you have to look in the java.util.stream.Collectors class, which provides many ready-to-use collectors.
As we have mentioned already, together with the operations discussed so far and the numeric streams operations we are going to present in the next section, the ready-to-use collectors cover the vast majority of the processing needs in mainstream programming, and there is a good chance you will never need to create a custom collector.