Collectors play an important role in Java 8 streams processing. They ‘collect’ the processed elements of the stream into a final representation. Invoking the
collect() method on a Stream, with a Collector instance passed as a parameter ends that Stream’s processing and returns back the final result.
Stream.collect() is thus a terminal operationClick to Read Tutorial explaining intermediate & terminal Stream operations.
But what exactly does a Collector do apart from ending a Stream and handing back the processed data? Which internal components of a Collector work in tandem to enable to produce the resulting Collection? Which collection operations are pre-defined as part of Java 8 language API?
In this tutorial I will attempt to answer the above fundamental questions about Collectors. We will start with formally defining a Collector and its capabilities. Next, we will see various components of a Collector and understand how they work together. We will then look at the interface
java.util.stream.Collector and understand how the components of Collector use the interface members. Next, we will have a quick overview of the commonly used predefined collectors defined in
java.util.Stream.Collectors class. We will finish this tutorial with a Java code example showing a predefined Collector in action.
What is java.util.stream.Collector – formal definition
The official java doc for
java.util.stream.Collector defines it as1 –
A Collector is a mutable reduction operation that accumulates input elements into a mutable result container, optionally transforming the accumulated result into a final representation after all input elements have been processed.
The above definition does seem a bit overbearing in terminology, at least for someone new to functional programming per say! Let me make an attempt to make it more comprehensible by stating it as below –
A Collector is a terminal operation which reduces the stream being processed to a Collection. For this reduction it uses a single modifiable collection instance in which it adds all the processed stream elements as it encounters them. Collector also comes with the feature of optionally mapping the stream elements to an equivalent form using specified logic while they are being collected.
I hope the above definition makes it clear what you can do with a Collector at least at a high level. We will of course take a deep dive into understanding Collectors, but before that I need to explain an important concept mentioned in the formal Collector definition above – that of a mutable reduction operation.
Stream.collect())collects the stream elements in a mutable result container(collection) as it processes them. Mutable reduction operations provide much improved performance when compared to an immutable reduction operation(such as
Stream.reduce()). This is due to the fact that the collection holding the result at each step of reduction is mutable for a Collector and can be used again in the next step.
Stream.reduce()operation, on the other hand, uses immutable result containers and as a result needs to instantiate a new instance of the container at every intermediate step of reduction which degrades performance.
What (all) does a Collector do internally when it collects?
A collector collects the elements of a stream into a mutable container as we understood above. But internally it does a lot more than simply ‘collect’. Drawn next is a diagram showing a Collector in action as it collects –
As you can see in the above diagram there are broadly 5 steps(marked by 5 orange markers) which a typical Collector goes through while processing a Stream of elements. Let us quickly understand each of these steps –
- Step 1 – Supplier provides the mutable empty result container: Supplier is an instance of the SupplierClick to Read Tutorial explaining Supplier functional interface functional interface which provides an instance of a Collection(or Map) to hold the collected elements.
- Step 2 – Accumulator adds individual elements into the result container: Accumulator is an instance of
BiConsumerfunctional interface. It adds individual elements of stream encountered by it into the result container. Accumulation action in this step is known as a fold in functional programming parlance.
- Step 3 – Combiner combines two partial results: Combiner is an instance of a
BinaryOperatorfunctional interface which combines two partial results returned by two separate groups of accumulations done in parallel.
- Step 4 – Optional Finisher to put the processed elements in a desired form: Finisher is an instance of a FunctionClick to Read Tutorial explaining Function<T,R> functional interface interface and its use is optional. If required, a Finisher can be used to map the collected elements in the result container to a different required form.
- Step 5 – Final Result: The final collected elements are returned by the Collection in the result container i.e. Collection instance.
Having seen the four important components of a Collector, viz. Supplier, Accumulator, Combiner and Finisher, it is time now to get introduced to the interface –
java.util.stream.Collector) interface is defined as follows –
public Interface Collector<T,A,R>
T is element type being processed by the Stream and is to be ‘collected’
A is the type of the accumulated result container which keeps on getting elements (of type
T) added throughout the ‘collecting’ process.
R is the type of the result container, or the collection, which is returned back as the ‘final’ output by the collector
How Collector interface members are used by 4 components of a Collector
- Supplier provides empty instance(or instances for parallel collectors) of type A to begin the accumulation of elements
- Accumulator uses an instance of A to collect T.
- Combiner combines two partial accumulated results of type A to produce a combined instance of A.
- Finisher maps A to R using a mapping function.
So far, we have understood the components of a Collector, and how these components work together and produced the final results. At this point you may be wondering whether for simple tasks, such as collecting the processed elements of a Stream, you will need to implement so many types and components? The good news is – you need not! This is where the predefined collectors come in handy.
Predefined collectors overview
Java 8 designers have thought of the most common mutable reduction operations which might be required by application developers. Implementation for these operations have been provided as individual static methods in
java.util.stream.Collectors class. Let us now take a look at the important reduction operations already implemented in
Collectors class and their purpose.
|averaging||averagingDouble(), averagingLong(), averagingInt()Click to Read Tutorial on How-to use Averaging Collector for int/long/double streams||To average elements of type Double/Long/Integer after applying a mapping function to the elements to extract respective values to be averaged|
|counting||counting()Click to Read Tutorial on Counting with Collectors||Count the number of stream elements|
|grouping||groupingBy()Click to Read Tutorial on Grouping with Collectors||To produce Map of elements grouped by grouping criteria provided|
|String concatenation||joining()Click to Read Tutorial on Joining as Strings with Collectors||For concatenation of stream elements into a single String|
|mapping||mapping()Click to Read Tutorial on How-to use Mapping Collector||Applying a mapping operation to all stream elements being collected|
|minimum and maximum determination||minBy()/maxBy()Click to Read Tutorial on finding Max/Min with Collectors||To find minimum/maximum of all stream elements based on Comparator provided|
|partitioning||partitioningBy()Click to Read Tutorial on Partitioning with Collectors||To partition stream elements into a Map based on the Predicate provided|
|reduction||reducing()||Reducing elements of stream based on BinaryOperator function provided|
|summarization||summarizingDouble(), summarizingLong(), summarizingInt()||To summarize stream elements after mapping them to Double/Long/Integer value using specific type Function|
|summation||summingDouble(), summingLong(), summingInt()||To sum-up stream elements after mapping them to Double/Long/Integer value using specific type Function|
|collect into a Collection||toCollection()Click to Read Tutorial on How-to use toCollection Collector||To collect stream elements into a collection|
|collect into a Map/ConcurrentMap||toMap()/toConcurrentMap()||To collect stream elements into a map/concurrent map after applying provided key/value determination Function instances to the elements|
|collect in a List||toList()||Collects stream elements in a List|
|collect in a Set||toSet()||Collects stream elements in a Set|
|collect and transform||collectingAndThen()Click to Read Tutorial on How-to use collectingAndThen Collector||Collects stream elements and then transforms them using a Function|
As you can see, the above list of predefined collectors is quite exhaustive and covers a wide range collector usages. As a result, most of the times your collecting needs will be fulfilled by a Collector from the list above. In rare cases, when you need to collect in a way different from those listed, you will have to implement your own custom Collector.
Let us now see how to use a predefined collector returned by method
Collectors.partitioningBy(). Using this predefined
Collector one can easily partition the elements of a stream into a
Map with elements divided into
false entries based on an input PredicateClick to Read Tutorial explaining Predicate functional interface passed to it.
(Note – If interested, you can read the full tutorial on partitioningBy Collector hereClick to Read Partitioning using Collectors Tutorial)
Java 8 Code example showing pre-defined Collector in action
Employeeis the POJO class for the above example, which contains 2 attributes –
- In the
BasicCollectorclass we first create a
Listof 5 employees, named
employeeList, using Arrays.asList()Click to Read Tutorial explaining how Arrays.asList() method works method.
- Next we create a stream of elements of
- We then collect this stream of employee objects using the pipelinedClick to Read tutorial explaining Concept of Pipelines in Computing
collect()method to which we pass an instance of
Collectors.partitioningBy()method. We partition the employees based on whether their
age > 30or not. Accordingly, we pass a lambda expressionClick to Read Java 8 Lambda Expressions Tutorial for this predicate as input to the
- We then print the entries for
employeeMapreturned by the
Map.forEach()method. The output is as expected – a
Mapcontaining two entries for keys –
false, and the values contains lists of
Employeeobjects satisfying/not satisfying the predicate passed.
Summary and further articles in the Java 8 Collector Series
This tutorial was an introduction to the basics of
Collector interface, the components of a Collector and how these components act in cohesion to collect the stream elements into a final collection. We also had an overview of the predefined collectors defined in
java.util.stream.Collectors class and saw a code example showing a predefined collector in action.
The stage is now set for us to delve deeper into the concepts of collectors. To begin with, in the next article of Collectors series, we will explore how a Collector can perform its duty of collecting stream elements in parallel to improve performance. This will be followed by detailed individual tutorials dedicated to each of the predefined Collectors we saw briefly above. The Collectors series will finally culminate with a tutorial explaining how to create a custom collector of your own.
1. Official Java 8 Collector API