Java 8 Grouping with Collectors | groupingBy method tutorial with examples

Introduction – Java 8 Grouping with Collectors tutorial explains how to use the predefined Collector returned by groupingBy() method of java.util.stream.Collectors class with examples.

The tutorial begins with explaining how grouping of stream elements works using a Grouping Collector. The concept of grouping is visually illustrated with a diagram. Next, the three overloaded groupingBy() methods in Collectors class are explained using their method definitions, Java code examples showing the 3 methods in action and explanations for the code examples. Lastly, a brief overview of the concurrent versions of the three groupingBy() methods is provided.
(Note – This tutorial assumes that its readers are familiar with the basics of Java 8 CollectorsRead Tutorial explaining basics of Java 8 Collectors.)

Understanding the concept of ‘grouping’ using Collectors
Given a stream of objects, there are scenarios where these objects need to be grouped based on a certain distinguishing characteristic they posses. This concept of grouping is the same as the ‘group by’ clause in SQL which takes an attribute, or a calculated value derived from attribute(s), to divide the retrieved records in distinct groups. Generally, in imperative style of programming, such grouping of records(objects in OOPS) involves iterating over each object, checking which group the object being examined falls in, and then adding that object in its correct group. The group itself is held together using a Collection instance. Java 8’s new functional features allow us to do the same grouping of objects in a declarative way, which is typical of functional rather than imperativeClick to Read tutorial explaining how functional & imperative programming styles differ style of programming, using Java 8’s new Grouping Collector.

Grouping collectors use a classification function, which is an instance of the Function<T,R> functional interface, which for every object of type T in a stream, returns a classifier object of type R. Various values of R, finite in number, are the ‘group names’ or ‘group keys’. As the grouping collector works on the stream of objects its collecting from it creates collections of stream objects corresponding to each of the ‘group keys’. I.e. for every value of R there is a collection of objects all of which return that value of R when subjected to the classification function.

All these R-values and corresponding Collection of stream objects are stored by the grouping collector in a Map<R, Collection<T>>, i.e. each ‘key,value’ entry in the map consists of ‘R,Collection<T>’.

The process of grouping, starting from the application of classification function on the stream elements, till the creation of Map containing the grouped elements, is as shown in the diagram below –

Java 8 Grouping with Collectors - Collectors.groupingBy() method
In the above diagram, the elements of Stream<T> are grouped using a classification function returning 4 values of Rr1,r2,r3,r4. The grouped elements are stored in a Map<R,Collection<T>>, with the 4 values of R being used as 4 keys pointing to 4 corresponding collections stored in the Map. These Collection instances hold the individual grouped elements, which is the required output from the grouping collector.

Having understood now the concept of grouping with collectors, let us now see how to implement grouping collectors in code using the 3 overloaded groupingBy() method variants provided in Collectors class, starting from the simplest variant which creates a List of the grouped elements.

Variant #1 of Collectors.groupingBy() method – stores grouped elements in a List
The simplest of Collectors.groupingBy() method variants is defined with the following signature –

public static <T, K> Collector<T, ?, Map<K, List<T>>> groupingBy(Function<? super T, ? extends K> classifier)

Where,
     – input is classifier which is an instance of a FunctionClick to read detailed tutorial on Function Functional Interfaces functional interface which converts from type T to type K.
     – output is a Collector with finisherClick to Read tutorial on 4 components of Collectors incl. ‘finisher’(return type) as a Map with entries having ‘key,value’ pairs as ‘K, List<T>

The simplest variant of groupingBy() method applies classifier Function<T,R> to each individual element of type T collected from Stream<T>. It then groups elements into individual lists based on the value of R they return on application of classifier function, and stores them in a Map<R,List<T>>, using the process we had understood in the previous section explaining how a grouping collector operates.

Variant #1 of grouping collector – Java Example
Lets say we have a stream of Employee objects, belonging to a company, who need to be grouped by their departments, with their Department present as an attribute in the Employee object. As the end result of applying the grouping collector for achieving this we want a Map with keys as departments and corresponding values as List of employees in that department. Diagrammatically such as an implementation would be represented as shown below –

Java 8 Example for Collectors.groupingBy() method
In the above diagram, employees are grouped into 4 departments – HR, OPERATIONS, LEGAL and MARKETING. Let us now see the Java code for implementing the above ‘Department – Employees’ use case, followed by its explanation.
Java 8 code example for Variant #1 of Collectors.groupingBy()
package com.javabrahman.java8.collector;
import java.util.Arrays;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;

public class GroupingWithCollectors {
  static List<Employee> employeeList = Arrays.asList(
      new Employee("Tom Jones", 45, 12000.00,Department.MARKETING),
      new Employee("Harry Major", 26, 20000.00, Department.LEGAL),
      new Employee("Ethan Hardy", 65, 30000.00, Department.LEGAL),
      new Employee("Nancy Smith", 22, 15000.00, Department.MARKETING),
      new Employee("Catherine Jones", 21, 18000.00, Department.HR),
      new Employee("James Elliot", 58, 24000.00, Department.OPERATIONS),
      new Employee("Frank Anthony", 55, 32000.00, Department.MARKETING),
      new Employee("Michael Reeves", 40, 45000.00, Department.OPERATIONS));
  public static void main(String args[]){
    Map<Department,List<Employee>> employeeMap
        = employeeList.stream().collect(Collectors.groupingBy(Employee::getDepartment));
    System.out.println("Employees grouped by department");
    employeeMap.forEach((Department key, List<Employee> empList) -> System.out.println(key +" -> "+empList));
    }
}
//Employee.java - POJO Class
package com.javabrahman.java8.collector;
public class Employee {
  private String name;
  private Integer age;
  private Double salary;
  private Department department;

  public Employee(String name, Integer age, Double salary, Department department) {
    this.name = name;
    this.age = age;
    this.salary = salary;
    this.department = department;
  }

  // Setters/Getters for name,age,salary,department go here

  public String toString(){
    return "Employee Name:"+this.name;
  }

  //Standard equals and hashcode implementations go here

}
//Enum Department.java
package com.javabrahman.java8.collector;
public enum Department {
  HR, OPERATIONS, LEGAL, MARKETING
}
 OUTPUT of the above code
Employees grouped by department
HR -> [Employee Name:Catherine Jones]
LEGAL -> [Employee Name:Harry Major, Employee Name:Ethan Hardy]
OPERATIONS -> [Employee Name:James Elliot, Employee Name:Michael Reeves]
MARKETING -> [Employee Name:Tom Jones, Employee Name:Nancy Smith, Employee Name:Frank Anthony]
Explanation of the code

  • Employee is the POJO class in the above example of which we create a Stream. It has four attributes – name, age, department and salary.
  • Department is an Enum with the following values – HR, OPERATIONS, LEGAL, MARKETING.
  • employeeList is a static list of 8 Employees.
  • In the main() method of GroupingWithCollectors class we create a Stream of Employees using the stream() method of List interface.
  • On the stream of Employees we call the collect() method with predefined Collector returned by Collectors.groupingBy() method as the parameter.
  • The classification function passed to groupingBy() method is the method referenceClick to Read Tutorial on Java 8’s Method References to Employee.getDepartment() method specified as "Employee::getDepartment".
  • Lastly, the Map of employees grouped by department is printed using Map.forEach() method. The output is as expected – map contains entries of ‘key,value’in the form of ‘Department, List<Employee>’ with an entry for containing a Department as key having the List of Employees of that Department stored as value.

Variant #2 of Collectors.groupingBy()- uses a user specified Collector to collect grouped elements
Whereas the 1st variant always returned a List containing the elements of a group, the 2nd variant of grouping collector provides the flexibility to specify how the grouped elements need to be collected using a second parameter which is a Collector. So, instead of just storing the groups in resultant Map as Lists, we can instead store them in say Sets, or find the maximum value in each group and store it rather than storing all the elements of a group, or any such collector operation which is applicable on the stream elements.

The 2nd variant of grouping collector is defined with the following signature –

Collector<T, ?, Map<K, D>> groupingBy(Function<? super T, ? extends K> classifier,
Collector<? super T, A, D> downstream)

Where,
     – 1st input parameter is classifier which is an instance of a FunctionClick to read detailed tutorial on Function Functional Interfaces functional interface which converts from type T to type K.
     – 2nd input parameter is downstream collector which collects the grouped elements into type D, where D is the specified finisherClick to Read tutorial on 4 components of Collectors incl. ‘finisher’.
     – output is a Collector with finisher(return type) as a Map with entries having ‘key,value’ pairs as ‘K, D

How variant#1 and variant#2 of grouping collector are closely related
In the Collectors class’ code, the first variant of grouping Collector which accepts just the classification function as input does not itself return the Collector which processes the Stream elements. Instead, internally it delegates the call forward to the second variant with the call – groupingBy(classifier, toList()). So, first variant of grouping collector is thus just a convenient way of invoking the second variant with the downstream collector ‘hardcoded’ as a List.

Let us now see the 2nd variant of grouping collector in action with a Java code example.

Variant #2 of grouping collector – Java Example
This example for variant#2 uses the same use case of employees being grouped as per their department but this time instead of storing the grouped elements in a List, we will instead store them inside a Set in the resultant Map.
(Note – The Employee class and employeeList objects with their values remain the same as the previous code usage example and hence are not shown below for brevity.)

Java 8 code example for VARIANT #2 of Collectors.groupingBy()
  public static void main(String args[]){
    Map<Department,Set<Employee>> employeeMap
      = employeeList.stream()
        .collect(Collectors.groupingBy(Employee::getDepartment, Collectors.toSet()));
    System.out.println("Employees grouped by department");
    employeeMap.forEach((Department key, Set<Employee> empSet) -> System.out.println(key +" -> "+empSet));
    }
 OUTPUT of the above code
Employees grouped by department
HR -> [Employee Name:Catherine Jones]
LEGAL -> [Employee Name:Harry Major, Employee Name:Ethan Hardy]
OPERATIONS -> [Employee Name:James Elliot, Employee Name:Michael Reeves]
MARKETING -> [Employee Name:Tom Jones, Employee Name:Nancy Smith, Employee Name:Frank Anthony]
Explanation of the code

  • The code above is ‘nearly’ the same as the code for 1st variant of grouping collector. The main difference is that Collectors.grouping() method is now passed a second parameter – Collectors.toSet() – which tells the grouping collector to collect the grouped values in individual Sets.
  • The output with employees grouped in Sets looks the same as 1st variant’s output as individual set elements are enclosed between square brackets -‘[]’ – just like they were for Lists. But, if you look closely at the code then you will find that the employeeMap.forEach() method call now has a Set<Employee> specified as the type of value rather than a List which was the case in the 1st variant.

Variant #3 of Collectors.groupingBy()- with user specified Supplier function for Map creation and Collector to collect grouped elements
Whereas the 1st variant always returned a List containing the elements of a group, the 2nd variant of grouping collector provides the flexibility to specify how the grouped elements need to be collected, the 3rd variant adds the capability to specify how the Map which holds the result is created. So, using the 3rd variant of grouping Collector it can be specified whether the resultant Map containing the grouped values is a HashMap or a TreeMap, or some user specified type of Map.

The 3rd variant of grouping collector is defined with the following signature –

Collector<T, ?, M> groupingBy(Function<? super T, ? extends K> classifier, Supplier<M> mapFactory, Collector<? super T, A, D> downstream)

Where,
     – 1st input parameter is classifier which is an instance of a FunctionClick to read detailed tutorial on Function Functional Interfaces functional interface which converts from type T to type K.
     – 2nd input parameter is Supplier<M>Click to read detailed tutorial on Supplier Functional Interfaces which is a factoryClick to Read Tutorial on Factory Design Pattern supplying Maps of type M.
     – 3rd input parameter is downstream collector which collects the grouped elements into type D, where D is the specified finisherClick to Read tutorial on 4 components of Collectors incl. ‘finisher’.
     – output is a Collector with finisher(return type) as a Map with entries having ‘key,value’ pairs as ‘K, D

How variant#2 and variant#3 of grouping collector are closely related
In the Collectors class’ code, the second variant of grouping Collector which accepts the classification function along with downstream collector as input does not itself return the collector which processes the stream elements. Instead, internally it delegates the call forward to the third variant with the call – groupingBy(classifier, HashMap::new, downstream);. So, second variant of grouping collector is thus just a convenient way of invoking the third variant with the Map factory Supplier ‘hardcoded’ as HashMap::new.

Going back a bit, we said something similar about the first and second groupingBy() variants as well. Thus, we actually have a transitive kind of relationship between the three variants. Variant #1 calls variant #2 with downstream collector hardcoded, and variant #2 calls variant #3 with Map Supplier factory hardcoded. Inferring transitively, we can now say that variant #1 actually calls variant #3 with both the downstream collector and Map Supplier factory hardcoded.

Fortunately, the transitive offloading/delegation between variants ends at variant #3 which actually contains the entire collector logic for a grouping collector.

Let us now see a Java code example showing how to use the 3rd variant of grouping collector.

Variant #3 of grouping collector – Java Example
This example for variant #3 uses the same use case of employees being grouped as per their department. However, this time we will store the grouped elements in a Set and tell the grouping collector to store the grouped employees in a TreeMap instance instead of the default HashMap instance that was internally hardcoded in variant #2.
(Note – The Employee class and employeeList objects with their values remain the same as the previous code usage example and hence are not shown below for brevity.)

Java 8 code example for VARIANT #3 of Collectors.groupingBy()
  public static void main(String args[]){
    Map<Department,Set<Employee>> employeeMap
      = employeeList.stream()
        .collect(Collectors.groupingBy(Employee::getDepartment, TreeMap::new, Collectors.toSet()));
    System.out.println("Employees grouped by department");
    employeeMap.forEach((Department key, Set<Employee> empSet) -> System.out.println(key +" -> "+empSet));
    }
 OUTPUT of the above code
Employees grouped by department
HR -> [Employee Name:Catherine Jones]
OPERATIONS -> [Employee Name:James Elliot, Employee Name:Michael Reeves]
LEGAL -> [Employee Name:Harry Major, Employee Name:Ethan Hardy]
MARKETING -> [Employee Name:Tom Jones, Employee Name:Nancy Smith, Employee Name:Frank Anthony]
Explanation of the code

  • The code above is ‘nearly’ the same as the code for 2nd variant of grouping collector. The main difference is that Collectors.grouping() method is now passed a third parameter as well – TreeMap::new() – which tells the grouping collector to collect the grouped values in an instance of a TreeMap.
  • The output with employees grouped in Sets corresponding to their departments is similar to what we saw in the java examples for 1st and 2nd variants. However, this time the department names, which are the keys of the result Map, are arranged in alphabetical order which was not the case in the previous outputs. This alphabetical ordering is because of the use of TreeMap this time which automatically sorts its entries based on the natural ordering of its keys.

Concurrent versions of grouping collector
We saw three groupingBy() method variants above which are good but not optimized for concurrent execution. In case you want to execute grouping collectors in a concurrent manner in a multi-threaded execution environment, then you can utilize the three overloaded methods in java.util.stream.Collectors class all of whom are named groupingByConcurrent(). These three concurrent methods have exactly the same signature as their non-concurrent counterparts – the same input parameters and the same return types respectively – their usage, apart from being used in concurrent contexts, is exactly the same as described above.

Conclusion
In the above tutorial we understood what the concept of grouping in the context of collectors entails, looked at the three grouping collector variants, understood their definition and working in depth using diagrams, code examples, and then saw how the three variants of groupingBy() methods are closely interlinked. Lastly, we touched upon the concurrent grouping by collectors as well.

 

Digiprove sealCopyright © 2014-2017 JavaBrahman.com, all rights reserved.