Understanding equals and hashcode contract when using collections in Java

This tutorial explains the contract defined for overriding java.lang.Object’s equals() and hashcode() methods and its importance when working with objects in a Java collection. It is this contract which lays down exactly how the two methods need to work together to allow objects to be correctly added and removed in Java collections.

Tutorial starts off with creating a HashSet instance and stores POJOs of Employee class in it. It first shows how, in the absence of overridden in-agreement hashCode() and equals() methods, it’s simply not possible to accurately remove an Object from the HashSet based on the uniqueness of the attribute values of the object.

The tutorial then implements equals() and hashCode() methods for Employee, along with explaining the mechanism of hashing based on which hashed collections store objects in memory, to show a correctly working HashSet instance. Lastly, it goes through the official contract for equals() and hashCode() methods defined in Object class.

Example showing HashSet being used without equals() and hashCode() overridden
Let us first take a look at the code for Employee POJO with 2 attributes name & age, and another set of code which uses the Employee as the type for a HashSet.

Code showing HashSet usage without overriding equals() & hashCode()
//Employee.java POJO class
package com.javabrahman.corejava;
public class Employee {
  private String name;
  private Integer age;

  public Employee(String name, Integer age) {
    this.name = name;
    this.age = age;
  }

  //Getters and Setters for name and age go here

  public String toString() {
    return "Employee Name:" + this.name
        + "  Age:" + this.age;
  }
}

//Class using Employee POJO in a HashSet
package com.javabrahman.corejava;
import java.util.HashSet;
public class EqualsNHashCode {
  public static void main(String args[]){
    HashSet<Employee> empHashSet=new HashSet<>();
    empHashSet.add(new Employee("David", 32));
    empHashSet.add(new Employee("John", 45));
    empHashSet.forEach(System.out::println);
    boolean isRemoved=empHashSet.remove(new Employee("David", 32));
    System.out.println("David is removed: "+isRemoved);
    empHashSet.forEach(System.out::println);
  }
}
 OUTPUT of the above code
Employee Name:David Age:32
Employee Name:John Age:45
David is removed: false
Employee Name:David Age:32
Employee Name:John Age:45
In the above code, we create 2 Employee instances named David and John. We then print the 2 employees using a forEach() method. Next, we remove David from the HashSet using a newly instantiated Employee object which has the same name and age – ‘David’, 32.

We then check whether David was removed. false is returned by the HashSet.remove() method indicating that David wasn’t removed from the HashSet. This is confirmed when we print the employees again using forEach() method, resulting in David and John both getting printed. So – What went wrong? Why was david not removed from the hashset?

What went wrong?
We tried to remove David by passing another instance of Employee with same name and age. HashSet internally uses Object.equals() to find the matching object it is trying to remove. Since equals() is not overridden in Employee class, equals() implementation being used by the HashSet is the default one defined in Object class which compares the memory references of the two objects its equating. As the Employee object holding David in the HashSet and the Employee object passed to the HashSet.remove() method are different, their memory adddresses will be different, and hence they are bound to fail the equals() test. Let us confirm this understanding by modifying our earlier code and adding some debugging code to it –

HashSet is unable to equate Employee objects for removal
package com.javabrahman.corejava;
import java.util.HashSet;
public class EqualsNHashCode {
  public static void main(String args[]){
    HashSet<Employee> empHashSet=new HashSet<>();
    Employee David = new Employee("David", 32);
    Employee John = new Employee("John", 45);
    empHashSet.add(David);
    empHashSet.add(John);
    empHashSet.forEach(System.out::println);
    Employee DavidForRemoval=new Employee("David", 32);
    System.out.println("----David's Objects are equal? "+David.equals(DavidForRemoval));
    boolean isRemoved=empHashSet.remove(DavidForRemoval);
    System.out.println("David is removed: "+isRemoved);
    empHashSet.forEach(System.out::println);
  }
}
 OUTPUT of the above code
Employee Name:David Age:32
Employee Name:John Age:45
—-David’s Objects are equal? false
David is removed: false
Employee Name:David Age:32
Employee Name:John Age:45
In the above output, you can see that the two objects we created – David(which we stored in HashSet) and DavidForRemoval which we passed to the HashSet.remove() method are indeed not equal. To solve this problem let us now go ahead and override equals() method in Employee class.
equals() code added to Employee.java
  @Override
  public boolean equals(Object obj) {
    if (obj == this) {
      return true;
    }
    if (!(obj instanceof Employee)) {
      return false;
    }
    Employee empObj = (Employee) obj;
    return this.age == empObj.age
        && this.name.equalsIgnoreCase(empObj.name);
  }
Let us now run the earlier code again and the output this time, with hashCode() implemented, is as follows –
Employee Name:David Age:32
Employee Name:John Age:45
—-David’s Objects are equal? true
David is removed: false
Employee Name:David Age:32
Employee Name:John Age:45
In the above output, the 2 Employee instances for David named David and DavidForRemoval are now found ‘equal’ using the equals() method. However, still the remove call for David didn’t work. As you might have guessed by now, the reason for non-removal of David lies in the fact that their hashcodes don’t match. To verify this understanding let me quickly show you what is printed when we print the hashcodes of David and DavidForRemoval.
Checking the default hashcode values for Employee objects
//rest of the code is same as earlier
System.out.println("----David's hashcode: "+David.hashCode());
System.out.println("----John's hashcode: "+John.hashCode());
System.out.println("----DavidForRemoval's hashcode: "+DavidForRemoval.hashCode());
 OUTPUT of the above code
—-David’s hashcode: 685325104
—-John’s hashcode: 460141958
—-DavidForRemoval’s hashcode: 1078694789
The above output confirms what we had deduced – hashcodes for variable David(685325104) and variable DavidForRemoval(1078694789) are different. So, the problem at hand now can be summarized like this – HashSet ends up searching for the stored David object in the wrong bucket of the hashtable and so is unable to find and remove it, despite the two objects being equal (using equals() i.e.).

To understand the problem with hashcodes, we will first need to understand a little bit about hashing, and the role played by hashcodes in hashing.

Understanding Hashing

How hashing stores objects in Hashed Collections in Java

Hashing, as you can see in the diagram above, is a mechanism used to distribute data into multiple buckets inside a hashed collection, with each bucket identified by a hashcode. So, the hashcode returned by an object determines which bucket the object is stored in.

When searching for an object to be removed using an ‘equal but different object’, such as David and DavidForRemoval in our case, HashSet needs to search for an object which is equal to DavidForRemoval in the same bucket in which David was stored. Otherwise, as in our case, with the hashcodes of the 2 objects being different, HashSet ends up searching in the wrong bucket and is unable to find the object. So in our case, HashSet was searching in the bucket identified by DavidForRemoval’s hashcode(1078694789) while David was stored in a bucket identified by a different hashcode(685325104). The search was destined to fail.

So, let us now override hashCode() method as well in Employee in order to get the same hashcode value for equal objects i.e. objects with same value of attributes.

Overriden hashCode() in Employee.java
  @Override
  public int hashCode() {
    int hash = 1;
    hash = hash * 17 + this.name.hashCode();
    hash = hash * 31 + this.age;
    return hash;
  }
Let us now run the earlier code again, the one with all the debugging code. The output comes out to be as follows –
—-David’s hashcode: 2039983707
—-John’s hashcode: 71751281
Employee Name:John Age:45
Employee Name:David Age:32
—-DavidForRemoval’s hashcode: 2039983707
—-David’s Objects are equal? true
David is removed: true
Employee Name:John Age:45
This time David was removed from the HashSet succesfully. The hashcode for David and DavidForRemoval was same this time – 2039983707. As a result, HashSet was able to search in the correct bucket for David’s Employee instance. And, given that the two objects could be equated correctly using equals(), HashSet was able to find David’s originally stored object, which was equal to DavidForRemoval, and remove it.

Having seen hashCode() and equals() methods in action, and how their correct implementation together is a basic requirement when working with hashed collections, let us now take a look at the ‘official’ contract defined in Object class for these methods.

Contracts of equals() and hashCode() as defined in Object API
Contract of equals() method
(Note – The rules below are applicable for non-null reference values x and y.)

  • It is reflexive. x.equals(x) should return true.
  • It is symmetric.x.equals(y) should return true if and only if y.equals(x) returns true.
  • It is transitive.If x.equals(y) returns true and y.equals(z) returns true, then x.equals(z) should return true.
  • It is consistent. Multiple invocations of x.equals(y) consistently return true or consistently return false.
  • For any non-null reference x, x.equals(null) should return false.

Contract of hashCode() method

  • On every invocation, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. The hashcode value need not be same for different execution environments.
  • If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
  • It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results.

(Note – The contracts above are simplified versions of the ones specified in official Object API1).

Key takeaways from the contracts of equals() and hashCode() methods

  1. Hashcodes for equal objects(which return true when compared using equals()) should be same.
  2. Hashcodes for unequal objects can be same. I.e. there is a one-to-many relationship between hashcodes and objects. (This is in line with the concept of hashing we saw above- as a particular hashcode can have multiple objects in its bucket).

Summary
In the above tutorial we understood the key concepts driving equals() and hashCode() usage in hashed collections in Java. We took an example in which equals() and hashCode() weren’t agreeing to see how object access and removal wouldn’t work properly in such cases. We then implemented the two methods in the POJO class we were storing in the HashSet as per their defined contract, and saw that object access and removal did work fine with the change.


1. Official Java API for Object class

 

Digiprove sealCopyright © 2014-2017 JavaBrahman.com, all rights reserved.