This tutorial explains the contract defined for overriding
hashcode() methods and its importance when working with objects in a Java collection. It is this contract which lays down exactly how the two methods need to work together to allow objects to be correctly added and removed in Java collections.
Tutorial starts off with creating a
HashSet instance and stores POJOs of
Employee class in it. It first shows how, in the absence of overridden in-agreement
equals() methods, it’s simply not possible to accurately remove an
Object from the
HashSet based on the uniqueness of the attribute values of the object.
The tutorial then implements
hashCode() methods for
Employee, along with explaining the mechanism of hashing based on which hashed collections store objects in memory, to show a correctly working
HashSet instance. Lastly, it goes through the official contract for
hashCode() methods defined in
HashSet being used without
Let us first take a look at the code for
Employee POJO with 2 attributes
age, and another set of code which uses the
Employee as the type for a
John. We then print the 2 employees using a
forEach()method. Next, we remove
HashSetusing a newly instantiated
Employeeobject which has the same name and age – ‘
We then check whether
David was removed.
false is returned by the
HashSet.remove() method indicating that
David wasn’t removed from the
HashSet. This is confirmed when we print the employees again using
forEach() method, resulting in David and John both getting printed. So – What went wrong? Why was david not removed from the hashset?
What went wrong?
We tried to remove
David by passing another instance of
Employee with same name and age.
HashSet internally uses
Object.equals() to find the matching object it is trying to remove. Since
equals() is not overridden in
equals() implementation being used by the
HashSet is the default one defined in
Object class which compares the memory references of the two objects its equating. As the
Employee object holding David in the
HashSet and the
Employee object passed to the
HashSet.remove() method are different, their memory adddresses will be different, and hence they are bound to fail the
equals() test. Let us confirm this understanding by modifying our earlier code and adding some debugging code to it –
David(which we stored in
DavidForRemovalwhich we passed to the
HashSet.remove()method are indeed not equal. To solve this problem let us now go ahead and override
Let us now run the earlier code again and the output this time, with
hashCode()implemented, is as follows –
In the above output, the 2
Employeeinstances for David named
DavidForRemovalare now found ‘equal’ using the
equals()method. However, still the remove call for David didn’t work. As you might have guessed by now, the reason for non-removal of David lies in the fact that their hashcodes don’t match. To verify this understanding let me quickly show you what is printed when we print the hashcodes of
OUTPUT of the above code The above output confirms what we had deduced – hashcodes for variable
685325104) and variable
1078694789) are different. So, the problem at hand now can be summarized like this –
HashSetends up searching for the stored
Davidobject in the wrong bucket of the hashtable and so is unable to find and remove it, despite the two objects being equal (using
To understand the problem with hashcodes, we will first need to understand a little bit about hashing, and the role played by hashcodes in hashing.
Hashing, as you can see in the diagram above, is a mechanism used to distribute data into multiple buckets inside a hashed collection, with each bucket identified by a hashcode. So, the hashcode returned by an object determines which bucket the object is stored in.
When searching for an object to be removed using an ‘equal but different object’, such as
DavidForRemoval in our case,
HashSet needs to search for an object which is equal to
DavidForRemoval in the same bucket in which
David was stored. Otherwise, as in our case, with the hashcodes of the 2 objects being different,
HashSet ends up searching in the wrong bucket and is unable to find the object. So in our case, HashSet was searching in the bucket identified by
David was stored in a bucket identified by a different hashcode(
685325104). The search was destined to fail.
So, let us now override
hashCode() method as well in
Employee in order to get the same hashcode value for equal objects i.e. objects with same value of attributes.
Davidwas removed from the
HashSetsuccesfully. The hashcode for
DavidForRemovalwas same this time –
2039983707. As a result,
HashSetwas able to search in the correct bucket for David’s
Employeeinstance. And, given that the two objects could be equated correctly using
HashSetwas able to find David’s originally stored object, which was equal to
DavidForRemoval, and remove it.
equals() methods in action, and how their correct implementation together is a basic requirement when working with hashed collections, let us now take a look at the ‘official’ contract defined in
Object class for these methods.
hashCode() as defined in Object API
(Note – The rules below are applicable for non-null reference values x and y.)
- It is reflexive. x.equals(x) should return true.
- It is symmetric.x.equals(y) should return true if and only if y.equals(x) returns true.
- It is transitive.If x.equals(y) returns true and y.equals(z) returns true, then x.equals(z) should return true.
- It is consistent. Multiple invocations of x.equals(y) consistently return true or consistently return false.
- For any non-null reference x, x.equals(null) should return false.
- On every invocation, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. The hashcode value need not be same for different execution environments.
- If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
- It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results.
(Note – The contracts above are simplified versions of the ones specified in official Object API1).
Key takeaways from the contracts of
- Hashcodes for equal objects(which return true when compared using equals()) should be same.
- Hashcodes for unequal objects can be same. I.e. there is a one-to-many relationship between hashcodes and objects. (This is in line with the concept of hashing we saw above- as a particular hashcode can have multiple objects in its bucket).
In the above tutorial we understood the key concepts driving
hashCode() usage in hashed collections in Java. We took an example in which
hashCode() weren’t agreeing to see how object access and removal wouldn’t work properly in such cases. We then implemented the two methods in the POJO class we were storing in the
HashSet as per their defined contract, and saw that object access and removal did work fine with the change.