Some background on equality and hash code overrides in Java

Photo by Christopher Williams on Unsplash

With just a couple of clicks, your favorite Java integrated development environment (IDE) can generate equals() and hashCode() overrides for your Java class. But do you understand what is generated, line by line, and why?

I understood most of what NetBeans generates for equals(), but there was one detail in that had me puzzled. I was going to ask a question about it on one of those question forums.

But, as I typed my question, I figured out the answer, and decided to write it up as an article here instead. I could also have asked the question on the NetBeans mailing list, but it’s also applicable to the other major Java IDEs.

I think I should go over some very basic background information that would have been considered unnecessary on the question forum. But then the article turned out to be so long, I broke it up into three articles.

In this first article, I’m just going to give background information. In the next article, I will look at generated hashCode() overrides, line by line. And in another article, I will look at generated equals() overrides, also line by line.

Some very basic background information

In Java, Object is the top-level superclass of all other classes, whether those classes come from the Java Development Kit (JDK), from you, or from a third party developer.

Thus there is never any need to write “extends Object” in a class declaration, but you can if you really want to (your IDE might issue a warning, though).

The Object class includes equals() and hashCode(), and other functions that we’re not interested in at the moment. Therefore, all classes in Java are guaranteed to have equals() and hashCode().

With equals(), we can see if one object is equal to another. And hashCode() just gives a number that is hopefully unique, or at least unique enough to distinguish one object from other objects in the currently running program.

In particular, hash codes are useful to hash table data structures like HashMap. Such data structures generally provide fast access to elements, thanks to a small cost in overhead.

It is my understanding that hash table data structures use numbered buckets to store their elements. Then, to see if a particular element is contained, or to retrieve it, the program needs to look in only one bucket instead of potentially having to look in every bucket.

But how does the hash table program know what bucket to put an element in? One way out of a few different possibilities would be to use the element’s hash code modulo the number of buckets.

Suppose for example we want to add two objects to a hash table with 128 buckets. One element has hash code 148, the other 1212436. Divided by 128, both of those numbers leave a remainder of 20. We could write 148 ≡ 20 mod 128, and 1212436 ≡ 20 mod 128.

Then both elements get put in bucket 20. Now, what bucket should an element with hash code −20 go into? I think it should go into bucket 108, not bucket 20. But such implementation details should be encapsulated so that we needn’t be too concerned by them.

It is important to understand that in most cases the hash code is not a two-way function, meaning that even if you have a hash code and know the type of the object that generated the hash code, it’s usually not possible to reconstruct the original object from just the hash code.

In Java, hashCode() is of type int, a 32-bit signed integer primitive. This means there are 2³² possible hash codes, but for some types there are more than 2³² possible objects.

For such types, unique hash codes are impossible. We have to settle for hash codes that are unique enough for our actual use cases.

A simple example: String. If we iterate Integer.toString(n) for n from Integer.MIN_VALUE to Integer.MAX_VALUE, we would have created 2³² distinct instances of String, none of which contain any letters or other non-digit characters apart from the dash as a minus sign.

Not that we’d have any practical need to do that. The important thing is that the String hash code function should give us hash codes that are likely to be unique for the instances of String we need to store in a data structure.

There are some types for which it is possible to guarantee hash code uniqueness, like for example the object wrapper Integer for the primitive int: the hash code is just the unboxed value.

Because hash code uniqueness can’t be guaranteed for most types, it may be necessary to do an equality check upon finding an object with a matching hash code, to be certain the object has been identified correctly.

The performance cost of an equality check on a single element should be negligible compared to the performance cost of equality checks on many or all elements of a given collection.

To determine if one object is equal to another, it may be necessary to compare them field by field. Determining the hash code of an object might also involve doing something with each field an object has.

However, the equals() and hashCode() in Object can’t really do anything with the properties that are specific to any of its subclasses. They should not be expected to, they were never meant to.

Because the equals() in Object might not suit the needs of a subclass, you can override that equals() by defining your ownequals() in your class. Your equals() override can use the pertinent fields of the subclass to determine the correct result. Much the same goes for hashCode().

Here’s a toy example of a hash code override for a class called SomeSubClass, with just two primitive numeric fields:

    @Override
public int hashCode() {
return this.fieldA * this.fieldB;
}

The override annotation is not strictly required, but the compiler will give a warning if you leave it out, and so should the IDE.

Maybe the designers of Java wish they had made “override” a reserved word. In both Scala and Kotlin, “override” is reserved by the language and must be used whenever applicable.

At runtime, when the Java Virtual Machine (JVM) needs the hash code for an object of class SomeSubClass and that class has hashCode() overridden, that’s the hashCode() function the JVM will use.

Let’s say instead SomeSubClass doesn’t override hashCode(), but it extends SomeClass, which does have a hashCode() override. Then the JVM will use hashCode() from SomeClass.

But if SomeClass, a direct descendant of Object (explicitly or implicitly), doesn’t have hashCode() either, then the JVM will use the hashCode() function from Object.

For a somewhat more realistic example, consider an Account class to represent bank accounts. This exampleAccount class is extended by a CheckingAccount class and a SavingsAccount class.

You probably want to know if two accounts are the same account in order to avoid pointless transactions, like transferring money from one account to the very same exact account.

Maybe hashCode() doesn’t seem all that useful in the Account example, since presumably each account has a unique account number. But a hash code can be useful for storing an object in a data structure. So it might not be too much trouble to simply use the account number as a hash code.

The expectations for hashCode() are closely related to the expectations for equals(). For one thing, objects said to be equal should have the same hash code. And objects of the same class that are said to be different should have different hash codes.

If you see a need for equals() but not hashCode(), you may think it efficient to only override equals() and rely on a superclass hashCode(). But that would probably frustrate the expectations users of your class have for the equality and hash code correlation.

A set of expectations is often termed a “contract.” That, as a matter of fact, is the terminology used in the Javadoc for Object.

In order to uphold the contract for equals() and hashCode(), if you override one, you have to override the other one also.

However, it may happen that an object of one class gets the same hash code as an object of a different class, even though they are said to be unequal. Most likely that won’t be a problem.

Another expectation is consistency. If we get the hash code of an object at one point during execution, and then get the hash code of the object again later on during the same run of the program, but the object has not changed in any way, the hash code should be the same as before.

Whether you want to override equals() and hashCode() in Account or in CheckingAccount or SavingsAccount is your call. Your decision will depend in great part on how you design these classes.

Even if Account is an abstract class, it might make sense to override equals() and hashCode() in it rather than in CheckingAccount or SavingsAccount. This is a point I’ll come back to later.

Having the IDE write the overrides for you

If you’re working on the source for the class in a plaintext editor like Notepad or Vim rather than in an IDE, you’ll have to figure out what the steps are and what order to do them in to properly override hashCode() and equals().

But in an IDE, you can invoke a dialog box, select the fields you want to use for equals() and hashCode() and then let the IDE write the whole thing for you. The procedure is slightly different in the three major IDEs:

  • In Eclipse 2019–09 R, use the menu Source > Generate hashCode() and equals()… to invoke the dialog box (there might be other ways to get to the dialog box). Then, in the dialog box, choose the fields you want. Eclipse gives some options for source placement and also a couple of very consequential details I’ll cover in Part 2 and Part 3 of this article.
  • In NetBeans 11.1, use the menu Source > Insert Code… to bring up the Generate dialog box (there’s one or two other ways to get to this one). From the Generate dialog box, choose equals() and hashCode()… to bring up a dialog box in which to choose the fields to use.
  • In IntelliJ 2019.1.1, use the menu Code > Generate… to bring up the Generate dialog box (there’s one or two other ways to get to this one). From the Generate dialog box, choose equals() and hashCode() to bring up a wizard that asks a couple of things before asking you to choose the fields to use.

Caveat: I don’t expect the above to change in any significant way, but I can’t guarantee they’ll stay exactly the same either.

Even with the chosen fields named the same, these three IDEs will generate equals() and hashCode() differently. But most of the differences are likely to be cosmetic and inconsequential.

Choosing the relevant fields

In the bank accounts example, it might be enough to use just the account number field, since presumably there’s some mechanism to ensure each account gets a unique account number.

If each customer can have more than one account, looking at the customer name field would not tell us much, in which case we should leave it out of equals() and hashCode().

It would also be a good idea to leave out the balance field, since that’s probably going to be changing all the time, and will occasionally coincide with the balance of other accounts.

Once you have chosen the fields to base equals() and hashCode() on in the dialog box of an IDE wizard (and other options as applicable), the IDE writes the whole thing for you.

It’s a good idea to review the generated functions. Most people, though, probably just accept what was generated without fully understanding each line, or even glancing it over.

Under the pressure of a deadline, one doesn’t really have time to wonder why an IDE automatically generates equals() and hashCode() the way it does, and not some other way.

But, as part of professional development as a programmer, one should take a moment to learn about what was generated and understand why it was generated the way it was.

An example class: money amounts

I could continue using the bank account example for this article, but there’s the problem that Java doesn’t yet have a standard way for representing money. There actually are competing proposals.

Money involves numbers, specifically real, rational numbers, which offer quite natural examples of equality and inequality.

Also, if the Account hash code is just the account number, it’s not such a good illustration of hash codes for classes needing two or more fields to come up with the hash codes.

An Account object should probably have a transaction history object. Or maybe it should have two transaction history objects, one for pending transactions and the other for completed transactions.

Then maybe a transaction history object would be a data structure that holds Transaction objects. I’m thinking Transaction would be an abstract class that essentially has two fields: transaction time and transaction amount.

It would make sense for the hash code of a Transaction object to be based on the hash codes for the transaction time object and the transaction amount object. I’ll show this “basing” of hash codes later on.

Maybe deposits (represented by the Deposit class that extends Transaction) would be transactions with positive amounts. And maybe withdrawals (represented by the Withdrawal class, which of course also extends Transaction) would be transactions with negative amounts.

This is so that the account balance can be updated simply by using the plus() function in CurrencyAmount. Deposits increase the balance, withdrawals decrease the balance and comments have no effect.

These requirements would be enforced by the constructors, but they require the money amount class to have some way of comparing money amounts as greater than or less than rather than just equal or not equal.

The standard way would be to have the money amount class T implement the interface Comparable<T> and then the compareTo(T other) function. Maybe I’ll write a separate article about that.

The money amount class will also need arithmetic functions, so that our program can add and subtract money amounts, and calculate percentages and permillages. That’s also outside the scope of this article.

However, the money amount class should be immutable, so that a money calculation always results in a new money amount object, even if the amount is the same (e.g., add zero currency units, multiply amount by 1).

In some banking systems there are also comment transactions, which don’t affect any account balances but provide information about transactions that do affect account balances.

In our object-oriented model, comment transactions would probably be implemented as a Comment class that extends Transaction.

An instance of Comment would then always have a transaction amount of zero currency units, so the constructor would only need a comment message (probably a String) and a comment time.

Then perhaps Comment should override the Transaction hash code and equality to be based on the comment message object and the comment time object, and not the transaction amount, which is of course always 0.00.

I’m thinking it might actually be a good idea to use money for this example. What we come up with here won’t be a serious contender for becoming the official Java money API, but it will be of great pedagogical value.

Java does define java.util.Currency, which provides us with a standard way to distinguish the various currencies, e.g., United States dollars, Japanese yen, euros, Swiss francs, etc. So you’ll want to import that into our money class.

We’ll call our money amount classCurrencyAmount and place it in a package called currency. It will import Currency.

package currency;import java.util.Currency;public class CurrencyAmount {    private final long units;
private final short cents;
private final Currency currencyID;
public CurrencyAmount(long singles, short cents,
Currency currency) {
this.units = singles;
this.cents = cents;
this.currencyID = currency;
}
}

You might also want to write a chained constructor that omits the cents parameters and fills it in as 0 for the primary constructor.

Also, you might want to design the primary constructor so that it quietly changes cent overflows to whole units, e.g., 10 dollars and 250 cents gets quietly changed to 12 dollars and 50 cents.

Have your IDE generate a JUnit 4 or JUnit 5 test class. I’ll do most of the testing with United States dollars and some other country’s currency. You can choose to use the same currencies or any other currencies.

package currency;import java.util.Currency;import static org.junit.Assert.*;public class CurrencyAmountTest {    private static final Currency DOLLARS =
Currency.getInstance("USD");
// private static final Currency YEN? EUROS? FRANCS? ETC.
}

Also create a Transaction class and a couple of associated subclasses, like Deposit and Withdrawal, but place them in a package called transactions or bankaccounts.transactions.

But don’t spend too much time on these, just create the necessary constructors and import the necessary classes (our CurrencyAmount class and a standard Java class for date and time).

The Java date and time API has its share of shortcomings, but for our purposes here it will do just fine.

And back in the currency package, create the runtime (not checked) exceptionCurrencyConversionNeededException, to throw in case someone tries to do something like deposit yen to an account drawn in francs.

Exceptions don’t generally need custom equality nor hash codes. But, to properly use CurrencyConversionNeededException, we’re relying on Currency to have equals() properly overridden so our program can tell United States dollars apart from Canadian dollars, for example.

Also, we’re going to need CurrencyAmount to override toString().

    @Override
public String toString() {
return this.currencyID.getSymbol() + this.units + "."
+ this.cents;
}

This leaves a lot to be desired, but it’ll be good enough for what we need it to do. Transaction also needs a toString() override.

    @Override
public String toString() {
return "Transaction " + this.amount.toString() + " on "
+ this.dateTime.toString();
}

Or maybe each of the concrete subclasses should individually override toString(), which might then perhaps render toString() in Transaction unnecessary. That’s not needed for this series of articles, though.

Another example class: complex numbers

Money is so mundane. I would rather use for an example something a little loftier, like complex numbers, which figure in astronomy.

But more people know about money than know about complex numbers. It’s therefore necessary to explain things about complex numbers that go without saying for money amounts.

Despite the name, complex numbers are, in some ways, quite simple. Complex numbers don’t have an obvious natural ordering, but two complex number variables can still be said to be equal or not.

So therefore the ComplexNumber class should definitely not implement Comparable<ComplexNumber>, but it should definitely override equals(). And of course also hashCode().

The ComplexNumber class will use floating point rational numbers to represents a small subset of all possible complex numbers. Specifically, two 64-bit double primitives. Not that different from the Complex struct in C#.

A complex number has a “real” part and an “imaginary” part. Two examples:

  • −1.2 + 0.8i is a complex number with a real part −1.2 and imaginary part 0.8i; and i is the imaginary unit, one of two numbers such that its square is −1 (the other is −i).
  • 1.2 − 0.8j is a complex number with real part 1.2 and imaginary part −0.8j, and j is just a different symbol for the imaginary unit which electrical engineers seem to prefer.

By the way, neither of these numbers is in the Mandelbrot set. I’m not going to go too in depth on the math of complex numbers here, other than what pertains to equality testing.

I refer you to a couple of Medium articles for more background on complex numbers: Brett Berry’s basic tutorial, and my own article on the subject (though in that one I’m more concerned with algebraic rather than numeric representations of complex numbers).

Complex numbers have many applications in physics and engineering. Both Fortran and C# come with a complex number data type right out the box. But Java doesn’t. At least not the JDK, so maybe you have to write one yourself.

The ComplexNumber class should probably be immutable, and it should have these two fields:

        private final double real;
private final double imag;

And also the appropriate constructor and getters.

I will place ComplexNumber in a package called numerics, I suggest you do the same. Then the file path for the source file will be something along the lines of src/numerics/ComplexNumber.java. This will be significant later on.

The expectation for equals() is that it will return true for two instances of ComplexNumber that represent the same number, e.g., after these lines,

        ComplexNumber someNumber = new ComplexNumber(1.0, 1.0);
ComplexNumber sameNumber = new ComplexNumber(1.0, 1.0);
boolean flag = someNumber.equals(sameNumber);

flag should be true. But that’s not going to be the case at this point if all we have is a constructor and a couple of getters.

In the next part

I will look at generated hashCode() for CurrencyAmount and Transaction in the next part. And in the part after that, I will look at the generated equals() for ComplexNumber, but the one for CurrencyAmount will also figure in there.

is a composer and photographer from Detroit, Michigan. He has been working on a Java program to display certain mathematical diagrams.