Maybe true and false are 1 and 0, but in Java they hash to 1231 and 1237

Photo by Chris Ried on Unsplash

Ask any Jeopardy! contestant what the numerical values of true and false are, and they’ll probably say: “0 and 1. I mean, what is 0 and 1? What are 1 and 0?”

It’s common knowledge that almost all computers operate in binary. This readily maps dichotomies to 0s and 1s. And so it makes perfect sense that the Boolean value true would be 1 and the Boolean value false would be 0.

But this seemingly simple question turns out to involve a lot of subtleties we take for granted every time we write an if statement or any other kind of conditional in a modern computer programming language.

The first of these subtleties is “data width.” Nowadays 64-bit processors seem to be fairly standard. Whether we’re talking about signed or unsigned integers, 64 bits seem enough to represent a lot more integers than we might need on any given day.

Even in the days of 8-bit processors it might still have seemed wasteful to spend an entire byte (8 bits) on a single Boolean value (please don’t read too much into my capitalization choice at this point).

If I understand correctly, programmers would cram as many as eight independent Boolean values into a single byte. The Boolean values would then be accessed by bitwise operations like XOR and NOT.

For example, suppose that you need to access the Boolean value of bit 5 in a flags byte. That would be the bit four bits from bit 0, the least significant bit. Since 2⁴ = 16, you could do flags bitwise AND 16. If the result is 16, the Boolean is true, but if it’s 0 it’s false.

Now suppose instead that you need your program to do something if any of the bits in a flags byte is a 1. You could do flags bitwise AND unsigned 255 (signed −1). Then this is true if the result is any nonzero number, false if it’s 0.

In fact, the American National Standards Institute (ANSI) standard for C++ specifies that false is 0 and true is any nonzero value.

In her book Portable C++, Patricia Giencke says that it’s safer to test that a Boolean is not false than to test that it is true.

Given condition declared as a Boolean, Giencke says condition == true might not always work across all C++ compilers and platforms, so it’s better to do condition != false. That seems kinda circuitous to me, pun intended.

With Java and C#, we don’t worry about something being true on one machine and false on another.

Thus most Java and C# programmers can be blissfully oblivious to the actual numerical values the Java Virtual Machine or the Common Language Runtime use to represent true and false, to say nothing of the actual underlying machine.

However, there is another context in which true and false get mapped to numerical values, and that is the context of hash codes.

When a boolean primitive is wrapped in a Boolean object, it has to have some kind of hash code (from this point forward please do read into my capitalization choices).

Specifically, a Boolean object wrapping a boolean primitive with a value of true has a hash code of 1231, while a value of 1237 corresponds to false.

I first discovered these values in the Scala REPL (I will explain Scala a bit later on). These values are no mystery, though, I could have just as easily have gotten them from the Oracle Java documentation for Boolean:

A hash code is a 32-bit signed integer that can potentially be used in a hash table (like java.util.IdentityHashMap) to hopefully uniquely identify the different objects in the hash table.

For example, a String containing “Hello, world!” hashes as −1880044555 and a String containing “Hello, World!” hashes as 1498789909 (I got these values from the Scala REPL and verified them in Scastie).

In Java, all objects inherit equals() and hashCode() from Object. They can both be overridden when necessary, and your IDE thinks overriding one requires overriding the other (unless you reconfigure the pertinent hint, of course).

Probably most Java programmers first come across hash codes when they need to implement equals() on a custom object, either because the program they’re writing needs to compare two objects, or maybe because a unit test is failing that should be passing.

So they start writing

    @Override
public boolean equals(Object obj)

and almost immediately the IDE is giving a warning that hashCode() is missing. NetBeans and IntelliJ will offer to write hashCode() for you, Eclipse probably does, too.

Some programmers probably just accept whatever the IDE writes for hashCode(). If you’re not on a tight deadline, however, it might be a good idea to scrutinize the generated function.

Better yet, write a stub yourself that will fail the first test and go from there. I think hash code functions make for an excellent exercise in test-driven development.

The most important thing about a hash code is that two objects that are said to be equal should have the same hash code. A unit test might go something like this:

    if (objectA.equals(objectB)) {
assertEquals(objectA.hashCode(), objectB.hashCode());
}

If it is practical and reasonable, two unequal objects should have unequal hash codes. The test above would be amended to:

    if (objectA.equals(objectB)) {
assertEquals(objectA.hashCode(), objectB.hashCode());
} else {
assertNotEquals
(objectA.hashCode(), objectB.hashCode());
}

Obviously the best way to insure that two equal objects have the same hash code is to use the same fields that determine equality to also determine the hash code.

Often the Fraction class is a handy, easy-to-understand example, one which I am thankful to Cay Horstmann for; he uses it quite a bit in his book Scala for the Impatient.

Two fractions are equal if their numerators are equal and their denominators are also equal.

If they’re not already in the lowest terms, we should put them in lowest terms before performing the comparison, so that, for example, the program recognizes that −7/14 = −1/2.

Also, to not over-complicate things, we should require that the denominator must always be a positive integer. To reuse the earlier example, 7/−14 would get converted first to −7/14 and then to −1/2.

Actually, the order of those steps doesn’t matter as long as our Fraction constructor takes care of them so that equals() can safely rely on the fractions already being in lowest terms and the denominator being a positive integer.

For my own project, I wrote Fraction to use 64-bit signed integers for the numerator and denominator. My implementation also includes a double to hold a numerical approximation of the fraction, which I consider to be unsuitable for use in equals().

But even if we used 32-bit signed integers for the numerator and denominator, we’d still have somewhat of a problem writing Fraction.hashCode(): the int primitive has only 2³² distinct values, but Fraction can represent at least 3 × 2³¹ − 1 different numbers.

And that’s counting only fractions with a denominator of 1, which are integers, and unit fractions (that is, fractions with a numerator of 1). Clearly Fraction can represent a lot more numbers than that.

So we need a compromise: equal fractions must still be guaranteed to get equal hash codes, and for unequal fractions we make an effort to produce distinct hash codes for the most likely scenarios but we don’t guarantee they’ll always be distinct for any pair of distinct fractions.

Then FractionTest should test several equal fractions regardless of likelihood in an actual use case, but for distinct fractions it should be limited to fractions that are likely to come up in actual use.

Our hashCode() stub to fail the first test can probably just return 0 for every fraction. Well, that might actually pass the equal fractions test, but it’s sure to fail the different fractions test.

We might then amend hashCode() to multiply the denominator by 2¹⁶ without regard for overflow, and then add the numerator modulo 2¹⁶.

Then −1/2 might hash as 131071 or 196607, depending on how we come by the numerator modulo 2¹⁶. However, −1/65538 would also hash as 131071 or 196607.

You might decide that this is just fine for what you need Fraction to do, as in, for example, if you don’t intend to put Fraction objects into a hash table, or you do intend to do that but don’t mind a small performance degradation in the unlikely event that you’ll be putting in fractions with denominators 65536 apart into the hash table.

Now let’s say that we implement GaussianFraction, which represents a number of the form a + bi, where a and b are just fractions of the kind we’re used to, and i is the principal square root of −1 (the other square root is −i).

We’re talking about numbers like −1/2 + i/2 (here b = 1/2), which is a solution to the equation 2x² + 2x + 1. GaussianFraction probably holds two Fraction objects, one for a and one for b.

Obviously we’d want the ability to check if two “Gaussian fractions” are equal, which means we’ll need hash codes as well.

It’s an even tighter squeeze into a 32-bit signed integer, but we can probably just do some arithmetic on the hash codes of a and b to get the hash code for a + bi, such as multiplying the hash code for a by the hash code for b.

None of this is an issue when implementing Boolean.hashCode(), since boolean can only have two possible values: true or false. So why not just hash them as 1 and 0?

Because some other object with a Boolean field might need to use Boolean.hashCode() for its own hash code, just as GaussianFraction.hashCode() might depend on Fraction.hashCode().

If false hashes as 0 and true hashes as 1, multiplying the hash codes of other fields by Boolean.hashCode() could be worthless. Adding Boolean.hashCode() might not be much better.

But with true hashing as 1231 and false as 1237, addition and multiplication might work a lot better towards generating unique hash codes than it would with 1 and 0.

I can’t think of a real world example in which you would actually need to hash with a Boolean, but that doesn’t mean such an example doesn’t exist.

I think that it would probably be acceptable to use a boolean primitive and then use an if statement to hash true and false to whatever numerical values make sense for your particular use case. Something like this:

    @Override
public int hashCode() {
int hash = 1;
// ...do some stuff with the hash codes of object fields...
if (this.someFlag) {
hash *= 3;
} else {
hash *= 17;
}
return hash;
}

Of course I doubt there would be any significant difference in performance between doing it this way and using Boolean.hashCode().

Rather than delay publication while hunting for a real world example, I’ll just use a toy example: Jeopardy! clues.

I’m guessing the writers probably just write the clues in Microsoft Word, though they might have a custom template. Microsoft Excel would also make sense.

As for the mechanism to get the clues from the writers onto the clue screens, the video editing software and the official clue archive, I’m guessing a C or C++ program, though JavaScript is also plausible (insert here the standard explanation of why JavaScript is very different from Java).

As you know, each Jeopardy! clue has a dollar amount an answering player can win or lose; but if it’s a Daily Double, the player may wager a different amount.

Our JeopardyClue object might have the following fields with the appropriate getters and setters: String category, short dollarAmount (assuming players must always wager whole dollar amounts), Boolean dailyDoubleFlag, String clue, String correctResponse, Date intendedAirDate.

Presumably the hash codes need only be distinct among the clues for a particular intended air date. Multiplying the hash code for the category by the hash code for the dollar amount might give suitable results.

I don’t see why the Daily Double flag hash code would need to figure into the clue hash code calculation, but since this is a toy example, let’s just say that it has to.

Here is the $400 clue in the People of the World category on the Jeopardy! game first aired June 14, 2018: “Javanese are the largest ethnic group of this nation.” “What is Indonesia?” Diana Hsu answered correctly.

This one might hash as −2128310992 without multiplying by dailyDoubleFlag.hashCode(), 94255344 with.

Note that since 2128310992 is a 30-bit number, multiplying −2128310992 by a 10-bit number causes an overflow which results in a positive integer, rather than −2632720697104, the absolute value of which is a 41-bit number.

The $600 clue in the same category was a Daily Double, Hsu wagered $1,200: “Afrikaners were once called by this name meaning ‘farmer’.” “Who are the Boer?” Or maybe she answered “What is Boer?” Either way, that’s correct.

This one might hash as 1102500808 without multiplying by dailyDoubleFlag.hashCode(), −31170888 with.

That’s probably enough for you to get the idea. If you care to do this exercise for the whole June 14, 2018 game, an unofficial game transcript is available from J-Archive.com.

Side bar: hash codes in Scala

Everything is an object in Scala: objects are objects, primitives are objects and functions are objects. And since Scala is built on top of Java, objects have hash codes, and primitives and functions also have hash codes. For example:

scala> (Math.abs(_: Double)).hashCode
res15: Int = 2099559808
scala> (Math.abs(_: Int)).hashCode
res16: Int = 284101091

You will most likely get different results within even a single Scala REPL session… I haven’t been able to replicate these results in Scastie, but I’ve been able to replicate results for some “primitives.”

Everything being an object in Scala means that the Scala REPL is a great way to play around with the hash codes and other standard properties of standard Java classes as well as standard properties of Java and/or Scala classes that you create.

In general I find it preferable to use the REPL for these sorts of explorations. To have to start a project in an IDE just to write some println statements, or even writing a source file in Notepad and compiling it on the command line feels like overkill for one-off things like this.

For the hash codes of Java primitives in the Scala REPL, it’s probably best to get their hash codes “as primitives” rather than trying to instantiate a wrapper.

scala> 7.hashCode
res16: Int = 7
scala> -7.hashCode
res17: Int = -7
scala> Math.PI.hashCode
res18: Int = 340593891
scala> true.hashCode
res19: Int = 1231
scala> false.hashCode
res20: Int = 1237

These results should be consistent across all Scala REPL sessions as long as Oracle doesn’t change the relevant implementations of hashCode().

If you’re really curious about what an object’s hash code would be if the class didn’t override Object.hashCode(), you can always do System.identityHashCode().

scala> System.identityHashCode(res16)
res21: Int = 224237360
scala> System.identityHashCode(7)
res22: Int = 224237360
scala> System.identityHashCode(res17)
res23: Int = 1098912468
scala> System.identityHashCode(-7)
res24: Int = 1098912468
scala> System.identityHashCode(res18)
res25: Int = 773074354
scala> System.identityHashCode(Math.PI)
res26: Int = 949389896
scala> System.identityHashCode(res19)
res27: Int = 1469015613
scala> System.identityHashCode(1231)
res28: Int = 827221909
scala> System.identityHashCode(res20)
res29: Int = 1249746110
scala> System.identityHashCode(1237)
res30: Int = 1576743136
scala> System.identityHashCode(7)
res31: Int = 224237360
scala> System.identityHashCode(-7)
res32: Int = 1098912468

I suppose these won’t be consistent across REPL instances. Nevertheless, I was surprised by how consistent the system hash codes for integers are.

Scala has some classes, like RichInt, that expand the capabilities of certain Java standard types. I wonder if those override the Java hash codes? That might be a discussion for another day.

But by now you might be feeling hungry for some reason.

I think there are hash browns in the background. Photo by Gabriel Gurrola on Unsplash

is a composer and photographer from Detroit, Michigan. He has been working on a Java program to display certain mathematical diagrams.