# Maybe true and false are 1 and 0, but in Java they hash to 1231 and 1237

Ask any *Jeopardy!* contestant what the numerical values of `true`

and `false`

are, and they’ll probably say: “0 and 1. I mean, what is 0 and 1? What are 1 and 0?”

It’s common knowledge that almost all computers operate in binary. This readily maps dichotomies to 0s and 1s. And so it makes perfect sense that the Boolean value `true`

would be 1 and the Boolean value `false`

would be 0.

But this seemingly simple question turns out to involve a lot of subtleties we take for granted every time we write an if statement or any other kind of conditional in a modern computer programming language.

The first of these subtleties is “data width.” Nowadays 64-bit processors seem to be fairly standard. Whether we’re talking about signed or unsigned integers, 64 bits seem enough to represent a lot more integers than we might need on any given day.

Even in the days of 8-bit processors it might still have seemed wasteful to spend an entire byte (8 bits) on a single Boolean value (please don’t read too much into my capitalization choice at this point).

If I understand correctly, programmers would cram as many as eight independent Boolean values into a single byte. The Boolean values would then be accessed by bitwise operations like XOR and NOT.

For example, suppose that you need to access the Boolean value of bit 5 in a flags byte. That would be the bit four bits from bit 0, the least significant bit. Since 2⁴ = 16, you could do flags bitwise AND 16. If the result is 16, the Boolean is `true`

, but if it’s 0 it’s `false`

.

Now suppose instead that you need your program to do something if *any* of the bits in a flags byte is a 1. You could do flags bitwise AND unsigned 255 (signed −1). Then this is `true`

if the result is any nonzero number, `false`

if it’s 0.

In fact, the American National Standards Institute (ANSI) standard for C++ specifies that `false`

is 0 and `true`

is any nonzero value.

In her book *Portable C++*, Patricia Giencke says that it’s safer to test that a Boolean is not `false`

than to test that it is `true`

.

Given `condition`

declared as a Boolean, Giencke says `condition == true`

might not always work across all C++ compilers and platforms, so it’s better to do `condition != false`

. That seems kinda circuitous to me, pun intended.

With Java and C#, we don’t worry about something being `true`

on one machine and `false`

on another.

Thus most Java and C# programmers can be blissfully oblivious to the actual numerical values the Java Virtual Machine or the Common Language Runtime use to represent `true`

and `false`

, to say nothing of the actual underlying machine.

However, there is another context in which `true`

and `false`

get mapped to numerical values, and that is the context of hash codes.

When a `boolean`

primitive is wrapped in a `Boolean`

object, it has to have some kind of hash code (from this point forward please do read into my capitalization choices).

Specifically, a `Boolean`

object wrapping a `boolean`

primitive with a value of `true`

has a hash code of 1231, while a value of 1237 corresponds to `false`

.

I first discovered these values in the Scala REPL (I will explain Scala a bit later on). These values are no mystery, though, I could have just as easily have gotten them from the Oracle Java documentation for `Boolean`

:

hashCode

`public int hashCode()`

Returns a hash code for this

`Boolean`

object.Overrides:

`hashCode`

in class`Object`

Returns: the integer 1231 if this object represents`true`

; returns the integer 1237 if this object represents`false`

.

See Also:`Object.equals(java.lang.Object)`

,`System.identityHashCode(java.lang.Object)`

A hash code is a 32-bit signed integer that can potentially be used in a hash table (like `java.util.IdentityHashMap`

) to hopefully uniquely identify the different objects in the hash table.

For example, a `String`

containing “Hello, world!” hashes as −1880044555 and a `String`

containing “Hello, World!” hashes as 1498789909 (I got these values from the Scala REPL and verified them in Scastie).

In Java, all objects inherit `equals()`

and `hashCode()`

from `Object`

. They can both be overridden when necessary, and your IDE thinks overriding one requires overriding the other (unless you reconfigure the pertinent hint, of course).

Probably most Java programmers first come across hash codes when they need to implement `equals()`

on a custom object, either because the program they’re writing needs to compare two objects, or maybe because a unit test is failing that should be passing.

So they start writing

` @Override`

public boolean equals(Object obj)

and almost immediately the IDE is giving a warning that `hashCode()`

is missing. NetBeans and IntelliJ will offer to write `hashCode()`

for you, Eclipse probably does, too.

Some programmers probably just accept whatever the IDE writes for `hashCode()`

. If you’re not on a tight deadline, however, it might be a good idea to scrutinize the generated function.

Better yet, write a stub yourself that will fail the first test and go from there. I think hash code functions make for an excellent exercise in test-driven development.

The most important thing about a hash code is that two objects that are said to be equal should have the same hash code. A unit test might go something like this:

` if (`**objectA.equals(objectB)**) {

**assertEquals(objectA.hashCode(), objectB.hashCode());**

}

If it is practical and reasonable, two unequal objects should have unequal hash codes. The test above would be amended to:

` if (objectA.equals(objectB)) {`

assertEquals(objectA.hashCode(), objectB.hashCode());

} **else **{

assertNotEquals(objectA.hashCode(), objectB.hashCode())**;**

}

Obviously the best way to insure that two equal objects have the same hash code is to use the same fields that determine equality to also determine the hash code.

Often the `Fraction`

class is a handy, easy-to-understand example, one which I am thankful to Cay Horstmann for; he uses it quite a bit in his book *Scala for the Impatient*.

Two fractions are equal if their numerators are equal and their denominators are also equal.

If they’re not already in the lowest terms, we should put them in lowest terms before performing the comparison, so that, for example, the program recognizes that −7/14 = −1/2.

Also, to not over-complicate things, we should require that the denominator must always be a positive integer. To reuse the earlier example, 7/−14 would get converted first to −7/14 and then to −1/2.

Actually, the order of those steps doesn’t matter as long as our `Fraction`

constructor takes care of them so that `equals()`

can safely rely on the fractions already being in lowest terms and the denominator being a positive integer.

For my own project, I wrote `Fraction`

to use 64-bit signed integers for the numerator and denominator. My implementation also includes a `double`

to hold a numerical approximation of the fraction, which I consider to be unsuitable for use in `equals()`

.

But even if we used 32-bit signed integers for the numerator and denominator, we’d still have somewhat of a problem writing `Fraction.hashCode()`

: the `int`

primitive has only 2³² distinct values, but `Fraction`

can represent at least 3 × 2³¹ − 1 different numbers.

And that’s counting only fractions with a denominator of 1, which are integers, and unit fractions (that is, fractions with a numerator of 1). Clearly `Fraction`

can represent a lot more numbers than that.

So we need a compromise: equal fractions must still be guaranteed to get equal hash codes, and for unequal fractions we make an effort to produce distinct hash codes for the most likely scenarios but we don’t guarantee they’ll always be distinct for any pair of distinct fractions.

Then `FractionTest`

should test several equal fractions regardless of likelihood in an actual use case, but for distinct fractions it should be limited to fractions that are likely to come up in actual use.

Our `hashCode()`

stub to fail the first test can probably just return 0 for every fraction. Well, that might actually pass the equal fractions test, but it’s sure to fail the different fractions test.

We might then amend `hashCode()`

to multiply the denominator by 2¹⁶ without regard for overflow, and then add the numerator modulo 2¹⁶.

Then −1/2 might hash as 131071 or 196607, depending on how we come by the numerator modulo 2¹⁶. However, −1/65538 would also hash as 131071 or 196607.

You might decide that this is just fine for what you need `Fraction`

to do, as in, for example, if you don’t intend to put `Fraction`

objects into a hash table, or you do intend to do that but don’t mind a small performance degradation in the unlikely event that you’ll be putting in fractions with denominators 65536 apart into the hash table.

Now let’s say that we implement `GaussianFraction`

, which represents a number of the form *a* + *bi*, where *a* and *b* are just fractions of the kind we’re used to, and *i* is the principal square root of −1 (the other square root is −*i*).

We’re talking about numbers like −1/2 + *i*/2 (here *b* = 1/2), which is a solution to the equation 2*x*² + 2*x* + 1. `GaussianFraction`

probably holds two `Fraction`

objects, one for *a* and one for *b*.

Obviously we’d want the ability to check if two “Gaussian fractions” are equal, which means we’ll need hash codes as well.

It’s an even tighter squeeze into a 32-bit signed integer, but we can probably just do some arithmetic on the hash codes of *a* and *b* to get the hash code for *a* + *bi*, such as multiplying the hash code for *a* by the hash code for *b*.

None of this is an issue when implementing `Boolean.hashCode()`

, since `boolean`

can only have two possible values: `true`

or `false`

. So why not just hash them as 1 and 0?

Because some other object with a `Boolean`

field might need to use `Boolean.hashCode()`

for its own hash code, just as `GaussianFraction.hashCode()`

might depend on `Fraction.hashCode()`

.

If `false`

hashes as 0 and `true`

hashes as 1, multiplying the hash codes of other fields by `Boolean.hashCode()`

could be worthless. Adding `Boolean.hashCode()`

might not be much better.

But with `true`

hashing as 1231 and `false`

as 1237, addition and multiplication might work a lot better towards generating unique hash codes than it would with 1 and 0.

I can’t think of a real world example in which you would actually need to hash with a `Boolean`

, but that doesn’t mean such an example doesn’t exist.

I think that it would probably be acceptable to use a `boolean`

primitive and then use an if statement to hash `true`

and `false`

to whatever numerical values make sense for your particular use case. Something like this:

` @Override`

public int hashCode() {

int hash = 1;

// ...do some stuff with the hash codes of object fields...

if (this.someFlag) {

hash *= 3;

} else {

hash *= 17;

}

return hash;

}

Of course I doubt there would be any significant difference in performance between doing it this way and using `Boolean.hashCode()`

.

Rather than delay publication while hunting for a real world example, I’ll just use a toy example: *Jeopardy!* clues.

I’m guessing the writers probably just write the clues in Microsoft Word, though they might have a custom template. Microsoft Excel would also make sense.

As for the mechanism to get the clues from the writers onto the clue screens, the video editing software and the official clue archive, I’m guessing a C or C++ program, though JavaScript is also plausible (insert here the standard explanation of why JavaScript is very different from Java).

As you know, each *Jeopardy!* clue has a dollar amount an answering player can win or lose; but if it’s a Daily Double, the player may wager a different amount.

Our `JeopardyClue`

object might have the following fields with the appropriate getters and setters: `String category`

, `short dollarAmount`

(assuming players must always wager whole dollar amounts), `Boolean dailyDoubleFlag`

, `String clue`

, `String correctResponse`

, `Date intendedAirDate`

.

Presumably the hash codes need only be distinct among the clues for a particular intended air date. Multiplying the hash code for the category by the hash code for the dollar amount might give suitable results.

I don’t see why the Daily Double flag hash code would need to figure into the clue hash code calculation, but since this is a toy example, let’s just say that it has to.

Here is the $400 clue in the People of the World category on the *Jeopardy!* game first aired June 14, 2018: “Javanese are the largest ethnic group of this nation.” “What is Indonesia?” Diana Hsu answered correctly.

This one might hash as −2128310992 without multiplying by `dailyDoubleFlag.hashCode()`

, 94255344 with.

Note that since 2128310992 is a 30-bit number, multiplying −2128310992 by a 10-bit number causes an overflow which results in a positive integer, rather than −2632720697104, the absolute value of which is a 41-bit number.

The $600 clue in the same category was a Daily Double, Hsu wagered $1,200: “Afrikaners were once called by this name meaning ‘farmer’.” “Who are the Boer?” Or maybe she answered “What is Boer?” Either way, that’s correct.

This one might hash as 1102500808 without multiplying by `dailyDoubleFlag.hashCode()`

, −31170888 with.

That’s probably enough for you to get the idea. If you care to do this exercise for the whole June 14, 2018 game, an unofficial game transcript is available from J-Archive.com.

# Side bar: hash codes in Scala

Everything is an object in Scala: objects are objects, primitives are objects and functions are objects. And since Scala is built on top of Java, objects have hash codes, and primitives and functions also have hash codes. For example:

scala> (Math.abs(_: Double)).hashCode

res15: Int = 2099559808scala> (Math.abs(_: Int)).hashCode

res16: Int = 284101091

You will most likely get different results within even a single Scala REPL session… I haven’t been able to replicate these results in Scastie, but I’ve been able to replicate results for some “primitives.”

Everything being an object in Scala means that the Scala REPL is a great way to play around with the hash codes and other standard properties of standard Java classes as well as standard properties of Java and/or Scala classes that you create.

In general I find it preferable to use the REPL for these sorts of explorations. To have to start a project in an IDE just to write some `println`

statements, or even writing a source file in Notepad and compiling it on the command line feels like overkill for one-off things like this.

For the hash codes of Java primitives in the Scala REPL, it’s probably best to get their hash codes “as primitives” rather than trying to instantiate a wrapper.

scala> 7.hashCode

res16: Int = 7scala> -7.hashCode

res17: Int = -7scala> Math.PI.hashCode

res18: Int = 340593891scala> true.hashCode

res19: Int = 1231scala> false.hashCode

res20: Int = 1237

These results should be consistent across all Scala REPL sessions as long as Oracle doesn’t change the relevant implementations of `hashCode()`

.

If you’re really curious about what an object’s hash code would be if the class didn’t override `Object.hashCode()`

, you can always do `System.identityHashCode()`

.

scala> System.identityHashCode(res16)

res21: Int = 224237360scala> System.identityHashCode(7)

res22: Int = 224237360scala> System.identityHashCode(res17)

res23: Int = 1098912468scala> System.identityHashCode(-7)

res24: Int = 1098912468scala> System.identityHashCode(res18)

res25: Int = 773074354scala> System.identityHashCode(Math.PI)

res26: Int = 949389896scala> System.identityHashCode(res19)

res27: Int = 1469015613scala> System.identityHashCode(1231)

res28: Int = 827221909scala> System.identityHashCode(res20)

res29: Int = 1249746110scala> System.identityHashCode(1237)

res30: Int = 1576743136scala> System.identityHashCode(7)

res31: Int = 224237360scala> System.identityHashCode(-7)

res32: Int = 1098912468

I suppose these won’t be consistent across REPL instances. Nevertheless, I was surprised by how consistent the system hash codes for integers are.

Scala has some classes, like `RichInt`

, that expand the capabilities of certain Java standard types. I wonder if those override the Java hash codes? That might be a discussion for another day.

But by now you might be feeling hungry for some reason.