Naming is Hard

Before we dive into today’s topic, I would like to make it clear that what follows is specific to how Android, and more precisely the Android RunTime (ART), works. Some of what follows applies to other environments as well, but the main twist is about Android.

If you have read my previous articles, you should know by now that seemingly small or irrelevant changes can have a large impact on performance. So let’s look at another one! We’ll start our journey with a class containing a bunch of fields. What they are and what they mean isn’t really relevant for now, let’s just imagine we are dealing with a fairly large object¹:

 1class Element {
 2    @JvmField var start = 0
 3    @JvmField var end = 0
 4
 5    @JvmField var groupId = 0
 6    @JvmField var id = 0
 7    @JvmField var localId = 0
 8    @JvmField var parent = 0
 9
10    @JvmField var flags = 0
11    @JvmField var localFlags = 0
12
13    @JvmField var localState = 0
14
15    @JvmField var readCount = 0
16    @JvmField var writeCount = 0
17    @JvmField var saveCount = 0
18
19    @JvmField var priority = 0
20    @JvmField var level = 0
21    @JvmField var next = 0
22    @JvmField var previous = 0
23
24    @JvmField var localVisibility = 0
25    @JvmField var visibility = 0
26    @JvmField var visibilityFlags = 0
27
28    @JvmField var quota = 0
29    @JvmField var quotaFlags = 0
30    @JvmField var quotaFailures = 0
31
32    @JvmField var totalAccessCount = 0
33
34    @JvmField var wideFlags = 0
35
36    @JvmField var unsafeAccessPermission = 0
37    @JvmField var unsafeAccessCount = 0
38
39    @JvmField var variableCount = 0
40
41    @JvmField var height = 0
42    @JvmField var width = 0
43
44    @JvmField var x = 0
45    @JvmField var y = 0
46    @JvmField var z = 0
47}

To eliminate any unnecessary overhead or potential skew in benchmarks, all the properties are marked as @JvmField so we can read and write the fields directly without going through getters and setters.

Now let’s assume we have an array of Element instances:

1private const val Iterations = 2_048
2
3private val elements = Array(Iterations) { index ->
4    Element().apply {
5        start = Random.nextInt(Iterations)
6        end = start + Random.nextInt(16)
7    }
8}

We only set the value of a couple of fields to write a simple benchmark that iterates over the array and reads the values:

 1@Test
 2fun sequentialReads() {
 3    var sum = 0
 4    benchmarkRule.measureRepeated {
 5        sum = 0
 6        for (element in elements) {
 7            val start = element.start
 8            val end = element.end
 9            sum += end - start
10        }
11    }
12    BlackHole.consume(sum)
13}

Since we allocated all the elements sequentially and stored them in an array, accessing every element and reading start and end is basically the best case scenario for memory access.

On a Pixel 2, running this benchmark yields the following result:

timeNs            min 12,975.8,   median 12,992.5,   max 13,268.9

But what if I decided to rename the start field to begin?

1class Element {
2    @JvmField var begin = 0
3    @JvmField var end = 0
4
5    // ...
6}

There is no structural change to the class, and the benchmark code remains the same. Let’s run the benchmark again just to be safe:

timeNs            min  7,694.3,   median  7,791.1,   max  8,052.3

Our code just got 40% faster!

Wait, what? Link to heading

Unfortunately, the name of your fields does have an impact on performance. To understand what’s going on, we can look at the compiled aarch64 machine code. The instructions that read start and end before doing the subtraction look like this:

1stp x0, lr, [sp, #-16]!
2ldr w0, [x1, #84]
3ldr w1, [x1, #8]
4sub w0, w1, w0

And if we rename start to begin:

1stp x0, lr, [sp, #-16]!
2ldr w0, [x1, #8]
3ldr w1, [x1, #12]
4sub w0, w1, w0

The highlighted lines show at which offsets the two fields are read. In the first case, the fields are stored at offset 8 and 84 (in bytes), and in the second case at offsets 8 and 12.

This implies that when the fields are named start and end, they are more than one L1 cache line away from each other in memory (a cache line is 64 bytes), causing an L1 cache miss every time we read the fields.

The next logical question is why does this happen? We can figure it out by doing one of two things:

Read the source code of ART (which I did a while back to answer this question)
Print out the memory offset of all the fields to see if there’s a pattern

Let’s do the latter, it’s a bit easier.

Listing field offsets Link to heading

With Unsafe, we can query the offset the each field:

1private val TheUnsafe = Unsafe::class.java.getDeclaredField("theUnsafe").let {
2    it.isAccessible = true
3    it.get(null) as Unsafe
4}
5
6for (field in Element::class.java.declaredFields) {
7    println("${field.name} = ${TheUnsafe.objectFieldOffset(field)}")
8}

This code will produce the following list for the begin/end case:

 1begin = 8
 2end = 12
 3flags = 16
 4groupId = 20
 5height = 24
 6id = 28
 7level = 32
 8localFlags = 36
 9localId = 40
10localState = 44
11localVisibility = 48
12next = 52
13parent = 56
14previous = 60
15priority = 64
16quota = 68
17quotaFailures = 72
18quotaFlags = 76
19readCount = 80
20saveCount = 84
21totalAccessCount = 88
22unsafeAccessCount = 92
23unsafeAccessPermission = 96
24variableCount = 100
25visibility = 104
26visibilityFlags = 108
27wideFlags = 112
28width = 116
29writeCount = 120
30x = 124
31y = 128
32z = 132

If you look closely, you’ll quickly realize that ART organizes all the fields alphabetically in memory. I of course picked the names to trigger the behavior we just observed. When using the names begin and end, both fields end up together and fit in a cache line. When using the names start and end, start goes somewhere close to the middle of the list, more than a cache line away.

This means it can sometimes be beneficial to name your fields based on their expected memory access patterns.

It’s not just names… Link to heading

So I showed you can speed up the benchmark by renaming start. You can also do it without renaming anything, but instead by turning all the other fields into Long types:

 1class Element {
 2    @JvmField var start = 0
 3    @JvmField var end = 0
 4
 5    @JvmField var groupId = 0L
 6    @JvmField var id = 0L
 7    @JvmField var localId = 0L
 8    @JvmField var parent = 0L
 9
10    @JvmField var flags = 0L
11    @JvmField var localFlags = 0L
12
13    @JvmField var localState = 0L
14
15    @JvmField var readCount = 0L
16    @JvmField var writeCount = 0L
17    @JvmField var saveCount = 0L
18
19    @JvmField var priority = 0L
20    @JvmField var level = 0L
21    @JvmField var next = 0L
22    @JvmField var previous = 0L
23
24    @JvmField var localVisibility = 0L
25    @JvmField var visibility = 0L
26    @JvmField var visibilityFlags = 0L
27
28    @JvmField var quota = 0L
29    @JvmField var quotaFlags = 0L
30    @JvmField var quotaFailures = 0L
31
32    @JvmField var totalAccessCount = 0L
33
34    @JvmField var wideFlags = 0L
35
36    @JvmField var unsafeAccessPermission = 0L
37    @JvmField var unsafeAccessCount = 0L
38
39    @JvmField var variableCount = 0L
40
41    @JvmField var height = 0L
42    @JvmField var width = 0L
43
44    @JvmField var x = 0L
45    @JvmField var y = 0L
46    @JvmField var z = 0L
47}

If we run both benchmarks (start and begin), we get:

// start
timeNs            min  6,850.3,   median  6,879.4,   max  7,275.2
// begin
timeNs            min  6,842.6,   median  6,878.1,   max  7,433.9

Wtf? Link to heading

The way ART orders fields is a bit more complicated than just sorting them alphabetically:

Fields are grouped by type (Long, Float, etc.)
Larger types come first (Double/Long then Float/Int etc.)
Fields are sorted alphabetically within a group
Some fields might break the above ordering for padding/alignment reasons

We all know that naming things in programming is hard. If you’ve read this entire post it just got a little bit harder.

Of course, we may not have the same definition of what a large object is. Let’s say it’s bigger than a Pair but smaller than a View ↩︎