Before we dive into today’s topic, I would like to make it clear that what follows is specific to how Android, and more precisely the Android RunTime (ART), works. Some of what follows applies to other environments as well, but the main twist is about Android.
If you have read my previous articles, you should know by now that seemingly small or irrelevant changes can have a large impact on performance. So let’s look at another one! We’ll start our journey with a class containing a bunch of fields. What they are and what they mean isn’t really relevant for now, let’s just imagine we are dealing with a fairly large object1:
1class Element {
2 @JvmField var start = 0
3 @JvmField var end = 0
4
5 @JvmField var groupId = 0
6 @JvmField var id = 0
7 @JvmField var localId = 0
8 @JvmField var parent = 0
9
10 @JvmField var flags = 0
11 @JvmField var localFlags = 0
12
13 @JvmField var localState = 0
14
15 @JvmField var readCount = 0
16 @JvmField var writeCount = 0
17 @JvmField var saveCount = 0
18
19 @JvmField var priority = 0
20 @JvmField var level = 0
21 @JvmField var next = 0
22 @JvmField var previous = 0
23
24 @JvmField var localVisibility = 0
25 @JvmField var visibility = 0
26 @JvmField var visibilityFlags = 0
27
28 @JvmField var quota = 0
29 @JvmField var quotaFlags = 0
30 @JvmField var quotaFailures = 0
31
32 @JvmField var totalAccessCount = 0
33
34 @JvmField var wideFlags = 0
35
36 @JvmField var unsafeAccessPermission = 0
37 @JvmField var unsafeAccessCount = 0
38
39 @JvmField var variableCount = 0
40
41 @JvmField var height = 0
42 @JvmField var width = 0
43
44 @JvmField var x = 0
45 @JvmField var y = 0
46 @JvmField var z = 0
47}
To eliminate any unnecessary overhead or potential skew in benchmarks, all the properties are
marked as @JvmField
so we can read and write the fields directly without going through getters
and setters.
Now let’s assume we have an array of Element
instances:
1private const val Iterations = 2_048
2
3private val elements = Array(Iterations) { index ->
4 Element().apply {
5 start = Random.nextInt(Iterations)
6 end = start + Random.nextInt(16)
7 }
8}
We only set the value of a couple of fields to write a simple benchmark that iterates over the array and reads the values:
1@Test
2fun sequentialReads() {
3 var sum = 0
4 benchmarkRule.measureRepeated {
5 sum = 0
6 for (element in elements) {
7 val start = element.start
8 val end = element.end
9 sum += end - start
10 }
11 }
12 BlackHole.consume(sum)
13}
Since we allocated all the elements sequentially and stored them in an array, accessing every
element and reading start
and end
is basically the best case scenario for memory access.
On a Pixel 2, running this benchmark yields the following result:
timeNs min 12,975.8, median 12,992.5, max 13,268.9
But what if I decided to rename the start
field to begin
?
There is no structural change to the class, and the benchmark code remains the same. Let’s run the benchmark again just to be safe:
timeNs min 7,694.3, median 7,791.1, max 8,052.3
Our code just got 40% faster!
Wait, what? Link to heading
Unfortunately, the name of your fields does have an impact on performance. To understand what’s going
on, we can look at the compiled aarch64 machine code. The instructions that read start
and end
before doing the subtraction look like this:
And if we rename start
to begin
:
The highlighted lines show at which offsets the two fields are read. In the first case, the fields are stored at offset 8 and 84 (in bytes), and in the second case at offsets 8 and 12.
This implies that when the fields are named start
and end
, they are more than one L1 cache line
away from each other in memory (a cache line is 64 bytes), causing an L1 cache miss every time we
read the fields.
The next logical question is why does this happen? We can figure it out by doing one of two things:
- Read the source code of ART (which I did a while back to answer this question)
- Print out the memory offset of all the fields to see if there’s a pattern
Let’s do the latter, it’s a bit easier.
Listing field offsets Link to heading
With Unsafe
, we can query the offset the each field:
1private val TheUnsafe = Unsafe::class.java.getDeclaredField("theUnsafe").let {
2 it.isAccessible = true
3 it.get(null) as Unsafe
4}
5
6for (field in Element::class.java.declaredFields) {
7 println("${field.name} = ${TheUnsafe.objectFieldOffset(field)}")
8}
This code will produce the following list for the begin
/end
case:
1begin = 8
2end = 12
3flags = 16
4groupId = 20
5height = 24
6id = 28
7level = 32
8localFlags = 36
9localId = 40
10localState = 44
11localVisibility = 48
12next = 52
13parent = 56
14previous = 60
15priority = 64
16quota = 68
17quotaFailures = 72
18quotaFlags = 76
19readCount = 80
20saveCount = 84
21totalAccessCount = 88
22unsafeAccessCount = 92
23unsafeAccessPermission = 96
24variableCount = 100
25visibility = 104
26visibilityFlags = 108
27wideFlags = 112
28width = 116
29writeCount = 120
30x = 124
31y = 128
32z = 132
If you look closely, you’ll quickly realize that ART organizes all the fields alphabetically in
memory. I of course picked the names to trigger the behavior we just observed. When using the names
begin
and end
, both fields end up together and fit in a cache line. When using the names start
and end
, start
goes somewhere close to the middle of the list, more than a cache line away.
This means it can sometimes be beneficial to name your fields based on their expected memory access patterns.
It’s not just names… Link to heading
So I showed you can speed up the benchmark by renaming start
. You can also do it without renaming
anything, but instead by turning all the other fields into Long
types:
1class Element {
2 @JvmField var start = 0
3 @JvmField var end = 0
4
5 @JvmField var groupId = 0L
6 @JvmField var id = 0L
7 @JvmField var localId = 0L
8 @JvmField var parent = 0L
9
10 @JvmField var flags = 0L
11 @JvmField var localFlags = 0L
12
13 @JvmField var localState = 0L
14
15 @JvmField var readCount = 0L
16 @JvmField var writeCount = 0L
17 @JvmField var saveCount = 0L
18
19 @JvmField var priority = 0L
20 @JvmField var level = 0L
21 @JvmField var next = 0L
22 @JvmField var previous = 0L
23
24 @JvmField var localVisibility = 0L
25 @JvmField var visibility = 0L
26 @JvmField var visibilityFlags = 0L
27
28 @JvmField var quota = 0L
29 @JvmField var quotaFlags = 0L
30 @JvmField var quotaFailures = 0L
31
32 @JvmField var totalAccessCount = 0L
33
34 @JvmField var wideFlags = 0L
35
36 @JvmField var unsafeAccessPermission = 0L
37 @JvmField var unsafeAccessCount = 0L
38
39 @JvmField var variableCount = 0L
40
41 @JvmField var height = 0L
42 @JvmField var width = 0L
43
44 @JvmField var x = 0L
45 @JvmField var y = 0L
46 @JvmField var z = 0L
47}
If we run both benchmarks (start
and begin
), we get:
// start
timeNs min 6,850.3, median 6,879.4, max 7,275.2
// begin
timeNs min 6,842.6, median 6,878.1, max 7,433.9
Wtf? Link to heading
The way ART orders fields is a bit more complicated than just sorting them alphabetically:
- Fields are grouped by type (
Long
,Float
, etc.) - Larger types come first (
Double/Long
thenFloat/Int
etc.) - Fields are sorted alphabetically within a group
- Some fields might break the above ordering for padding/alignment reasons
We all know that naming things in programming is hard. If you’ve read this entire post it just got a little bit harder.