While my work responsibilities do not leave me much time to write code nowadays, I have managed to make a few small contributions to Jetpack Compose in the last few months, mostly focusing on performance.
If you are an Android app developer, your performance concerns probably start and stop at a fairly high level 1. I find working on large scale libraries like Compose fascinating because you need to worry about performance not only at a macro level, but also at a micro level. Since parts of the libraries can be invoked frequently (many times per frame for instance), even micro-optimizations can make a difference 2.
Jetpack Compose obviously benefits greatly from the amazing work done by kotlinc
, R8
, and ART
to automagically optimize both your apps and our libraries, but these automatic optimizations have
their own limits. Importantly, some of those optimizations — the ones performed by R8
— will not
apply in debug mode. This means that there are optimizations that will matter to developers if
they allow us to improve their debugging workflow.
With that in mind, I have spent a lot of time looking at the code in Jetpack Compose to find
optimization opportunities at all levels of the stack. To help me do this, I have built and
published kotlin-explorer, a desktop app that
makes it easier to visualize Kotlin code as both
dex bytecode and ARM 64-bit
assembly. Using this tool revealed a few fascinating low-level optimization opportunities I
would like to share with you, starting with Int.sign
. Next time, we’ll look at Float.sign
.
Int.sign
is a simple
API that returns the sign of an integer as an integer:
-1
if the value is negative0
if the value is zero1
if the value is positive
In Kotlin, Int.sign
is implemented as follows in the standard library:
The implementation is clean, concise and does exactly what it should, so what could we possibly improve here? To figure this out, let’s look at the generated dex bytecode:
1if-gez v0, 0004 // +0004
2const/4 v0, #int -1 // #ff
3goto 0009 // +0006
4if-lez v0, 0008 // +0004
5const/4 v0, #int 1 // #1
6goto 0009 // +0002
7const/4 v0, #int 0 // #0
8return v0
This assembly is a direct translation of the original Kotlin code into dex instructions, so let’s go a level deeper and look at the aarch64 assembly that will run on your Android device:
1cmp w1, #0x0 (0)
2cset w0, ge
3cmp w1, #0x0 (0)
4cset w1, gt
5cmp w0, #0x0 (0)
6csinv w0, w1, wzr, ne
7ret
This version is a little better because it removes the branches found in the original code. It
relies instead on aarch64’s conditional but branchless instructions cset
and csinv
. Even if you
don’t fully understand aarch64 assembly, the fact that the comparison instruction cmp
is used
twice to compare the w1
register to 0
should raise questions. And it is indeed possible to
write a more optimized version of Int.sign
in aarch64:
This new version uses half of the instructions (excluding ret
) compared to the previous solution.
Thankfully, it is easy to get Int.sign
to produce this code by forcing it to use
java.lang.Integer.signum()
for which ART
provides an optimized instrinsic:
1public actual val Int.sign: Int get() = Integer.signum(this)
So what should you do about this? You could create your own version of Int.sign
(see below as
well) or you could
wait for Kotlin 2.0
which will include a fix. JetBrains measured the impact of this change on JDK21 on Linux, and the
improvements are
significant.
There is another way to implement Int.sign
without branches that doesn’t rely on runtime
intrinsics to produce good aarch64 assembly:
1val Int.sign: Int get() = (this shr 31) or (-this ushr 31)
With some bit-manipulation 3 trickery we end up with the following aarch64 code:
Does this version matter? No idea, I have not benchmarked it 4. But it’s neat.
As they should. ↩︎
Especially when you add up the effects of many such micro-optimizations. ↩︎
I love Kotlin but I strongly dislike its bitwise operators, especially to shift bits around. ↩︎
But JetBrains did and it looks to be as fast as the
signum()
version. ↩︎