Daneel: Type inference for Dalvik bytecode
In the last blog post about Daneel I mentioned one particular caveat of Dalvik bytecode, namely the existence of untyped instructions, which has a huge impact on how we transform bytecode. I want to take a similar approach as last time and look at one specific example to illustrate those implications. So let us take a look at the following Java method.
public float untyped(float[] array, boolean flag) { if (flag) { float delta = 0.5f; return array[7] + delta; } else { return 0.2f; } }
The above is a straightforward snippet and most of you probably know how the generated Java bytecode will look like. So let’s jump right to the Dalvik bytecode and discuss that in detail.
UntypedSample.untyped:([FZ)F: [regs=5, ins=3, outs=0] 0000: if-eqz v4, 0009 0002: const/high16 v0, #0x3f000000 0004: const/4 v1, #0x7 0005: aget v1, v3, v1 0007: add-float/2addr v0, v1 0008: return v0 0009: const v0, #0x3e4ccccd 000c: goto 0008
Keep in mind that Daneel doesn’t like to remember things, so he wants to look through the code just once from top to bottom and emit Java bytecode while doing so. He gets really puzzled at certain points in the code.
- Label 2: What is the type of register
v0
? - Label 4: What is the type of register
v1
? - Label 9: Register
v0
again? What’s the type at this point?
You, as a reader, do have the answer because you know and understand the semantic of the underlying Java code, but Daneel doesn’t, so he tries to infer the types. Let’s look through the code in the same way Daneel does.
At method entry he knows about the types of method parameters. Dalvik passes parameters in the last registers (in this case in v3
and v4
). Also we have a register (in this case v2
) holding a this
reference. So we start out with the following register types at method entry.
UntypedSample.untyped:([FZ)F: [regs=5, ins=3, outs=0] uninit uninit object [float bool
The array to the right represents the inferred register types at each point in the instruction stream as determined by the abstract interpreter. Note that we also have to keep track of the dimension count and the element type for array references. Now let’s look at the first block of instructions.
0002: const/high16 v0, #0x3f000000 u32 uninit object [float bool 0004: const/4 v1, #0x7 u32 u32 object [float bool 0005: aget v1, v3, v1 u32 float object [float bool 0007: add-float/2addr v0, v1 float float object [float bool
Each line shows the register type after the instruction has been processed. At each line Daneel learns something new about the register types.
- Label 2: I don’t know the type of
v0
, only that it holds an untyped 32-bit value. - Label 4: Same applies for
v1
here, it’s an untyped 32-bit value as well. - Label 5: Now I know
v1
is used as an array index, it must have been an integer value. Also the array reference in registerv3
is accessed, so I know the result is a float value. The result is stored inv1
, overwriting it’s previous content. - Label 7: Now I know
v0
is used in a floating-point addition, it must have been a float value.
Keep in mind that at each line, Daneel emits appropriate Java bytecode. So whenever he learns the concrete type of a register, he might need to retroactively patch previously emitted instructions, because some of his assumptions about the type were broken.
Finally we look at the second block of instructions reached through the conditional branch as part of the if
-statement.
0009: const v0, #0x3e4ccccd u32 uninit object [float bool 000c: goto 0008 float uninit object [float bool
When reaching this block we basically have the same information as at method entry. Again Daneel learns in the process.
- Label 9: I don’t know the type of
v0
, only that it holds an untyped 32-bit value. - Label 12: Now I know that
v0
has to be a float value because the unconditional branch targets the join-point at label 8. And I already looked at that code and know that we expect a float value in that register at that point.
This illustrates why our abstract interpreter also has to remember and merge register type information at each join-point. It’s important to keep in mind that Daneel follows the instruction stream from top to bottom, as opposed to the control-flow of the code.
Now imagine scrambling up the code so that instruction stream and control-flow are vastly different from each other, together with a few exception handlers and an optimal register re-usage as produced by some SSA representation. That’s where Daneel still keeps choking at the moment. But we can handle most of the code produced by the dx
tool already and will hunt down all those nasty bugs triggered by obfuscated code as well.
Disclaimer: The abstract interpreter and the method rewriter were mostly written by Rémi Forax, with this post I take no credit for it’s implementation whatsoever, I just want to explain how it works.
I wish I knew more about
I wish I knew more about Java, thus I value it when people are willing to share their knowledge with me.
Great Post! I learned a lot
Great Post! I learned a lot from this, Thank you!
I thik this is very helpfull
I thik this is very helpfull post
Please keep up the good work!
Please keep up the good work!
Great job for this comment!
Great job for this comment!
Very informative content.
Very informative content. Thanks.
Genuinely caring about your
Genuinely caring about your pupils is a quality that today's educators possess. Educators deserve our utmost appreciation for all that they do. They aid in the development of our nation by teaching us to be compassionate and to make good decisions for the future. Our Chicago commercial door repair specialists wanted to emphasize this point as well. Truly a lesson to be treasured.
Do you personally recommend
Do you personally recommend this?
What a great piece of
What a great piece of article!
Exactly! I love it.
Exactly! I love it.
You presented your ideas and
You presented your ideas and thoughts really well on the paper.
Many many thanks to you!
Many many thanks to you!
Michael, I remember grappling
Michael, I remember grappling with a similar Dalvik bytecode challenge in a project a couple of years ago, especially when dealing with untyped instructions. Your deep dive into Daneel's type inference approach brought back those memories vividly. I truly appreciate your ability to break down complex concepts, making them both accessible and enlightening.
Sign up using the binance
Sign up using the binance referral link and get 100 USDT as a gift - https://www.binance.com/en/activity/referral-entry/CPA?ref=CPA_00BMIXFAF1
It's so kind of you!
It's so kind of you!
Thanks for sharing!
Thanks for sharing!
Many thanks for sharing this!
Many thanks for sharing this!
I am really impressed with
I am really impressed with your writing style. Keep it up!
Very informative post!
Very informative post!
Very useful and informative
Very useful and informative post!
I visited Your blog and got a
I visited Your blog and got a massive number of informative articles. I read many articles carefully and got the information that I had been looking for for a long time. Hope you will write such a helpful article in future. Thanks for writing.
Thank you a lot for sharing
Thank you a lot for sharing this with all of us you actually realize
what you're speaking approximately! Bookmarked. Kindly additionally discuss with my website =).
We can have a link trade arrangement among us
Such a great post!
Such a great post!
Thanks for sharing this to
Thanks for sharing this to public!
Excellent post!
Excellent post!
Good to know about this!
Good to know about this!
Thanks for letting us know!
Thanks for letting us know!
Thanks for sharing!
Thanks for sharing!
Actually, it's pretty good to
Actually, it's pretty good to see!
Good to start to know about
Good to start to know about coding.
I couldn't agree more!
I couldn't agree more!
I wish I were more proficient
I wish I were more proficient in Java, therefore I appreciate it when others are willing to share what they've learned with me.
Thanks for the information..
Thanks for the information..
Very informative and helpful.
Very informative and helpful.
This type of topic is usually
This type of topic is usually motivating, and I love to read great stuff, so I'm pleased to find a nice spot for many in this post, similar to how wonderful credit repair in Houston is.
I enjoy reading excellent
I enjoy reading excellent content, and this kind of subject usually inspires me, so I'm happy to see that this post has a nice place for many.
Great blog! I found some
Great blog! I found some interesting things in here that I might use for future references. Hoping to see more of this posts in the future!
So much owe to your blog.
So much owe to your blog. Very helpful.
your aggregate driveways are
your aggregate driveways are the best. we can partner up to provide for the asphalts.
I found some other article
I found some other article contains tactics relating to JAVA yet your article adding some another informational data.
Same here. I am on the
Same here. I am on the lookout for reliable services.
Great article. Great code.
Great article. Great code.
Excellent Blog! If you want
Excellent Blog! If you want to read some article you can check out site. https://www.elizabethtowtruck.com/
Java is really my weakness
Java is really my weakness but I am glad that there's someone who is willing to share their knowledge and skills in Java.
You know what, I am not that
You know what, I am not that expert in java but here you are helping someone who are new into this field.
This data helps.
This data helps.
Do you have any articles that
Do you have any articles that will support this kind of coding? My apologies for this, I am not that expert and I want to know more.
Keep the good work.
Keep the good work.
The article with the data is
The article with the data is very helpful.
You have a good data! Thanks
You have a good data! Thanks for sharing