Daneel: Type inference for Dalvik bytecode

In the last blog post about Daneel I mentioned one particular caveat of Dalvik bytecode, namely the existence of untyped instructions, which has a huge impact on how we transform bytecode. I want to take a similar approach as last time and look at one specific example to illustrate those implications. So let us take a look at the following Java method.

public float untyped(float[] array, boolean flag) {
   if (flag) {
      float delta = 0.5f;
      return array[7] + delta;
   } else {
      return 0.2f;
   }
}

The above is a straightforward snippet and most of you probably know how the generated Java bytecode will look like. So let’s jump right to the Dalvik bytecode and discuss that in detail.

UntypedSample.untyped:([FZ)F:
  [regs=5, ins=3, outs=0]
   0000: if-eqz v4, 0009
   0002: const/high16 v0, #0x3f000000
   0004: const/4 v1, #0x7
   0005: aget v1, v3, v1
   0007: add-float/2addr v0, v1
   0008: return v0
   0009: const v0, #0x3e4ccccd
   000c: goto 0008

Keep in mind that Daneel doesn’t like to remember things, so he wants to look through the code just once from top to bottom and emit Java bytecode while doing so. He gets really puzzled at certain points in the code.

  • Label 2: What is the type of register v0?
  • Label 4: What is the type of register v1?
  • Label 9: Register v0 again? What’s the type at this point?

You, as a reader, do have the answer because you know and understand the semantic of the underlying Java code, but Daneel doesn’t, so he tries to infer the types. Let’s look through the code in the same way Daneel does.

At method entry he knows about the types of method parameters. Dalvik passes parameters in the last registers (in this case in v3 and v4). Also we have a register (in this case v2) holding a this reference. So we start out with the following register types at method entry.

UntypedSample.untyped:([FZ)F:
  [regs=5, ins=3, outs=0]               uninit uninit object [float bool

The array to the right represents the inferred register types at each point in the instruction stream as determined by the abstract interpreter. Note that we also have to keep track of the dimension count and the element type for array references. Now let’s look at the first block of instructions.

   0002: const/high16 v0, #0x3f000000   u32    uninit object [float bool
   0004: const/4 v1, #0x7               u32    u32    object [float bool
   0005: aget v1, v3, v1                u32    float  object [float bool
   0007: add-float/2addr v0, v1         float  float  object [float bool

Each line shows the register type after the instruction has been processed. At each line Daneel learns something new about the register types.

  • Label 2: I don’t know the type of v0, only that it holds an untyped 32-bit value.
  • Label 4: Same applies for v1 here, it’s an untyped 32-bit value as well.
  • Label 5: Now I know v1 is used as an array index, it must have been an integer value. Also the array reference in register v3 is accessed, so I know the result is a float value. The result is stored in v1, overwriting it’s previous content.
  • Label 7: Now I know v0 is used in a floating-point addition, it must have been a float value.

Keep in mind that at each line, Daneel emits appropriate Java bytecode. So whenever he learns the concrete type of a register, he might need to retroactively patch previously emitted instructions, because some of his assumptions about the type were broken.

Finally we look at the second block of instructions reached through the conditional branch as part of the if-statement.

   0009: const v0, #0x3e4ccccd          u32    uninit object [float bool
   000c: goto 0008                      float  uninit object [float bool

When reaching this block we basically have the same information as at method entry. Again Daneel learns in the process.

  • Label 9: I don’t know the type of v0, only that it holds an untyped 32-bit value.
  • Label 12: Now I know that v0 has to be a float value because the unconditional branch targets the join-point at label 8. And I already looked at that code and know that we expect a float value in that register at that point.

This illustrates why our abstract interpreter also has to remember and merge register type information at each join-point. It’s important to keep in mind that Daneel follows the instruction stream from top to bottom, as opposed to the control-flow of the code.

Now imagine scrambling up the code so that instruction stream and control-flow are vastly different from each other, together with a few exception handlers and an optimal register re-usage as produced by some SSA representation. That’s where Daneel still keeps choking at the moment. But we can handle most of the code produced by the dx tool already and will hunt down all those nasty bugs triggered by obfuscated code as well.

Disclaimer: The abstract interpreter and the method rewriter were mostly written by Rémi Forax, with this post I take no credit for it’s implementation whatsoever, I just want to explain how it works.

Codes are helpful indeed.

Codes are helpful indeed. Glad that this blog shares a few.

There is a lot of new

There is a lot of new information coming to light and it would be useful if you could give some updates since your opinion is so valued.

Nice response in return of

Nice response in return of this issue with firm arguments and explaining everything on the topic of that.

Thank you; I've been hunting

Thank you; I've been hunting for information on this subject recently.

it is very impressive and

it is very impressive and informative content good work keep it up

So lucky to come across your

So lucky to come across your excellent blog. Your blog brings me a great deal of fun. Good luck with the site.

thanks for cheering comment

thanks for cheering comment section! Hope it includes my website for seeking.

This was a great solution and

This was a great solution and delivered so clearly. Thanks!

Nice and interesting post. I

Nice and interesting post. I was looking for this kind of information and enjoyed reading this one.

Thanks for the nice blog. It

Thanks for the nice blog. It was very useful for me. I'm happy I found this blog.

Thank you, I've recently been

Thank you, I've recently been looking for information about this topic

Wonderful post!! Thanks for

Wonderful post!! Thanks for sharing this insightful information.

So nice to find such a

So nice to find such a reliable source of information thank you!

Thanks for this useful

Thanks for this useful Post.... nice creation!

Good job! Amazing post!

Good job! Amazing post!

I'm not sure, but I believe

I'm not sure, but I believe that relatively few people are aware of Dalvik bytecode. Thank you for your contribution. continue your amazing work

A very awesome post. We are

A very awesome post. We are really grateful for your topic.

Nice and interesting post. I

Nice and interesting post. I was looking for this kind of information and enjoyed reading this one.

I agree; I was hunting for

I agree; I was hunting for similar information.

Thank you, I've recently been

Thank you, I've recently been looking for information about this topic

Wonderful post!! Thanks for

Wonderful post!! Thanks for sharing this insightful information.

So nice to find such a

So nice to find such a reliable source of information thank you!

they were there when we need

they were there when we need them most

Great post! really

Great post! really knowledgeable!

Thank you, I've recently been

Thank you, I've recently been looking for information about this topic

A very awesome post. We are

A very awesome post. We are really grateful for your topic.

Wilson's disease is present

Wilson's disease is present at birth, but signs and symptoms don't appear until the copper builds up in the brain, liver or other organ. Signs and symptoms vary depending on the parts of your body... you can read more about it at 84

<a href="https://brainandnervecenter.com/condition/wilson%27s+disease/c/410">https://brainandnervecenter.com/condition/wilson%27s+disease/c/410/a>

So nice to find such a

So nice to find such a reliable source of information thank you!

Wonderful post!! Thanks for

Wonderful post!! Thanks for sharing this insightful information.

Amazing post.Thanks for this

Amazing post.Thanks for this great information you've shared.

I'm not sure, but I believe

I'm not sure, but I believe Dalvik bytecode is only known to a small number of individuals. Thank you very much for your contribution. Continue your great work

Excellent post. Never knew

Excellent post. Never knew this, regards for letting me know.

Nice and interesting post. I

Nice and interesting post. I was looking for this kind of information and enjoyed reading this one.

A very awesome post. We are

A very awesome post. We are really grateful for your topic.

Interesting post, ill share

Interesting post, ill share it.
Northern Virginia Private Property Parking Management is a reputable parking management company in Arlington, Virginia and its environs.
https://novaparkingmanagement.com/

Keep in mind that Daneel

Keep in mind that Daneel doesn’t like to remember things, so he wants to look through the code just once from top to bottom and emit Java bytecode while doing so.

Extremely good post. Really

Extremely good post. Really thank you! I will read on…

Wonderful information. Keep

Wonderful information. Keep up the great work.

I found Hubwit as a

I found Hubwit as a transparent s ite, a social hub which is a conglomerate of Buyers and Sellers who are ready to offer online digital consultancy at decent cost.

One of the benefits of type

One of the benefits of type inference is that it can help improve the performance of Dalvik bytecode. Type inference can help the Dalvik VM better understand the types of data being used by a program and optimize its execution accordingly. In some cases, type inference can also help to eliminate entire classes of errors that would otherwise be present in bytecode.

Keep sharing the helpful

Keep sharing the helpful information.

Thanks for the information.

Thanks for the information. Keep sharing.

Helpful information. Thanks

Helpful information. Thanks for sharing.

Thank you; I've recently been

Thank you; I've recently been looking for information on this issue.

Great website. Very

great post home slice! I wish

great post home slice! I wish i could write code like this!

See this first:

See this first: https://www.tileandgroutcleaningprosgoodyear.com

Dalvik bytecode is notoriously difficult to work with, due to its lack of type information. However, recent research has shown that it is possible to perform type inference on Dalvik bytecode, which can help make working with it much easier. Type inference can be used to automatically detect the types of variables and methods, as well as to automatically generate type-safe code. This can make working with Dalvik bytecode much easier and more efficient, as well as help to improve the safety and security of Android applications. Type inference is an important tool for developers who want to work with Dalvik bytecode, and it is something that all Android developers should be aware of.

great post! as always your

great post! as always your content is superior to the other blogs on this subject! Cheers!

Bytecode help, thank you for

Bytecode help, thank you for sharing, really will help https://northamptonplasteringpros.com

Excellent insights into

Excellent insights into Dalvik bytecode. Thank you for your efforts in sharing this type of information; I greatly appreciate it. It will help me improve my knowledge about transforming bytecodes.