Logo: C# Computing
 
Web CsharpComputing.com

C# Tutorial Lesson 8: Hacker's introduction to MSIL.

MSIL is the lowest level .NET language. All languages targeting .NET framework generate MSIL. As a C# developer, you will probably never write MSIL code directlly, but you will often look at the MSIL disassembly of your application for answers to dependency, versioning and optimization questions.

To start disassembling C# (or any other .NET application) you need to run ildasm.exe from either VS.NET command prompt or .NET framework command prompt. On my computer, ildasm.exe is at C:\Program Files\Microsoft Visual Studio 8\SDK\v2.0\Bin.

To get you excied about MSIL, let me ask you a question: Where does the term boxing come from? Is there such a keyword in C#? What about VB.Net? Why do we call value to object conversion boxing and the oposite conversion unboxing? Because, MSIL is using box and unbox keywords to perform these conversations.

So, let's study MSIL. The simplest MSIL program is the one which does not do anything and has no data:

.assembly hello{}
.class hello
{
  .method static public void main() il managed
  {
    .entrypoint
     ret
  }
}


MSIL is an object oriented assembly language. As such, it retains object-oriented constructs of the source languages, e.g private and public methods. Every MSIL application needs to have an entry point. Any method (not just Main) can serve as the entry point as long as it is decorated with .entrypoint instruction. MSIL programs are compiled with the Intermediate language compiler, ilasm.exe that is located in the same directory as the dissassambler. Here is a more complicated program that, once again, does not do anything but has some data.

.assembly hello{}
.class hello{
    .method static public void main() il managed{
        .entrypoint
        .locals( string V_0)
         ldstr "hi there"
        stloc.0
        ret
        }
}

This program has a statement .local(string V_0),which declares a single local variable of type string. This declaration allows the compiler to allocate "hi there" on the local stack. Because of that, stloc.0 can find "hi there" and pop it from the stack. Since you are working in a managed environment, you cannot leave data in memory before quitting the program. Memory leaks are not allowed; so every single variable which you have allocated in memory has to be popped from the registers. Every program also needs to start with a declaration of the assembly it belongs to. In our case, we choose the assembly name to be the same as the class name.

Intermediate language compiler is very forgiven, and you may easily crash an MSIL application by inserting some invalid instructions into the code. For example, try adding ldstr "hi there"; after ret instruction above.

Let's take a look at a bit more complicated example which still doesn't do anything useful.

//allocating and deallocating multiple variables on the stack
.assembly hello{}
.class hello
 {
 .method static public void main() il managed
  {
    .maxstack 2
    .entrypoint
    .locals( string V_0, string V_1) //we have two local variables now
    ldstr "hi there" //push this string on stack
    ldstr "bye here" //push second string on stack
    stloc.0 //pop first string  from the stack and store it in  the local variable 0.
    //you do not need to worry about deallocating local variables - it is done by the runtime.
    stloc.0 //pop the second string from the stack and store it in the same local variable ("hi there" is overwritten)
    ret
  }
}

There is a new element in this program: .maxstack declaration. We use .maxstack to declare the maximum number of variables we plan to have on the stack at any given time. The default value is 1, so we can always omit this declaration when we use a single register.

Here is a hello world program written in MSIL

//compile with ilasm
.assembly hello {}
.method static public void main() il managed
{
  .entrypoint
  ldstr "Hello MS IL!"
  call void [mscorlib]System.Console::WriteLine(class System.String)
  ret
}

All MSIL directives start with a period. Any MSIL component (except module) is an assembly. Ilasm allows classles assemblies (see code above). However, classless assemblies are not compatible with assemblies generated from higher level .NET languages (e.g. C# and VB.NET).,

.entrypoint and ret are equivalent to main(){ ... }

.lsdtr loads string into a register and calls to WriteLine picks it up from there. WriteLine does all the clean up before it displays "hello  msil", we do not need to pop anything from the stack. We will get a runtime error if we do.

Here is a program which illustrates how to store data into local variables and how to overwrite them

.assembly hello{}
.assembly extern mscorlib {}
.class hello
  {
    .method static public void main() il managed{
    .maxstack 2
    .entrypoint
    .locals(string V_0, string V_1)
    //we have two local variables now
    ldstr "hi there" //push this string on stack
    ldstr "bye here" //push second string on stack
    stloc.0 //pop first string from the stack and store it in the local variable 0.
    //you do not need to worry about dealocating local variables - it is done by the runtime.
    stloc.0 //pop the second string from the stack and store it in the same local variable ("bye there" is overwritten)
    ldloc.0 //push the remaining local variable containing "bye there" into the register
    call void [mscorlib]System.Console::WriteLine(string)
    ret
  }
}

It is always a lot of fun to manipulate integers with Assembly language.

 

//print number 2
.assembly hello {}
.method public static void Main() il managed
{
  .entrypoint
  .locals(int32
V_0)
  ldc.i4.2
  stloc.0
  ldloc.0
  call void [mscorlib]System.Console::WriteLine(int32)
  ret
}

The next program adds two integers

//add two numbers 1 and 3
.assembly hello {}
.assembly extern mscorlib {}
.class public hello 
{
    .method static public void main()
   {
    .entrypoint
                .maxstack 2
    .locals(int32 V_0,     int32 V_1) //declare two local variables
    ldc.i4.1 //put number  1 on the stack
    ldc.i4.3 //put number 3 on the stack
    stloc.0 //pop 1 from the stack and store it in the local variable
    ldloc.0 //push local variable  with value 1 on the stack
    add //add takes care of the second value on the local stack
    //you should not try to deallocoate memory there. it is done by add
    //add works with the first variable on the stack and the value
    call void [mscorlib]System.Console::WriteLine(int32)
    ret
   }
}

It is sometimes very useful to have an explicit conversion between a value and an object. This is done with box directive. The example bellow outputs an object value. So, we need to explicitly convert the data inside the register to a boxed data.

.assembly hello{}
.method public static void Main() il managed
{
   .entrypoint
  ldc.i4.s 100 //put 100 on stack
  box [mscorlib]System.Int32 //convert it to on object in place
  call void [mscorlib]System.Console::WriteLine(object) //print the value of the object
  ret
}

The example above was a bit contrived to keep things simple . Here is a more realistic example

.assembly hello{}
.method public static void Main() il managed
{
  .entrypoint
  .maxstack 2
  .locals (int32 V_0)
  ldstr "Please enter your age:"
  call void [mscorlib]System.Console::WriteLine(string)
  call string [mscorlib]System.Console::ReadLine()
  call int32 [mscorlib]System.Int32::Parse(string)
  stloc.0
  ldstr "You are {0} years old "
  ldloc.0
  box [mscorlib]System.Int32 //convert int32 to an object on the stack
  call void [mscorlib]System.Console::WriteLine(string, object)
  ret
}

Note that MSIL does not have System.Consol::WriteLine(sting,int32 ) method, therefore int32 needs to be converted to another type to allow output to the console.

Exercises:

  1. Write a program that subtracts two integers.
  2. Read an MSIL article by John Robbins at MSDN magazine.
  3. Use ildasm to disassemble your .Net programs. Does compiling with /o+ optimization option change MSIL code? Why?