The Machine Language of the JVM


I have moved on. I do have notes, but not in tutorial form, on objects, exceptions, etc.  Take a look at ALACA Notes Parts 1 - 4.  (August, 2004)

Welcome to a small tutorial on the machine language of the Java Virtual Machine (JVM).

It's not deep.  It's not complicated.  It's what you would want to know if you knew what you wanted.

Its purpose is to show how the the machine instructions support the Java Language.

It's important to do the exercises, since concepts are exercised and expanded there.
 
 

(Last revised September, 2000).


Contents

    How to look at assembly code
    Simple machine architecture (at first glance).
    Arithmetic expressions
        Putting integer constants onto the stack
        Calculations
    Calling methods and passing parameters
    Jasmin programs
    Flow of Control
    Arrays
    Exercises

Resources
     Jasmin
    JVM instruction reference


How to look at assembly code

  1. If you know how to write a Java program, then you are halfway there.  Compile the program into a class file.  For example, I compiled Demo1.java on Pegasus by % javac Demo1.java. If all goes well, an object file Demo1.class is created.  I can then execute this program by just naming the class file  % java Demo1.

  2.  
  3. For the rest of you, don't worry.  If you can read C++ and are familiar with the concepts of classes and objects of a class, you can read a lot of Java, or at least enough Java for this tutorial.  While looking at Demo1.java , do note:

  4.  
  5.  Don't try too hard in viewing the class file; it's in hexadecimal and has a pretty complicated format.  Instead, we'll use Java's disassembler javap.  In this example, I viewed the class file by % javap -c -verbose Demo1.  The -c option will show the bytecodes for then instructions and -verbose option shows more information about the methods.  The listing will show assembler instructions for the bytecodes for the methods.

  6.  
  7. Look at the  exercises.

Simple machine architecture

  1. For each invoked method, the JVM creates an isolated environment in memory called a stack frame.  It consists of

  2.  
    1. Local variables.  They are numbered 0, 1, 2, ... .  (For methods of an object, local variable 0 contains a reference to the object itself (this).  Right now we are implementing class methods, so we don't have to worry about this.) Local variables usually contain an integer or a reference to an object.  We access a local variable by its number.

    2.  
    3. A stack for manipulating data.  All (well, almost all) arithmetic is performed on the stack. When the method begins, the stack is empty.  When the method ends, the stack better be empty.  For most statements, the compiler will generate code that start and end with an empty stack.  Unlike most machine architectures, there are no registers to run out of.
    Java compilation creates a lot of constants such as large numbers, strings, and names of classes and methods. Constants are stored in a constant pool that is accessible to all the methods of the class.

    Constants are stored in an array-like structure;  the JVM accesses a constant by naming the index (starting from 1) for the constant.  It's direct addressing for constants!

    There are many types of constants; and each one has its own data structure.  It can be quite complicated.  For example, a constant for a method contains fields that contain indices to other constants.  One of them names the class of the method; the other names the method and the type of parameters.

    The good news is that the compiler (and Java assembler) knows how to create the constant pool.  All you have to know is that constants exist in a pool and that they are accessible by naming the index of the constant.
     

  3. Machine instructions can refer to (1) immediate data (part of the instruction) (2) local variables, (3) the stack, (4)  the constant pool,  (5) a branch address.

  4. Most instructions consist of a one byte operation code (opcode) followed by a byte containing an operand.  The operand can be an integer constant,  an index to local storage, or an index to the constant pool.  Most of the time this is just fine; most constants are small and methods usually need just a few local variables.  In the rare case that a larger constant is needed or that there are many, many local variables or many, many constants in the constant pool, the instruction will contain a two byte index or constant.

    Why just use one byte for the operand when two bytes works just as well?  To keep the program small, an important consideration since most programs are transmitted over the Internet.

    To make instructions even smaller some instructions imply a constant.  You'll see this below.
     

  5.  Exercises


Arithmetic expressions

Putting Integer Constants onto the stack

  1. Arithmetic is performed on the stack.  Data for the calculations can come from (1) the instruction (immediate addressing or even implied by the instruction), (2) local variables (direct addressing), and (3) the constant pool (direct addressing).  The JVM has many, many instructions for loading data onto the stack.  It's best to see this in context.

  2.  
  3. There are many types of  arithmetic data, such as integers, floats, doubles, and longs. To keep this tutorial short, we'll just consider integers.  Integers are 32 bits and are stored stored big-endian in memory.

  4. Let's start with assigning constants to local variables.  Look at the method stackManip() in Demo1.java and, at the same time, the  assembler code.   The first statement is x = 2;  x is assigned as local variable 0.
     
  5. The instruction iconst_2 at location 0 pushes a constant 2 onto the stack.

  6. How long is the instruction?  (This is a tutorial, so try to answer the question now.)
    The constant is implied by the instruction!
    There are separate machine instructions for pushing 0 through 5 and -1 onto the stack.
       
  7. Visit the Jasmine site  to see description of  the  iconst_<n> machine instructions.
    1.  
  8. We save the data at the top of the stack (pop the stack) into local variable 0 by the instruction istore_0 at location 1.

  9. How long is the instruction?
    What are the other istore_<n> instructions?
       
  10. Let's see how  y = 6; is implemented:
    1. bipush 6
        istore_1
    How long is the instruction?
    The second byte contains the constant 6.  Find the description of this instruction.  What are the largest and smallest integers?
    Do you see how y is the second local constant?
     
  11. Onto bigger constants:  To implement  z = 200; we now need two push a 2 byte constant.  The instruction sipush 200 does just that -- s(hort) i(nteger) push.

  12. How long is the instruction?
    How big can the constants be?
       
  13. Even bigger constants:  To implement u = 2000000; the designers could have used a 5 byte instruction.  They thought better.  Instead, put large constants into the constant pool rather than within the instruction.  Thus we see: ldc #1 <Integer 2000000>  which copies (loads the constant) 2000000 found at the first index of the constant pool onto the stack.

  14. How long is the instruction?
    What other constants can be loaded from the constant pool?
     
  15.  Exercises
Calculations
  1. The strategy is to (1) push the operands on the stack, (2) perform the operation on the two operands -- the operation removes the operands from the stack and puts the result on the top of the stack, (3) and then finally pop the stack with the answer.  Let's start with   u = x + u;  The compiler generated

  2.   12 iload_0
      13 iload_3
      14 iadd
      15 istore_3
    First (1)  x is pushed, then u.  (2) Now we perform the addition.  What is on the top of the stack?  x + u.  How long is the instruction?  (3) We now save it.
     
  3. What arithmetic operations can you do on the integers?  The usual ones: iadd isub ineg imul idiv irem (taking the remainder)

  4.  
  5. You should be able to read a Java expression from left to right and perform the calculation on the stack.  For v = 3*(x/y - 2/(u+y)) we get the compiled code  (I adding comments starting with the semicolon):

  6.     16 iconst_3
      17 iload_0   ; x
      18 iload_1   ; y
      19 idiv
      20 iconst_2
      21 iload_3   ;u
      22 iload_1   ;y
      23 iadd
      24 idiv
      25 isub
      26 imul
      27 istore 4  ; v
    What gets pushed first?
    We don't have the second operand for the multiplication.  So let's keep pushing.  What two variables get pushed next?
    We can now divide.  How long is the instruction? What is now on the stack after the division?  top ->  x/y, 3.
    We still can't multiply.  And we can't yet subtract.  What is now pushed?
    We still can't multiply, subtract, or divide.  What variables are now pushed?  What does the stack now look like?
    We now can add.  What does the stack now look like?
    We now can divide. What does the stack now look like?
    We now can subtract. What does the stack now look like?
    We can finally perform the multiplication.  The stack just contains the value of the expression.
    We now can pop the stack. How long is the istore instruction?  What does 4 represent?  How many variables can we save into using istore?
     
  7. For every rule there is an exception.  How would you code x++; using a stack? How many instructions does it take?  How many bytes are used?

  8. Since this is a common operation, the JVM designers created  iinc 0 1.  How long is the instruction?  What do the 0 and 1 signify?  What is the range of local variables?  What is the range of constants (since this instruction can be used for x--)?
     
  9. The JVM also supports the logical instructions ishl ishr (the arithmetic kind) iushr (the logical kind; Java uses the operator >>> for this) iand ior ixor.  Do you see how y <<=  3; is compiled?

  10.  
  11. Verification.  The examples you have seen have been generated by the Java compiler; we trust that its code is correct and will run on any JVM.  However, when the program is loaded, no such trust is assumed.  The program is examined instruction by instruction before execution to make sure that the code will be able to execute.  For example,  the iadd instruction must have two integers on the stack.  A program that has only one integer on the stack will get a message such as

  12. Exception in thread "main" java.lang.VerifyError: (class: Boom, method: dumb sig
    nature: ()V) Unable to pop operand off an empty stack.  The program is then terminated.

    Be aware that all code will be examined by the Java verifier.  This will help you when you write your own assembly code since the verifier is checking the structure of your program.  It's not as powerful as the compiler, but it's much better than watching your program just blow up.
     

  13. Exercises.

  14.  

Calling methods and passing parameters

  1. To simplify things, we'll just call static methods (they don't belong to any object) with integer parameters.  For the expression v = add(x,y,-1); in Demo1.java, we see a call to the method add which requires 3 parameters.  As in C,  Java passes parameters by value.  When add begins execution, it will be able to access 3 integers; it has no idea where they came from. add will do its calculation and return an integer which will be saved in v.

  2.  
  3. So how is it done? stackManip pushes the values of the parameters onto the stack.  Thus we have the assembler code

  4.   43 iload_0    ; push x
      44 iload_1    ; push y
      45 iconst_m1  ; push -1
    Note that the last parameter is on the top of the stack and the first parameter is the furthest down.
       
  5. We need a call instruction:

  6.   46 invokestatic #8 <Method int add(int, int, int)>
       
    How long is this instruction? The #8 tells us to go to the constant pool.  What do we find there?  The name of the method, the type of parameters and the type of the return value. (Note that when the program is loaded, the JVM verifier will then check to make sure that 3 integers really are on the stack.) The JVM now knows what code to execute and where to find the parameters.  The JVM creates an environment for add; it removes the three integers on the top of the stack and places them in the first 3 local variables of add.  From the caller's point of view, the parameters have been popped. The JVM now executes the function add  which will return a  integer (this is discussed below).  Upon completion control is passed back to the caller with the return value on the caller's stack.  stackManip pops that value into v:
      49 istore 4
       
    There are many call instructions; the one that we use here is for static methods; there are other instructions for calling methods of an object.
     
  7. Let's see how the the method public static int add(int first, int second, int third)is implemented.    The first three local variables 0 through 2 will contain the value of the parameters when add begins execution.  For the code above, local variable 0 will contain x, 1 will contain y, and 2 will contain -1.  add may use more local storage for intermediate results; we see that sum is local variable 3.
    1.  
    At this point you can do the calculation:
       0 iload_0   ; first
       1 iload_1   ; second
       2 iadd      ; stack now has first + second
       3 iload_2   ; third
       4 iadd      ; stack has first + second + third
       5 istore_3  ; sum = first + second + third (stack is empty)

    To return a value we must put it on the stack (this is add's stack;  add knows nothing about the caller).  After it does this, it issues the return integer instruction.  So we have
       6 iload_3  ; sum
       7 ireturn

    The method is complete;  the JVM takes the integer value on add's stack and copy's it over to the caller's stack.  add's environment can now be recycled.
     

  8. Note that stackManip() has no parameters so there is no local storage for that.  Since it does not return a value, it used the instruction return.  How many return instructions are there and why?

  9.  
  10. At the beginning of assembler code we see that the class file has a summary of the methods:

  11.          public static void stackManip();
            /* Stack=5, Locals=6, Args_size=0 */
        public static int add(int, int, int);
            /* Stack=2, Locals=4, Args_size=3 */

    We and the JVM see the method names, parameters, and return type; but we also see how big the stack can get and how many local variables there are.  This is for verification and run time checking;  if an instruction pushes too much onto the stack or tries to access a local variable beyond the number stated, the program will complain!
     

  12. You'll also see code that looks like

  13. Method Demo1()
       0 aload_0
       1 invokespecial #7 <Method java.lang.Object()>
       4 return
    This code is used for initializing an object.  Let's worry about what it means later when we discuss objects.
     
  14. Exercises -- to be done

Jasmin programs
 

  1. By now you have been looking at what the compiler does to your code.  Let's take a small break and show you how you can write your own bytecodes.  This will show you how smart and stupid the compiler is.  It will also show you how you  have to worry about details.

  2.  
  3. We'll use the Jasmin assembler.  The format of  jasmin source code is similar to Java, with some additions so that there is enough information for the creation of the class file.  Make sure you have first set up the environment.

  4.  
  5. Look at  SomeFunctions.j which defines the class file SomeFunctions.  This class contains static methods such as int distSq(int x, int y), int distSq(int x, int y, int z), int dist(int x, int y), int millionX(int x).
    1. The semicolon ; begins a comment and is similar to // in Java.
    2. We declare the class by

    3. .class public SomeFunctions
      .super java/lang/Object
      This is the same as Java's public class SomeFunctions extends Object.   Note the path to the class Object.

      At this point we don't have to worry about class and object attributes.
       

    4. When a object is created, it is initialized.  If the class contains no initiators, an implicit one is defined which contains a  call to super().   This is echoed by the code

    5. .method public <init>()V
         .limit stack 1
         aload_0
         invokespecial java/lang/Object/<init>()V
         return
      .end method
      Let's defer  the details until I explain objects.  (We really don't need this, since we don't create any objects.  But it doesn't hurt to copy it in.  So we do.)
       
    6. We can now start to write our methods. We tell the assembler that we are doing this by beginning with the assembler directive .method  methodname(parms)returnValue.  For example, to define int distSq(int x, int y), the directive looks like .method public static distSq(II)I.  We see that it has two integer parameteres and returns an integer.  We use I for integers.  (We can also have Byte, Character, Double, Float, J-long (I have no idea), Short,  and Z-boolean (?).  If a function does not return a value we would end the declaration with a V.  This directive will create a description of this method in the constant pool.
    7. Next we have to tell the class file how much local storage and stack space the method will need by

    8.        .limit locals howManyLocals
          .limit stack  howBigTheStackWillGet
      You will have to calculate this yourself; if you miscalculate, most likely you will get a runtime error. (Jasmine has a default size of 1 for each limit, but it's wiser to be explicit.)  After writing the code, I calculated that distSq should have
              .limit locals 2
              .limit stack  3
       
    9. You can now write the code for the function.  For readability, I recommend
    10. End the method with an .end method directive.

    11.  
  6. Loading large constants.  Just do ldc A_LARGE_CONSTANT. This tells the assembler to do do two things:  (1) create a constant int the pool that contains this number; (2) generate the ldc instruction with the index to this constant.  The assembler really does all the dirty work for you.  In the method millionX(int x)which returns 1000000*x, I ldc  1000000 to push 1000000 onto the stack.
  7. Calling a static method.  Use invokestatic path/Class/method(args)return_value.  In int distSq(int x, int y, int z), there is a call to distSq(x, y).  After pushing x and y onto the stack, we  invokestatic SomeFunctions/distSq(II)I.  Note that the assembler needs the name of the class for the method; it's not smart like the Java compiler.

  8. Most likely, you'll also need to include a path to the method.  The method int dist(int x, int y)calls abs(n).  This function is defined in the package java.lang.Math. Thus, we invokestatic java/lang/Math/abs(I)I.  The syntax of Jasmin requires that we replace the periods of Java by slashes.

    After the name of the method, we must give the type of the paraemeters between the parentheses followed by the return type.  This is the same as we did when we defined a method.

    Can you guess how this instruction is generated?  Remember, that it is the assembler that is creating those constants to name the function, the parameters, and the return type.
     

  9. Assembling and testing program.  After you have set up the environment, you assemble the code into the class files by the command jasmin ClassName.j.  For the example, I did

  10. D:\JAVA\JASMIN-->jasmin SomeFunctions.j
    Generated: SomeFunctions.class

    The first time, most likely you'll get cryptic error messages.  Look closely at the line causing the error.  Did you misspell something?  Did you leave out a semicolon or a period?

    I recommend that you javap -c -verbose ClassName so that you are convinced that you have correctly created a class file.

    You'll notice that my example has no main() method.  Instead I write a driver program which will call the static methods and then display the results.  (When we get to objects, you'll be able to do i/o easily in the assembler code.)  For example, I can
        System.out.println(SomeFunctions.distSq(3,4,10));
    Remember that you have to explicitly name the class that the contains the static function.

    Compile the driver program and execute its class.  If all goes well, you'll get what you are looking for.  But, at first, you may get exceptions.  Reasons can be

    1. Your limits for the stack and local variables are not enough.  You may get something like Exception in thread "main" java.lang.VerifyError: (class: SomeFunctions, method: distSq signature: (II)I) Stack size too large or Exception in thread "main" java.lang.ClassFormatError: SomeFunctions (Arguments can't fit into locals) (Do you see that the first is a verify error?  Your program was checked for how much stack space it needed and was found lacking.  In the second, the JVM realized when it tried to call your method that it did not have enough local variables.)
    2. You have a typo in the naming a class or a method that you call.  Remember that the assembler is not smart; it does not check that methods exist in a class; it just assembles what it sees.  For example, in int distSq(int x, int y, int z), there is a call to distSq(x, y).  If I misspell the call as invokestatic SomeFunctions/disstSq(II)I, the code will compile.  But when the java program is run, you'll see Exception in thread "main" java.lang.NoSuchMethodError at SomeFunctions.distSq(SomeFunctions.j).  Unfortunately, the JVM does not tell you what method it was looking for.  Good hunting.
    3. You just get the wrong answer.  This is where, in real architectures, you use a debugger so that you can execute one machine instruction at a time and examine the results.  Although the SDK does include a debugger, it  is targeted at Java code, not assembler code. It won't help you much.

    4.  You need to walk through the calculation and examine the results.  To help you, I have included in SomeFunctions.j the method void show(int x) which will print out the value of x.  How can this be useful?  As you go through a calculation you can dup the top of the stack and then invokestatic SomeFunctions/show(I)V.  Do this in every step to verify that what you think is on the top of the stack is so.  I do this in int distSq(int x, int y, int z).  When you are satisfied, comment out the code.
  11. Exercises -- to be done

Flow of Control

  1. Now that you can assemble simple statements, let's put some logic into our programs so we can branch around them.  Java has a separate type boolean; a boolean variable can have a value of either true or false.  Note that the integer 0 is not the same as false. But in the JVM, booleans are represented by integers with 0 and 1 corresponding with false and true, respectively.  Looking at ifManip()in Demo2.java and its  disassembly , we see that

  2.        boolean  flag = false;
                 flag = true;
    compiles into       0 iconst_0    ; false
                  1 istore_3
                  2 iconst_1    ; true
                  3 istore_3
     
  3. How does the Java compiler implement an if statement such as

  4.        if (x < 2) x = 0;
    1. Push the two integers onto the stack that you want to compare:

    2.   10 iload_0    ; x
       11 iconst_2   ; 2 x
    3. Compare them with one of the if_icmpXX branching instructions where XX can be eq, ne, lt, gt, le, ge:

    4.  12 if_icmpge 17  The JVM pops the numbers off the stack and compares them.  If x is greater than or equal to 2, then execution will continue at the instruction at location 17; otherwise execution continues at the next instruction at  location 15.

      Why is the test for "greater than or equal" rather than "less than"?.  The compiler wants to preserve the program structure so that the code x = 0; should  immediately follow the test.  Thus, we want to branch around the code when x >= 2.

      The coding for if (x < 2) x = 0; then looks like:

        10 iload_0        ; x
        11 iconst_2       ; 2 x
        12 if_icmpge 17   ;      if x ge 2, continue execution at 17
        15 iconst_0       ; 0
        16 istore_0       ;      x = 0
        17 ...
       

    5. If you look at the hexadecimal machine instruction for if_icmpge 17, you'll see a20005. We know that a2 will be the opcode for if_icmpge; but why 0005 instead of 0011 (which is 17 in hexadecimal)?  Recall that to access constants, the instruction contained the index of the constant in the constant pool.   So the instruction does not contain an index.  (If it did, this would be called "direct addressing". )

    6.  

       
       
       
       
       

      Then what does it contain?  What location is if_icmpge 17?  How far is that away from 17?  0005 is the offset to the instruction.

      Many CPU's have a special register call the PC (Program Counter) which contains the address of the currently executing instruction.  (Other CPU's PC contains the address of the next instruction to execute.)  The JVM's PC contains the location to the current instruction.  Program excution can be thought of as

        loop forever {
        1. fetch the instruction at the location in the PC
        2. decode the instruction
        3. execute the instruction
        4. increment the PC by the length of the instruction or by the offset if it is a branching instruction and the test is satisfied
        }
      Thus, after the if_icmpge at 12 is executed, if x >= 2 is true, the PC will contain 12 + 5 = 17.  The instruction at 17 will be executed next.  If the test is false the PC will contain 12 + 3 = 15, which is the next sequential instruction.

      Summarizing,      offset = (adresss of instruction to execute, called the target address) - PC.
      Or                      target address = PC + offset.

      The offset is a 16 bit integer.  What is the largest positive offset?
      What does a negative offset mean?  What is the largest negative offset?  What implications does this have for a size of method?
      What happens if a branching instruction contained an offset of 0?
       
       

  1. It seems that the above has avoided any discussion of booleans.  Let's see how the compiler implements

  2.     flag = (x < y);:

      17 iload_0        ; x
      18 iload_1        ; y x
      19 if_icmpge 26   ; if x ge y continue at 26
      22 iconst_1       ; else push true
      23 goto 27        ; continue at 27 -- this is the famous "go to"
      26 iconst_0       ; push false if x ge y.
                        ; In either case, the stack contains a boolean
      27 istore_3       ; flag = (x < y)

    We need the infamous goto instruction to skip around the code that pushes false onto the stack.  Remember that the operand of the gotoinstruction will contain the offset 4.  You'll see that, in most cases, branches will always be downards.  The exception will be to return to the beginning of a loop.
     
     

  3. Note that there are two paths of execution to the instruction at 27.  In either case the stack contains an  integer (really, a boolean) at instruction 27.  When the program is loaded,  the verifier will check to see that, for every path to a particular instruction, the stack and local variables are equivalent.   That is, if the stack contains 3 integers along one path to an instruction, it must also contain 3 integers along the other path.

  4.    Method void dumb()
       0 iconst_0
       1 iconst_1
       2 if_icmpge 6
       5 iconst_1
       6 return
    In this example, if the branch is taken, the stack will be empty at location 6.  If it isn't, the stack will contain the constant.  When the class is loaded, the message
    Exception in thread "main" java.lang.VerifyError: (class: Boom, method: dumb signature: ()V) Inconsistent stack height 1 != 0 is displayed.
     
  5. Let's look at the bytecodes for

  6.            if (!flag)
            y++;
         else
            z--;
        28 iload_3      ; flag
      29 ifne 38      if (flag) continue at 38
      32 iinc 1 1
      35 goto 41
      38 iinc 2 -1
      41
    Note the instruction ifne needs only one operand on the stack; the other operand is always 0.  In this case, if the data on the stack is not 0 (i.e., true) then we continue execution at 41 where we decrement z.  There are 6 ifXX instructions that compare the integer on the top of the stack with 0.

    Look at the other examples.
     

  7. Coding in Jasmin.  Let's look at the sgn function, which computes the sign on an integer, in SomeFunctions.j.  Note that that a label (or symbolic address)  follows ifle rather than a number.  A label defines a location for an instruction.  We specify a label with a name followed by a colon, and nothing else.  The assembler keeps track of labels; for example pos corresponds with location 2, and zero corresponds with 10.  When the ifle is assembled, the assembler calculates the offset to be 10 - 2 = 8.   If you didn't have labels you would have to calculate the offsets yourself.  That can be very painful especially if you add or delete lines.

  8. ;   public static int sgn(int x);
    ;   returns 1, if x > 0
    ;           0, if x = 0
    ;          -1, if x < 0
    .method public static sgn(I)I
           .limit locals 1
           .limit stack  3

           ; Parameters:
           ;    0 - x

           iload_0   ; x
           dup       ; x x  keep x around for 2nd test
    ;  Labels must be on separate lines
    pos:
           ifle  zero ; x    is x > 0?
           pop        ;      yes. remove x
           iconst_1   ; 1
           goto  endif
    zero:
           ifne  neg  ;      no.  is x = 0?
           iconst_0   ; 0        yes
           goto  endif
    neg:
           iconst_m1  ; -1       no
    endif:
    ;  All paths here will have an integer on the stack
           ireturn
    .end method

    Note that the structure of the code closely follows the definition of the function; destroying structure destroys readability.  Comments describe why an instruction is coded; they should not merely echo code.  The comment at  endif acknowledges that you are aware that all paths must lead to a equivalent structure of the stack (and perhaps the local variables).
     

  9. Codng of Loops  -- to be done.
  10. Exercises -- to be done.

Arrays
  1. Up to this point, the JVM architecture shown is pretty standard.  You could think of the local variables as registers inside the CPU or as local memory.  The invokestatic instruction is similar as a typical call but with the name of the function used as the operand instead of the address of function (though we use the name in assembler code).  Most architectures also have some stack structure for manipulating or saving data.  The passing of parameters on the stack is quite common in other architectures.  Most subroutines would copy the parameters to the registers or local storage.  The comparison of data with a branch is also quite common.

  2.  
  3. Java arrays are objects; they are not consecutive storage locations in  memory.  The JVM implements objects in a heap, which is memory that the JVM, not the programmer, manages.  When the JVM creates an object with  new, a reference to that object (you could think of it as an address) in the heap is returned.  References are a datatype in the JVM; as such, they are treated much differently than integers.   You'll see this easily, since there are separate instructions for arrays and objects.  There is not much you can do with references; you can acess the data associated with the object; you can push/pop them; you can compare references with other references or with null.  That's about it. Note that you can't do arithmetic on them. (Well, you could try.  But the verifier won't let you!)  This is one of the reasons that Java is a "safe" language; it's very much unlike C++ where you can do all kinds of bad things with pointers.

  4.  
  5. While the JVM has special instructions to access arrays, the concepts involved are similar to other architecutures.  In traditional architectures, a register would contain a reference (address) to the array.  Another register would contain an offset to the value in the array.  A load instruction would specify both registers as well as the receiving register, as in ld [r2,r3],r4.  The element to be retrieved is at the address which is the sum of the reference and the offset.  In the JVM, we push a reference to the array onto the stack, and then the index to the element .  The iaload instruction pops off the index and the reference, fetches the integer from the array and puts it on the stack.
  6. In Demo3.java and its disassembly, we see how this is done.  To create an integer array with 3 integers, we push 3 onto the stack and then newarray int.  The second byte of the instruction describe the type of array to be created; you can create all the usual types of numeric arrays.  The JVM pops the 3 and creates an integer array in the heap with three elements initialized to 0.  A refrence to the array is pushed onto the stackT.  Then we astore_0 it.  Note that this instruction expects a reference to  an object on the stack.

  7.    ; int x[];
      ;  x = new int[3];
       0 iconst_3       3
       1 newarray int   x
       3 astore_0   ; stack now empty
     
  8. To save an integer in the array, we push the reference to the array by the aload_0 instruction, the index of the array, then the value we want stored.  We can then iastore it:

  9.   ;  x[2] = 4;
       4 aload_0     ; x
       5 iconst_2    ; 2 x
       6 iconst_4    ; 4 2 x
       7 iastore     ;
     
  10. Accessing an element is similar.  The instruction iaload needs the reference and theindex on the stack. But notice how clever that comiler is:

  11.   ; x[0] = x[2];
       8 aload_0     ; x
       9 iconst_0    ; 0  x  for x[0]
      10 aload_0     ; x  0  x
      11 iconst_2    ; 2  x  0 x  for x[2]
      12 iaload      ; x[2] 0 x
      13 iastore     ;
     
  12. To obtain x.length, there is the special instruction arraylength:

  13.   ; len = x.length;
      14 aload_0     ; x
      15 arraylength ; x.length
      16 istore_2
     
  14. Recall that the statement y = x; means that y references the same array as x.  Assignments of objects uses the aload and astore instructions:

  15.   17 aload_0     ;x
      18 astore_1    ;  y = x
     
  16. When testing for equality of objects, we must the instruction if_acmpne that compares  objects. Note that null is its own constant:

  17.      if ( y == x)
            y = null;
      19 aload_1         ; y
      20 aload_0         ; x y
      21 if_acmpne 26    ;    same object?
      24 aconst_null     ; null    This is not 0!
      25 astore_1        ;    yes.  Then y = null
  1. Coding in Jasmin. You can model what the compiler does or you can roll your own.  There is no additional syntax that you have to worry about.  In MoreFunctions.j, there are several examples.  You should compare the code with the code generated by TestFunctions.java.  The execution timings comparing the implementations is quite interesting.
  2. Exercises -- to be done.

  3.  

Odds and Ends
 

  1. Cute instructions designed especially for Java:
  2. Other numeric types; their arithmetic
  3. Working with objects
  4. Implementing
       


© 2000 Carl. E. Bredlau

bredlauc@mail.montclair.edu