The Machine Language of the JVM
I have moved on. But I
do
have notes, but not in tutorial form, on objects, exceptions,
etc.
Take a look at ALACA
Notes
Parts 1 - 4. (August, 2004)
Welcome to a small tutorial on the machine language of the Java
Virtual
Machine (JVM).
It's not deep. It's not complicated. It's what you would
want to know if you knew what you wanted.
Its purpose is to show how the the machine instructions support the
Java Language.
It's important to do the exercises, since concepts are be exercised
and expanded there.
(Last revised September, 2000).
Contents
How to look at assembly code
Simple machine
architecture
(at first glance).
Arithmetic expressions
Putting
integer constants onto the stack
Calculations
Calling methods and passing
parameters
Jasmin programs
Flow of Control
Arrays
Exercises
Resources
Jasmin
JVM
instruction reference
How to look at assembly code
- If you know how to write a Java program, then you are halfway
there.
Compile the program into a class file. For example, I
compiled Demo1.java
on Pegasus by % javac Demo1.java. If all goes well, an
object
file Demo1.class
is created. I can then execute this program by just naming the
class
file % java Demo1.
- For the rest of you, don't worry. If you can read C++
and
are familiar
with the concepts of classes and objects of a class, you can read a lot
of Java, or at least enough Java for this tutorial. While looking
at Demo1.java
, do note:
- Java programs reside in a class. The name of the class is
the
prefix
of the filename; the suffix is (almost) always .java.
- A simple class will usually contain a main
function
(also called
a method). This is the function that the JVM will run
first. Static
means that the method belongs to the class rather than to an object of
the class. It's the same as C++'s main() method.
Don't
worry about String[] args; that's for passing in parameters
from
the command line.
- This program consists of class methods that don't do much
other
than demonstrate
some feature of the machine code. The only object mentioned is System.out
which is used for writing to the console.
- A Java program is compiled into an object file containing
the
machine instructions,
or bytecodes, along other data. The file name is
the
name of the class (not necessarily the name of the source file) and the
suffix .class.
- To compile a Java program on Pegasus, you need to set
up an environment.
- Don't try too hard in viewing the class file; it's in
hexadecimal
and has a pretty complicated format. Instead, we'll use Java's
disassembler javap.
In this example, I viewed the class file by %
javap -c -verbose Demo1. The -c option will
show
the bytecodes for then instructions and -verbose option shows
more information about the methods. The listing will show
assembler
instructions for the bytecodes for the methods.
- Look at the exercises.
Simple machine
architecture
- For each invoked method, the JVM creates an isolated environment
in
memory
called a stack frame. It consists of
- Local variables. They are numbered 0, 1, 2, ... .
(For
methods
of an object, local variable 0 contains a reference to the object
itself
(this). Right now we are implementing class methods, so
we don't have to worry about this.) Local variables usually contain an
integer or a reference to an object. We access a local variable
by
its number.
- A stack for manipulating data. All (well, almost all)
arithmetic
is performed on the stack. When the method begins, the stack is
empty.
When the method ends, the stack better be empty. For most
statements,
the compiler will generate code that start and end with an empty
stack.
Unlike most machine architectures, there are no registers to
run
out of.
Java compilation creates a lot of constants such as large
numbers,
strings, and names of classes and methods. Constants are stored in a constant
pool that is accessible to all the methods of the class.
Constants are stored in an array-like structure; the JVM
accesses
a constant by naming the index (starting from 1) for the
constant.
It's direct addressing for constants!
There are many types of constants; and each one has its own data
structure.
It can be quite complicated. For example, a constant for a method
contains fields that contain indices to other constants. One of
them
names the class of the method; the other names the method and the type
of parameters.
The good news is that the compiler (and Java assembler) knows how
to
create the constant pool. All you have to know is that constants
exist in a pool and that they are accessible by naming the index of the
constant.
- Machine instructions can refer to (1) immediate data (part of the
instruction)
(2) local variables, (3) the stack, (4) the constant pool,
(5) a branch address.
Most instructions consist of a one byte operation code (opcode)
followed by a byte containing an operand. The operand can be an
integer
constant, an index to local storage, or an index to the constant
pool. Most of the time this is just fine; most constants are
small
and
methods usually need just a few local variables. In the rare case
that a larger constant is needed or that there are many, many local
variables
or many, many constants in the constant pool, the instruction will
contain
a two byte index or constant.
Why just use one byte for the operand when two bytes works just as
well?
To keep the program small, an important consideration since most
programs
are transmitted over the Internet.
To make instructions even smaller some instructions imply a
constant.
You'll see this below.
- Exercises
Arithmetic expressions
Putting Integer Constants
onto
the stack
- Arithmetic is performed on the stack. Data for the
calculations
can
come from (1) the instruction (immediate addressing or even implied by
the instruction), (2) local variables (direct addressing), and (3) the
constant pool (direct addressing). The JVM has many, many
instructions
for loading data onto the stack. It's best to see this in context.
- There are many types of arithmetic data, such as
integers,
floats,
doubles, and longs. To keep this tutorial short, we'll just consider
integers.
Integers are 32 bits and are stored stored big-endian in memory.
Let's start with assigning constants to local variables. Look
at the method stackManip() in Demo1.java
and, at the same time, the assembler
code. The first statement is x = 2; x
is assigned as local variable 0.
- The instruction iconst_2 at location 0 pushes a
constant 2 onto
the stack.
How long is the instruction? (This is a tutorial, so try to
answer
the question now.)
The constant is implied by the instruction!
There are separate machine instructions for pushing 0 through 5 and
-1 onto the stack.
- Visit the Jasmine
site to see description of the iconst_<n>
machine
instructions.
- We save the data at the top of the stack (pop the stack) into
local
variable
0 by the instruction istore_0 at location 1.
How long is the instruction?
What are the other istore_<n> instructions?
- Let's see how y = 6; is implemented:
bipush 6
istore_1
How long is the instruction?
The second byte contains the constant 6. Find the description
of this instruction. What are the largest and smallest integers?
Do you see how y is the second local constant?
- Onto bigger constants: To implement z =
200;
we now
need two push a 2 byte constant. The instruction sipush 200
does just that -- s(hort) i(nteger) push.
How long is the instruction?
How big can the constants be?
- Even bigger constants: To implement u = 2000000; the
designers
could have used a 5 byte instruction. They thought better.
Instead, put large constants into the constant pool rather than within
the instruction. Thus we see: ldc #1 <Integer 2000000>
which copies (loads the constant) 2000000 found at the first index of
the
constant pool onto the stack.
How long is the instruction?
What other constants can be loaded from the constant pool?
- Exercises
Calculations
- The strategy is to (1) push the operands on the stack, (2)
perform the
operation on the two operands -- the operation removes the operands
from
the stack and puts the result on the top of the stack, (3) and then
finally
pop the stack with the answer. Let's start with u
= x + u; The compiler generated
12 iload_0
13 iload_3
14 iadd
15 istore_3
First (1) x is pushed, then u. (2) Now we perform the
addition.
What is on the top of the stack? x + u. How long is the
instruction?
(3) We now save it.
- What arithmetic operations can you do on the
integers? The
usual
ones: iadd isub ineg imul idiv irem (taking the remainder)
- You should be able to read a Java expression from left to
right
and perform
the calculation on the stack. For v = 3*(x/y - 2/(u+y))
we get the compiled code (I adding comments starting with the
semicolon):
16 iconst_3
17 iload_0 ; x
18 iload_1 ; y
19 idiv
20 iconst_2
21 iload_3 ;u
22 iload_1 ;y
23 iadd
24 idiv
25 isub
26 imul
27 istore 4 ; v
What gets pushed first?
We don't have the second operand for the multiplication. So let's
keep pushing. What two variables get pushed next?
We can now divide. How long is the instruction? What is now on
the stack after the division? top -> x/y, 3.
We still can't multiply. And we can't yet subtract. What
is now pushed?
We still can't multiply, subtract, or divide. What variables
are now pushed? What does the stack now look like?
We now can add. What does the stack now look like?
We now can divide. What does the stack now look like?
We now can subtract. What does the stack now look like?
We can finally perform the multiplication. The stack just
contains
the value of the expression.
We now can pop the stack. How long is the istore
instruction?
What does 4 represent? How many variables can we save into using istore?
- For every rule there is an exception. How would you
code x++;
using a stack? How many instructions does it take? How many bytes
are used?
Since this is a common operation, the JVM designers created iinc
0 1. How long is the instruction? What do the 0 and 1
signify? What is the range of local variables? What is the
range of constants (since this instruction can be used for x--)?
- The JVM also supports the logical instructions ishl
ishr (the
arithmetic kind) iushr (the logical kind; Java uses the
operator >>> for
this) iand ior ixor. Do you see how y
<<=
3; is compiled?
- Verification. The examples you have seen have
been
generated
by the Java compiler; we trust that its code is correct and will run on
any JVM. However, when the program is loaded, no such
trust
is assumed. The program is examined instruction by instruction before
execution to make sure that the code will be able to execute. For
example, the iadd instruction must have two integers on
the stack. A program that has only one integer on the stack will
get a message such as
Exception in thread "main" java.lang.VerifyError: (class: Boom,
method: dumb sig
nature: ()V) Unable to pop operand off an empty stack.
The program is then terminated.
Be aware that all code will be examined by the Java
verifier.
This will help you when you write your own assembly code since the
verifier
is checking the structure of your program. It's not as powerful
as
the compiler, but it's much better than watching your program just blow
up.
- Exercises.
Calling methods and
passing
parameters
- To simplify things, we'll just call static methods (they don't
belong
to
any object) with integer parameters. For the expression v =
add(x,y,-1); in Demo1.java,
we see a call to the method add which requires 3
parameters.
As in C, Java passes parameters by value. When add
begins execution, it will be able to access 3 integers; it has no idea
where they came from. add will do its calculation and return
an
integer which will be saved in v.
- So how is it done? stackManip pushes the values
of the
parameters
onto the stack. Thus we have the assembler
code
43 iload_0 ; push x
44 iload_1 ; push y
45 iconst_m1 ; push -1
Note that the last parameter is on the top of the stack and the first
parameter is the furthest down.
- We need a call instruction:
46 invokestatic #8 <Method int add(int, int, int)>
How long is this instruction? The #8 tells us to go to the constant
pool.
What do we find there? The name of the method, the type of
parameters
and the type of the return value. (Note that when the program is
loaded,
the JVM verifier will then check to make sure that 3 integers really
are
on the stack.) The JVM now knows what code to execute and where to find
the parameters. The JVM creates an environment for add;
it removes the three integers on the top of the stack and places them
in
the first 3 local variables of add. From the caller's
point
of view, the parameters have been popped. The JVM now executes the
function add
which will return a integer (this is discussed below). Upon
completion control is passed back to the caller with the return value
on
the caller's stack. stackManip pops that value into v:
49 istore 4
There are many call instructions; the one that we use here is for
static
methods; there are other instructions for calling methods of an object.
- Let's see how the the method public static int add(int
first, int second,
int third)is implemented. The first three local
variables 0 through 2 will contain the value of the parameters when add
begins
execution. For the code above, local variable 0 will contain x, 1
will contain y, and 2 will contain -1. add may use more
local storage for intermediate results; we see that sum is
local
variable 3.
At this point you can do the calculation:
0 iload_0 ; first
1 iload_1 ; second
2 iadd ; stack now has
first + second
3 iload_2 ; third
4 iadd ; stack has
first
+ second + third
5 istore_3 ; sum = first + second + third
(stack
is empty)
To return a value we must put it on the stack (this is add's
stack; add knows nothing about the caller).
After
it does this, it issues the return integer instruction. So we
have
6 iload_3 ; sum
7 ireturn
The method is complete; the JVM takes the integer value on add's
stack and copy's it over to the caller's stack. add's
environment
can now be recycled.
- Note that stackManip() has no parameters so there is no
local
storage for that. Since it does not return a value, it used the
instruction return.
How many return instructions are there and why?
- At the beginning of assembler
code we see that the class file has a summary of the methods:
public static
void stackManip();
/* Stack=5, Locals=6,
Args_size=0 */
public static int add(int, int, int);
/* Stack=2, Locals=4,
Args_size=3 */
We and the JVM see the method names, parameters, and return type;
but
we also see how big the stack can get and how many local variables
there
are. This is for verification and run time checking; if an
instruction pushes too much onto the stack or tries to access a local
variable
beyond the number stated, the program will complain!
- You'll also see code that looks like
Method Demo1()
0 aload_0
1 invokespecial #7 <Method java.lang.Object()>
4 return
This code is used for initializing an object. Let's worry about
what it means later when we discuss objects.
- Exercises -- to be done
Jasmin programs
- By now you have been looking at what the compiler does to your
code.
Let's take a small break and show you how you can write your own
bytecodes.
This will show you how smart and stupid the compiler is. It will
also show you how you have to worry about details.
- We'll use the Jasmin assembler. The format of
jasmin
source
code is similar to Java, with some additions so that there is enough
information
for the creation of the class file. Make sure you have first set
up the environment.
- Look at SomeFunctions.j
which defines the class file SomeFunctions. This class contains
static
methods such as int distSq(int x, int y), int distSq(int
x,
int y, int z), int dist(int x, int y), int
millionX(int
x).
- The semicolon ; begins a comment and is similar to //
in Java.
- We declare the class by
.class public SomeFunctions
.super java/lang/Object
This is the same as Java's public class SomeFunctions extends
Object.
Note the path to the class Object.
At this point we don't have to worry about class and object
attributes.
- When a object is created, it is initialized. If the class
contains
no initiators, an implicit one is defined which contains a call
to super().
This is echoed by the code
.method public <init>()V
.limit stack 1
aload_0
invokespecial java/lang/Object/<init>()V
return
.end method
Let's defer the details until I explain objects. (We really
don't need this, since we don't create any objects. But it
doesn't
hurt to copy it in. So we do.)
- We can now start to write our methods. We tell the
assembler
that we are
doing this by beginning with the assembler directive .method
methodname(parms)returnValue. For example, to define int
distSq(int x, int y), the directive looks like .method public
static distSq(II)I. We see that it has two integer
parameteres
and returns an integer. We use I for integers.
(We
can also have Byte, Character, Double, Float,
J-long
(I have no idea), Short, and Z-boolean
(?).
If a function does not return a value we would end the declaration with
a V. This directive will create a description of this
method
in the constant pool.
- Next we have to tell the class file how much local storage and
stack
space
the method will need by
.limit locals howManyLocals
.limit stack howBigTheStackWillGet
You will have to calculate this yourself; if you miscalculate, most
likely you will get a runtime error. (Jasmine has a default size of 1
for
each limit, but it's wiser to be explicit.) After writing the
code,
I calculated that distSq should have
.limit locals 2
.limit stack 3
- You can now write the code for the function. For
readability, I recommend
- Clearly document the definition of the function as if you
were writing
it in Java. It's much harder figuring out what is happening if
you
don't.
- Describe the parameters (the first local variables)
with the
names
of the parameters. No one is going to remember that local
variable
0 is x.
- Define meaningful names for your local variables.
- When using local variables, don't change the context.
If a local
variable is used for x, don't decide that it can then be used for
i.
Define another local variable for i. Storage is pretty cheap
these
days.
- When manipulating the stack, describe what's happening.
Your code
should look something like
; Calculate x*x + y*y
;
Stack
iload_0 ; x
dup
; x x
imul
; x*x
iload_1 ; y
x*x
dup
; y y x*x
imul
; y*y x*x
iadd
; y*y + x*x
ireturn
Most people will read the comments rather than the code!
- End the method with an .end method directive.
- Loading large constants. Just do ldc
A_LARGE_CONSTANT. This
tells the assembler to do do two things: (1) create a constant
int
the pool that contains this number; (2) generate the ldc
instruction
with the index to this constant. The assembler really does all
the
dirty work for you. In the method millionX(int x)which
returns
1000000*x, I ldc 1000000 to push 1000000 onto the stack.
- Calling a static method. Use invokestatic
path/Class/method(args)return_value.
In int distSq(int x, int y, int z), there is a call to distSq(x,
y). After pushing x and y onto the stack, we invokestatic
SomeFunctions/distSq(II)I. Note that the assembler needs the
name of the class for the method; it's not smart like the Java compiler.
Most likely, you'll also need to include a path to the method.
The method int dist(int x, int y)calls abs(n).
This function is defined in the package java.lang.Math. Thus, we invokestatic
java/lang/Math/abs(I)I. The syntax of Jasmin requires that
we
replace the periods of Java by slashes.
After the name of the method, we must give the type of the
paraemeters
between the parentheses followed by the return type. This is the
same as we did when we defined a method.
Can you guess how this instruction is generated? Remember,
that
it is the assembler that is creating those constants to name the
function,
the parameters, and the return type.
- Assembling and testing program. After you have set
up the environment, you assemble the code into the class files by
the
command jasmin ClassName.j. For the example, I did
D:\JAVA\JASMIN-->jasmin SomeFunctions.j
Generated: SomeFunctions.class
The first time, most likely you'll get cryptic error
messages.
Look closely at the line causing the error. Did you
misspell
something? Did you leave out a semicolon or a period?
I recommend that you javap -c -verbose ClassName so that
you
are convinced that you have correctly created a class file.
You'll notice that my example has no main()
method. Instead
I write a driver
program which will call the static methods and then display the
results.
(When we get to objects, you'll be able to do i/o easily in the
assembler
code.) For example, I can
System.out.println(SomeFunctions.distSq(3,4,10));
Remember that you have to explicitly name the class that the contains
the static function.
Compile the driver program and execute its class. If all
goes
well, you'll get what you are looking for. But, at first, you may
get exceptions. Reasons can be
- Your limits for the stack and local variables are not
enough. You
may get something like Exception in thread "main"
java.lang.VerifyError:
(class: SomeFunctions, method: distSq signature: (II)I) Stack size too
large or Exception in thread "main"
java.lang.ClassFormatError:
SomeFunctions (Arguments can't fit into locals) (Do you see that
the
first is a verify error? Your program was checked for how much
stack
space it needed and was found lacking. In the second, the JVM
realized
when it tried to call your method that it did not have enough local
variables.)
- You have a typo in the naming a class or a method that you
call.
Remember that the assembler is not smart; it does not check that
methods
exist in a class; it just assembles what it sees. For example, in
int
distSq(int x, int y, int z), there is a call to distSq(x, y).
If I misspell the call as invokestatic SomeFunctions/disstSq(II)I,
the code will compile. But when the java program is run, you'll
see Exception
in thread "main" java.lang.NoSuchMethodError at
SomeFunctions.distSq(SomeFunctions.j).
Unfortunately, the JVM does not tell you what method it was looking
for.
Good hunting.
- You just get the wrong answer. This is where, in real
architectures,
you use a debugger so that you can execute one machine instruction at a
time and examine the results. Although the SDK does include a
debugger,
it is targeted at Java code, not assembler code. It won't help
you
much.
You need to walk through the calculation and examine the
results.
To help you, I have included in SomeFunctions.j
the method void show(int x) which will print out the value of
x. How can this be useful? As you go through a calculation
you can dup the top of the stack and then invokestatic
SomeFunctions/show(I)V.
Do this in every step to verify that what you think is on the top of
the
stack is so. I do this in int distSq(int x, int y, int z).
When you are satisfied, comment out the code.
- Exercises -- to be done
Flow of Control
- Now that you can assemble simple statements, let's put some logic
into
our programs so we can branch around them. Java has a separate
type boolean;
a boolean variable can have a value of either true
or false.
Note that the integer 0 is not the same as false.
But
in the JVM, booleans are represented by integers with 0 and 1
corresponding
with false and true, respectively. Looking at ifManip()in
Demo2.java
and its disassembly
, we see that
boolean flag = false;
flag = true;
compiles into 0
iconst_0
; false
1 istore_3
2 iconst_1 ; true
3 istore_3
- How does the Java compiler implement an if statement
such as
if (x < 2) x = 0;
- Push the two integers onto the stack that you want to compare:
10 iload_0 ; x
11 iconst_2 ; 2 x - Compare them with
one of the if_icmpXX branching
instructions
where XX can be eq, ne, lt, gt,
le, ge:
12 if_icmpge 17 The JVM pops the numbers off
the
stack and compares them. If x is greater than or equal
to
2, then execution will continue at the instruction at location 17;
otherwise
execution continues at the next instruction at location 15.
Why is the test for "greater than or equal" rather than "less
than"?.
The compiler wants to preserve the program structure so that the code x
= 0; should immediately follow the test. Thus, we
want
to branch around the code when x >= 2.
The coding for if (x < 2) x = 0; then looks like:
10 iload_0
; x
11 iconst_2 ; 2 x
12 if_icmpge 17
;
if x ge 2, continue execution at 17
15 iconst_0 ; 0
16 istore_0
;
x = 0
17 ...
- If you look at the hexadecimal machine instruction for if_icmpge
17,
you'll see a20005. We know that a2 will be the
opcode
for if_icmpge; but why 0005 instead of 0011 (which
is 17 in hexadecimal)? Recall that to access constants, the
instruction
contained the index of the constant in the constant pool.
So
the instruction does not contain an index. (If it did, this would
be called "direct addressing". )
Then what does it contain? What location is if_icmpge
17?
How far is that away from 17? 0005 is the offset
to the instruction.
Many CPU's have a special register call the PC (Program
Counter)
which contains the address of the currently executing
instruction.
(Other CPU's PC contains the address of the next instruction to
execute.)
The JVM's PC contains the location to the current instruction.
Program
excution can be thought of as
loop forever {
- fetch the instruction at the location in the PC
- decode the instruction
- execute the instruction
- increment the PC by the length of the instruction or by the
offset if
it
is a branching instruction and the test is satisfied
}
Thus, after the if_icmpge at 12 is executed, if x >= 2
is true, the PC will contain 12 + 5 = 17. The instruction at 17
will
be executed next. If the test is false the PC will contain 12 + 3
= 15, which is the next sequential instruction.
Summarizing, offset = (adresss of
instruction
to execute, called the target address) - PC.
Or
target address = PC + offset.
The offset is a 16 bit integer. What is the largest
positive offset?
What does a negative offset mean? What is the largest negative
offset? What implications does this have for a size of method?
What happens if a branching instruction contained an offset of 0?
- It seems that the above has avoided any discussion of
booleans.
Let's
see how the compiler implements
flag = (x < y);:
17 iload_0 ; x
18 iload_1 ; y
x
19 if_icmpge 26 ; if x ge y continue at 26
22 iconst_1 ; else
push
true
23 goto 27 ;
continue
at 27 -- this is the famous "go to"
26 iconst_0 ; push
false
if x ge y.
; In either case, the stack contains a boolean
27 istore_3 ; flag =
(x < y)
We need the infamous goto instruction to skip around
the code
that pushes false onto the stack. Remember that the operand of
the gotoinstruction
will contain the offset 4. You'll see that, in most cases,
branches
will always be downards. The exception will be to return to the
beginning
of a loop.
- Note that there are two paths of execution to the instruction at
27.
In either case the stack contains an integer (really, a boolean)
at instruction 27. When the program is loaded, the verifier
will check to see that, for every path to a particular instruction, the
stack and local variables are equivalent. That is, if the
stack
contains 3 integers along one path to an instruction, it must also
contain
3 integers along the other path.
Method void dumb()
0 iconst_0
1 iconst_1
2 if_icmpge 6
5 iconst_1
6 return
In this example, if the branch is taken, the stack will be empty at
location 6. If it isn't, the stack will contain the
constant.
When the class is loaded, the message
Exception in thread "main" java.lang.VerifyError: (class: Boom,
method: dumb signature: ()V) Inconsistent stack height 1 != 0 is
displayed.
- Let's look at the bytecodes for
if
(!flag)
y++;
else
z--;
28 iload_3 ; flag
29 ifne 38 if (flag)
continue
at 38
32 iinc 1 1
35 goto 41
38 iinc 2 -1
41
Note the instruction ifne needs only one operand on the
stack;
the other operand is always 0. In this case, if the data on the
stack
is not 0 (i.e., true) then we continue execution at 41 where we
decrement
z. There are 6 ifXX instructions that compare the
integer
on the top of the stack with 0.
Look at the other examples.
- Coding in Jasmin. Let's look at the sgn function,
which computes
the sign on an integer, in SomeFunctions.j.
Note that that a label (or symbolic address)
follows ifle
rather than a number. A label defines a location for an
instruction.
We specify a label with a name followed by a colon, and nothing
else.
The assembler keeps track of labels; for example pos
corresponds
with location 2, and zero corresponds with 10. When the
ifle is
assembled, the assembler calculates the offset to be 10 - 2 =
8.
If you didn't have labels you would have to calculate the offsets
yourself.
That can be very painful especially if you add or delete lines.
; public static int sgn(int x);
; returns 1, if x > 0
; 0,
if x = 0
; -1, if
x < 0
.method public static sgn(I)I
.limit locals 1
.limit stack 3
; Parameters:
; 0 - x
iload_0 ; x
dup
; x x keep x around for 2nd test
; Labels must be on separate lines
pos:
ifle zero ;
x
is x > 0?
pop
; yes. remove x
iconst_1 ; 1
goto endif
zero:
ifne neg
;
no. is x = 0?
iconst_0 ;
0
yes
goto endif
neg:
iconst_m1 ;
-1
no
endif:
; All paths here will have an integer on the stack
ireturn
.end method
Note that the structure of the code closely follows the definition
of
the function; destroying structure destroys readability. Comments
describe why an instruction is coded; they should not merely
echo
code. The comment at endif acknowledges that you
are
aware that all paths must lead to a equivalent structure of the stack
(and
perhaps the local variables).
- Codng of Loops -- to be done.
- Exercises -- to be done.
Arrays
- Up to this point, the JVM architecture shown is pretty
standard.
You could think of the local variables as registers inside the CPU or
as
local memory. The invokestatic instruction is similar as a
typical
call but with the name of the function used as the operand instead of
the
address of function (though we use the name in assembler code).
Most
architectures also have some stack structure for manipulating or saving
data. The passing of parameters on the stack is quite common in
other
architectures. Most subroutines would copy the parameters to the
registers or local storage. The comparison of data with a branch
is also quite common.
- Java arrays are objects; they are not consecutive storage
locations in
memory. The JVM implements objects in a heap, which is
memory
that the JVM, not the programmer, manages. When the JVM creates
an
object with new, a reference to that object
(you
could think of it as an address) in the heap is returned.
References
are a datatype in the JVM; as such, they are treated much differently
than
integers. You'll see this easily, since there are separate
instructions for arrays and objects. There is not much you can do
with references; you can acess the data associated with the object; you
can push/pop them; you can compare references with other references or
with null. That's about it. Note that you can't do
arithmetic
on them. (Well, you could try. But the verifier won't let
you!)
This is one of the reasons that Java is a "safe" language; it's very
much
unlike C++ where you can do all kinds of bad things with pointers.
- While the JVM has special instructions to access arrays,
the
concepts involved
are similar to other architecutures. In traditional
architectures,
a register would contain a reference (address) to the array.
Another
register would contain an offset to the value in the array. A
load
instruction would specify both registers as well as the receiving
register,
as in ld [r2,r3],r4. The element to be retrieved is at
the
address which is the sum of the reference and the offset. In the
JVM, we push a reference to the array onto the stack, and then the index
to
the element . The iaload instruction pops off the index
and the reference, fetches the integer from the array and puts it on
the
stack.
- In Demo3.java
and its disassembly,
we see how this is done. To create an integer array with 3
integers,
we push 3 onto the stack and then newarray int. The
second
byte of the instruction describe the type of array to be created; you
can
create all the usual types of numeric arrays. The JVM pops the 3
and creates an integer array in the heap with three elements
initialized
to 0. A refrence to the array is pushed onto the stackT.
Then
we astore_0 it. Note that this instruction expects a
reference
to an object on the stack.
; int x[];
; x = new int[3];
0 iconst_3 3
1 newarray int x
3 astore_0 ; stack now empty
- To save an integer in the array, we push the reference to
the
array by
the aload_0 instruction, the index of the array, then the
value
we want stored. We can then iastore it:
; x[2] = 4;
4 aload_0 ; x
5 iconst_2 ; 2 x
6 iconst_4 ; 4 2 x
7 iastore ;
- Accessing an element is similar. The instruction iaload
needs
the reference and theindex on the stack. But notice how clever that
comiler
is:
; x[0] = x[2];
8 aload_0 ; x
9 iconst_0 ; 0 x for
x[0]
10 aload_0 ; x 0 x
11 iconst_2 ; 2 x 0 x
for x[2]
12 iaload ; x[2] 0 x
13 iastore ;
- To obtain x.length, there is the special instruction arraylength:
; len = x.length;
14 aload_0 ; x
15 arraylength ; x.length
16 istore_2
- Recall that the statement y = x; means that y
references the same
array as x. Assignments of objects uses the aload and astore
instructions:
17 aload_0 ;x
18 astore_1 ; y = x
- When testing for equality of objects, we must the
instruction if_acmpne that
compares objects. Note that null is its own constant:
if ( y == x)
y = null;
19 aload_1
; y
20 aload_0
; x y
21 if_acmpne 26 ; same
object?
24 aconst_null ;
null
This is not 0!
25 astore_1
;
yes. Then y = null
- Coding in Jasmin. You can model what the compiler does
or
you can
roll your own. There is no additional syntax that you have to
worry
about. In MoreFunctions.j,
there are several examples. You should compare the code with the code
generated by TestFunctions.java.
The execution
timings comparing the implementations is quite interesting.
- Exercises -- to be done.
Odds and Ends
- Cute instructions designed especially for Java:
- jsr and ret
- dup_x, swap
- ifnull
- switch instructions
- Other numeric types; their arithmetic
- short and byte become int
- long
- double
- conversions x2y, where x,y = b,c,d,f,i,d,l,s
- arrays
- Working with objects
- accessing attributes
- calling methods
- arrays of objects
- Implementing
- interface
- super()
- exceptions
© 2000 Carl. E. Bredlau
bredlauc@mail.montclair.edu