|
Table of Content | Chapter Eight (Part 5) |
| CHAPTER
EIGHT: MASM: DIRECTIVES & PSEUDO-OPCODES (Part 4) |
|
| 8.9 -
The END Directive 8.10 - Variables 8.11 - Label Types 8.11.1 - How to Give a Symbol a Particular Type 8.11.2 - Label Values |
8.11.3 -
Type Conflicts 8.12 - Address Expressions 8.12.1 - Symbol Types and Addressing Modes 8.12.2 - Arithmetic and Logical Operators 8.12.3 - Coercion |
| 8.9 The END Directive | |
The end directive terminates an assembly
language source file. In addition to telling MASM that it has reached the end of an
assembly language source file
the end directive's optional operand tells
MS-DOS where to transfer control when the program begins execution; that is
you specify
the name of the main procedure as an operand to the end directive. If the end
directive's operand is not present
MS-DOS will begin execution starting at the first byte
in the .exe file. Since it is often inconvenient to guarantee that your main program
begins with the first byte of object code in the .exe file
most programs specify a
starting location as the operand to the end directive. If you are using the
SHELL.ASM file as a skeleton for your assembly language programs
you will notice that the
end directive already specifies the procedure main as the
starting point for the program.
If you are using separate assembly and you're linking together several different object code files (see "Managing Large Programs") only one module can have a main program. Likewise only one module should specify the starting location of the program. If you specify more than one starting location you will confuse the linker and it will generate an error.
Global variable declarations use the byte/sbyte/db
word/sword/dw
dword/sdword/dd
qword/dq
and tbyte/dt pseudo-opcodes.
Although you can place your variables in any segment (including the code segment)
most
beginning assembly language programmers place all their global variables in a single data
segment.
A typical variable declaration takes the form:
varname byte initial_value
Varname is the name of the variable you're
declaring and initial_value is the initial value you want that variable to
have when the program begins execution. "?" is a special initial value. It means
that you don't want to give a variable an initial value. When DOS loads a program
containing such a variable into memory
it does not initialize this variable to any
particular value.
The declaration above reserves storage for a single byte.
This could be changed to any other variable type by simply changing the byte mnemonic
to some other appropriate pseudo-opcode.
For the most part
this text will assume that you declare
all variables in a data segment
that is
a segment that the 80x86's ds
register will point at. In particular
most of the programs herein will place all
variables in the DSEG segment (CSEG is for code
DSEG
is for data
and SSEG is for the stack). See the SHELL.ASM program in Chaper
Four for more details on these segments.
Since Chapter Five covers the declaration of variables data types structures arrays and pointers in depth this chapter will not waste any more time discussing this subject. Refer to Chapter Five for more details.
One unusual feature of Intel syntax assemblers (like MASM) is that they are strongly typed. A strongly typed assembler associates a certain type with symbols declared appearing in the source file and will generate a warning or an error message if you attempt to use that symbol in a context that doesn't allow its particular type. Although unusual in an assembler most high level languages apply certain typing rules to symbols declared in the source file. Pascal of course is famous for being a strongly typed language. You cannot in Pascal assign a string to a numeric variable or attempt to assign an integer value to a procedure label. Intel in designing the syntax for 8086 assembly language decided that all the reasons for using a strongly typed language apply to assembly language as well as Pascal. Therefore standard Intel syntax 80x86 assemblers like MASM impose certain type restrictions on the use of symbols within your assembly language programs.
8.11.1 How to Give a Symbol a Particular Type
Symbols
in an 80x86 assembly language program
may be one
of eight different primitive types: byte
word
dword
qword
tbyte
near
far
and abs
(constant). Anytime you define a label with the byte
word
dword
qword
or tbyte
pseudo-opcodes
MASM associates the type of that pseudo-opcode with the label. For
example
the following variable declaration will create a symbol of type byte:
BVar byte ?
Likewise the following defines a dword symbol:
DWVar dword ?
Variable types are not limited to the primitive types built
into MASM. If you create your own types using the typedef or struct
directives MASM will associate those types with any associated variable declarations.
You can define near symbols (also known as statement
labels) in a couple of different ways. First
all procedure symbols declared with the proc
directive (with either a blank operand field or near in the operand
field) are near symbols. Statement labels are also near symbols. A statement label takes
the following form:
label: instr
Instr represents an 80x86 instruction. Note
that a colon must follow the symbol. It is not part of the symbol
the colon informs the
assembler that this symbol is a statement label and should be treated as a near typed
symbol.
Statement labels are often the targets of jump and loop instructions. For example consider the following code sequence:
mov cx 25 Loop1: mov ax cx call PrintInteger loop Loop1
The loop instruction decrements the cx register
and transfers control to the instruction labelled by Loop1 until cx becomes
zero.
Inside a procedure
statement labels are local. That is
the scope of statement labels inside a procedure are visible only to code inside that
procedure. If you want to make a symbol global to a procedure
place two colons after the
symbol name. In the example above
if you needed to refer to Loop1 outside of
the enclosing procedure
you would use the code:
mov cx 25 Loop1:: mov ax cx call PrintInteger loop Loop1
Generally
far symbols are the targets of jump and call
instructions. The most common method programmers use to create a far label is to place far
in the operand field of a proc directive. Symbols that are simply constants
are normally defined with the equ directive. You can also declare symbols
with different types using the equ and extrn/extern/externdef directives.
An explanation of the extrn directives appears in the section "Managing
Large Programs".
If you declare a numeric constant using an equate MASM assigns the type abs (absolute or constant) to the system. Text and string equates are given the type text. You can also assign an arbitrary type to a symbol using the equ directive see "Type Operators" for more details.
Whenever you define a label using a directive or
pseudo-opcode
MASM gives it a type and a value. The value MASM gives the label is usually
the current location counter value. If you define the symbol with an equate the equate's
operand usually specifies the symbol's value. When encountering the label in an operand
field
as with the loop instruction above
MASM substitutes the label's value
for the label.
Since the 80x86 supports strongly typed symbols the next question to ask is "What are they used for?" In a nutshell strongly typed symbols can help verify proper operation of your assembly language programs. Consider the following code sections:
DSEG segment public 'DATA' . . . I byte ? . . . DSEG ends CSEG segment public 'CODE' . . . mov ax I . . . CSEG ends end
The mov instruction in this example is
attempting to load the ax register (16 bits) from a byte sized variable. Now
the 80x86 microprocessor is perfectly capable of this operation. It would load the al
register from the memory location associated with I and load the ah
register from the next successive memory location (which is probably the L.O. byte
of some other variable). However
this probably wasn't the original intent. The person who
wrote this code probably forgot that I is a byte sized variable and assumed
that it was a word variable - which is definitely an error in the logic of the program.
MASM would never allow an instruction like the one above to be assembled without generating a diagnostic message. This can help you find errors in your programs particularly difficult-to-find errors. On occasion advanced assembly language programmers may want to execute a statement like the one above. MASM provides certain coercion operators that bypass MASM's safety mechanisms and allow illegal operations (see "Coercion").
An address expression is an algebraic expression that
produces a numeric result that MASM merges into the displacement field of an instruction.
An integer constant is probably the simplest example of an address expression. The
assembler simply substitutes the value of the numeric constant for the specified operand.
For example
the following instruction fills the immediate data fields of the mov instruction
with zeros:
mov ax 0
Another simple form of an addressing mode is a symbol. Upon encountering a symbol MASM substitutes the value of that symbol. For example the following two statements emit the same object code as the instruction above:
Value equ 0 mov ax Value
An address expression however can be much more complex than this. You can use various arithmetic and logical operators to modify the basic value of some symbols or constants.
Keep in mind that MASM computes address expressions during
assembly
not at run time. For example
the following instruction does not load ax
from location Var and add one to it:
mov ax Var1+1
Instead
this instruction loads the al register
with the byte stored at the address of Var1 plus one and then loads the ah
register with the byte stored at the address of Var1 plus two.
Beginning assembly language programmers often confuse computations done at assembly time with those done at run time. Take extra care to remember that MASM computes all address expressions at assembly time!
8.12.1 Symbol Types and Addressing Modes
Consider the following instruction:
jmp Location
Depending on how the label Location is
defined
this jmp instruction will perform one of several different
operations. If you'll look back at the chapter on the 80x86 instruction set
you'll notice
that the jmp instruction takes several forms. As a recap
they are
jmp label (short) jmp label (near) jmp label (far) jmp reg (indirect near through register) jmp mem/reg (indirect near through memory) jmp mem/reg (indirect far thorugh memory)
Notice that MASM uses the same mnemonic (jmp)
for each of these instructions; how does it tell them apart? The secret lies with the
operand. If the operand is a statement label within the current segment
the assembler
selects one of the first two forms depending on the distance to the target instruction. If
the operand is a statement label within a different segment
then the assembler selects jmp
(far) label. If the operand following the jmp instruction is a
register
then MASM uses the indirect near jmp and the program jumps to the
address in the register. If a memory location is selected
the assembler uses one of the
following jumps:
word/sword/dw
dword/sdword/dd
An error results if you've used byte/sbyte/db
qword/dq
or tbyte/dt or some other type.
If you've specified an indirect address
e.g.
jmp
[bx]
the assembler will generate an error because it cannot determine if bx
is pointing at a word or a dword variable. For details on how you specify the size
see
the section on coercion in this chapter.
8.12.2 Arithmetic and Logical Operators
MASM recognizes several arithmetic and logical operators. The following tables provide a list of such operators:
| Operator | Syntax | Description |
|---|---|---|
| + | +expr | Positive (unary) |
| - | -expr | Negation (unary) |
| + | expr + expr | Addition |
| - | expr - expr | Subtraction |
| * | expr * expr | Multiplication |
| / | expr / expr | Division |
| MOD | expr MOD expr | Modulo (remainder) |
| [ ] | expr [ expr ] | Addition (index operator) |
| Operator | Syntax | Description |
|---|---|---|
| SHR | expr SHR expr | Shift right |
| SHL | expr SHL expr | Shift left |
| NOT | NOT expr | Logical (bit by bit) NOT |
| AND | expr AND expr | Logical AND |
| OR | expr OR expr | Logical OR |
| XOR | expr XOR expr | Logical XOR |
| Operator | Syntax | Description |
|---|---|---|
| EQ | expr EQ expr | True (0FFh) if equal false (0) otherwise |
| NE | expr NE expr | True (0FFh) if not equal false (0) otherwise |
| LT | expr LT expr | True (0FFh) if less false (0) otherwise |
| LE | expr LE expr | True (0FFh) if less or equal false (0) otherwise |
| GT | expr GT expr | True (0FFh) if greater false (0) otherwise |
| GE | expr GE expr | True (0FFh) if greater or equal false (0) otherwise |
You must not confuse these operators with 80x86
instructions! The addition operator adds two values together
their sum becomes an operand
to an instruction. This addition is performed when assembling the program
not at run
time. If you need to perform an addition at execution time
use the add or adc
instructions.
You're probably wondering "What are these operators used for?" The truth is not much. The addition operator gets used quite a bit the subtraction somewhat the comparisons once in a while and the rest even less. Since addition and subtraction are the only operators beginning assembly language programmers regularly employ this discussion considers only those two operators and brings up the others as required throughout this text.
The addition operator takes two forms: expr+expr or
expr[expr]. For example
the following instruction loads the accumulator
not from memory
location COUNT
but from the very next location in memory:
mov al COUNT+1
The assembler
upon encountering this statement
will
compute the sum of COUNT's address plus one. The resulting value is the
memory address for this instruction. As you may recall
the mov al
memory
instruction is three bytes long and takes the form:
Opcode | L. O. Displacement Byte | H. O. Displacement Byte
The two displacement bytes of this instruction contain the
sum COUNT+1.
The expr[expr] form of the addition operation
is for accessing elements of arrays. If AryData is a symbol that represents
the address of the first element of an array
AryData[5] represents the
address of the fifth byte into AryData. The expression AryData+5
produces the same result
and either could be used interchangeably
however
for arrays
the expr[expr] form is a little more self documenting. One trap to avoid: expr1[expr2][expr3]
does not automatically index (properly) into a two dimensional array for you. This simply
computes the sum expr1+expr2+expr3.
The subtraction operator works just like the addition operator except it computes the difference rather than the sum. This operator will become very important when we deal with local variables in Chapter 11.
Take care when using multiple symbols in an address expression. MASM restricts the operations you can perform on symbols to addition and subtraction and only allows the following forms:
Expression: Resulting type: reloc + const Reloc at address specified. reloc - const Reloc at address specified. reloc - reloc Constant whose value is the number of bytes between the first and second operands. Both variables must physically appear in the same segment in the current source file.
Reloc stands for relocatable symbol or expression. This can be a variable name a statement label a procedure name or any other symbol associated with a memory location in the program. It could also be an expression that produces a relocatable result. MASM does not allow any operations other than addition and subtraction on expressions whose resulting type is relocatable. You cannot for example compute the product of two relocatable symbols.
The first two forms above are very common in assembly
language programs. Such an address expression will often consist of a single relocatable
symbol and a single constant (e.g.
"var + 1"). You won't use the
third form very often
but it is very useful once in a while. You can use this form of an
address expression to compute the distance
in bytes
between two points in your program.
The procsize symbol in the following code
for example
computes the size of Proc1:
Proc1 proc near push ax push bx push cx mov cx 10 lea bx SomeArray mov ax 0 ClrArray: mov [bx] ax add bx 2 loop ClrArray pop cx pop bx pop ax ret Proc1 endp procsize = $ - Proc1
"$" is a special symbol MASM uses to denote the
current offset within the segment (i.e.
the location counter). It is a relocatable
symbol
as is Proc1
so the equate above computes the difference between the
offset at the start of Proc1 and the end of Proc1. This is the
length of the Proc1 procedure
in bytes.
The operands to the operators other than addition and
subtraction must be constants or an expression yielding a constant (e.g.
"$-Proc1"
above produces a constant value). You'll mainly use these operators in macros and with the
conditional assembly directives.
Consider the following program segment:
DSEG segment public 'DATA' I byte ? J byte ? DSEG ends CSEG segment . . . mov al I mov ah J . . . CSEG ends
Since I and J are adjacent
there is no need to use two mov
instructions to load al and ah
a simple mov ax
I
instruction would do the same thing. Unfortunately
the assembler will balk at mov
ax
I since I is a byte. The assembler will complain if you attempt to treat it as
a word. As you can see
however
there are times when you'd probably like to treat a byte
variable as a word (or treat a word as a byte or double word
or treat a double word as a
something else).
Temporarily changing the type of a label for some
particular occurrence is coercion. Expressions can be coerced to a different type using
the MASM ptr operator. You use the ptr operator as follows:
type PTR expression
Type is any of byte
word
dword
tbyte
near
far
or other type and expression is any general expression that is the
address of some object. The coercion operator returns an expression with the same value as
expression
but with the type specified by type. To handle the above problem you'd use the
assembly language instruction:
mov ax word ptr I
This instructs the assembler to emit the code that will
load the ax register with the word at address I. This will
of course
load al
with I and ah with J.
Code that uses double word values often makes extensive use
of the coercion operator. Since lds and les are the only 32-bit
instructions on pre-80386 processors
you cannot (without coercion) store an integer value
into a 32-bit variable using the mov instruction on those earlier CPUs. If
you've declared DBL using the dword pseudo-opcode
then an
instruction of the form mov DBL
ax will generate an error because it's
attempting to move a 16 bit quantity into a 32 bit variable. Storing values into a double
word variable requires the use of the ptr operator. The following code
demonstrates how to store the ds and bx registers into the
double word variable DBL:
mov word ptr DBL bx mov word ptr DBL+2 ds
You will use this technique often as various UCR Standard Library and MS-DOS calls return a double word value in a pair of registers.
Warning: If you coerce a jmp instruction to
perform a far jump to a near label
other than performance
degradation (the far jmp takes longer to execute)
your program will work
fine. If you coerce a call to perform a far call to a near subroutine
you're
headed for trouble. Remember
far calls push the cs register onto the stack
(with the return address). When executing a near ret instruction
the old cs
value will not be popped off the stack
leaving junk on the stack. The very next pop
or ret instruction will not operate properly since it will pop the cs
value off the stack rather than the original value pushed onto the stack.
Expression coercion can come in handy at times. Other times
it is essential. However
you shouldn't get carried away with coercion since data type
checking is a powerful debugging tool built in to MASM. By using coercion
you override
this protection provided by the assembler. Therefore
always take care when overriding
symbol types with the ptr operator.
One place where you'll need coercion is with the mov
memory
immediate instruction. Consider the following instruction:
mov [bx] 5
Unfortunately
the assembler has no way of telling whether bx
points at a byte
word
or double word item in memory. The value of the immediate
operand isn't of any use. Even though five is a byte quantity
this instruction might be
storing the value 0005h into a word variable
or 00000005 into a double word variable. If
you attempt to assemble this statement
the assembler will generate an error to the effect
that you must specify the size of the memory operand. You can easily accomplish this using
the byte ptr
word ptr
and dword ptr operators as
follows:
mov byte ptr [bx] 5 ;For a byte variable mov word ptr [bx] 5 ;For a word variable mov dword ptr [bx] 5 ;For a dword variable
Lazy programmers might complain that typing strings like
"word ptr" or "far ptr" is too much work.
Wouldn't it have been nice had Intel chosen a single character symbol rather than these
long phrases? Well
quit complaining and remember the textequ directive. With
the equate directive you can substitute a long string like "word ptr"
for a short symbol. You'll find equates like the following in many programs
including
several in this text:
byp textequ <byte ptr> ;Remember "bp" is a reserved symbol! wp textequ <word ptr> dp textequ <dword ptr> np textequ <near ptr> fp textequ <far ptr>
With equates like the above you can use statements like the following:
mov byp [bx] 5 mov ax wp I mov wp DBL bx mov wp DBL+2 ds
|
Table of Content | Chapter Eight (Part 5) |
Chapter Eight: MASM: Directives &
Pseudo-Opcodes (Part 4)
26 SEP 1996