|
Table of Content | Chapter Fifteen (Part 4) |
| CHAPTER FIFTEEN: STRINGS AND CHARACTER SETS (Part 3) |
| 15.2 - Character
Strings 15.2.1 - Types of Strings 15.2.2 - String Assignment 15.2.3 - String Comparison |
| 15.2 Character Strings |
Since you'll encounter character strings more often than other types of strings they deserve special attention. The following sections describe character strings and various types of string operations.
At the most basic level the 80x86's string instruction only operate upon arrays of characters. However since most string data types contain an array of characters as a component the 80x86's string instructions are handy for manipulating that portion of the string.
Probably the biggest difference between a character string and an array of characters is the length attribute. An array of characters contains a fixed number of characters. Never any more never any less. A character string however has a dynamic run-time length that is the number of characters contained in the string at some point in the program. Character strings unlike arrays of characters have the ability to change their size during execution (within certain limits of course).
To complicate things even more there are two generic types of strings: statically allocated strings and dynamically allocated strings. Statically allocated strings are given a fixed maximum length at program creation time. The length of the string may vary at run-time but only between zero and this maximum length. Most systems allocate and deallocate dynamically allocated strings in a memory pool when using strings. Such strings may be any length (up to some reasonable maximum value). Accessing such strings is less efficient than accessing statically allocated strings. Furthermore garbage collection[5] may take additional time. Nevertheless dynamically allocated strings are much more space efficient than statically allocated strings and in some instances accessing dynamically allocated strings is faster as well. Most of the examples in this chapter will use statically allocated strings.
A string with a dynamic length needs some way of keeping track of this length. While there are several possible ways to represent string lengths the two most popular are length-prefixed strings and zero-terminated strings. A length-prefixed string consists of a single byte or word that contains the length of that string. Immediately following this length value are the characters that make up the string. Assuming the use of byte prefix lengths you could define the string "HELLO" as follows:
HelloStr byte 5 "HELLO"
Length-prefixed strings are often called Pascal strings since this is the type of string variable supported by most versions of Pascal[6].
Another popular way to specify string lengths is to use zero-terminated strings. A zero-terminated string consists of a string of characters terminated with a zero byte. These types of strings are often called C-strings since they are the type used by the C/C++ programming language. The UCR Standard Library since it mimics the C standard library also uses zero-terminated strings.
Pascal strings are much better than C/C++ strings for
several reasons. First
computing the length of a Pascal string is trivial. You need only
fetch the first byte (or word) of the string and you've got the length of the string.
Computing the length of a C/C++ string is considerably less efficient. You must scan the
entire string (e.g.
using the scasb instruction) for a zero byte. If the
C/C++ string is long
this can take a long time. Furthermore
C/C++ strings cannot contain
the NULL character. On the other hand
C/C++ strings can be any length
yet require only a
single extra byte of overhead. Pascal strings
however
can be no longer than 255
characters when using only a single length byte. For strings longer than 255 bytes
you'll
need two bytes to hold the length for a Pascal string. Since most strings are less than
256 characters in length
this isn't much of a disadvantage.
An advantage of zero-terminated strings is that they are easy to use in an assembly language program. This is particularly true of strings that are so long they require multiple source code lines in your assembly language programs. Counting up every character in a string is so tedious that it's not even worth considering. However you can write a macro which will easily build Pascal strings for you:
PString macro String local StringLength StringStart byte StringLength StringStart byte String StringLength = $-StringStart endm . . . PString "This string has a length prefix"
As long as the string fits entirely on one source line you can use this macro to generate Pascal style strings.
Common string functions like concatenation length substring index and others are much easier to write when using length-prefixed strings. So we'll use Pascal strings unless otherwise noted. Furthermore the UCR Standard library provides a large number of C/C++ string functions so there is no need to replicate those functions here.
You can easily assign one string to another using the movsb
instruction. For example
if you want to assign the length-prefixed string String1
to String2
use the following:
; Presumably ES and DS are set up already lea si String1 lea di String2 mov ch 0 ;Extend len to 16 bits. mov cl String1 ;Get string length. inc cx ;Include length byte. rep movsb
This code increments cx by one before
executing movsb because the length byte contains the length of the string
exclusive of the length byte itself.
Generally
string variables can be initialized to constants
by using the PString macro described earlier. However
if you need to set a
string variable to some constant value
you can write a StrAssign subroutine
which assigns the string immediately following the call. The following
procedure does exactly that:
include stdlib.a includelib stdlib.lib cseg segment para public 'code' assume cs:cseg ds:dseg es:dseg ss:sseg ; String assignment procedure MainPgm proc far mov ax seg dseg mov ds ax mov es ax lea di ToString call StrAssign byte "This is an example of how the " byte "StrAssign routine is used" 0 nop ExitPgm MainPgm endp StrAssign proc near push bp mov bp sp pushf push ds push si push di push cx push ax push di ;Save again for use later. push es cld ; Get the address of the source string mov ax cs mov es ax mov di 2[bp] ;Get return address. mov cx 0ffffh ;Scan for as long as it takes. mov al 0 ;Scan for a zero. repne scasb ;Compute the length of string. neg cx ;Convert length to a positive #. dec cx ;Because we started with -1 not 0. dec cx ;skip zero terminating byte. ; Now copy the strings pop es ;Get destination segment. pop di ;Get destination address. mov al cl ;Store length byte. stosb ; Now copy the source string. mov ax cs mov ds ax mov si 2[bp] rep movsb ; Update the return address and leave: inc si ;Skip over zero byte. mov 2[bp] si pop ax pop cx pop di pop si pop ds popf pop bp ret StrAssign endp cseg ends dseg segment para public 'data' ToString byte 255 dup (0) dseg ends sseg segment para stack 'stack' word 256 dup (?) sseg ends end MainPgm
This code uses the scas instruction to
determine the length of the string immediately following the call instruction.
Once the code determines the length
it stores this length into the first byte of the
destination string and then copies the text following the call to the string
variable. After copying the string
this code adjusts the return address so that it points
just beyond the zero terminating byte. Then the procedure returns control to the caller.
Of course
this string assignment procedure isn't very
efficient
but it's very easy to use. Setting up es:di is all that you need
to do to use this procedure. If you need fast string assignment
simply use the movs
instruction as follows:
; Presumably DS and ES have already been set up. lea si SourceString lea di DestString mov cx LengthSource rep movsb . . . SourceString byte LengthSource-1 byte "This is an example of how the " byte "StrAssign routine is used" LengthSource = $-SourceString DestString byte 256 dup (?)
Using in-line instructions requires considerably more setup
(and typing!)
but it is much faster than the StrAssign procedure. If you
don't like the typing
you can always write a macro to do the string assignment for you.
Comparing two character strings was already beaten to death
in the section on the cmps instruction. Other than providing some concrete
examples
there is no reason to consider this subject any further.
Note: all the following examples assume that es and
ds are pointing at the proper segments containing the destination and source
strings.
Comparing Str1 to Str2:
lea si Str1 lea di Str2 ; Get the minimum length of the two strings. mov al Str1 mov cl al cmp al Str2 jb CmpStrs mov cl Str2 ; Compare the two strings. CmpStrs: mov ch 0 cld repe cmpsb jne StrsNotEqual ; If CMPS thinks they're equal compare their lengths ; just to be sure. cmp al Str2 StrsNotEqual:
At label StrsNotEqual
the flags will contain
all the pertinent information about the ranking of these two strings. You can use the
conditional jump instructions to test the result of this comparison.
[5] Reclaiming unused storage.
[6] At least those versions of Pascal which support strings.
|
Table of Content | Chapter Fifteen (Part 4) |
Chapter Fifteen: Strings And
Character Sets (Part 3)
28 SEP 1996