|
Table of Content | Chapter Sixteen (Part 9) |
| CHAPTER SIXTEEN: PATTERN MATCHING (Part 8) |
| 16.8 -
Some Sample Pattern Matching Applications 16.8.1 - Converting Written Numbers to Integers |
| 16.8 Some Sample Pattern Matching Applications |
The best way to learn how to convert a pattern matching problem to the respective pattern matching algorithms is by example. The following sections provide several examples of some small pattern matching problems and their solutions.
16.8.1 Converting Written Numbers to Integers
One interesting pattern matching problem is to convert written (English) numbers to their integer equivalents. For example take the string "one hundred ninety-two" and convert it to the integer 192. Although written numbers represent a pattern quite a bit more complex than the ones we've seen thus far a little study will show that it is easy to decompose such strings.
The first thing we will need to do is enumerate the English words we will need to process written numbers. This includes the following words:
zero one two three four five six seven eight nine ten eleven twelve thirteen fourteen fifteen sixteen seventeen eighteen nineteen twenty thirty forty fifty sixty seventy eighty ninety hundred and thousand.
With this set of words we can build all the values between zero and 65 535 (the values we can represent in a 16 bit integer.
Next we've got to decide how to put these words together to form all the values between zero and 65 535. The first thing to note is that zero only occurs by itself it is never part of another number. So our first production takes the form:
Numberzero | NonZero
The next thing to note is that certain values may occur in pairs denoting addition. For example eighty-five denotes the sum of eighty plus five. Also note that certain other pairs denote multiplication. If you have a statement like "two hundred" or "fifteen hundred" the "hundred" word says multiply the preceding value by 100. The multiplicative words "hundred" and "thousand" are also additive. Any value following these terms is added in to the total; e.g. "one hundred five" means 1*100+5. By combining the appropriate rules we obtain the following grammar
| NonZero | Thousands Maybe100s | Hundreds | |
| Thousands | Under100 thousand | |
| Maybe100s | Hundreds | |
|
| Hundreds | Under100 hundred After100 | Under100 | |
| After100 | Under100 | |
|
| Under100 | Tens Maybe1s| Teens | ones | |
| Maybe1s | Ones | |
|
| ones | one | two | three | four | five | six | seven | eight | nine | |
| teens | ten | eleven | twelve | thirteen | fourteen | fifteen | sixteen | seventeen | eighteen | nineteen | |
| tens | twenty | thirty | forty | fifty | sixty | seventy | eighty | ninety |
The final step is to add semantic actions to actually convert the strings matched by this grammar to integer values. The basic idea is to initialize an accumulator value to zero. Whenever you encounter one of the strings that ones teens or tens matches you add the corresponding value to the accumulator. If you encounter the hundred or thousand strings you multiply the accumulator by the appropriate factor. The complete program to do the conversion follows:
; Numbers.asm
;
; This program converts written English numbers in the range "zero"
; to "sixty five thousand five hundred thirty five" to the corresponding
; integer value.
.xlist
include stdlib.a
includelib stdlib.lib
matchfuncs
.list
dseg segment para public 'data'
Value word 0 ;Store results here.
HundredsVal word 0
ThousandsVal word 0
Str0 byte "twenty one"
0
Str1 byte "nineteen hundred thirty-five"
0
Str2 byte "thirty three thousand two hundred nineteen"
0
Str3 byte "three"
0
Str4 byte "fourteen"
0
Str5 byte "fifty two"
0
Str6 byte "seven hundred"
0
Str7 byte "two thousand seven"
0
Str8 byte "four thousand ninety six"
0
Str9 byte "five hundred twelve"
0
Str10 byte "twenty three thousand two hundred ninety-five"
0
Str11 byte "seventy-five hundred"
0
Str12 byte "sixty-five thousand"
0
Str13 byte "one thousand"
0
; The following grammar is what we use to process the numbers.
; Semantic actions appear in the braces.
;
; Note: begin by initializing Value
HundredsVal
and ThousandsVal to zero.
;
; N -> separators zero
; | N4
;
; N4 -> do1000s maybe100s
; | do100s
;
; Maybe100s -> do100s
; | <empty string>
;
; do1000s -> Under100 "THOUSAND" separators
; {ThousandsVal := Value*1000}
;
; do100s -> Under100 "HUNDRED"
; {HundredsVal := Value*100} After100
; | Under100
;
; After100 -> {Value := 0} Under100
; | {Value := 0} <empty string>
;
; Under100 -> {Value := 0} try20 try1s
; | {Value := 0} doTeens
; | {Value := 0} do1s
;
; try1s -> do1s | <empty string>
;
; try20 -> "TWENTY" {Value := Value + 20}
; | "THIRTY" {Value := Value + 30}
; | ...
; | "NINETY" {Value := Value + 90}
;
; doTeens -> "TEN" {Value := Value + 10}
; | "ELEVEN" {Value := Value + 11}
; | ...
; | "NINETEEN" {Value := Value + 19}
;
; do1s -> "ONE" {Value := Value + 1}
; | "TWO" {Value := Value + 2}
; | ...
; | "NINE" {Value := Value + 9}
separators pattern {anycset
delimiters
0
delim2}
delim2 pattern {spancset
delimiters}
doSuccess pattern {succeed}
AtLast pattern {sl_match2
separators
AtEOS
AtEOS}
AtEOS pattern {EOS}
N pattern {sl_match2
separators
N2
N2}
N2 pattern {matchistr
zero
N3
AtLast}
zero byte "ZERO"
0
N3 pattern {sl_match2
N4
0
AtLast}
N4 pattern {sl_match2
do1000s
do100s
Maybe100s}
Maybe100s pattern {sl_match2
do100s
AtLast
AtLast}
do1000s pattern {sl_match2
Under100
0
do1000s2}
do1000s2 pattern {matchistr
str1000
0
do1000s3}
do1000s3 pattern {sl_match2
separators
do1000s4
do1000s5}
do1000s4 pattern {EOS
0
0
do1000s5}
do1000s5 pattern {Get1000s}
str1000 byte "THOUSAND"
0
do100s pattern {sl_match2
do100s1
Under100
After100}
do100s1 pattern {sl_match2
Under100
0
do100s2}
do100s2 pattern {matchistr
str100
0
do100s3}
do100s3 pattern {sl_match2
separators
do100s4
do100s5}
do100s4 pattern {EOS
0
0
do100s5}
do100s5 pattern {Get100s}
str100 byte "HUNDRED"
0
After100 pattern {SetVal
0
0
After100a}
After100a pattern {sl_match2
Under100
doSuccess}
Under100 pattern {SetVal
0
0
Under100a}
Under100a pattern {sl_match2
try20
Under100b
Do1orE}
Under100b pattern {sl_match2
doTeens
do1s}
Do1orE pattern {sl_match2
do1s
doSuccess
0}
NumPat macro lbl
next
Constant
string
local try
SkipSpcs
val
str
tryEOS
lbl pattern {sl_match2
try
next}
try pattern {matchistr
str
0
SkipSpcs}
SkipSpcs pattern {sl_match2
separators
tryEOS
val}
tryEOS pattern {EOS
0
0
val}
val pattern {AddVal
Constant}
str byte string
byte 0
endm
NumPat doTeens
try11
10
"TEN"
NumPat try11
try12
11
"ELEVEN"
NumPat try12
try13
12
"TWELVE"
NumPat try13
try14
13
"THIRTEEN"
NumPat try14
try15
14
"FOURTEEN"
NumPat try15
try16
15
"FIFTEEN"
NumPat try16
try17
16
"SIXTEEN"
NumPat try17
try18
17
"SEVENTEEN"
NumPat try18
try19
18
"EIGHTEEN"
NumPat try19
0
19
"NINETEEN"
NumPat do1s
try2
1
"ONE"
NumPat try2
try3
2
"TWO"
NumPat try3
try4
3
"THREE"
NumPat try4
try5
4
"FOUR"
NumPat try5
try6
5
"FIVE"
NumPat try6
try7
6
"SIX"
NumPat try7
try8
7
"SEVEN"
NumPat try8
try9
8
"EIGHT"
NumPat try9
0
9
"NINE"
NumPat try20
try30
20
"TWENTY"
NumPat try30
try40
30
"THIRTY"
NumPat try40
try50
40
"FORTY"
NumPat try50
try60
50
"FIFTY"
NumPat try60
try70
60
"SIXTY"
NumPat try70
try80
70
"SEVENTY"
NumPat try80
try90
80
"EIGHTY"
NumPat try90
0
90
"NINETY"
include stdsets.a
dseg ends
cseg segment para public 'code'
assume cs:cseg
ds:dseg
; Semantic actions for our grammar:
;
;
;
; Get1000s- We've just processed the value one..nine
grab it from
; the value variable
multiply it by 1000
and store it
; into thousandsval.
Get1000s proc far
push ds
push dx
mov ax
dseg
mov ds
ax
mov ax
1000
mul Value
mov ThousandsVal
ax
mov Value
0
pop dx
mov ax
di ;Required by sl_match.
pop ds
stc ;Always return success.
ret
Get1000s endp
; Get100s- We've just processed the value one..nine
grab it from
; the value variable
multiply it by 100
and store it
; into hundredsval.
Get100s proc far
push ds
push dx
mov ax
dseg
mov ds
ax
mov ax
100
mul Value
mov HundredsVal
ax
mov Value
0
pop dx
mov ax
di ;Required by sl_match.
pop ds
stc ;Always return success.
ret
Get100s endp
; SetVal- This routine sets Value to whatever is in si
SetVal proc far
push ds
mov ax
dseg
mov ds
ax
mov Value
si
mov ax
di
pop ds
stc
ret
SetVal endp
; AddVal- This routine sets adds whatever is in si to Value
AddVal proc far
push ds
mov ax
dseg
mov ds
ax
add Value
si
mov ax
di
pop ds
stc
ret
AddVal endp
; Succeed matches the empty string. In other words
it matches anything
; and always returns success without eating any characters from the input
; string.
Succeed proc far
mov ax
di
stc
ret
Succeed endp
; This subroutine expects a pointer to a string containing the English
; version of an integer number. It converts this to an integer and
; prints the result.
ConvertNumber proc near
mov value
0
mov HundredsVal
0
mov ThousandsVal
0
ldxi N
xor cx
cx
match
jnc NoMatch
mov al
"'"
putc
puts
print
byte "' = "
0
mov ax
ThousandsVal
add ax
HundredsVal
add ax
Value
putu
putcr
jmp Done
NoMatch: print
byte "Illegal number"
cr
lf
0
Done: ret
ConvertNumber endp
Main proc
mov ax
dseg
mov ds
ax
mov es
ax
meminit ;Init memory manager.
; Union in a "-" to the delimiters set because numbers can have
; dashes in them.
lesi delimiters
mov al
'-'
addchar
; Some calls to test the ConvertNumber routine and the conversion process.
lesi Str0
call ConvertNumber
lesi Str1
call ConvertNumber
lesi Str2
call ConvertNumber
lesi Str3
call ConvertNumber
lesi Str4
call ConvertNumber
lesi Str5
call ConvertNumber
lesi Str6
call ConvertNumber
lesi Str7
call ConvertNumber
lesi Str8
call ConvertNumber
lesi Str9
call ConvertNumber
lesi Str10
call ConvertNumber
lesi Str11
call ConvertNumber
lesi Str12
call ConvertNumber
lesi Str13
call ConvertNumber
Quit: ExitPgm
Main endp
cseg ends
sseg segment para stack 'stack'
stk db 1024 dup ("stack ")
sseg ends
zzzzzzseg segment para public 'zzzzzz'
LastBytes db 16 dup (?)
zzzzzzseg ends
end Main
Sample output:
'twenty one' = 21 'nineteen hundred thirty-five' = 1935 'thirty three thousand two hundred nineteen' = 33219 'three' = 3 'fourteen' = 14 'fifty two' = 52 'seven hundred' = 700 'two thousand seven' = 2007 'four thousand ninety six' = 4096 'five hundred twelve' = 512 'twenty three thousand two hundred ninety-five' = 23295 'seventy-five hundred' = 7500 'sixty-five thousand' = 65000 'one thousand' = 1000
|
Table of Content | Chapter Sixteen (Part 9) |
Chapter Sixteen: Pattern Matching
(Part 8)
29 SEP 1996