Objects First


Readable code

The importance of readable, intelligible code has been emphasised many times already. In this section, let's look at two apparently unrelated topics - enumerated data types and the switch and see how they can be used to produce more readable code.

Assigning codes or The Magic Number Problem

Computer programs abound with codes which classify objects, actions to be performed, sources of information, destinations for information, etc. Let's extend our Rectangle class to include coloured rectangles, eg ones which can be drawn on a display screen. The key problem is: How do we represent colour in the computer? The standard solution is to assign a numeric code to each colour:
ColourCode
black0
blue1
red2
yellow3
.....
Now we can add a SetColour and Colour methods to our class to change and extract the colour of an object. If we want to process objects of different colours in different ways, then we can write:
int colour;

colour = Colour( r );
if ( colour == 0 ) {
  /* Process a black rectangle */
  ...
  }
else if ( colour == 1 ) {
  /* Process a blue rectangle */
  ...
  }
All very straightforward, until we receive the latest direction from our colour consultant "at least 50% of rectangles should be puce in colour" and we have to change the program (after we've found out what is actually meant by puce!). Or we discover that tangerine rectangles are being processed with the rules for orange ones. Then we find that Needless to say, this leads to enormous potential for errors. For example, someone not familiar with any logic that there might have been in the original encoding, adds new codes and forgets to change all the statements that dealt with the old codes correctly.

A much better solution is to have a class of colours, each represented by a symbolic name, eg black, blue, red, ..etc, and to be able to use those names in our program:

Colour colour;

colour = Colour( r );
if ( colour == black ) {
  /* Process a black rectangle */
  ...
  }
else if ( colour == blue ) {
  /* Process a blue rectangle */
  ...
  }
Not only is this much easier to read, it presents far fewer maintenance headaches. Adding a new colour involves simply deciding on a new, meaningful name, eg puce can be called puce (how original!), and then adding some statements to our program. No existing program statements should need change, thus we can be quite confident that, after we have added the statements to handle the new colour, processing of all the original colours will still be correct.

Enumerated types

C provides an enumerated type which can be used to define variables whose values are not numbers or characters, but names, such as red, black, etc. An enumerated type is defined as a set of labels which are the values which an object of that type can have. Thus, we might define a Colour class as:
typedef enum { black, blue, red, yellow } Colour;
We've seen typedef before - you can view it as a declaration that defines a new type (or class in our OO terminology). The name of the new type (or class) is Colour. The middle part is the original C way of declaring an enumerated type - the keyword enum followed by a list of symbols or labels in braces. The symbols can be any legal C identifier and thus follow the same rules as the names of objects, functions, etc. The list of symbols can be as long as you like - although some C implementations impose limitations for efficiency reasons.

What's happening here?
In order to produce a program that runs efficiently, the C compiler will assign codes to the various symbols that we have placed in the list of possible values of objects of this class. For example, in the Colour example, the compiler might well start at zero and assign codes 1,2, ... etc to the symbols as it encountered them. This of course would result in exactly the same table as we had originally.

So what have we gained?

Easier maintenance and readability!
  • Our code is free of magic numbers - numeric codes which have no intrinsic meaning and are just there to confuse a reader of our program.
  • Since the compiler assigns the codes, we can simply add new symbols as we please: the compiler will take care of reassigning new values to all the codes.

Operations on enumerated types

Using typedef enum { ... }, we can declare a class of objects which have a defined set of symbolic values. (I say symbolic values here, because, as far as the logic of the program is concerned, any actual values which might be associated with the various symbolic values are only there for the computer's convenience. Computers which can think in terms of abstract values only exist, to date at least, in the Gamma quadrant!)

They also have a limited set of sensible operations: basically, ==, != and assignment.

Unfortunately, being a language designed for hackers, C permits all the operations defined for integers on enum types. Thus:

#include <stdio.h>

main() {
  typedef enum { black, blue, red, yellow, green, purple } Colour;

  Colour a, b, c;

  a = black; b = blue; c = red;

  if ( a > c ) printf("a>c\n"); else printf("a<=c\n");
  if ( b >= c ) printf("b>=c\n"); else printf("b<c\n");
  printf("a = %d, b = %d, c = %d\n", a, b, c );
  }
is perfectly legal C and executes without any problems. My Unix system prints:
a<=c
b<c
a = 0, b = 1, c = 2
for the printf statement in this example.

However, it is not guaranteed that any other system will print the same values.
The C compiler is free to choose what codes are used for black, blue and red.

However, ==, != and assignment are required by the standard to work in equivalent ways on any ANSI C implementation. Thus it is best to regard an enumerated type as possessing only these three operations or methods. (Some more strongly typed languages, such as Pascal, Modula and Ada, enforce such a restriction. C is often referred to as weakly typed because of the relaxations in operation usage rules that it permits.)

If you are using an enumerated type in a single program, then the separation of the logical operation of the program from issues of representation (what code is used for "blue"?, etc) presents no problems. Although there is one caveat:

If you alter the definition of an enumerated type, eg to add an extra value, then you must re-compile all the program modules that use that type. In a well-constructed C program, the definition of the class (the typedef enum { ... } ClassName; statement) will be in a header file (.h extension) which is included in all modules which use objects of the class. If a change is made to the definition, then all the modules which import the definition must be re-compiled.

Even though an identical definition of an enumerated type may be imported into two different programs, it is not guaranteed that the representations (actual values used) in the two programs are the same. (Although, generally, the same compiler will produce the same coding from an identical class definition.) This is definitely the case for compilers on different machines (or even different compilers on the same machine!). Thus a problem arises if program A writes objects of an enumerated type into a file (or onto a communications channel) and they are read by program B (on the same or a different machine).

To overcome this problem and to permit wider use of enumerated types (with the consequent gain to legibity and maintainability of programs), C allows us to specify the actual values to be used when defining the class:

typedef enum { black=1, blue=2, red=3, .. } Colour;
Now objects of the class Colour can be written into files and read by other programs with no problems. We have lost the convenience of allowing the compiler to assign codes to the values our objects can take. Thus we may have to make some major re-adjustment if we want to ensure that when we add puce to the class, the value assigned to it reflects that it lies somewhere in the pale yellow, brown, pink, .. region. Of course, if this is not a consideration, then life is simple, we just add puce = x+1 to the end of the list, where x is the value allocated to the last item currently in the list. However, even a major re-numbering exercise ( occasioned by wanting to locate puce next to sickly_yellow), only needs to be done in one statement. All other code using the symbolic values is then simply re-compiled. The only code adjustment is the adding of the special cases for puce to the program.

Thus it is possible to retain the major advantage of enumerated types - legible, understandable code - with only a small increase in the maintenance effort and allow objects of enumerated types to be passed betweed programs via files and communications channels.

Enumerated types are particularly useful in multiple-choice branching statements or switch statements.

An alternative strategy

Some C programmers will use #define directives to establish symbolic constants to make their programs readable rather than use enumerated types. For example, we could write:
/* Colour.h - Class of colour definitions */
#define BLACK    0
#define BLUE     1
#define RED      2
#define YELLOW   3

#include "Colour.h"

int colour;

if ( colour == BLACK ) {
  ...
  }
else if ( colour == BLUE ) {
  ...
  }
else if ( colour == RED ) {
  ...
  }
..
Although I have a slight preference for the enumerated type, there are times when the #define approach makes sense also. These are when the codes which you must use are part of the problem specification and thus have some meaning in themselves. For example, suppose that each colour had to be represented by its number in a fashion designer's catalogue - because that number effectively defined the colour. Then, although I want to progam symbolically (and forget the codes), there are points in the program where the actual value of the code has real meaning, for example, when the program tells you to look at the actual colour - giving its code number now, because that's what printed beside the reference example on page x of the catalogue.

The decision to use an enumerated type rather than a defined constant is easier in strongly typed languages such as Pascal or Ada, because the compiler prevents you from mixing types (it will reject an attempt to compare an object of an enumerated type to an integer, for example). Thus the compiler will give you some help to avoid silly mistakes. For example, in Ada,

TYPE colour IS (black, blue, red, yellow);
....
c: colour;
...
IF c = 23 THEN ...
END IF;
will generate a compiler error, because c = 23 is an invalid expression: you can't compare objects of different types.

Unfortunately, C promotes enum's to int in expressions and can't give this assistance!

Key terms

Magic Number
A numeric or character code assigned to an action, object, class of objects, etc which has no significance in itself. The assignment of codes is completely arbitrary and the codes have no relationship with the things they represent, hence the "magic" soubriquet.
enumerated type
A type or class in which variables have values which are designated by symbols such as "red", "blue", "save_command", "save_as_command", etc, rather than numeric values.
strongly typed languages
Languages in which rules about the use of types in expressions and assignments are strongly enforced. These languages don't automatically promotes values and variables to other types in expressions. Pascal, Modula and Ada are strongly typed languages. The original C was definitely weakly typed: ANSI standard C is only slightly more strongly typed. Good ANSI C compilers will give many warnings about potentially dangerous code. A sofware engineer will examine each such warning and make changes that ensure that it doesn't appear again.

Continue on to switch statements.
Back to the Table of Contents
©
John Morris, 1998