Chapter 3. Unscrambling Declarations in C Syntax Only a Compiler Could Love As Kernighan and Ritchie acknowledge, "C is sometimes castigated for the syntax of its declarations" (K&R, 2nd E.d, p. 122). C's declaration syntax is trivial for a compiler (or compiler-writer) to process,but hard for the average programmer.It's no exaggeration to say that C is significantly and needlessly complicated because of the awkward manner of combining types. How a Declaration Is Formed Let's first take a look at some C terminology, and the individual pieces that can make up a declaration. An important building block is a declarator—the heart of any declaration; roughly, a declarator is the identifier and any pointers, function brackets, or array indica-tions that go along with it, as shown in Figure 3-1. A declaration is made up of the parts shown in Figure 3-2. There are restrictions on legal declarations. You can't have any of these: • a function can't return a function, so you'll never see foo()() • a function can't return an array, so you'll never see foo()[] • an array can't hold a function, so you'll never see foo[]() You can have any of these: • a function returning a pointer to a function is allowed: int (* fun())(); • a function returning a pointer to an array is allowed: int (* foo())[] • an array holding pointers to functions is allowed: int (*foo[])() • an array can hold other arrays, so you'll frequently see int foo[][] A Word About structs A struct thus has the general form: struct optional_tag { type_1 identifier_1; type_2 identifier_2; ... type_N identifier_N; } optional_variable_definitions; /* array inside a struct */ struct s_tag { int a[100]; }; you can now treat the array as a first-class type. You can copy the entire array with an assignment statement, pass it to a function by value, and make it the return type of a function. A Word About unions
A union has the general form: union optional_tag { type_1 identifier_1; type_2 identifier_2; ... type_N identifier_N; } optional_variable_definitions; Unions are typically used to save space, by not storing all possibilities for certain data items that cannot occur together. Unions can also be used, not for one interpretation of two different pieces of data, but to get two different interpretations of the same data.An example is: union bits32_tag { int whole; /* one 32-bit value */ struct {char c0,c1,c2,c3;} byte; /* four 8-bit bytes */ } value; This union allows a programmer to extract the full 32-bit value, or the individual byte fields value.byte.c0, and so on. There are other ways to accomplish this, but the union does it without the need for extra assignments or type casting. A Word About enums The general form of an enum: enum optional_tag {stuff... } optional_variable_definitions; The stuff… in this case is a list of identifiers, possibly with integer values assigned to them. There is one advantage to enums: unlike #defined names which are typically discarded during compilation, enum names usually persist through to the debugger, and can be used while debugging your code. The Precedence Rule
We have now reviewed the building blocks of declarations. This section describes one method for breaking them down. into an English explanation. The Precedence Rule for Understanding C Declarations A: Declarations are read by starting with the name and then reading in precedence order. B: The precedence, from high to low, is: B.1 parentheses grouping together parts of a declaration B.2 the postfix operators: parentheses () indicating a function, and square brackets [] indicating an array. B.3 the prefix operator: the asterisk denoting "pointer to". C: If a const and/or volatile keyword is next to a type specifier (e.g. int, long, etc.) it applies to the type specifier. Otherwise the const and/or volatile keyword applies to the pointer asterisk on its immediate left. An example of solving a declaration using the Precedence Rule:
char* const *(*next)(); Solving a Declaration Using the Precedence Rule A First, go to the variable name, "next", and note that it is directly enclosed by parentheses. B.1 So we group it with what else is in the parentheses, to get "next is a pointer to...". B Then we go outside the parentheses, and have a choice of a prefix asterisk, or a postfix pair of parentheses. B.2 Rule B.2 tells us the highest precedence thing is the function parentheses at the right, so we have "next is a pointer to a function returning…" B.3 Then process the prefix "*" to get "pointer to". C Finally, take the "char * const", as a constant pointer to a character. Then put it all together to read: "next is a pointer to a function returning a pointer to a const pointer-to-char" Unscrambling C Declarations by Diagram
In this section we present a diagram with numbered steps (see Figure 3-3). Magic Decoder Ring for C Declarations Start at the first identifier you findwhen reading from the left. When we match a token in our declaration against
the diagram, we erase it from further consideration. At each point we look first at the token to the right, then to the left. When everything has been erased, the job is done. Example: char* const *(*next)(); Steps in Unscrambling a C Declaration Declaration Remaining Next Step to Apply Result (start at leftmost identifier) char * const step 1 say "next is a…" *(*next )(); char * const *(*) (); step 2, 3 doesn't match, go to next step, say "next is a…" char * const *(* )(); step 4 doesn't match, go to next step char * const *(* )(); step 5 asterisk matches, say "pointer to …", go to step 4 char * const *() (); step 4 "(" matches up to ")", go to step 2 char * const * () ; step 2 doesn't match, go to next step char * const * () ; step 3 say "function returning…" char * const * ; step 4 doesn't match, go to next step char * const * ; step 5 say "pointer to…" char * const ; step 5 say "read-only…" char * ; step 5 say "pointer to…" char ; step 6 say "char" Then put it all together to read: "next is a pointer to a function returning a pointer to a read-only pointer-to-char" typedef Can Be Your Friend
Typedefs are a funny kind of declaration: they introduce a new name for a type rather than reserving space for a variable. In some ways, a typedef is similar to macro text replacement—it doesn't introduce a new type, just a new name for a type, but there is a key difference explained later. Instead of the declaration saying "this name refers to a variable of the stated type," the typedef keyword doesn't create a variable, but causes the declaration to say "this name is a synonym for the stated type." Typically, this is used for tricky cases involving pointers to stuff. The classic example is the declaration of the signal() prototype.The ANSI Standard shows that signal is declared as: void (*signal(int sig, void (*func)(int)) ) (int); Here's how it can be simplified by a typedef that "factors out" the common part. typedef void (*ptr_to_func) (int); /* this says that ptr_to_func is a pointer to a function * that takes an int argument, and returns void */ ptr_to_func signal(int, ptr_to_func); /* this says that signal is a function that takes * two arguments, an int and a ptr_to_func, and * returns a ptr_to_func */ Typedef is not without its drawbacks, however.It has the same confusing syntax of other declarations, and the same ability to cram several declarators into one declaration. It provides essentially nothing for structs, except the unhelpful ability to omit the struct keyword. And in any typedef, you don't even have to put the typedef at the start of the declaration! Typedef creates aliases for data types rather than new data types. You can typedef any type. typedef int (*array_ptr)[100]; Tips for Working with Declarators
Don't put several declarators together in one typedef, like this: typedef int *ptr, (*fun)(), arr[5]; /* ptr is the type "pointer to int" * fun is the type "pointer to a function returning int" * arr is the type "array of 5 ints" */ And never, ever, bury the typedef in the middle of a declaration, like this: unsigned const long typedef int volatile *kumquat; Difference Between typedef int x[10] and #define x int[10] As mentioned above, there is a key difference between a typedef and macro text replacement. The right way to think about this is to view a typedef as being a complete "encapsulated" type—you can't add to it after you have declared it. The difference between this and macros shows up in two ways. You can extend a macro typename with other type specifiers, but not a typedef 'd typename. That is, #define peach int unsigned peach i; /* works fine */ typedef int banana; unsigned banana i; /* Bzzzt! illegal */ Second, a typedef 'd name provides the type for every declarator in a declaration. #define int_ptr int * int_ptr chalk, cheese; After macro expansion, the second line effectively becomes: int * chalk, cheese; This makes chalk and cheese as different as chutney and chives: chalk is a pointer-to-an-integer, while cheese is an integer. In contrast, a typedef like this: typedef char * char_ptr; char_ptr Bentley, Rolls_Royce; declares both Bentley and Rolls_Royce to be the same. The name on the front is different, but they are both a pointer to a char. What typedef struct foo { ... foo; } foo; Means
There are multiple namespaces in C:
* label names * tags (one namespace for all structs, enums and unions) * member names (each struct or union has its own namespace) * everyting else Everything within a namespace must be unique, but an identical name can be applied to things in different namespaces. Because it is legal to use the same name in different namespaces, you sometimes see code like this.
struct foo {int foo;} foo; This is absolutely guaranteed to confuse and dismay future programmers who have to maintain your code. And what would sizeof( foo ); refer to? Things get even scarier. Declarations like these are quite legal: typedef struct baz {int baz;} baz; struct baz variable_1; baz variable_2; That's too many "baz"s! Let's try that again, with more enlightening names, to see what's going on: typedef struct my_tag {int i;} my_type; struct my_tag variable_1; my_type variable_2; The typedef introduces the name my_type as a shorthand for "struct my_tag {int i}", but it also introduces the structure tag my_tag that can equally be used with the keyword struct. typedef struct fruit {int weight, price_per_lb } fruit; /* statement 1 */ struct veg {int weight, price_per_lb } veg; /* statement 2 */ very different things are happening. Statement 1 declares a structure tag "fruit" and a structure typedef "fruit" which can be used like this: struct fruit mandarin; /* uses structure tag "fruit" */ fruit tangerine; /* uses structure type "fruit" */ Statement 2 declares a structure tag "veg" and a variable veg. Only the structure tag can be used in further declarations, like this: struct veg potato; It would be an error to attempt a declaration of veg cabbage. That would be like writing: int i; ij; Tips for Working with Typedefs
Don't bother with typedefs for structs. All they do is save you writing the word "struct", which is a clue that you probably shouldn't be hiding anyway. Use typedefs for: • types that combine arrays, structs, pointers, or functions. • portable types. When you need a type that's at least (say) 20-bits, make it a typedef. Then when you port the code to different platforms, select the right type, short, int, long, making the change in just the typedef, rather than in every declaration. • casts. A typedef can provide a simple name for a complicated type cast. E.g. • • typedef int (*ptr_to_int_fun)(void); • int * p; = (ptr_to_int_fun) p; Always use a tag in a structure definition, even if it's not needed. It will be later. Write a Program to Translate C Declarations into English Here's the design. The main data structure is a stack, on which we store tokens that we have read, while we are reading forward to the identifier. Then we can look at the next token to the right by reading it, and the next token to the left by popping it off the stack. The data structure looks like: struct token { char type; char string[MAXTOKENLEN]; }; /* holds tokens we read before reaching first identifier */ struct token stack[MAXTOKENS]; /* holds the token just read */ struct token this; The pseudo-code is: utility routines---------- classify_string look at the current token and return a value of "type" "qualifier" or "identifier" in this.type gettoken read the next token into this.string if it is alphanumeric, classify_string else it must be a single character token this.type = the token itself; terminate this.string with a nul. read_to_first_identifier gettoken and push it onto the stack until the first identifier is read. Print "identifier is", this.string gettoken parsing routines---------- deal_with_function_args read past closing ')' print out "function returning" deal_with_arrays while you've got "[size]" print it out and read past it deal_with_any_pointers while you've got "*" on the stack print "pointer to" and pop it deal_with_declarator if this.type is '[' deal_with_arrays if this.type is '(' deal_with_function_args deal_with_any_pointers while there's stuff on the stack if it's a '(' pop it and gettoken; it should be the closing ')' deal_with_declarator else pop it and print it main routine---------- main read_to_first_identifier deal_with_declarator Make String Comparison Look More Natural One of the problems with the strcmp () routine to compare two strings is that it returns zero if the strings are identical. This leads to convoluted code when the comparison is part of a conditional statement: if (!strcmp(s,"volatile")) return QUALIFIER; a zero result indicates false, so we have to negate it to get what we want. Here's a better way. Set up the definition: #define STRCMP(a,R,b) (strcmp(a,b) R 0) Now you can write a string in the natural style if ( STRCMP(s, ==, "volatile")) Using this definition, the code expresses what is happening in a more natural style. Try rewriting the cdecl program to use this style of string comparison, and see if you prefer it. |