Computing Memory Alignment Restrictions of Data Types in C and C++
Tom Lokovic
http://www.monkeyspeak.com/tom/
July 2003
ABSTRACT
When implementing a dynamic object system in which structures of heterogenous
data are assembled at runtime, one must respect memory alignment restrictions
of the hardware. Since alignment restrictions are different from one platform
to the next, this imposes a portability problem. This paper describes a
portable method for computing alignment restrictions of data types in C and C++.
BACKGROUND
Modern computer hardware is byte-addressed: the address space is broken up
into bytes, and any pointer may reference any individual byte. A value which
consists of one byte may be placed anywhere in memory. However, most hardware architectures
impose restrictions on larger data types. Specifically, a multi-byte
value may be constrained to appear at only certain memory locations.
This restriction is often called memory alignment: the multi-byte value may
only appear at memory locations which are integer multiples of some number.
In this paper, we call that number the stride of the data type. The stride
may vary between different data types on one platform, and between the same
type on different platforms.
Most programs in C and C++ need not worry about alignment restrictions.
This is because compilers know enough about the platform to arrange values so that they all align properly. As long
as a program restricts itself to simple structs and classes, and as long as it does
not attempt to place values at arbitrary memory locations, the programmer
need not worry about alignment issues.
However, some programs cannot be so restrictive. In particular, if
a program needs to assemble structs dynamically--that is, determine at
runtime which types are needed and then arrange them on the fly--then
native structs and classes are insufficient. The program must perform
its own "packing" of structured data, which also means it must take
care to respect alignment.
While C and C++ compilers always know about the alignment restrictions of
the underlying platform, there is no formal mechanism to ask the compiler
about those restrictions. This paper offers a portable way to "trick" the
compiler into revealing what it knows about alignment.
THE TECHNIQUE
C and C++ provide a built-in operator sizeof(), which returns the size of
a data type in bytes. We seek a similar operator, strideof(), which returns
the type's alignment stride in bytes.
Here is such an operator, implemented with a template and a macro:
template struct Tchar { T t; char c; };
#define strideof(T) \ ((sizeof(Tchar) > sizeof(T)) ? \ sizeof(Tchar)-sizeof(T) : sizeof(T))
It may not be immediately obvious why this works, or even what this code
has to do with alignment. Before we get into the details, though, let's
consider the advantages and disadvantages of this technique.
Advantages:
- It works for any type, including native types (eg, short, double), struct types,
and class types.
- Its computation can be performed at compile-time via constant
folding, and thus (like sizeof()) it incurs no runtime cost at all.
Disadvantages:
- The macro uses C++ templates, which are not available in C.
The basic technique (subtracting T's size from the size
of a special struct) works equally well in C, but without templates
there's no way to provide a general macro like strideof().
- While this technique is always guaranteed to give a result that is
safe to use, and in practice does give the right answer, it's not
strictly guaranteed to give T's true stride. More on that
later.
- Unlike sizeof(),which is a built-in operator, the strideof() macro
can only take a type as its argument--not an instance.
HOW IT WORKS:
Consider a type T for which we wish to compute the stride.
The stride constrains the memory locations at which an instance of T may begin.
(Here, each ^ represents a memory location at which T may appear; the spacing between them is the stride we seek.)
^ ^ ^ ^ ^ ^ ^ ^ ^ ^
Now consider a hypothetical instance of T in memory. It has sizeof(T)
bytes, and must begin at some multiple of T's stride. We also know
(from the proof in Appendix 1 below) that sizeof(T) is an integer multiple
of T's stride. Unfortunately, we don't know which multiple it is, and so we still don't know the stride.
tttttttttttt ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
Now suppose that, instead of a T, we have a struct Tchar which contains
both a T and
a char. There are several ways in which the compiler may choose to arrange
the struct, and this technique works equally well with all of them. For
now, we'll assume it does what most compilers do: place the char immediately
after the T:
ttttttttttttc ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
However, the compiler cannot leave the struct quite like this. For alignment reasons, the
compiler will bump the size of Tchar up to some multiple of T's
stride--probably
the next multiple. It must do this regardless of the order in which it
arranges the elements and the padding. For a proof, See Appendix 2.
ttttttttttttc### ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
Now, if we subtract the size of T from the size of Tchar, we get some integer
multiple of T's stride. In this case (and in the common case), we get T's
stride exactly.
ttttttttttttc### ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ <---
There are two cases in which we might not get T's stride exactly. The first
is if the compiler (for some reason) chooses to pad the struct too much. In
that case, Tchar's size will be bumped up to some higher multiple of T's
stride. The result we get is not optimal, but it's still guaranteed to be
safe to use. Such excessive padding should probably be considered a bug in
the compiler, but our technique deals safely with that case anyway.
The second case is if T already has some padding in it, and the compiler
chooses to reuse some of T's padding to store the extra char. In that
case, Tchar
will be the same size as T, and our subtraction will yield zero. When that
happens, our macro defaults to sizeof(T), which is not optimal but is at
least known to be a multiple of T's stride. (There is some dispute as to
whether this case can arise--some believe that it is illegal for a compiler
to reuse padding like this. Either way, our technique deals with it.)
Note that this technique has more to do with the compiler's
alignment strategy than the hardware's actual restrictions. For
example, some
architectures do not prohibit misaligned values, but instead incur
some performance penalty when operating on misaligned values. If a compiler
is configured to allow misaligned values (to save space at the cost of some
performance), then our technique will reflect that alignment strategy.
APPENDIX 1
Claim: For every type T, sizeof(T) is an integer multiple of T's stride S.
Proof:
- By definition, instances of T may only begin at memory addresses which
are integer multiples of S.
- C and C++ guarantee that every datatype is "tilable". That is, if an
instance of T appears at a legal address a, then it is always legal
(alignment-wise) for another instance to appear at a+sizeof(T). This is
what makes arrays possible.
- Given any two instances of T in memory, the difference between their addresses is an integer multiple of S. (This is clearly true since, from i, both addresses are integer multiples of S.)
- Suppose A is an instance of type T which appears at address a. By ii,
it is legal to place another instance at address a+sizeof(T). And by
iii, we know (a+sizeof(T))-a is an integer multiple of S.
- Thus, sizeof(T) is an integer multiple of S.
APPENDIX 2
Claim: For every struct type T, sizeof(T) is an integer multiple of the stride
of each of T's members.
Proof:
- Suppose A is an instance of type T at address a.
- From ii, it is legal to place an instance B of type T at
address a+sizeof(T).
- Given any element inside struct T, the difference between its address
in B and its address in A is sizeof(T).
- But from iii, that difference is also an integer multiple of the element's stride.
- Thus, sizeof(T) is an integer multiple of the element's stride.
|