Alignment Support

Aligning data improves the performance of intrinsics. When using the Intel® Streaming SIMD Extensions (Intel® SSE) intrinsics, you should align data to 16 bytes in memory operations. Specifically, you must align __m128 objects as addresses passed to the _mm_load and _mm_store intrinsics. If you want to declare arrays of floats and treat them as __m128 objects by casting, you need to ensure that the float arrays are properly aligned.

Use __declspec(align) to direct the compiler to align data more strictly than it otherwise would. For example, a data object of type int is allocated at a byte address which is a multiple of 4 by default. However, by using __declspec(align), you can direct the compiler to instead use an address which is a multiple of 8, 16, or 32 with the following restriction on IA-32 architecture:

You can use this data alignment support as an advantage in optimizing cache line usage. By clustering small objects that are commonly used together into a struct, and forcing the struct to be allocated at the beginning of a cache line, you can effectively guarantee that each object is loaded into the cache as soon as any one is accessed, resulting in a significant performance benefit.

The syntax of this extended-attribute is as follows:

align(n)

where n is an integral power of 2, up to 4096. The value specified is the requested alignment.

Note iconNote

If a value is specified that is less than the alignment of the affected data type, it has no effect. In other words, data is aligned to the maximum of its own alignment or the alignment specified with __declspec(align).

You can request alignments for individual variables, whether of static or automatic storage duration. (Global and static variables have static storage duration; local variables have automatic storage duration by default.) You cannot adjust the alignment of a parameter, nor a field of a struct or class. You can, however, increase the alignment of a struct (or union or class), in which case every object of that type is affected.

As an example, suppose that a function uses local variables i and j as subscripts into a 2-dimensional array. They might be declared as follows:
int i, j;

These variables are commonly used together. But they can fall in different cache lines, which could be detrimental to performance. You can instead declare them as follows:
__declspec(align(16)) struct { int i, j; } sub;

The compiler now ensures that they are allocated in the same cache line. In C++, you can omit the struct variable name (written as sub in the previous example). In C, however, it is required, and you must write references to i and j as sub.i and sub.j.

If you use many functions with such subscript pairs, it is more convenient to declare and use a struct type for them, as in the following example: typedef struct __declspec(align(16)) { int i, j; } Sub;

By placing the __declspec(align) after the keyword struct, you are requesting the appropriate alignment for all objects of that type. Note that allocation of parameters is unaffected by __declspec(align). (If necessary, you can assign the value of a parameter to a local variable with the appropriate alignment.)

You can also force alignment of global variables, such as arrays:

__declspec(align(16)) float array[1000];


Submit feedback on this help topic

Copyright © 1996-2011, Intel Corporation. All rights reserved.