The vectorization report can provide information about loops that could take advantage of Intel® Streaming SIMD Extensions (Intel® SSE3, SSE2, and SSE) vectorization, and it is available on systems based on IA-32 and Intel® 64 architectures.
See Using Parallelism for information on other vectorization options.
The -vec-report (Linux* and Mac OS* X) or /Qvec-report (Windows*) option directs the compiler to generate the vectorization reports with different levels of information. Specify a value of 3 to generate the maximum diagnostic details.
Operating System |
Command |
---|---|
Linux and Mac OS X |
icpc -c -xSSSE3 -vec-report3 sample.cpp |
Windows |
icl /c /QxSSSE3 /Qvec-report:3 sample.cpp |
where -c (Linux and Mac OS X) or /c (Windows) instructs the compiler to compile the example without generating an executable.
Linux and Mac OS X: The space between the option and the phase is optional.
Windows: The colon between the option and phase is optional.
The following example results illustrate the type of information generated by the vectorization report:
Example results |
---|
sample.cpp(10) : (col. 2) remark: loop was not vectorized: not inner loop. sample.cpp(11) : (col. 6) remark: loop was not vectorized: not inner loop. sample.cpp(12) : (col. 5) remark: vector dependence: assumed FLOW dependence between c line 13 and b line 13. sample.cpp(12) : (col. 5) remark: vector dependence: assumed FLOW dependence between c line 13 and a line 13. sample.cpp(12) : (col. 5) remark: vector dependence: assumed FLOW dependence between c line 13 and c line 13. sample.cpp(12) : (col. 5) remark: loop was not vectorized: existence of vector dependence. |
If the compiler reports "r;Loop was not vectorized" because of the existence of vector dependence, then you should analyze the loop for vector dependence. If you determine there is no legitimate vector dependence, then the message indicates that the compiler was assuming the pointers or arrays in the loop were dependent, which implies the pointers or arrays were aliased. Use memory disambiguation techniques to resolve these cases.
There are three major types of vector dependence: FLOW, ANTI, and OUTPUT.
See Loop Independence to determine if there is a valid vector dependence. Many times the compiler report will assert a vector dependence where none exists - this is because the compiler assumes memory aliasing. The action to take in these cases is to check code for dependencies; if there are none, inform the compiler using methods described in memory aliasing including restrict or ivdep.
There are a number of situations where the vectorization report may indicate vector dependencies. The following situations will sometimes be reported as vector dependencies, non-unit stride, low trip count, and complex subscript expression.
Non-Unit Stride
The report might indicate that a loop could not be vectorized when the memory is accessed in a non-Unit Stride manner. This means that nonconsecutive memory locations are being accessed in the loop. In such cases, see if loop interchange can help or if it is practical. If not then you can force vectorization sometimes through vector always pragma; however, you should verify improvement.
See Understanding Runtime Performance for more information about non-unit stride conditions.
The vectorization reports are generated during the final compilation phase, which is when the executable is generated; therefore, there are certain option combinations you cannot use if you are attempting to generate a report. If you use the following option combinations, the compiler issues a warning and does not generate a report:
-c or -ipo or -x with -vec-report (Linux* and Mac OS* X) and /c or /Qipo or /Qx with /Qvec-report (Windows*)
-c or -ax with -vec-report (Linux and Mac OS X) and /c or /Qax with /Qvec-report (Windows)
The following example commands can generate vectorization reports:
Operating System |
Command Examples |
---|---|
Linux and Mac OS X |
The following commands generate a vectorization report: icpc -xSSSE3 -vec-report3 sample.cpp icpc -xSSSE3 -ipo -vec-report3 sample.cpp icpc -c -xSSSE3 -ipo -vec-report3 sample.cpp The following commands will not generate a vectorization report: icpc -c -xSSSE3 -vec-report3 sample.cpp icpc -xSSSE3 -ipo -vec-report3 sample.cpp icpc -c -xSSSE3 -ipo -vec-report3 sample.cpp |
Windows |
The following commands generate a vectorization report: icl /QxSSSE3 /Qvec-report:3 sample.cpp icl /QxSSSE3 /Qipo /Qvec-report:3 sample.cpp icl /c /QxSSSE3 /Qipo /Qvec-report:3 sample.cpp The following commands will not generate a vectorization report: icl /c /QxSSSE3 /Qvec-report:3 sample.cpp icl /QxSSSE3 /Qipo /Qvec-report:3 sample.cpp icl /c /QxSSSE3 /Qipo /Qvec-report:3 sample.cpp |
You might consider changing existing code to allow vectorization under the following conditions:
The vectorization report indicates that the program "contains unvectorizable statement at line XXX"; eliminate conditions such as, a printf() or user defined foo() the loop.
The vectorization report states there is a "vector dependence: proven FLOW dependence between 'r;variable' line XXX, and 'r;variable' line XXX" or "loop was not vectorized: existence of vector dependence." Generally, these conditions indicate true loop dependencies are stopping vectorization. In such cases, consider changing the loop algorithm.
For example, consider the two equivalent algorithms producing identical output below. "Foo" will not vectorize due to the FLOW dependence but "bar" does vectorize.
Example |
---|
void foo(double *y) { for(int i=1;i<10;i++) { // a loop that puts sequential numbers into array y y[i] = y[i-1]+1; } } void bar(double *y) { for(int i=1;i<10;i++) { // a loop that puts sequential numbers into array y y[i] = y[0]+i; } } |
Unsupported loop structures may prevent vectorization. An example of an unsupported loop structure is a loop index variable that requires complex computation. Change the structure to remove function calls to loop limits and other excessive computation for loop limits.
Example |
---|
int function(int n) { return (n*n-1); } void unsupported_loop_structure(double *y, int n) { for (int i=0; i<function(n); i++) { *y = *y * 2.0; } } |
Non-unit stride access might cause the report to state that "vectorization possible but seems inefficient". Try to restructure the loop to access the data in a unit-stride manner (for example, apply loop interchange), or try #pragma vector always.
Using mixed data types in the body of a loop might prevent vectorization. In the case of mixed data types, the vectorization report might state something similar to "loop was not vectorized: condition too complex".
The following example code demonstrates a loop that cannot vectorize due to mixed data types within the loop. For example, withinborder is an integer while all other data types in loop are not. Simply changing the withinborder data type will allow this loop to vectorize.
Example |
---|
int howmany_close(double *x, double *y) { int withinborder=0; double dist; for(int i=0;i<100;i++) { dist=sqrtf(x[i]*x[i] + y[i]*y[i]); if (dist<5) withinborder++; } return 0; } |