Loop Independence

Loop independence is important since loops that are independent can be parallelized. Independent loops can be parallelized in a number of ways, from the course-grained parallelism of OpenMP*, to fine-grained Instruction Level Parallelism (ILP) of vectorization and software pipelining.

Loops are considered independent when the computation of iteration Y of a loop can be done independently of the computation of iteration X. In other words, if iteration 1 of a loop can be computed and iteration 2 simultaneously could be computed without using any result from iteration 1, then the loops are independent.

Occasionally, you can determine if a loop is independent by comparing results from the output of the loop with results from the same loop written with a decrementing index counter.

For example, the loop shown in example 1 might be independent if the code in example 2 generates the same result.

Example

#define MAX 1024

void loop_indep1(int a[MAX],int b[MAX])

{

for (int j=0;j<MAX;j++)

a[j] = b[j];

}

#define MAX 1024

void loop_indep2(int a[MAX],int b[MAX])

{

for (int j=MAX;j>0;j--)

a[j] = b[j];

}

When loops are dependent, improving loop performance becomes much more difficult. Loops can be dependent in several, general ways.

The following sections illustrate the different loop dependencies.

Flow Dependency - Read After Write

Cross-iteration flow dependence is created when variables are written then read in different iterations, as shown in the following example:

Example

void flow_dep(double A[])

{

for (int j=1; j<1024; j++)

A[j]=A[j-1];

}

The above example is equivalent to the following lines for the first few iterations:

Sample Iterations

A[1]=A[0];

A[2]=A[1];

Recurrence relations feed information forward from one iteration to the next.

Example

void time_stepping_loops(double a[], double b[])

{

for(int j=1; j<MAX; j++) {

a[j] = a[j-1] + b[j];

}

}

Most recurrences cannot be made fully parallel. Instead, look for a loop further out or further in to parallelize. You might be able to get more performance gains through unrolling.

Anti Dependency - Write After Read

Cross-iteration anti-dependence is created when variables are read then written in different iterations, as shown in the following example:

Example

void anti_dep1(double A[])

{

for (int j=1; j<1024; j++)

A[j]=A[j+1];

}

The above example is equivalent to the following lines for the first few iterations:

Sample Iterations

A[1]=A[2];

A[2]=A[3];

Output Dependency - Write After Write

Cross-iteration output dependence is where variables are written then rewritten in a different iteration. The following example illustrates this type of dependency:

Example

void anti_dep2(double A[], double B[], double C[])

{

for (int j=1; j<1024; j++) {

A[j]=B[j];

A[j+1]=C[j];

}

}

The above example is equivalent to the following lines for the first few iterations:

Sample Iterations

A[1]=B[1];

A[2]=C[1];

A[2]=B[2];

A[3]=C[2];

Reductions

The Intel® compiler can successfully vectorize or software pipeline (SWP) most loops containing reductions on simple math operators like multiplication (*), addition (+), subtraction (-), and division (/). Reductions collapse array data to scalar data by using associative operations:

Example

void reduction(double * sum, double c[])

{

for (int j=0; j<MAX; j++) {

*sum = *sum + c[j];

}

}

The compiler might occasionally misidentify a reduction and report flow-, anti-, output-dependencies or sometimes loop-carried memory-dependency-edges; in such cases, the compiler will not vectorize or SWP the loop. In such cases, recognize that the programming construct is simply a reduction, and direct the compiler through the use of pragmas to vectorize or SWP the loop; you can pragmas (like, #pragma ivdep or #pragma swp) to help the compiler in these cases.