Parallel loops

Omega adopts the Kokkos programming model to express on-node parallelism. To provide simplified syntax for the most frequently used computational patterns, Omega provides wrappers funtions that internally handle creating and setting-up Kokkos policies.

Flat multi-dimensional parallelism

parallelFor

To perform parallel iteration over a multi-dimensional index range Omega provides the parallelFor wrapper. For example, the following code shows how to set every element of a 3D array in parallel.

   Array3DReal A("A", N1, N2, N3);
   parallelFor(
       {N1, N2, N3},
       KOKKOS_LAMBDA(int J1, int J2, int J3) {
          A(J1, J2, J3) = J1 + J2 + J3;
       });

Ranges with up to five dimensions are supported. Optionally, a label can be provided as the first argument of parallelFor.

   parallelFor("Set A",
       {N1, N2, N3},
       KOKKOS_LAMBDA(int J1, int J2, int J3) {
          A(J1, J2, J3) = J1 + J2 + J3;
       });

Adding labels can result in more informative messages when Kokkos debug variables are defined.

parallelReduce

To perform parallel reductions over a multi-dimensional index range the parallelReduce wrapper is available. The following code sums every element of A.

   Real SumA;
   parallelReduce(
       {N1, N2, N3},
       KOKKOS_LAMBDA(int J1, int J2, int J3, Real &Accum) {
          Accum += A(J1, J2, J3);
       },
       SumA);

Note the presence of an accumulator variable Accum in the KOKKOS_LAMBDA arguments. You can use parallelReduce to perform other types of reductions. As an example, the following snippet finds the maximum of A.

   Real MaxA;
   parallelReduce(
       {N1, N2, N3},
       KOKKOS_LAMBDA(int J1, int J2, int J3, Real &Accum) {
          Accum = Kokkos::max(Accum, A(J1, J2, J3));
       },
       Kokkos::Max<Real>(MaxA));

To perform reductions that are not sums, in addition to modifying the lambda body, the final reduction variable needs to be cast to the appropriate type. In the above example, MaxA is cast to Kokkos::Max<Real> to perform a max reduction. The parallelReduce wrapper supports performing multiple reduction at the same time. You can compute SumA and MaxA in one pass over the data:

   parallelReduce(
       {N1, N2, N3},
       KOKKOS_LAMBDA(int J1, int J2, int J3, Real &AccumSum, Real &AccumMax) {
          AccumSum += A(J1, J2, J3);
          AccumMax = Kokkos::max(AccumMax, A(J1, J2, J3));
       },
       SumA, Kokkos::Max<Real>(MaxA));

Similarly to parallelFor, parallelReduce supports labels and up to five dimensions.