Vertical Coordinate

Initialization

The default VertCoord instance is created by the init method, which assumes that Decomp has already been initialized.

Decomp::init();
VertCoord::init();

The default instance can be retrieved by:

auto *DefVertCoord = VertCoord::getDefault();

Additional instances can be created by calling the create method.

VertCoord *VertCoord::create(const std::string &Name,  ///< [in] Name for vertical coordinate
                             const Decomp *MeshDecomp, ///< [in] Decomp for mesh
                             Config *Options)          ///< [in] Confiuration options

This calls the constructor and places the instance in a map that can be used to retrieve the instance by name:

auto *DefVertCoord = VertCoord::get('Name');

The constructor stores some information from Decomp, allocates the primary member variables that are computed by methods of the VertCoord class during a model timestep. Host mirror copies are also created, with variables appended with H. It also reads in some additional mesh information and computes the information describing the min/max location of the active vertical layers. Finally, it registers variables with IOStreams so they can be requested as output.

Variables

A list of member variables along with their types and dimension sizes is below:

Variable Name

Type

Dimensions

PressureInterface

Real

NCellsSize, NVertLevelsP1

PressureMid

Real

NCellsSize, NVertLevels

ZInterface

Real

NCellsSize, NVertLevelsP1

ZMid

Real

NCellsSize, NVertLevels

GeopotentialMid

Real

NCellsSize, NVertLevels

LayerThicknessPStar

Real

NCellsSize, NVertLevels

MinLevelCell

Integer

NCellsSize

MaxLevelCell

Integer

NCellsSize

MinLevelEdgeTop

Integer

NEdgesSize

MaxLevelEdgeTop

Integer

NEdgesSize

MinLevelEdgeBot

Integer

NEdgesSize

MaxLevelEdgeBot

Integer

NEdgesSize

MinLevelVertexTop

Integer

NVerticesSize

MaxLevelVertexTop

Integer

NVerticesSize

MinLevelVertexBot

Integer

NVerticesSize

MaxLevelVertexBot

Integer

NVerticesSize

VertCoordMovementWeights

Real

NCellsSize, NVertLevels

RefLayerThickness

Real

NCellsSize, NVertLevels

BottomDepth

Real

NCellsSize

Removal

VertCoord instances can be removed by name:

VertCoord.erase("Name");

or all instances can be destroyed by calling:

VertCoord.clear();

Use of hierarchical parallelism

The methods computePressure and computeZHeight are similar in that they use hierarchical parallelism to split the work for horizontal cells over teams of threads, with a parallel_for. This is done with a TeamPolicy:

const auto Policy = TeamPolicy(NCellsAll, OMEGA_TEAMSIZE, 1);

The parallel_for is then called with this policy:

Kokkos::parallel_for("loopName", Policy, KOKKOS_LAMBDA(const TeamMember &Member) {
       const I4 ICell = Member.league_rank();
     ...
}

The cumulative sum in the vertical is computed among threads (in parallel) within in the parallel_for using a parallel_scan. The parallel_scan is called inside the parallel_for:

Range = KMax - KMin + 1;
Kokkos::parallel_scan(
    TeamThreadRange(Member, Range),
    [&](int K, Real &Accum, bool IsFinal) {
    ...
}

where KMax and KMin are the maximum and minimum active vertical layer indices. In computeTargetThickness an outer loop divides the horizontal cells over teams of threads as above, however, there is a nested parallel_reduce that computes the column sum of the reference layer thicknesses times the vertical coordinate movement weights among threads in a team:

Real SumWh 0= 0;
Kokkos::parallel_reduce(
    Kokkos::TeamThreadRange(Member, KMin, KMax + 1),
    [=](const int K, Real &LocalWh) {
       LocalWh += VertCoordMovementWeights(ICell, K) *
                  RefLayerThickness(ICell, K);
    },
    SumWh);

also, the vertical computation of the target thicknesses are computed in a nested parallel_for:

Kokkos::parallel_for(
    Kokkos::TeamThreadRange(Member, NChunks), [&](const int KChunk) {
   ...
}

This parallel_for iterates over vertical chunks to facilitate vectorization on CPUs within in inner for loop over the vector length. The vector length on GPUs is set to 1 to maximize parallelism. The computeGeopotential method uses hierarchical parallelism in a very similar way to computeTargetThickness, except that it doesn’t require a column sum. It has an outer parallel_for that splits horizontal cells into teams and an inner parallel_for that does vertical computations in chunks.