Vertical Coordinate
Initialization
The default VertCoord
instance is created by the init
method, which assumes that Decomp
has already been initialized.
Decomp::init();
VertCoord::init();
The default instance can be retrieved by:
auto *DefVertCoord = VertCoord::getDefault();
Additional instances can be created by calling the create
method.
VertCoord *VertCoord::create(const std::string &Name, ///< [in] Name for vertical coordinate
const Decomp *MeshDecomp, ///< [in] Decomp for mesh
Config *Options) ///< [in] Confiuration options
This calls the constructor and places the instance in a map that can be used to retrieve the instance by name:
auto *DefVertCoord = VertCoord::get('Name');
The constructor stores some information from Decomp
, allocates the primary member variables that are computed by methods of the VertCoord
class during a model timestep.
Host mirror copies are also created, with variables appended with H
.
It also reads in some additional mesh information and computes the information describing the min/max location of the active vertical layers.
Finally, it registers variables with IOStreams
so they can be requested as output.
Variables
A list of member variables along with their types and dimension sizes is below:
Variable Name |
Type |
Dimensions |
---|---|---|
PressureInterface |
Real |
NCellsSize, NVertLevelsP1 |
PressureMid |
Real |
NCellsSize, NVertLevels |
ZInterface |
Real |
NCellsSize, NVertLevelsP1 |
ZMid |
Real |
NCellsSize, NVertLevels |
GeopotentialMid |
Real |
NCellsSize, NVertLevels |
LayerThicknessPStar |
Real |
NCellsSize, NVertLevels |
MinLevelCell |
Integer |
NCellsSize |
MaxLevelCell |
Integer |
NCellsSize |
MinLevelEdgeTop |
Integer |
NEdgesSize |
MaxLevelEdgeTop |
Integer |
NEdgesSize |
MinLevelEdgeBot |
Integer |
NEdgesSize |
MaxLevelEdgeBot |
Integer |
NEdgesSize |
MinLevelVertexTop |
Integer |
NVerticesSize |
MaxLevelVertexTop |
Integer |
NVerticesSize |
MinLevelVertexBot |
Integer |
NVerticesSize |
MaxLevelVertexBot |
Integer |
NVerticesSize |
VertCoordMovementWeights |
Real |
NCellsSize, NVertLevels |
RefLayerThickness |
Real |
NCellsSize, NVertLevels |
BottomDepth |
Real |
NCellsSize |
Removal
VertCoord
instances can be removed by name:
VertCoord.erase("Name");
or all instances can be destroyed by calling:
VertCoord.clear();
Use of hierarchical parallelism
The methods computePressure
and computeZHeight
are similar in that they use hierarchical parallelism to split the work for horizontal cells over teams of threads, with a parallel_for
.
This is done with a TeamPolicy
:
const auto Policy = TeamPolicy(NCellsAll, OMEGA_TEAMSIZE, 1);
The parallel_for
is then called with this policy:
Kokkos::parallel_for("loopName", Policy, KOKKOS_LAMBDA(const TeamMember &Member) {
const I4 ICell = Member.league_rank();
...
}
The cumulative sum in the vertical is computed among threads (in parallel) within in the parallel_for
using a parallel_scan
.
The parallel_scan
is called inside the parallel_for
:
Range = KMax - KMin + 1;
Kokkos::parallel_scan(
TeamThreadRange(Member, Range),
[&](int K, Real &Accum, bool IsFinal) {
...
}
where KMax
and KMin
are the maximum and minimum active vertical layer indices.
In computeTargetThickness
an outer loop divides the horizontal cells over teams of threads as above, however, there is a nested parallel_reduce
that computes the column sum of the reference layer thicknesses times the vertical coordinate movement weights among threads in a team:
Real SumWh 0= 0;
Kokkos::parallel_reduce(
Kokkos::TeamThreadRange(Member, KMin, KMax + 1),
[=](const int K, Real &LocalWh) {
LocalWh += VertCoordMovementWeights(ICell, K) *
RefLayerThickness(ICell, K);
},
SumWh);
also, the vertical computation of the target thicknesses are computed in a nested parallel_for
:
Kokkos::parallel_for(
Kokkos::TeamThreadRange(Member, NChunks), [&](const int KChunk) {
...
}
This parallel_for
iterates over vertical chunks to facilitate vectorization on CPUs within in inner for
loop over the vector length.
The vector length on GPUs is set to 1 to maximize parallelism.
The computeGeopotential
method uses hierarchical parallelism in a very similar way to computeTargetThickness
, except that it doesn’t require a column sum.
It has an outer parallel_for
that splits horizontal cells into teams and an inner parallel_for
that does vertical computations in chunks.