(omega-dev-vert-coord)= ## Vertical Coordinate ### Initialization The default `VertCoord` instance is created by the `init` method, which assumes that `Decomp` has already been initialized. ``` Decomp::init(); VertCoord::init(); ``` The default instance can be retrieved by: ``` auto *DefVertCoord = VertCoord::getDefault(); ``` Additional instances can be created by calling the `create` method. ``` VertCoord *VertCoord::create(const std::string &Name, ///< [in] Name for vertical coordinate const Decomp *MeshDecomp, ///< [in] Decomp for mesh Config *Options) ///< [in] Confiuration options ``` This calls the constructor and places the instance in a map that can be used to retrieve the instance by name: ``` auto *DefVertCoord = VertCoord::get('Name'); ``` The constructor stores some information from `Decomp`, allocates the primary member variables that are computed by methods of the `VertCoord` class during a model timestep. Host mirror copies are also created, with variables appended with `H`. It also reads in some additional mesh information and computes the information describing the min/max location of the active vertical layers. Finally, it registers variables with `IOStreams` so they can be requested as output. ### Variables A list of member variables along with their types and dimension sizes is below: | Variable Name | Type | Dimensions | | ------------- | ---- | ---------- | | PressureInterface | Real | NCellsSize, NVertLevelsP1 | | PressureMid | Real | NCellsSize, NVertLevels | | ZInterface | Real | NCellsSize, NVertLevelsP1| | ZMid | Real | NCellsSize, NVertLevels | | GeopotentialMid | Real | NCellsSize, NVertLevels | | LayerThicknessPStar | Real | NCellsSize, NVertLevels| | MinLevelCell | Integer | NCellsSize | | MaxLevelCell | Integer | NCellsSize | | MinLevelEdgeTop | Integer| NEdgesSize | | MaxLevelEdgeTop | Integer | NEdgesSize | | MinLevelEdgeBot | Integer | NEdgesSize | | MaxLevelEdgeBot | Integer | NEdgesSize | | MinLevelVertexTop | Integer | NVerticesSize | | MaxLevelVertexTop | Integer | NVerticesSize | | MinLevelVertexBot | Integer | NVerticesSize | | MaxLevelVertexBot | Integer | NVerticesSize | | VertCoordMovementWeights | Real | NCellsSize, NVertLevels | | RefLayerThickness | Real | NCellsSize, NVertLevels | | BottomDepth | Real | NCellsSize | ### Removal `VertCoord` instances can be removed by name: ``` VertCoord.erase("Name"); ``` or all instances can be destroyed by calling: ``` VertCoord.clear(); ``` ### Use of hierarchical parallelism The methods `computePressure` and `computeZHeight` are similar in that they use hierarchical parallelism to split the work for horizontal cells over teams of threads, with a `parallel_for`. This is done with a `TeamPolicy`: ```c++ const auto Policy = TeamPolicy(NCellsAll, OMEGA_TEAMSIZE, 1); ``` The `parallel_for` is then called with this policy: ```c++ Kokkos::parallel_for("loopName", Policy, KOKKOS_LAMBDA(const TeamMember &Member) { const I4 ICell = Member.league_rank(); ... } ``` The cumulative sum in the vertical is computed among threads (in parallel) within in the `parallel_for` using a `parallel_scan`. The `parallel_scan` is called inside the `parallel_for`: ```c++ Range = KMax - KMin + 1; Kokkos::parallel_scan( TeamThreadRange(Member, Range), [&](int K, Real &Accum, bool IsFinal) { ... } ``` where `KMax` and `KMin` are the maximum and minimum active vertical layer indices. In `computeTargetThickness` an outer loop divides the horizontal cells over teams of threads as above, however, there is a nested `parallel_reduce` that computes the column sum of the reference layer thicknesses times the vertical coordinate movement weights among threads in a team: ```c++ Real SumWh 0= 0; Kokkos::parallel_reduce( Kokkos::TeamThreadRange(Member, KMin, KMax + 1), [=](const int K, Real &LocalWh) { LocalWh += VertCoordMovementWeights(ICell, K) * RefLayerThickness(ICell, K); }, SumWh); ``` also, the vertical computation of the target thicknesses are computed in a nested `parallel_for`: ```c++ Kokkos::parallel_for( Kokkos::TeamThreadRange(Member, NChunks), [&](const int KChunk) { ... } ``` This `parallel_for` iterates over vertical chunks to facilitate vectorization on CPUs within in inner `for` loop over the vector length. The vector length on GPUs is set to 1 to maximize parallelism. The `computeGeopotential` method uses hierarchical parallelism in a very similar way to `computeTargetThickness`, except that it doesn't require a column sum. It has an outer `parallel_for` that splits horizontal cells into teams and an inner `parallel_for` that does vertical computations in chunks.