GPU code overview
CMSSW modules meant to be run on NVIDIA GPUs.
Functionality covered in this documentation:
Module Dataflow
flowchart TD
 classDef Module fill:#c5c5FF,stroke:#0000ff,stroke-width:4px,color:#0000ff;
 classDef CUDAProduct fill:#ffffff,stroke:#008000,stroke-width:2px,color:#008000; 
 SiPixelRawToClusterCUDA --> SiPixelDigisCUDA([SiPixelDigisCUDA])
 SiPixelRawToClusterCUDA --> SiPixelClustersCUDA([SiPixelClustersCUDA])
 SiPixelClustersCUDA --> SiPixelRecHitCUDA
 SiPixelDigisCUDA --> SiPixelRecHitCUDA 
 SiPixelRecHitCUDA --> TrackingRecHit2DGPU([TrackingRecHit2DGPU])
 TrackingRecHit2DGPU --> CAHitNtupletCUDA
 CAHitNtupletCUDA --> PixelTrackHeterogeneous([PixelTrackHeterogeneous])
 PixelTrackHeterogeneous --> PixelVertexProducerCUDA
 PixelVertexProducerCUDA --> ZVertexHeterogeneous([ZVertexHeterogeneous])
 subgraph Legend
  CUDAProduct([CUDA Product])
  CMSSWModule([CMSSW Module])
 end
 class CMSSWModule,SiPixelRawToClusterCUDA,SiPixelRecHitCUDA,CAHitNtupletCUDA,PixelVertexProducerCUDA Module;
 class CUDAProduct,SiPixelClustersCUDA,SiPixelDigisCUDA,TrackingRecHit2DGPU,PixelTrackHeterogeneous,ZVertexHeterogeneous CUDAProduct;  Data Structure
The SoA approach is used to store pixel data used by the CUDA code.
In short, data from each module is concatenated into multiple 1D arrays.
For example, suppose we have two modules (module0 and module1).
Each one will contain 16 x 80 x 52 = 66560 pixels. Each pixel has
its module coordinates (x and y) and the module index it belongs
to (moduleInd, for this example 0 and 1).
A simplistic visualization of the example above can be seen below; each module is composed of 2 x 8 ROCs, each pixel having unique coordinates relative to the module.

The SoA approach to store the pixel data would look like the image below; data from all modules are concatenated one after another in 1D arrays:

Warning
The data is not stored in a per-module sorted manner,
meaning that module1 data could precede module0's data.
Data from each module is, however, stored consecutively,
meaning that data from one module is not split up into several
blocks.
An actual example of such arrays can be seen in the SiPixelDigisCUDASOAView.
Todo
Where is this data structure created? Unpacking?