SiPixelDigisCUDASOAView

Class which stores Pixel digi data in an SoA approach. Used indirectly by SiPixelDigisCUDA instances, such as the digis_d variable in the SiPixelRawToClusterGPUKernel class.

The data is stored on the GPU.

File on github.

Key attributes of this class:

xx_: x-coordinates of pixels (array)
yy_: y-coordinates of pixels (array)
adc_: ADC values for each pixel (array)
moduleInd_: GPU-specific module identifiers that each pixel belongs to (GPU only)
clus_: The cluster identifier that each pixel is assigned to.
pdigi_: Packed digi format. Contains coordinates and ADC values in one variable. See also PixelDigi for the CPU counterpart.
rawIdArr_: Unique identifier used to identify modules in the whole of CMS (TODO: Double-check that this is the same as DetId).

UML Diagram

classDiagram
    class SiPixelDigisCUDASOAView{
        -uint16_t* xx_
        -uint16_t* yy_      
        -uint16_t* adc_     
        -uint16_t* moduleInd_       
        -int32_t* clus_             
        -uint32_t* pdigi_
        -uint32_t* rawIdArr_        

        +xx() uint16_t*
        +yy() uint16_t*     
        +adc() uint16_t*        
        +moduleInd() uint16_t*      
        +clus() int32_t*        
        +pdigi() uint32_t*              
        +rawIdArr() uint32_t*   
        -roundFor128ByteAlignment(int size) static int
        -getColumnAddress(column, store, size)
}

Todo

Explain how this works

For example, if we want to transfer only 1 uint16_t variable (usually has a size of 2 bytes), one would have to round this up to the nearest 128-byte transaction, meaning that we should request 64 uint16_t variables (because 64 * sizeof(uint16_t) = 128) to be efficient. Similary, if we needed 64 uint16_t, we would still need one 128-byte transaction. However, for 65 uint16_t, we would need 2 x 128-byte transfers.

It's used mainly for pointer arithmetic, for calculating the offset that the pointer will have to move to find the start of a specific SoA variable, e.g. the adc_ array.

Constructor

Todo

TODO

Memory Structure

As mentioned, efficient GPU global memory transactions must be 32-, 64- or 128-byte aligned, therefore all memory storage and accesses are best done in chuncks of the aforementioned sizes.

For this code, 128-byte chunks have been chosen by the developers.

When a SiPixelDigisCUDASOAView instance is created, pointers to the start of each SoA variable (clus_, adc_ etc) are created. Those are calculated using the start of the memory requested (i.e. the pointer stored in m_store), the constant assigned to each variable in StorageLocation and the number of pixels available (maxFedWords).

Note

The values in StorageLocation (0, 2, 4, 6, 7, 8, 9) are not random, but are directly related to the data type of the variables. Note that 32-bit variables (clus_, pdigi_ and rawIdArr_) are given numbers that have double the size of 16-bit variables (adc_, xx_, yy_, moduleInd_). For example, note that the values of StorageLocation::kCLUS and StorageLocation::kPDIGI differ by 2, while the values of StorageLocation::kADC and StorageLocation::kXX differ by 1.

Firstly, 32-bit variables are stored in order, i.e.:

clus_ (int32_t)
pdigi_ (uint32_t)
rawIdArr_ (uint32_t)

Then, 16-bit variables are stored, i.e.:

adc_ (uint16_t)
xx_ (uint16_t)
yy_ (uint16_t)
moduleInd_ (uint16_t)

As an example, let's assume that we want to store information for 8 pixels (numPixels = maxFedWords = 8) on the GPU, using the SiPixelDigisCUDASOAView which makes use of the SoA approach. Upon a digis_d instance creation, an m_store pointer to the GPU memory is created, where the data will be stored. Let's assume that it points to GPU memory location 0.

GPU mem empty — The GPU memory, initially empty

`clus_`

To find out where the first variable will be stored (i.e. the first constant in StorageLocation, kCLUS), we check the assignment done in the class' constructor:

clus_ = getColumnAddress<int32_t>(StorageLocation::kCLUS, store, maxFedWords);

The getColumnAddress method is called, which will calculate the memory offset starting from the store pointer as:

store + kCLUS * roundFor128ByteAlignment(numPixels) = 0 + 0 * 64 = 0

Note

roundFor128ByteAlignment will calculate the minimum number of uint16_t variables that must be requested, given the number of pixels that must be stored.

For 8 pixels, roundFor128ByteAlignment will return 64, since a single block of 64 * uint16_ts (i.e. 1 x 128 bytes, since a uint16_t is 2 bytes) will suffice to fit 8 pixels.

Therefore, since each clus_ element is an int32_t variable, we will be storing 8 * sizeof(int32_t) = 32 bytes starting from memory address 0 (calculated by getColumnAddress above).

GPU mem clus — The GPU memory, with **clus_** stored. The data takes up memory address 0 to 31

`pdigi_`

Moving on, next in StorageLocation is kPDIGI, with value 2. Similarly, we calculate the initial memory offset that the data will be stored at by using the getColumnAddress method:

store + kPDIGI * roundFor128ByteAlignment(numPixels) = 0 + 2 * 64 = 128

This number is the offset counted in uint16_t, meaning twice this number in bytes, meaning 256 bytes offset from the start.

GPU mem pdigi — The GPU memory, with **pdigi_** also stored. The data takes up memory address 256 to 256 + 31

Note

Memory addresses 32 to 256, i.e. memory between clus_ and pdigi_, remain unused.

`rawIdArr_`

Similarly for rawIdArr_ (kRAWIDARR = 4):

store + kRAWIDARR * roundFor128ByteAlignment(numPixels) = 0 + 4 * 64 = 256

GPU mem rawidarr — The GPU memory, with **rawIdArr_** also stored. The data takes up memory address 512 to 512 + 31

`adc_`

Next, adc_ is stored (kADC = 6). These values are of size uint16_t, meaning that, for 8 pixels, this array will take up 8 * sizeof(uint16_t) = 16 bytes of memory.

To get the memory location where the data will be stored at, we calculate the value returned from getColumnAddress:

store + kADC * roundFor128ByteAlignment(numPixels) = 0 + 6 * 64 = 384

We multiply by two, since this is still an offset counted in uint16_ts and we store adc_ in memory address 768.

GPU mem adc — The GPU memory, with **adc_** also stored. The data takes up memory address 768 to 768 + 31

`xx_`

xx_ (kXX = 7) is also an uint16_t variable, and it is going to be stored in:

store + kXX * roundFor128ByteAlignment(numPixels) = 0 + 7 * 64 = 448

Final offset is 896.

GPU mem x — The GPU memory, with x_ also stored. The data takes up memory address 896 to 896 + 16

Final structure

After following the same procedure for yy_ and moduleInd_, we find the final memory structure to look like the one below:

GPU mem all — The GPU memory, with all variables stored. The memory that remains unused is marked with light blue.

Parts of the memory will remain unused, depending on the numPixels but, for large number of pixels, the percentage of wasted memory becomes negligible.

Special case

What if numPixels was 64? By following the same procedure, the resulting memory usage would look something like this:

GPU mem full — The GPU memory, with all variables stored, and **numPixels = 64**. No memory is unused.

Note

Any multiple of 64 would result in no wasted memory.

SiPixelDigisCUDASOAView

UML Diagram

Contents

Enums

`StorageLocation`

Methods

`getColumnAddress`

`roundFor128ByteAlignment`

Constructor

Memory Structure

`clus_`

`pdigi_`

`rawIdArr_`

`adc_`

`xx_`

Final structure

Special case