Available machines
As an overview, you have three choices in order to run GPU-enabled code:
- The very heavily-used
lxplus-gpu
machines ([NOT RECOMMENDED]). - The Point5's Online Machines (RECOMMENDED).
- Special nodes.
lxplus-gpu
The lxplus service offers lxplus-gpu.cern.ch
for shared GPU instances - with limited isolation and performance.
One can connect similary as would do to the lxplus.cern.ch
host domain.
ssh <username>@lxplus-gpu.cern.ch [-X]
If connecting directly from your computer
You might need to initialize kerberos and execute bash again from within the machine:
kinit
exec bash
CMS Point 5 Machines
Info
This section is taken from the CMS TWiki TriggerDevelopmentWithGPUs page.
There are 10 machines available for general development and validation of the online reconstruction on GPUs:
gpu-c2a02-35-01.cms
gpu-c2a02-35-02.cms
gpu-c2a02-37-01.cms
gpu-c2a02-37-02.cms
(currently without a GPU)gpu-c2a02-37-03.cms
gpu-c2a02-37-04.cms
gpu-c2a02-39-01.cms
gpu-c2a02-39-02.cms
(Preferred)gpu-c2a02-39-03.cms
gpu-c2a02-39-04.cms
These are dedicated machines for the development of the online reconstruction.
To access them, you will first need a CMS online account. See below for instructions.
Request a CMS Online account
To request access, please subscribe to the cms-hlt-gpu e-group and send an email to andrea.bocci@cern.ch, indicating:
- whether you already have an online account;
- your online or lxplus username;
- your full name and email.
How to connect
Requirements:
- Have a CMS online account and
- Be in the
gpudev
group.
To connect directly from your computer:
- Create a proxy:
ssh -f -N -D18080 <username>@cmsusr.cern.ch
- Connect via SSH:
or
ssh -o ProxyCommand='nc --proxy localhost:18080 --proxy-type socks5 %h %p' <username>@gpu-c2a02-39-02.cms
ssh -o ProxyCommand='nc -x localhost:18080 -X 5 %h %p' <username>@gpu-c2a02-39-02.cms
Note
More detailed instructions here
Special configuration required
- To make commands like
cmsenv
andcmsrel
available, runfirst.source /cvmfs/cms.cern.ch/cmsset_default.sh
- To allow connecting to GitHub via HTTP:
- Configure the SOCKS proxy
- Open the proxy:
ssh -f -N cmsusr.cms
- Configure
git
:git config --global --replace-all http.proxy socks5://localhost:18080
- Set the correct
SCRAM_ARCH
for these machines:export SCRAM_ARCH=el8_amd64_gcc10
Notes
- These machines lie in a different subnet than the one that the LXPLUS machines belong to.
- A side-effect of the previous point is that those machines do not have access to the Grid.
- The
/nfshome0/<username>
directory is shared and available from all the machines above, but has limited space. - The
/data/user/<username>
directory is not shared across the devices, but has larger capacity. - CMS Cluster Users Guide
- Use
curl
with the--socks5
flag like so:curl --socks5 socks5://localhost:18080 <url>
Useful commands
Transfering files to/from P5 machines
From your own computer:
scp -r -o ProxyCommand='nc -x localhost:18080 -X 5 %h %p' <username>@gpu-c2a02-39-01.cms:/remote/path /local/path
This prevents the nc: invalid option -- '-'
error.
Special GPU nodes
Info
This section is more or less taken from the Patatrack website systems subpage.
cmg-gpu1080
System information
Getting access to the machine
In order to get access to the machine you should send a request to subscribe to the CERN e-group: cms-gpu-devel.
You should also send an email to Felice Pantaleo motivating the reason for the requested access.
Usage Policy
Normally, no more than 1 GPU per users should be used. To limit visible devices use
export CUDA_VISIBLE_DEVICES=<list of numbers>
Where <list of numbers>
can be e.g. 0
, 0,4
, 1,2,3
. Use nvidia-smi
to check available resources.
Usage for ML studies
If you need to use the machine for training DNNs you could accidentally occupy all the GPUs, making them unavailable for other users.
For this reason you're kindly asked to use
import setGPU
before any import that will use a GPU (e.g. tensorflow). This will assign to you the least loaded GPU on the system.
It is strictly forbidden to use GPUs from within your jupyter notebook. Please export your notebook to a python program and execute it. The access to the machine will be revoked when failing to comply to this rule.