In SLURM jobs, use the $TMPDIR
variable to set the directory for the scratch files of your programs, for example:
export SCRATCH=$TMPDIR
export GAUSS_SCRDIR=$TMPDIR
The $TMPDIR
directories are accessible only within the SLURM jobs
For the purposes of the calculations, there is a temporary disk space for storing files that are actively used by user programs. Each calculation task is assigned at least one directory for temporary files, which has its own unique path stored in the TMPDIR_XXX
variable, where XXX
depends on the type of temporary disk space (XXX
= LUSTRE
, LOCAL
or SHM
).
The user has access only to the directories of their tasks.
Current list of partitions and their available TMPDIR
The current list of TMPDIR spaces for each SLURM partition, along with the default types of temporary disk spaces, is located in in the SLURM partitions in the "Available TMPDIR" columns
TMPDIR directories for finished jobs
The TMPDIR directory is kept for the next 14 days after the job finishes. Stored TMPDIR directories can be found at/lustre/tmp/slurm/finished_jobs/${SLURM_JOB_ID}
.
Backup copies of the temporary directoriesare not created
There is no possibility to recover data from$TMPDIR
. Contents of the TMPDIR are not protected by WCSS and may be deleted or lost without warning. Users should secure their important results on their own
bem2-cpu-short
, bem2-cpu-normal
, bem2-cpu-interactive
lem-cpu-short
, lem-cpu-normal
, lem-cpu-interactive
lem-gpu-short
, lem-gpu-normal
, lem-gpu-interactive
TMPDIR_LUSTRE=/lustre/tmp/slurm/${SLURM_JOB_ID}
--gres=storage:lustre:1
bem2-cpu-short
i bem2-cpu-normal
), 360T (partitions lem-cpu-short
, lem-cpu-normal
, lem-gpu-short
, lem-gpu-normal
)A shared file system simultaneously available on all compute servers in a given partition.
It is primarily used for multi-node compute jobs where different nodes operate on the same set of files.
Access to TMPDIR_LUSTRE
The/lustre/tmp/slurm/$SLURM_JOB_ID
directories are available only within the SLURM jobs and are not directly visible after logging to ui.wcss.pl.To browse files under TMPDIR one should start an interactive session first, for example, using the
sub-interactive
command
Separate Lustre TMP filesystems
The Bem2 supercomputer (partitionsbem2-cpu-short
andbem2-cpu-normal
) and the LEM supercomputer (partitionslem-cpu[-short,normal], lem-gpu[-short,normal]
) have their own temporary Lustre file systems - directories under the path/lustre/tmp
from the Bem2 supercomputer are not available on the Lem supercomputer and vice versa. When using the Lustre file system, please read the terms of use located on the Lustre File system page
lem-cpu-short
, lem-cpu-normal
, lem-cpu-interactive
lem-gpu-short
, lem-gpu-normal
, lem-gpu-interactive
TMPDIR_LOCAL=/mnt/lscratch/slurm/${SLURM_JOB_ID}
--gres=storage:local:<QUOTA>
lem-cpu-short
i lem-cpu-normal
); 7000G (partitions lem-gpu-short
i lem-gpu-normal
)<QUOTA>
(e.g. 100M, 1G, 100G, 200G ...):Partitions | Default | Maximum |
---|---|---|
lem-cpu-short, lem-cpu-normal | 200G | 3400G |
lem-gpu-short, lem-gpu-normal | 200G | 7000G |
lem-gpu-interactive | 200G | 1500G |
lem-cpu-interactive | 50G | 400G |
Local file system created on NVME disks, accessible within a single compute node.
It is primarily used for single-node compute tasks that have large disk space requirements (over 50GB) and perform many IO (Input/Output) operations
bem2-cpu-short
, bem2-cpu-normal
, bem2-cpu-interactive
lem-cpu-short
, lem-cpu-normal
, lem-cpu-interactive
lem-gpu-short
, lem-gpu-normal
,lem-gpu-interactive
TMPDIR_SHM=/dev/shm/slurm/${SLURM_JOB_ID}
--gres=storage:shm:<QUOTA>
<QUOTA>
(e.g. 100M, 1G, 100G, 200G ...).Local file space located in the RAM cache (path /dev/shm
), available within a single compute node.
Primarily used for single-node compute jobs with low disk space requirements (below 50GB) and many IO (Input/Output) operations.
Using SHM storage and memory allocation in SLURM jobs
When using the SHM storage (option--gres=storage:shm:<QUOTA>
), the amount of<QUOTA>
space needed will be automatically added to the job's memory requirement (one declared using the--mem
option)For example, if a job requires 5GB of memory for running programs (declared via the option
--mem=5G
) and needs additional 50GB of space in SHM (declared via the option--gres=storage:shm:50G
), then the total RAM requirement for this task will be50G+5G=55G
, which will be automatically taken into account when queuing this task (a new value--mem=55G
will be set)
The user declares the TMPDIR type when submitting a SLURM job via the so-called GRES resources (Generic Resource Scheduling) using the --gres=storage:<XXX>:<QUANTITY>
option, where <XXX>
is the type of temporary disk space, and <QUANTITY>
declares the maximum possible occupancy of the TMPDIR directory (the so-called quota), for which the prefixes M
(MiB) and G
(GiB) can be used. When using a batch file for sbatch
, the #SBATCH --gres=storage:<XXX>:<QUANTITY>
option should be provided.
Only those temporary disk space directories are created that were declared during the job submission (or the default directories for a given partition). For example, if the user specifies only --gres=storage:lustre
, only the directory under the path stored in the $TMPDIR_LUSTRE
variable will be created, and the $TMPDIR_SHM
and $TMPDIR_LOCAL
directories will not be available!
Usage of several types of TMPDIR within jobs
It is possible to specify several types of temporary disk spaces (depending on their availability on a given SLURM partition), separating them with a comma, e.g.--gres=storage:lustre,storage:local:10G
. Additionally, if the user uses other types of GRES, e.g. GPU cards, they should also be specified after a comma, e.g.--gres=storage:lustre,storage:local:10G,gpu:hopper:2
.
If the user does not specify the type of TMPDIR when submitting a SLURM task, the default directory type TMPDIR_XXX
will be assigned for the task.
The default directory types TMPDIR_XXX
are assigned depending on the partition and the type of task - single or multi-node. The default directory types for each partition are marked in the Tables in the SLURM partitions in the "Available TMPDIR" column using the characters "*" for single-node tasks and "**" for multi-node tasks
Examples:
bem2-cpu-short
partition without specifying the --gres=storage
option, the job will be allocated Lustre temporary storage by default, i.e. TMPDIR_LUSTRE=/lustre/tmp/slurm/${SLURM_JOB_ID}
.lem-gpu-normal
partition, the job will be allocated local temporary storage by default, i.e. TMPDIR_LOCAL=/mnt/lscratch/slurm/${SLURM_JOB_ID}
.lem-gpu-normal
partition, then by default temporary Lustre disk space will be allocated for such job, i.e. TMPDIR_LUSTRE=/lustre/tmp/slurm/${SLURM_JOB_ID}
.For the duration of the job, appropriate TMPDIR directories are created and the corresponding TMPDIR_XXX
variables are exported, and in the case of selecting several types of TMPDIR - several directories and several TMPDIR_XXX
variables.
For the duration of the job,, a global variable TMPDIR
is created, which is set to one of the available variables TMPDIR_LUSTRE
, TMPDIR_SHM
or TMPDIR_LOCAL
.
TMPDIR preference
When the user has declared the use of more than one TMPDIR space, the$TMPDIR
variable is set to the one with the highest preference according to the seriesTMPDIR_LUSTRE < TMPDIR_SHM < TMPDIR_LOCAL
To use the default TMPDIR type for a selected partition, simply do not declare the --gres=storage:<XXX>
variable. For example:
$ srun -p bem2-cpu -N 1 -c 1 -t 1 --mem=1G {...}
For the duration of such a job, the following variables and directories will be created:
TMPDIR_LUSTRE = /lustre/tmp/slurm/${SLURM_JOB_ID}
, with no limit on TMPDIR directory occupancyTMPDIR = ${TMPDIR_LUSTRE}
$ srun -p bem2-cpu -N 1 -c 1 -t 1 --mem=1G --gres=storage:lustre {...}
or in the sbatch
script:
#/bin/bash
#SBATCH -p bem2-cpu
#SBATCH -N 1
#SBATCH -c 1
#SBATCH -t 1
#SBATCH --mem=1G
#SBATCH --gres=storage:lustre
{...}
For the duration of such a job, the following variables and directories will be created:
TMPDIR_LUSTRE = /lustre/tmp/slurm/${SLURM_JOB_ID}
, with no limit on TMPDIR directory occupancyTMPDIR = ${TMPDIR_LUSTRE}
$TMPDIR_LOCAL
occupancy 10GB$ srun -p lem-gpu -N 1 -c 1 -t 01:00:00 --mem=1G --gres=storage:local:10GB {...}
or in the sbatch
script:
#/bin/bash
#SBATCH -p lem-gpu
#SBATCH -N 1
#SBATCH -c 1
#SBATCH -t 1
#SBATCH --mem=1G
#SBATCH --gres=storage:local:10G
{...}
For the duration of such a job, the following variables and directories will be created:
TMPDIR_LOCAL = /mnt/lscratch/slurm/${SLURM_JOB_ID}
, with 10GB limit on the TMPDIR occupancyTMPDIR = ${TMPDIR_LOCAL}
$ srun -p bem2-cpu -N 1 -c 1 -t 01:00:00 --mem=1G --gres=storage:shm:100G {...}
or in the sbatch
script:
#/bin/bash
#SBATCH -p bem2-cpu
#SBATCH -N 1
#SBATCH -c 1
#SBATCH -t 1
#SBATCH --mem=1G
#SBATCH --gres=storage:shm:100G
{...}
For the duration of such a job, the following variables and directories will be created:
TMPDIR_SHM = /dev/shm/slurm/${SLURM_JOB_ID}
, with 100GB limit on the TMPDIR occupancyTMPDIR = ${TMPDIR_SHM}
In the case of such job, the total memory requirement will be 1G + 100G = 101G
, which will be taken into account when queuing the task and allocating resources on the compute nodes.
$ srun -p lem-gpu -N 2 -c 1 -t 01:00:00 --mem=1GB --gres=storage:local:10G,storage:lustre {...}
or in the sbatch
script:
#/bin/bash
#SBATCH -N 2
#SBATCH -p lem-gpu
#SBATCH -c 1
#SBATCH -t 1
#SBATCH --mem=1G
#SBATCH --gres=storage:local:10G,storage:lustre
{...}
For the duration of such a job, the following variables and directories will be created:
TMPDIR_LUSTRE = /lustre/tmp/slurm/${SLURM_JOB_ID}
, with no limit on TMPDIR directory occupancyTMPDIR_LOCAL = /mnt/lscratch/slurm/${SLURM_JOB_ID}
, with 10GB limit on the TMPDIR occupancyTMPDIR = ${TMPDIR_LOCAL}
After the job is finished, the entire contents of the TMPDIR directory are automatically moved to the Lustre TMP file system and stored there for the next 14 days. The saved TMPDIR directories are located under the path /lustre/tmp/slurm/finished_jobs/${SLURM_JOB_ID}
, where ${SLURM_JOB_ID}
is the number of the finished job.
This allows users to, among other things, restart calculations from checkpoint files saved in TMPDIR or access output data that was not directly moved to the home directory (e.g. due to its very large size). The TMPDIR directory is moved to the Lustre TMP file system regardless of the type of disk storage it was originally located on - local disk (TMPDIR_LOCAL
), RAM (TMPDIR_SHM
) or Lustre TMP system (TMPDIR_LUSTRE
).
Browsing the contents of old TMPDIR directories is possible, for example, within the appropriate interactive job:
sub-interactive
- for jobs that ran on the Bem2 supercomputer (partitions bem2-cpu-short
, bem2-cpu-normal
, bem2-cpu-interactive
)sub-interactive-lem-cpu
- for jobs that ran on the LEM supercomputer (partitions lem-cpu-short
, lem-cpu-normal
, lem-cpu-interactive
, lem-gpu-short
, lem-gpu-normal
, lem-gpu-interactive
)Different Lustre TMP file systems
Due to the separation of Lustre TMP systems on Bem2 and LEM supercomputers, the saved TMPDIR directories are only accessible from the computing nodes of the same supercomputer on which they were calculated - Bem2 or LEM. Moreover, the/lustre/tmp/slurm/finished_jobs/${SLURM_JOB_ID}
directories are only accessible from the computational jobs.
Job 1234567
ended due to exceeding the maximum job time (TIMEOUT
). In the case of the software used, it is possible to restart calculations using a checkpoint file, which was continuously saved in the TMPDIR directory during the job.
$ sacct -X -j 1234567
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
1234567 bash lem-cpu-s+ kdm-staff 1 TIMEOUT 0:0
The job ran on the lem-cpu-short
partition (shortened by the command to lem-cpu-s+
), i.e. on the LEM supercomputer.
sub-interactive-lem-cpu
command, starting the interactive job may take up to several minutes due to booting up a powered off server):$ sub-interactive-lem-cpu
(use `sub-interactive -h` for help)
An interactive job is being submitted with the following parameters:
job_name interactive
service (D) kdm-staff
partition lem-cpu-interactive
nodes 1
cores 1
memory 3 GB
time limit 1 hours
reservation
x11 disabled
gpu disabled
$ cd /lustre/tmp/slurm/finished_jobs/1234567