In SLURM jobs, use the $TMPDIR variable to set the directory for the scratch files of your programs, for example:
export SCRATCH=$TMPDIR export GAUSS_SCRDIR=$TMPDIRThe $TMPDIR directories are accessible only within the SLURM jobs
For the purposes of the calculations, there is a temporary disk space for storing files that are actively used by user programs. Each calculation task is assigned at least one directory for temporary files, which has its own unique path stored in the TMPDIR_XXX variable, where XXX depends on the type of temporary disk space (XXX = LUSTRE, LOCAL or SHM).
The user has access only to the directories of their tasks.
Current list of partitions and their available TMPDIR
The current list of TMPDIR spaces for each SLURM partition, along with the default types of temporary disk spaces, is located in in the SLURM partitions in the "Available TMPDIR" columns
TMPDIR directories for finished jobs
The TMPDIR directory is kept for the next 14 days after the job finishes. Stored TMPDIR directories can be found at/lustre/tmp/slurm/finished_jobs/${SLURM_JOB_ID}.
Backup copies of the temporary directoriesare not created
There is no possibility to recover data from$TMPDIR. Contents of the TMPDIR are not protected by WCSS and may be deleted or lost without warning. Users should secure their important results on their own
bem2-cpu-short, bem2-cpu-normal, bem2-cpu-interactivelem-cpu-short, lem-cpu-normal, lem-cpu-interactivelem-gpu-short, lem-gpu-normal, lem-gpu-interactiveTMPDIR_LUSTRE=/lustre/tmp/slurm/${SLURM_JOB_ID}--gres=storage:lustre:1bem2-cpu-short i bem2-cpu-normal), 360T (partitions lem-cpu-short, lem-cpu-normal, lem-gpu-short, lem-gpu-normal)A shared file system simultaneously available on all compute servers in a given partition.
It is primarily used for multi-node compute jobs where different nodes operate on the same set of files.
Access to TMPDIR_LUSTRE
The/lustre/tmp/slurm/$SLURM_JOB_IDdirectories are available only within the SLURM jobs and are not directly visible after logging to ui.wcss.pl.To browse files under TMPDIR one should start an interactive session first, for example, using the
sub-interactivecommand
Separate Lustre TMP filesystems
The Bem2 supercomputer (partitionsbem2-cpu-shortandbem2-cpu-normal) and the LEM supercomputer (partitionslem-cpu[-short,normal], lem-gpu[-short,normal]) have their own temporary Lustre file systems - directories under the path/lustre/tmpfrom the Bem2 supercomputer are not available on the Lem supercomputer and vice versa. When using the Lustre file system, please read the terms of use located on the Lustre File system page
lem-cpu-short, lem-cpu-normal, lem-cpu-interactivelem-gpu-short, lem-gpu-normal, lem-gpu-interactiveTMPDIR_LOCAL=/mnt/lscratch/slurm/${SLURM_JOB_ID}--gres=storage:local:<QUOTA>lem-cpu-short i lem-cpu-normal); 7000G (partitions lem-gpu-short i lem-gpu-normal)<QUOTA> (e.g. 100M, 1G, 100G, 200G ...):| Partitions | Default | Maximum | 
|---|---|---|
| lem-cpu-short, lem-cpu-normal | 200G | 3400G | 
| lem-gpu-short, lem-gpu-normal | 200G | 7000G | 
| lem-gpu-interactive | 200G | 1500G | 
| lem-cpu-interactive | 50G | 400G | 
Local file system created on NVME disks, accessible within a single compute node.
It is primarily used for single-node compute tasks that have large disk space requirements (over 50GB) and perform many IO (Input/Output) operations
bem2-cpu-short, bem2-cpu-normal, bem2-cpu-interactivelem-cpu-short, lem-cpu-normal, lem-cpu-interactivelem-gpu-short, lem-gpu-normal,lem-gpu-interactiveTMPDIR_SHM=/dev/shm/slurm/${SLURM_JOB_ID}--gres=storage:shm:<QUOTA><QUOTA> (e.g. 100M, 1G, 100G, 200G ...).Local file space located in the RAM cache (path /dev/shm), available within a single compute node.
Primarily used for single-node compute jobs with low disk space requirements (below 50GB) and many IO (Input/Output) operations.
Using SHM storage and memory allocation in SLURM jobs
When using the SHM storage (option--gres=storage:shm:<QUOTA>), the amount of<QUOTA>space needed will be automatically added to the job's memory requirement (one declared using the--memoption)For example, if a job requires 5GB of memory for running programs (declared via the option
--mem=5G) and needs additional 50GB of space in SHM (declared via the option--gres=storage:shm:50G), then the total RAM requirement for this task will be50G+5G=55G, which will be automatically taken into account when queuing this task (a new value--mem=55Gwill be set)
The user declares the TMPDIR type when submitting a SLURM job via the so-called GRES resources (Generic Resource Scheduling) using the --gres=storage:<XXX>:<QUANTITY> option, where <XXX> is the type of temporary disk space, and <QUANTITY> declares the maximum possible occupancy of the TMPDIR directory (the so-called quota), for which the prefixes M (MiB) and G (GiB) can be used. When using a batch file for sbatch, the #SBATCH --gres=storage:<XXX>:<QUANTITY> option should be provided.
Only those temporary disk space directories are created that were declared during the job submission (or the default directories for a given partition). For example, if the user specifies only --gres=storage:lustre, only the directory under the path stored in the $TMPDIR_LUSTRE variable will be created, and the $TMPDIR_SHM and $TMPDIR_LOCAL directories will not be available!
Usage of several types of TMPDIR within jobs
It is possible to specify several types of temporary disk spaces (depending on their availability on a given SLURM partition), separating them with a comma, e.g.--gres=storage:lustre,storage:local:10G. Additionally, if the user uses other types of GRES, e.g. GPU cards, they should also be specified after a comma, e.g.--gres=storage:lustre,storage:local:10G,gpu:hopper:2.
If the user does not specify the type of TMPDIR when submitting a SLURM task, the default directory type TMPDIR_XXX will be assigned for the task.
The default directory types TMPDIR_XXX are assigned depending on the partition and the type of task - single or multi-node. The default directory types for each partition are marked in the Tables in the SLURM partitions in the "Available TMPDIR" column using the characters "*" for single-node tasks and "**" for multi-node tasks
Examples:
bem2-cpu-short partition without specifying the --gres=storage option, the job will be allocated Lustre temporary storage by default, i.e. TMPDIR_LUSTRE=/lustre/tmp/slurm/${SLURM_JOB_ID}.lem-gpu-normal partition, the job will be allocated local temporary storage by default, i.e. TMPDIR_LOCAL=/mnt/lscratch/slurm/${SLURM_JOB_ID}.lem-gpu-normal partition, then by default temporary Lustre disk space will be allocated for such job, i.e. TMPDIR_LUSTRE=/lustre/tmp/slurm/${SLURM_JOB_ID}.For the duration of the job, appropriate TMPDIR directories are created and the corresponding TMPDIR_XXX variables are exported, and in the case of selecting several types of TMPDIR - several directories and several TMPDIR_XXX variables.
For the duration of the job,, a global variable TMPDIR is created, which is set to one of the available variables TMPDIR_LUSTRE, TMPDIR_SHM or TMPDIR_LOCAL.
TMPDIR preference
When the user has declared the use of more than one TMPDIR space, the$TMPDIRvariable is set to the one with the highest preference according to the seriesTMPDIR_LUSTRE < TMPDIR_SHM < TMPDIR_LOCAL
To use the default TMPDIR type for a selected partition, simply do not declare the --gres=storage:<XXX> variable. For example:
$ srun -p bem2-cpu -N 1 -c 1 -t 1 --mem=1G {...}
For the duration of such a job, the following variables and directories will be created:
TMPDIR_LUSTRE = /lustre/tmp/slurm/${SLURM_JOB_ID}, with no limit on TMPDIR directory occupancyTMPDIR = ${TMPDIR_LUSTRE}$ srun -p bem2-cpu -N 1 -c 1 -t 1 --mem=1G --gres=storage:lustre {...}
or in the sbatch script:
#/bin/bash
#SBATCH -p bem2-cpu
#SBATCH -N 1
#SBATCH -c 1
#SBATCH -t 1
#SBATCH --mem=1G
#SBATCH --gres=storage:lustre
{...}
For the duration of such a job, the following variables and directories will be created:
TMPDIR_LUSTRE = /lustre/tmp/slurm/${SLURM_JOB_ID}, with no limit on TMPDIR directory occupancyTMPDIR = ${TMPDIR_LUSTRE}$TMPDIR_LOCAL occupancy 10GB$ srun -p lem-gpu -N 1 -c 1 -t 01:00:00 --mem=1G --gres=storage:local:10GB {...}
or in the sbatch script:
#/bin/bash
#SBATCH -p lem-gpu
#SBATCH -N 1
#SBATCH -c 1
#SBATCH -t 1
#SBATCH --mem=1G
#SBATCH --gres=storage:local:10G
{...}
For the duration of such a job, the following variables and directories will be created:
TMPDIR_LOCAL = /mnt/lscratch/slurm/${SLURM_JOB_ID}, with 10GB limit on the TMPDIR occupancyTMPDIR = ${TMPDIR_LOCAL}$ srun -p bem2-cpu -N 1 -c 1 -t 01:00:00 --mem=1G --gres=storage:shm:100G {...}
or in the sbatch script:
#/bin/bash
#SBATCH -p bem2-cpu
#SBATCH -N 1
#SBATCH -c 1
#SBATCH -t 1
#SBATCH --mem=1G
#SBATCH --gres=storage:shm:100G
{...}
For the duration of such a job, the following variables and directories will be created:
TMPDIR_SHM = /dev/shm/slurm/${SLURM_JOB_ID}, with 100GB limit on the TMPDIR occupancyTMPDIR = ${TMPDIR_SHM}In the case of such job, the total memory requirement will be 1G + 100G = 101G, which will be taken into account when queuing the task and allocating resources on the compute nodes.
$ srun -p lem-gpu -N 2 -c 1 -t 01:00:00 --mem=1GB --gres=storage:local:10G,storage:lustre {...}
or in the sbatch script:
#/bin/bash
#SBATCH -N 2
#SBATCH -p lem-gpu
#SBATCH -c 1
#SBATCH -t 1
#SBATCH --mem=1G
#SBATCH --gres=storage:local:10G,storage:lustre
{...}
For the duration of such a job, the following variables and directories will be created:
TMPDIR_LUSTRE = /lustre/tmp/slurm/${SLURM_JOB_ID}, with no limit on TMPDIR directory occupancyTMPDIR_LOCAL = /mnt/lscratch/slurm/${SLURM_JOB_ID}, with 10GB limit on the TMPDIR occupancyTMPDIR = ${TMPDIR_LOCAL}After the job is finished, the entire contents of the TMPDIR directory are automatically moved to the Lustre TMP file system and stored there for the next 14 days. The saved TMPDIR directories are located under the path /lustre/tmp/slurm/finished_jobs/${SLURM_JOB_ID}, where ${SLURM_JOB_ID} is the number of the finished job.
This allows users to, among other things, restart calculations from checkpoint files saved in TMPDIR or access output data that was not directly moved to the home directory (e.g. due to its very large size). The TMPDIR directory is moved to the Lustre TMP file system regardless of the type of disk storage it was originally located on - local disk (TMPDIR_LOCAL), RAM (TMPDIR_SHM) or Lustre TMP system (TMPDIR_LUSTRE).
Browsing the contents of old TMPDIR directories is possible, for example, within the appropriate interactive job:
sub-interactive - for jobs that ran on the Bem2 supercomputer (partitions bem2-cpu-short, bem2-cpu-normal, bem2-cpu-interactive)sub-interactive-lem-cpu - for jobs that ran on the LEM supercomputer (partitions lem-cpu-short, lem-cpu-normal, lem-cpu-interactive, lem-gpu-short, lem-gpu-normal, lem-gpu-interactive)Different Lustre TMP file systems
Due to the separation of Lustre TMP systems on Bem2 and LEM supercomputers, the saved TMPDIR directories are only accessible from the computing nodes of the same supercomputer on which they were calculated - Bem2 or LEM. Moreover, the/lustre/tmp/slurm/finished_jobs/${SLURM_JOB_ID}directories are only accessible from the computational jobs.
Job 1234567 ended due to exceeding the maximum job time (TIMEOUT). In the case of the software used, it is possible to restart calculations using a checkpoint file, which was continuously saved in the TMPDIR directory during the job.
$ sacct -X -j 1234567
JobID           JobName  Partition    Account  AllocCPUS      State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
1234567            bash lem-cpu-s+  kdm-staff          1    TIMEOUT      0:0
The job ran on the lem-cpu-short partition (shortened by the command to lem-cpu-s+), i.e. on the LEM supercomputer.
sub-interactive-lem-cpu command, starting the interactive job may take up to several minutes due to booting up a powered off server):$ sub-interactive-lem-cpu
(use `sub-interactive -h` for help)
An interactive job is being submitted with the following parameters:
        job_name     interactive
        service      (D) kdm-staff
        partition    lem-cpu-interactive
        nodes        1
        cores        1
        memory       3 GB
        time limit   1 hours
        reservation
        x11          disabled
        gpu          disabled
$ cd /lustre/tmp/slurm/finished_jobs/1234567