To start an interactive session on ui.wcss.pl, use the sub-interactive script:
$ sub-interactive [-n <NODES>] [-c <CORES>] [-m <MEMORY>] [-t <TIME_LIMIT>] [-x] [-h]
where:
-c CORES - number of cores, max. 24 (default: 1)-n NODES - number of nodes, max. 2 (default: 1)-m MEMORY - amount of RAM in GB, max. 45 (default: 3)-t TIME_LIMIT - interactive session duration in hours, max. 6 (default: 1)-x - enable X11 forwarding (default: disabled)-h - display help messageA user may only use two
sub-interactivesessions at the same time.
For example, to start an interactive session for 6 hours using 4 CPUs, 12GB of RAM, and with X11 forwarding enabled:
$ sub-interactive -t 6 -c 4 -m 12 -x
Running the sub-interactive script without additional arguments will start a short interactive session with default resources: 1 node, 1 CPU, 1 hour, 3GB RAM, and no X11 forwarding.
Users can still submit interactive jobs to other partitions (e.g.,
shortandnormal) using thesruncommand (instructions in the section below); however, the wait time for such an interactive job to start will be significantly longer.
sub-interactiveThe sub-interactive command uses dedicated computing resources assigned to the SLURM "interactive" partition. These resources are not shared with other partitions, significantly increasing availability for users. Furthermore, the wait time for a sub-interactive session is much shorter than on other partitions.
The following resource limits are imposed on the "interactive" partition (and thus sub-interactive sessions) per user:
-t 6:00:00)-c 24)-N 2)--mem 45GB)Users are encouraged to allocate the smallest amount of resources possible for interactive tasks (tailored to their actual needs) to avoid blocking access to quick interactive tasks for others. For instance, an interactive task used to prepare a batch file via a GUI program and test it in a calculation program can likely be run on 1 CPU core with a small amount of RAM, e.g., 2GB:
$ sub-interactive -c 1 -m 2
The "interactive" partition is strictly for interactive tasks!
Submitting batch jobs usingsbatchto the "interactive" partition is prohibited. Furthermore, CPU and RAM utilization efficiency on the "interactive" partition will be monitored.
To run an interactive task on any partition and in any resource configuration, use the srun command:
srun [-p partition_name] --pty program_name
where:
-p partition | Specifies the partition where the task will be executed. If omitted, the task will run on the default partition. |
--pty | Enables pseudo-terminal mode to display real-time output of the executed task. |
program_name | The name of the command or program to execute. |
Example:
abcd@ui: ~ $ srun -p short -t 10 --pty /usr/bin/ls -l /home/abcd
To obtain a fully interactive session from the compute server, run srun with the bash program as shown below:
abcd@ui: ~ $ srun -p short -t 10 --pty bash
[wcss] abcd@r31c02b16 ~ >
Text from the video tutorial can be highlighted, copied, and pasted into your terminal.
A batch job can be submitted using the sbatch command. To send a job to the queue, you must first prepare a script containing resource allocation information and the programs to be executed.
Example script myjob.sh:
#!/bin/bash
#SBATCH -N1
#SBATCH -c10
#SBATCH --mem=10gb
#SBATCH --time=1:00:00
#SBATCH --mail-user=<email_address>
#SBATCH --job-name=<job_name>
source /usr/local/sbin/modules.sh
python3 program.py
Such a script can be launched using the command:
abcd@ui: ~ $ sbatch myjob.sh
Text from the video tutorial can be highlighted, copied, and pasted into your terminal.
Job arrays are useful when you want to execute the same script with different parameters (a so-called parameter sweep). Scripts are executed simultaneously on different nodes, and results are saved in separate files.
Example script arrayjob.sh which prints a randomly generated number between 1 and 50:
#!/bin/bash
#SBATCH -N1
#SBATCH -c5
#SBATCH --mem=250
#SBATCH -t1
# Generate a random number between 1-50
rand=`shuf -i 1-50 -n 1`
# Print the generated number to a file
echo "The generated number is: ${rand}"
Submit the job to the queue with the command:
abcd@ui: ~>sbatch --array=1-3 arrayjob.sh
Executing this script results in 3 files:
abcd@ui: ~>ls
slurm-581784_1.out slurm-581784_2.out slurm-581784_3.out
abcd@ui: ~>more slurm-*
::::::::::::::
slurm-581784_1.out
::::::::::::::
The generated number is: 47
::::::::::::::
slurm-581784_2.out
::::::::::::::
The generated number is: 37
::::::::::::::
slurm-581784_3.out
::::::::::::::
The generated number is: 32
Another example using Python. argument.py:
import sys
print(int(sys.argv[1])**2)
Next, prepare the corresponding job script arrayjob.sh, for example:
#!/bin/bash
#SBATCH -N1
#SBATCH -c5
#SBATCH --mem=250
#SBATCH -t1
#SBATCH --array=1-5 # list of sub-task IDs
module load Python
python3 argument.py $SLURM_ARRAY_TASK_ID
And submit the job to the queue:
sbatch arrayjob.sh
This job will return 5 files with the default naming format JOBID_TASKID.out, each containing the square of the number corresponding to the sub-task ID. In general, a job array does not have to iterate by one. An iterator can be specified after a : character. For example, an array from 1 to 10 with a step of 2 would be written as --array=1-10:2.
An array job has several additional environment variables.
| Variable | Description |
|---|---|
SLURM_ARRAY_JOB_ID |
JOBID of the first job in the array |
SLURM_ARRAY_TASK_ID |
Number equal to the sub-task index value |
SLURM_ARRAY_TASK_COUNT |
Total number of tasks in the array |
SLURM_ARRAY_TASK_MAX |
Index of the last task in the array |
SLURM_ARRAY_TASK_MIN |
Index of the first task in the array |
In the previously mentioned example with the argument, SLURM_ARRAY_JOB_ID is equal to 177, which also corresponds to the notation 177_1. The value of the SLURM_JOB_ID variable for subsequent tasks in the array will be assigned in a form that does not include the array task index:
The value of SLURM_ARRAY_TASK_MIN will be 1, SLURM_ARRAY_TASK_MAX = 9, and SLURM_ARRAY_TASK_COUNT = 5.
job step)A job consists of two sections: resource request and execution sequence. Resource requests define the required number of CPUs/GPUs, job duration, required RAM, etc. The execution sequence describes what needs to be done (e.g., computational steps, software selection, parameters, etc.).
Typically, a job is created using a submission script (shell script). For example, comments starting with SBATCH at the beginning of a Bash script are interpreted by Slurm as parameters describing resource requests and other SLURM job options. A full list of parameters can be obtained from the sbatch manual (man sbatch).
The first line of the submission file must be a shebang line (
#!/bin/bash). The following lines must contain SBATCH directives. Any other lines can be placed at the end.
The script itself constitutes a job step. Other job steps are created using the srun command.
#!/bin/bash
#
#SBATCH --job-name=test
#SBATCH --output=test_%j.txt
#
#SBATCH --ntasks=1
#SBATCH --time=10:00
#SBATCH --mem-per-cpu=200
srun hostname
srun sleep 60
would request one processor for 60 minutes and 200 MB of RAM in the default queue.
Once started, the first job step srun hostname is executed, which runs the UNIX hostname command on the node where the requested processor was allocated. Then, the second job step runs the sleep command. The --job-name parameter allows giving the job a meaningful name, and the --output parameter defines the file where the job result will be sent (%j will be replaced by the Job ID).
If the
--outputparameter is not specified, the default filename isslurm-%j.out. Including the Job ID in the output filename is useful because it prevents multiple jobs from writing to the same output file simultaneously. It is important not to let multiple jobs write to the same file, as this leads to incorrect output and creates unnecessary load on the file system.
By default, SLURM saves results to separate files.
Once the submission script is correct, you must submit it to Slurm using the sbatch command, which returns the Job ID assigned to the job upon successful submission.
> sbatch submit.sh
sbatch: Submitted batch job 1234
It is possible to submit a new job to the queue from within an SBATCH script.
After submitting a job to the queue via sbatch, its execution will proceed through the following steps/states:
If the job is executed correctly, it receives the status
COMPLETED.
If the job is not executed correctly, it receives the status
FAILED.
On the supercomputer, every user is assigned a default Service "Przetwórz na superkomputerze" (Process on supercomputer), under which all computational jobs are run by default. If a user has access to more than one "Przetwórz na superkomputerze" Service, jobs submitted without the
-A <service>parameter will be charged to the user's default service.
The supercomputer user independently decides which service's resources to use when submitting a job in the SLURM queuing system. For the srun command, this is done using the -A <service> argument, and in sbatch scripts, using the appropriate option in the script header: #SBATCH -A <service>.
Example:
$ srun -N 1 -c 2 -t 01:00:00 -A hpc-XXX-XXXXXXXX test_script.sh
More information on how the resource consumption registration system works can be found in the documentation Register of resource consumption by "Process on supercomputer" Services.
We recommend paying particular attention to which service is used by default and which one you wish to use for a given computational task.
Users can change the default service for performing computations using the set-default-service command.
Calling the set-default-service command without any parameters displays a list of all available services along with the ID of the service set as the default:
$ set-default-service
-------------------------------------------------------
You are currently a member of the following Services:
hpc-XXX-XXXXXXXXXX hpc-YYY-YYYYYYYYYY hpc-ZZZ-ZZZZZZZZZZ
Your default service is: hpc-YYY-YYYYYYYYYY
-------------------------------------------------------
Calling the command with the -h parameter will show a help message and a list of available options and parameters.
$ set-default-service -h
set-default-service <account> Set specified Service as the default Service
set-default-service -l List all available Services
set-default-service -d Print current default Service
set-default-service -h Show this help screen
To set a default service, provide its ID as the single argument and then confirm the change by typing y:
$ set-default-service hpc-XXX-XXXXXXXXXX
Setting hpc-XXX-XXXXXXXXXX as the default Service
Modified users...
ZZZZZ
Would you like to commit changes? (You have 30 seconds to decide)
(N/y): y
Default Service set successfully
-------------------------------------------------------
The -d parameter is used to display the name of the currently set default service:
$ set-default-service -d
hpc-XXX-XXXXXXXXXX
The -l parameter displays a list of all services to which the user belongs:
$ set-default-service -l
hpc-XXX-XXXXXXXXXX hpc-YYY-YYYYYYYYYY hpc-ZZZ-ZZZZZZZZZZ
If a user attempts to set a service they do not belong to as the default, the program returns an error message:
$ set-default-service hpc-SSS-SSSSSSSSSS
You don't belong to the specified Service, or this Service does not exit. Aborting