To start an interactive session, on ui.wcss.pl
use the sub-interactive
script.
$ sub-interactive [-n <NODES>] [-c <CORES>] [-m <MEMORY>] [-t <TIME_LIMIT>] [-x] [-h]
where:
-c CORES
- number of cores, max. 24 (default 1);-n NODES
- number of nodes, max. 2 (default 1);-m MEMORY
- amount of RAM in GB, max. 45 (default 3);-t TIME_LIMIT
- duration of interactive session in hours, max. 6 (default 1);-x
- enable X11 forwarding (default blocked);-h
- display help message.Users can only use one 'sub-interactive' interactive session at a time.
For example, to start a 6-hour interactive session using 4 CPUs, 12GB of RAM, and with X11 forwarding enabled:
$ sub-interactive -t 6 -c 4 -m 12 -x
Running the sub-interactive
script without additional arguments will start a short interactive session with default computational resources, i.e., 1 node, 1 CPU, 1 hour, 3GB of RAM, without X11 forwarding.
Users can still submit interactive tasks on other partitions (e.g., short and normal) using the srun command (instructions in the section below), however, the waiting time for starting such an interactive task will be significantly longer.
sub-interactive
scriptThe sub-interactive command uses the resources assigned to the SLURM "interactive" partition. These resources are not shared with other partitions, which increase their availability to users. Additionally, sub-interactive waiting time is shorter than the other partitions.
The following limits on resources per user are imposed for the "interactive" partition (and therefore for sub-interactive sessions):
Users are advised to allocate as few resources as possible to interactive tasks (i.e. tailored to their real needs), thus not blocking access to fast interactive tasks for other users. For example, the allocation of an interactive task to create a batch file using a GUI program and test it in your calculation program can most likely be run on 1 CPU core with a small amount of RAM, e.g. 2GB:
$ sub-interactive -c 1 -m 2
To launch an interactive task, use the srun
command as follows:
srun [-p <partition_name>] -I --pty <program_name>
where:
-p <partition>
: Specifies the partition on which the task will be executed. If this parameter is omitted, the task will be executed in the default partition.-I
: Shortcut for --immediate, informing the srun
command to immediately start the task.--pty
: Launches a pseudo-terminal mode where the results of the task are displayed in real-time.<program_name>
: The name of the command or program to execute.Example:
abcd@ui: ~ $ srun -p short -I -t 10 --pty /usr/bin/ls -l /home/abcd
For a fully interactive session on the computing server, execute srun
with the bash
program as follows:
abcd@ui: ~ $ srun -p short -I -t 10 --pty bash
[wcss] abcd@r31c02b16 ~ >
A batch task can be executed using the sbatch
command. To submit a job to the queue, you need to prepare an appropriate script containing information about resource allocation and the programs to be executed.
Example script myjob.sh
:
#!/bin/bash
#SBATCH -N1
#SBATCH -c10
#SBATCH --mem=10gb
#SBATCH --time=1:00:00
#SBATCH --mail-user=<email_address>
#SBATCH --job-name=<job_name>
source /usr/local/sbin/modules.sh
python3 program.py
To run such a script, use the command:
abcd@ui: ~ $ sbatch myjob.sh
Text from the video tutorial can be selected, copied, and pasted into your terminal.
Task arrays are helpful when you want to execute the same script with different parameters (parameter sweep). Scripts will be executed simultaneously on different nodes, and the results will be saved in different files.
Example script arrayjob.sh
randomly generating a number from 1 to 50:
#!/bin/bash
#SBATCH -N1
#SBATCH -c5
#SBATCH --mem=250
#SBATCH -t1
# Generate a random number from 1 to 50
rand=$(shuf -i 1-50 -n 1)
# Print the generated number to a file
echo "Generated number: ${rand}"
Submit the job to the queue using the command:
abcd@ui: ~$ sbatch --array=1-3 arrayjob.sh
As a result of script execution, three files will be created:
abcd@ui: ~$ ls
slurm-581784_1.out slurm-581784_2.out slurm-581784_3.out
Each file contains the generated number. Another example using Python (argument.py
):
import sys
print(int(sys.argv[1])**2)
Create an appropriate array job script arrayjob.sh
:
#!/bin/bash
#SBATCH -N1
#SBATCH -c5
#SBATCH --mem=250
#SBATCH -t1
#SBATCH --array=1-5 # List of task IDs
module load Python
python3 argument.py $SLURM_ARRAY_TASK_ID
Submit the job using the command:
sbatch arrayjob.sh
This job will return 5 files in the default naming format JOBID_TASKID.out
, where each file contains the square of the task ID.
Array jobs have several additional environment variables:
Variable | Description |
---|---|
SLURM_ARRAY_JOB_ID |
The JOBID of the first task in the array. |
SLURM_ARRAY_TASK_ID |
A number equal to the indexing value of the subtask. |
SLURM_ARRAY_TASK_COUNT |
The total number of tasks in the array. |
SLURM_ARRAY_TASK_MAX |
The index of the last task in the array. |
SLURM_ARRAY_TASK_MIN |
The index of the first task in the array. |
In the previous example, with the argument SLURM_ARRAY_JOB_ID
equal to 177_1
, this corresponds to 177 without the array task index. The value of SLURM_JOB_ID for subsequent tasks in the array will be assigned in a form that does not include the array task index:
The value of SLURM_ARRAY_TASK_MIN
will equal 1, SLURM_ARRAY_TASK_MAX = 9
, and SLURM_ARRAY_TASK_COUNT = 5
.
job step
)The task consists of two sections: the demand of resources and the order in which tasks are performed. The request of resources is the determination of the number of CPUs/GPUs required, the duration of the task, the number of RAMs needed, etc. The order of tasks is described by the Atomicist what to do (e.g. calculation steps, selection of software to be run, parameters, etc.)
Typically, the task is created using the application script (shell script). For example, comments starting from the BASTCH script at the beginning of the Bash script are interpreted by Slurm as parameters describing the resource requests and other SLUM task options. The full list of parameters can be obtained from the sbatch manual (man sbatch
)
The first line of the application must be the shebang line (
#!/bin/bash.
) The next line must contain directives SBATCH. At the end you can place any other line.
The script itself represents a step of the task. Other steps of the task are created using the srun
command:
#!/bin/bash
#
#SBATCH --job-name=test
#SBATCH --output=test_%j.txt
#
#SBATCH --ntasks=1
#SBATCH --time=10:00
#SBATCH --mem-per-cpu=200
srun hostname
srun sleep 60
would require one processor for 60 minutes and 200 MB of RAM in the default queue.
At the start, the first step of the hostname task is executed. The second step of the task is to run the hostname command. The "Sleep" parameter allows you to give the task a significant name, and the "Output" parameter defines the file to which the task result is to be sent (%j will be replaced by the task identifier (Job ID) ).
If the parameter is not specified, the default name of the file is "slurm-%j.out.". Enabling the task identifier (Job ID) in the name of the output file is useful because it prevents multiple tasks from being saved simultaneously to the same output file. It is important not to allow many tasks to be saved to the same file because this causes an incorrect output and creates an unnecessary load on the file system.
By default, SLURM saves the results to separate files.
When the application script is correct, you need to send it to Slurm with the command "Sbatch", which after a successful notification returns the identifier (JobID) assigned to the task.
> sbatch submit.sh
sbatch: Submitted batch job 1234
It is possible to submit a new task to the queue from the SBATCH script.
After submitting a job to the queue using sbatch
, its execution follows the following steps/states:
If the task is successfully executed, it receives a completed status (
COMPLETED
).
If the task fails to execute, it receives a failed status (
FAILED
).
On the supercomputer, each user has a default service that will run all submitted jobs by default. In a situation where the user has access to more than one "Process on a supercomputer" Service, jobs submitted without the parameter
-A <service>
will use the default service.
The user may want to assign his computational job to a specified Service (to which he has access). In the case of the srun
command, this is done via the argument -A <service>
, and within the sbatch
scripts by using the #SBATCH -A <service>
header option.
For example:
$ srun -N 1 -c 2 -t 01:00:00 -A hpc-XXX-XXXXXXXX test_script.sh
For more information on the resource accounting system, please refer to the Service Resource Usage Accounting.
We recommend paying particular attention to which service one uses by default and which service should be actually used for the submitted job.
Users can change the default service using the provided set-default-service
command.
When envoked without any parameters, the set-default-service
command displays a list of all available services along with the default service identifier:
$ set-default-service
-------------------------------------------------------
You are currently a member of the following Services:
hpc-XXX-XXXXXXXXXX hpc-YYY-YYYYYYYYYY hpc-ZZZ-ZZZZZZZZZZ
Your default service is: hpc-YYY-YYYYYYYYYY
-------------------------------------------------------
To show the help messange and available options, one should use the -h
parameter:
$ set-default-service -h
set-default-service <account> Set specified Service as the default Service
set-default-service -l List all available Services
set-default-service -d Print current default Service
set-default-service -h Show this help screen
To set specified service as the default one, one should pass its identifier as the only argument to the set-default-service
command and further confirm the change by typing y
:
$ set-default-service hpc-XXX-XXXXXXXXXX
Setting hpc-XXX-XXXXXXXXXX as the default Service
Modified users...
ZZZZZ
Would you like to commit changes? (You have 30 seconds to decide)
(N/y): y
Default Service set successfully
-------------------------------------------------------
The parameter -d
prints the name of the currently set default service.
$ set-default-service -d
hpc-XXX-XXXXXXXXXX
The parameter -l
displays a list of all services to which the user belongs.
$ set-default-service -l
hpc-XXX-XXXXXXXXXX hpc-YYY-YYYYYYYYYY hpc-ZZZ-ZZZZZZZZZZ
If a user tries to set a service to which he does not belong as a default service, the program returns error information:
$ set-default-service hpc-SSS-SSSSSSSSSS
You don't belong to the specified Service, or this Service does not exit. Aborting