A new Lustre TMP filesystem has been made available for the LEM supercomputer to share TMPDIR directories for multi-node jobs.
More information about selection of the TMPDIR directories can be found in the documentation.
TMPDIR directories in the Lustre TMP system are created at TMPDIR=/lustre/tmp/slurm/$SLURM_JOB_ID
.
The new Lustre TMP file system is available on the following SLURM partitions:
lem-cpu-short
, lem-cpu-normal
, lem-cpu-interactive
lem-gpu-short
, lem-gpu-normal
, lem-cpu-interactive
The temporary disk space directories TMPDIR=/lustre/tmp/slurm/$SLURM_JOB_ID
are only available within SLURM jobs and are not immediately visible after logging in to the ui.wcss.pl
server. To browse files under TMPDIR, you must first start an interactive session using the command sub-interactive
(for the Bem2 supercomputer) or sub-interactive-lem-cpu
(for the LEM supercomputer).
For multi-node jobs, Lustre TMP is the default file system on which the TMPDIR directory is created.
To use the Lustre TMP file system in any job (i.e. other than multi-node), when submitting a job using the srun
or sbatch
commands, use the --gres=storage:lustre:1
option.
Attention! Using the Lustre TMP system on the LEM supercomputer for single-node jobs is not recommended!
The Bem2 supercomputer (partitions bem2-cpu-short
and bem2-cpu-normal
) and the LEM supercomputer have their own Lustre TMP file systems - directories under the /lustre/tmp
path from the Bem2 supercomputer are not available on the Lem supercomputer and vice versa.
It is now possible to keep the TMPDIR directory after job completion, also for those jobs that used local TMPDIRs.
The TMPDIR directory is kept for another 14 days after the job is finished. The saved TMPDIR directories are located under the path /lustre/tmp/slurm/finished_jobs/${SLURM_JOB_ID}
and are only accessible from the compute nodes of the same supercomputer on which they were calculated (Bem2 or LEM).
A new version of the sub-gaussian
command has been made available for submitting jobs to Gaussian16.
To get help on how to use the command, use the command without additional arguments or with the --help
option.
Important changes in the command's functionality:
lem-cpu
$TMPDIR
. Additional files should be passed using the --copy
option.chk
files generated in TMPDIR
will not be automatically copied to $SLURM_SUBMIT_DIR
TMPDIR
for a given partition:lem-cpu
partition - local TMPDIR
, with a default 200G usage limit (changeable with --gres=storage:local:<QUOTA>
)bem2-cpu
partition - shared TMPDIR
on Lustre TMP (no usage limit)More information about the TMPDIR selection system can be found in the documentation.
New CPU servers of the LEM supercomputer, equipped with AMD EPYC 9554 processors and NVME disks (intended for local TMPDIR directories), have been made available for all users.
Detailed information about the current partitions can be found in documentation.
New partitions lem-cpu-short
and lem-cpu-normal
with new CPU servers of the LEM supercomputer have been created.
To subtmit a job at these new SLURM partitions, one should use the -p lem-cpu
option.
We encourage you to use these SLURM partitions especially for computational programs such as Gaussian or ORCA.
Attention! Currently, servers from the lem-cpu-short
and lem-cpu-normal
partitions do not have access to the shared Lustre TMP system for TMPDIR directories. This file system is planned to be made available in the near future.
Currently, on Lem CPU partitions it is only possible to use local TMPDIR directories created on NVME disks. There are two configurations of Lem CPU servers: with a maximum NVME capacity of 1700GB or 3400GB.
More information about the new TMPDIR selection system can be found in documentation.
A new sub-interactive-lem-cpu
command has been provided to assign interactive tasks using Lem CPU nodes.
The software provided in the form of modules is different on the Lem CPU and Bem2 CPU partitions (the available software on each partition can be displayed using the module avail
command). If a given module on the Lem CPU partition is missing, you can contact the administrators by writing to helpdesk@e-science.pl.
New LEM supercomputer has been made available, which is in the top hundred fastest supercomputers in the world from the TOP500 ranking (https://top500.org/system/180272/).
Detailed information about the current partitions can be found in documentation.
lem-gpu-short
and lem-gpu-normal
partitionsNew lem-gpu-short
and lem-gpu-normal
partitions have been created with servers equipped with NVIDIA H100 GPUs. Jobs assigned to these partitions require access to the “Process on the supercomputer” Service with available GPU hours and the allocation of at least one GPU card (option --gres=gpu:hopper:1
).
For more information on the GPU computing, see documentation.
The names of previous SLURM partitions have been changed:
short
→ bem2-cpu-short
normal
→ bem2-cpu-normal
interactive
→ bem2-cpu-interactive
The rules for using partitions remain unchanged.
A mechanism has been introduced to automatically distribute tasks to *-short
and *-normal
partitions based on the declared job duration. Thanks to this, users do not have to remember to allocate jobs to the appropriate *-short
or *-normal
partition.
When submitting the jobs using the srun
or sbatch
commands, it is enough to specify the appropriate name -p <PARTITION>
(e.g. -p bem2-cpu
):
bem2-cpu
→ splits tasks between bem2-cpu-short
and bem2-cpu-normal
lem-gpu
→ splits tasks between lem-gpu-short
and lem-gpu-normal
A new mechanism for selection of the type of TMPDIR directories for calculations, has been introduced. Depending on the selected partition, different types of TMPDIR directories are available:
/dev/shm
)The TMPDIR directory is selected using the --gres=storage:<TYPE>:<QUANTITY>
option, specifying the type and maximum amount of space for the selected TMPDIR directory type. If no TMPDIR directory type is selected, default TMPDIR directories are allocated, which are selected based on the selected partition.
More information about the new TMPDIR selection system can be found in documentation.
A new command sub-interactive-lem-gpu
is provided for submitting interactive jobs using the GPU.
The amount of available memory for SLURM jobs for the bem2-cpu-short
and bem2-cpu-normal
partitions (former short
and normal
partitions) has been changed from 183G/372G to 177G/357G. Currently, the amount of available memory for SLURM jobs is 95% of the total memory of the compute nodes.
More information about the maximum possible amount of resources per node for SLURM jobs can be found in documentation.
The SLURM queueing system has been updated to version 24.11.3.
This update does not introduce significant changes from the user’s perspective and mainly concerns the internal mechanisms of SLURM controllers. Its primary goal is to improve the stability of the queueing system and address known issues observed at WCSS, such as problems with running the srun
command within jobs.
The list of software changes is available here.
SCM is organizing a series of free webinars focused on utilizing AMS in scientific research. Topics include:
Link to the page:
https://www.scm.com/news/join-the-third-edition-of-the-amsterdam-modeling-suite-webinar-series/
Update and modernization of the ui.wcss.pl host has finished.
When attempting to connect using ssh, the message “WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!” will appear:
In case of connection problem, please contact our support center available at helpdesk@e-science.pl.
The HPC Info service webpage: https://hpc-info.kdm.wcss.pl
The HPC Info service allows for monitoring the usage of computational resources by supercomputer users that have access to the service "Process on a supercomputer". Users have a convenient access to information such as:
For more information, refer to the documentation HPC Info.
The Open OnDemand platform webpage: https://ood.e-science.pl
The Open OnDemand platform has been created to simplify access to supercomputer resources through a web interface. It allows users to run applications, transfer files, monitor tasks and more.
For more information, refer to the documentation Open OnDemand.
The directories for temporary storage TMPDIR=/lustre/tmp/slurm/$SLURM_JOB_ID
are saved for the next 14 days after the jobs finish. Afterwards, they are automatically removed.
The introduced mechanism is meant to help restart failed or stopped jobs from the checkpoint files (if available) or start new jobs that use the data from previous jobs as input, without the need to copy data between the $HOME
directory or the PD storage.
WARNING! Backup copies of the temporary directories in
/lustre/tmp
are not created. The contents of the/lustre/tmp
file system are not protected by WCSS and may be deleted or lost without warning. Users should secure their important results on their own.
The latest version of the TURBOMOLE 7.8 package is available on Bem2
On the supercomputer, an accounting system for resource usage by all Services “Process on supercomputer”, has been introduced.
To check for the current usage of computer resources by the available Services, after logging into the supercomputer, one should use the command `service-balance`.
General rules for the newly introduced accounting system:
For more information, refer to the documentation Service Resource Usage Accounting.
Limits have been introduced on RPC calls from users to the scheduler.
The limits are as follows:
The introduction of limits aims to improve the operation of the scheduler, prevent overload, and protect the system from attacks by sending an excessive number of requests to the server in a short period of time.
squeue
, sinfo
, srun
, sbatch
, scancel
, etc.Example: if a user submits 150 jobs in a bash loop, 50 will be submitted immediately, and the remaining 100 will be submitted at a rate of 2 jobs per second. Therefore, the entire operation will take about 200 seconds.
The new version is available - NBO 7.
A new partition "interactive” in the SLURM queueing system, dedicated for short interactive jobs, has become available for users of supercomputer.
In case of the "interactive” partition, following limits per user have been imposed on computational resources:
-t 6:00:00
)-c 24
)-N 2
)--mem 45GB
)In order to run an interactive job on the „interactive” partition, when evoking the srun
command one must add options -p interactive -q interactive -A kdm
. For example:
$ srun -N 1 -c 1 -t 01:00:00 -p interactive -q interactive -A kdm --pty /bin/bash
The SCM AMS package is available again, including: ADF, BAND, DFTB, REAXFF.
Tasks can be assigned using the sub-ams-2022.103
script.
On February 26, 2024, the FTP service at ftp.kdm.wcss.pl was shut down.
The software from FTP has been made available on the nextcloud.e-science.pl platform in the kdm-software
directory.
All active users of the supercomputer have access to the resource.
The SLURM scheduling system was updated to version 23.11.3.
This update does not concern the users and is mainly related to the SLURM controllers. Its main purpose is to increase the stability of the queuing system and fix previously known bugs, which were also reported at WCSS.
Full list of changes in the SLURM system is available here.
New long-term storage space for each HPC Service, labeled as PD (pl. Przestrzeń Dyskowa), has been provided for users of the supercomputer. Each HPC Service has one dedicated PD directory in which users can store large amount of data and share with their co-workers of this HPC Service. Access to the PD directory is available to each user belonging to a particular HPC Service.
PD-info
to view basic information about available PD (including paths to available PD directories, data limits, and their fill status);*For the HPC Services where no disk space was specified during the application and for all HPC Services started before October 2023 (i.e., before the introduction of the new user and service management system E-SCIENCE.PL).
For more information, refer to the documentation Long-term storage space for HPC Services.