On a supercomputer, an account of computer resource usage is kept for all "Process on a supercomputer" Services. The basic principles of the accounting register are as follows:
service-balance
command.set-default-service
command.ATTENTION! The pool of available CPU and GPU hours is shared among all users of a given service.
The rules for calculating the amount of used resources are defined by Specific Terms of Service "Process on a supercomputer".
The SLURM queueing system, which has information on all tasks running on a supercomputer, is responsible for the collection of resource consumption information by the services. In the SLURM nomenclature, services are defined as so-called "SLURM accounts". Resource consumption is calculated and billed from the service when the computational job is completed (both correctly and incorrectly).
service-balance
commandThe resource consumption of a service is based on the number of resources allocated (e.g. number of CPU cores) multiplied by the actual duration of the computational job (so-called wall time).
On the supercomputer, users can check the state of consumption of resources using the service-balance
command:
$ service-balance [-h] [--user USER] [--service SERVICE] [--timeunit {seconds,minutes,hours}]
where:
--user USER
- display information for the selected user. Default, display information for the current user;--service SERVICE
- display information for the selected service only (if it is available for the selected user). By default, the program displays information about all available services for the user;--timeunit {seconds,minutes,hours}
- show resource usage in specified time unit. By default, consumption is given in CPU/GPU hours;-h
- show help message;For example:
$ service-balance
#############################################################################################################################################
# SLURM Account information for user "YYYYYY" #
# #
# * Each Service is given a unique Account (name of the Account identical to the Service ID) #
# * (D) marks the default Service and (d) marks the default QoS for each Service #
#############################################################################################################################################
Service ID/Account Name | QoS Name | Resource | Available [h] | Used [h] | %
---------------------------------------------------------------------------------------------------------------------------------------------
1) (D) hpc-XXXXXXX-XXXXXXXXXX | (d) hpc-XXXXXXX-XXXXXXXXXX | CPU | 300000.0 | 696.3 | 0.2
| | GPU | 200.0 | 0.0 | 0.0
-------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------
2) hpc-YYYYYYYYYYYYYYYY-YYYYYYYYYY | (d) hpc-YYYYYYYYYYYYYYYY-YYYYYYYYYY | CPU | 200.0 | 20.3 | 10.2
| | GPU | N/A | N/A | ---
-------------------------------------------------------------------------------------------------
The first column, Service ID/Account Name
, contains the identifiers of services to which the user has access. The service marked with the symbol (D)
is the default service for which all computational jobs are started by default. Then, each of the following columns contains information corresponding to the service and Qos.
The second column, QoS Name
lists all available QoS for a given service. Currently, each service has only one QoS. In addition, for each service, its default QoS is indicated by the symbol (d)
.
The remaining columns provide information on a specific resource:
Resource
column specifies the type of resource;Available [h]
column shows the total amount of resources available to all users of the service (counted in CPU/GPU hours);Used [h]
column shows the already used amount of the resource by all users of the service (counted in CPU/GPU hours);%
column shows the consumption percentage, i.e. Used/Available * 100
.In the above example, the user of the YYYYYY
has two available services:
hpc-XXXXXXX-XXXXXXXXXX
(default), with available 300 000 CPUh and 200 GPUh. All users of this service have already used 699.3 CPUh (0.2% resource consumption) and 0 GPUh (0.0% resource consumption).hpc-YYYYYYYYYYYYYYYY-YYYYYYYYYY
, with 200 CPUh and no GPU hours available (N/A) (i.e. service has no access to the GPU partitions). All users of this service have already used 20.3 CPUh (10.2% of resource consumption).In the above example, all tasks of the user of the service shall be charged by default to the hpc-XXXXXXX-XXXXXXXXXX
service. To use the pool of resources of another service, please provide additional parameters during the job submission. For more information, see the section below.
The service is charged for the allocation of the resource, i.e. the number of CPU cores or GPU cards, regardless of the actual load on the resource.
For example:
The computational job had been submitted for 48 CPU cores and a total wall time of 168 h. The computational job was completed after 100 h, so the service was charged with the final amount of4800 CPUh = 48 CPU * 100 h
. The algorithm does not consider whether during these 100 hours all 48 CPU cores were used with the 100% load or less.
Each user has been assigned a default service (in SLURM nomenclature, a default account), to which all calculation tasks are being charged by default. Users can change the default service using the available tools.
For example, users with only one service do not need to specify additional parameters each time when commissioning a task.
User may want to assign his computational job to a specified service (to which he has access). In such case, a few additional options are required during the job submission - see Section Job submission using specified Service.
ATTENTION! Users should consciously select the services under which they submit the computational jobs.
WNSC is not responsible for the misassignment of computational jobs to the specified services; it is not possible to change billing records for services nor reassign to another service after the job completion.
The amount of computational resources is requested during the submission of an application for this Service via the E-SCIENCE.PL webpage.
If the assigned amount of resources proves to be insufficient for the HPC Service's needs, the Service Manager can submit a request to increase computational resources (including additional disk space for the PD directory) through the E-SCIENCE.PL webpage.
The procedure for submitting a request for resource increase is as follows:
The Service Manager should log in to E-SCIENCE.PL.
Access the user panel by clicking on the button with the name located at the top of the page.
Go to the "Wnioski realizowane" tab and in the category "Wnioski dot. godzin obliczeniowych" select the HPC Service for which the resource increase is to be requested. After selecting the service, basic data regarding previously requested resources will be displayed.
Click the "Zwiększ zasoby" button, which will lead to a new form.
Enter the additional amount of CPU and GPU hours and make any additional changes if necessary.
Submit the request by clicking the "Wyślij" button.
The Service Manager will be notified of the application's consideration results via email.