The MPI (Message Parsing Interface) communication protocol allows an application to run on multiple cores or nodes simultaneously (so-called distributed computing), as long as it is compatible with the protocol. However, MPI can also be used to enable multiple instances of a single program within a single job.
With MPI tasks, it is very important to distinguish between paramters
-c
and-n
, specifying-c 10
without-n
will result in the launch of 10 MPI processes independent of each other - as if 10 different tasks were launched, each on a single core.
Specifying-n 10
will result in 10 MPI processes running properly.
Example mpijob.sh
running 10 MPI processes on two nodes:
#!/bin/bash
#SBATCH -N2
#SBATCH -n10
#SBATCH --ntasks-per-node=5
#SBATCH -c1
#SBATCH --mem-per-cpu=300
#SBATCH -t10
module load Python
module load OpenMPI
mpiexec python3 simulation.py
Splitting a task across multiple nodes can be useful when one node does not have the expected number of free cores, or when the speed of computation is limited by RAM bandwidth.
The recommended method for starting intel MPI is to run with srun after first pointing to the location of the slurm PMI library.
A sample script for a task run from Intel MPI:
#!/bin/bash
#SBATCH -N2
#SBATCH -n10
#SBATCH --mem=10gb
#SBATCH --time=1:00:00
#SBATCH --job-name=intel_MPI_test
export I_MPI_PMI_LIBRARY=/opt/slurm/current/lib64/libpmi.so
module load intel/2021a
srun intel_MPI_script.sh