ข้ามเนื้อหา

Usages

This content is not available in your language yet.

Basic Slurm Commands

Viewing System Information

sinfo


The sinfo command shows the state of partitions and nodes managed by Slurm.


Terminal window
$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
debug* up 30:00 4 mix prism-[1-4]
cpu up 7-00:00:00 4 mix prism-[1-4]
gpu up 3-00:00:00 1 mix prism-4
batch up 7-00:00:00 4 mix prism-[1-4]
interactive up 1:00:00 1 mix prism-4

When there is a maintenance or failure in any node, you can add -R flag to sinfo command to check for more details.

Terminal window
$ sinfo -R

Job Management Commands

sbatch

The sbatch command is used to submit a job script for later execution.

Terminal window
$ sbatch myjob.sh
Submitted batch job 12345

Example job script (myjob.sh):

#!/bin/bash
#SBATCH --job-name=my_test_job # Job name
#SBATCH --output=job_%j.out # Output file (%j = job ID)
#SBATCH --error=job_%j.err # Error file
#SBATCH --time=01:00:00 # Time limit hrs:min:sec
#SBATCH --nodes=1 # Number of nodes
#SBATCH --ntasks=1 # Number of CPU cores
#SBATCH --mem=2G # Memory limit
echo "My first Slurm job"
hostname
date
sleep 60

scancel

The scancel command is used to cancel a queued or running job.

Terminal window
$ scancel 12345 # Cancel job with ID 12345
$ scancel -u username # Cancel all jobs for a specific user

squeue


The squeue command shows the status of the submitted job in the cluster. This includes pending, running, and completing jobs.


Terminal window
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
12345 batch vllm test1 PD 0:00 1 (Resources)
12346 batch python test2 PD 0:00 1 (Priority)
12347 batch python test2 R 2:49 1 prism-1
12348 debug bash test1 R 15:30 1 prism-1
12349 debug image-la test3 R 1:00 1 prism-2

Assuming you are test1 user, you want to filter only your submmited job, you can add -u flag to filter only for your jobs.

Terminal window
# $USER is environment variable that contains the current session username.
# This can also be replaced by your username.
# In this example, $USER refer to test1
$ squeue -u $USER
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
12345 batch vllm test1 PD 0:00 1 (Resources)
12348 debug bash test1 R 15:30 1 prism-1

Interactive Sessions

srun

The srun command is used to run jobs interactively or create job steps.

Terminal window
$ srun --pty bash -i # Start an interactive bash session
$ srun -N1 hostname # Run 'hostname' command on one node

GPU Job Submission

Interactive GPU Session

To request an interactive session with GPU access:

Terminal window
# Request 1 GPU with 8 CPU cores and 16GB memory
$ srun --partition=batch --gres=gpu:1 --cpus-per-gpu=8 --mem-per-gpu=16G --pty bash -i
# Request another example with the same configuration
$ srun --partition=batch --gres=gpu:1 --cpus-per-gpu=8 --mem-per-gpu=16G --pty bash -i

Batch GPU Jobs

Example GPU job script (gpu_job.sh):

#!/bin/bash
#SBATCH --job-name=gpu_test # Job name
#SBATCH --output=gpu_%j.out # Output file (%j = job ID)
#SBATCH --error=gpu_%j.err # Error file
#SBATCH --partition=batch # Partition selection
#SBATCH --gres=gpu:1 # Number of GPUs (1 in this case)
#SBATCH --cpus-per-gpu=8 # CPUs per GPU
#SBATCH --mem-per-gpu=16G # Memory per GPU
#SBATCH --time=08:00:00 # Time limit hrs:min:sec
# Load any required modules here
# module load cuda/11.8
# Your GPU program commands here
nvidia-smi # Check GPU status
python your_gpu_script.py

Submit the GPU job:

Terminal window
$ sbatch gpu_job.sh

You can monitor your job’s output in real-time using the tail command:

Terminal window
# Monitor output file (replace JOBID with your job number)
$ tail -f gpu_JOBID.out
# Monitor error file
$ tail -f gpu_JOBID.err
# Example with actual job ID 12345
$ tail -f gpu_12345.out

Resource Monitoring

scontrol

The scontrol command is the administrative tool for viewing and modifying Slurm state.

Terminal window
$ scontrol show job 12345 # Show details of a specific job
$ scontrol show node prism-1 # Show details of a specific node
$ scontrol show partition # Show partition information