Intro-HPC-workshop

Working on Wolffe

Wolffe is the name of the HPC cluster at WSU, the wiki page for Wolffe is available here.

Logging in

To log onto Wolffe, you will need to use the SSH protocol. Open a terminal and type the following command:

ssh <username>@wolffe.cdms.westernsydney.edu.au

Replace <username> with your WSU username. You will be prompted to enter your password. Once logged in, you will be in your home directory on the Wolffe cluster.

You’ll also need to be on the CDMS network to access Wolffe. To access the VPN, use the OpenVPN client and the profile you have received. Note the profile is only valid until the end of the year, so staff should request a new profile each year, and students should request a new profile from their supervisors.

Once logged in, you can see the message:

For help with HPC matters, see:

https://wiki.cdms.westernsydney.edu.au/index.php?title=HPC_documentation

Last login: xxxxx from xxx.xxx.xxx.xxx

and any message of the day that may be displayed.

How to use any apps on Wolffe

Unlike a standard computer, HPC systems do not have all the application available for use by default. Instead, applications are made available through modules. Modules allow you to load and unload software packages as needed.

To see the available modules, you can use the command:

module avail

This will list all the available modules on Wolffe:

---------------------------------- /usr/share/Modules/modulefiles ----------------------------------
dot  module-git  module-info  modules  null  use.own  

-------------------------------------- /usr/share/modulefiles --------------------------------------
mp-x86_64  mpi/openmpi-x86_64  

-------------------------------------- /software/modulefiles ---------------------------------------
10x/cellranger-3.0.2   funtools/funtools              mpi/openmpi-4.0.2           R/4.4.2  
anaconda/conda3        gnu/gcc-7.4.0                  mpi/openmpi-4.1.5           T-RECS   
caffe/caffe-cuda-10.0  gnu/gcc-10                     nccl/nccl-2.26.2-1          
caffe/caffe-fedora     gnu/gcc-10.5.0                 nextflow/nextflow-25.04.6   
casa/casa-5.4.1        gnu/gcc-11.1.0                 PyCharm-community-2023.2.3  
casa/casa-5.6.0        Java/java-24                   Python/Python3.6            
casa/casa-5.7.0.pre    Julialang/Julia-1.6.2          Python/Python3.7.0          
cmake/cmake-3.24.4     karma/karma-1.7.25             Python/Python3.9            
cmake/cmake-3.25.3     lammps/lammps-stable-20230801  Python/Python3.10           
colmap/colmap-3.11     lammps/lammps-stable-20230822  Python/Python3.11           
cuda/cuda-10.0         matlab/matlab2016a             Python/Python3.12.3         
cuda/cuda-10.2         matlab/matlab2018a             PyTorch/Python3.7.0         
cuda/cuda-11.0         matlab/matlab2019a             PyTorch/Python3.9           
cuda/cuda-11.2         miriad/miriad                  PyTorch/Python3.10          
cuda/cuda-11.6         Montage/Montage-6.0            PyTorch/Python3.11          
cuda/cuda-12.6         mpi/openmpi-1.8.8              PyTorch/Python3.12.3        

You can load a module using the command:

module load <module_name>

For example, to load the Python 3.10 module, you would use:

module load Python/Python3.10

You can check which modules are currently loaded with:

module list
Currently Loaded Modulefiles:
 1) Python/Python3.10 

Unlike most HPC systems, Wolffe does have git installed by default, so you can use it without loading a module. So you can clone this repository directly into your home directory:

git clone https://github.com/CRMDS/Intro-HPC-workshop.git

Querying the queue system

Wolffe uses the Slurm workload manager to manage jobs. To find out what resources are available, you can use the command:

sinfo

This will show you the status of the nodes in the cluster:

PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
cpu*         up 21-00:00:0      2    mix compute-[002-003]
cpu*         up 21-00:00:0      1  alloc compute-001
ampere24     up 7-00:00:00      4  alloc a30-[002-005]
ampere80     up 7-00:00:00      1  alloc a100-100
ampere80     up 7-00:00:00      1   idle a100-101
ampere40     up 7-00:00:00      1  alloc a100-000
ampere40     up 7-00:00:00      2   idle a100-[001-002]

The columns indicate:

To see the jobs currently running and submitted on the cluster, you can use:

squeue

This will show you a list of jobs in the queue:

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
             18226  ampere24   RLGOAL 30069287  R 2-10:31:01      1 a30-005
             18287  ampere24     mat1 18870679  R   21:04:11      1 a30-002
             18288  ampere24     mat2 18870679  R   21:03:45      1 a30-003
             18289  ampere24     mat3 18870679  R   21:03:13      1 a30-004
             18100  ampere40 EA_V4_Mi 30069287  R 4-15:02:23      1 a100-000
             18304  ampere80 vec2word 30069287  R    8:52:30      1 a100-100
             17962       cpu X12.2_15 30031031  R 5-20:01:58      1 compute-001
             17963       cpu X12.2_14 30031031  R 5-20:01:55      1 compute-001
             17964       cpu X12.2_14 30031031  R 5-19:57:36      1 compute-001
             17965       cpu X12.2_13 30031031  R 5-19:52:40      1 compute-001
             17966       cpu X12.2_13 30031031  R 5-19:52:11      1 compute-001
             17967       cpu X12.2_12 30031031  R 5-19:35:30      1 compute-001
             17968       cpu X12.2_12 30031031  R 5-19:34:30      1 compute-001
             17969       cpu X12.2_11 30031031  R 5-19:29:03      1 compute-001
             17970       cpu X12.2_11 30031031  R 5-19:25:31      1 compute-001
             17971       cpu X12.2_10 30031031  R 5-18:49:38      1 compute-001
             17972       cpu X12.2_10 30031031  R 5-18:28:52      1 compute-001
             17973       cpu X12.2_09 30031031  R 5-18:24:49      1 compute-001
             ......

The columns indicate:

squeue can also be queried to show more information about jobs, check the man squeue page for more details.

To see your own jobs, you can use:

squeue -u $USER

Running jobs

There are two main ways to run jobs on Wolffe: interactively and in batch mode. Note that you should never run jobs directly on the login node, as this can disrupt other users.

Interactive jobs

To ask for interactive resources, you can use the sinteractive command:

sinteractive -p cpu --time=0:30:00

This command requests an interactive session on the cpu partition for 30 minutes. You can adjust the partition and time limit as needed. Once the resources are allocated, you will be dropped into a shell on one of the compute nodes, where you can run commands interactively.

We’ll run a simple neural network example. First, load the Python module:

module load Python/Python3.10

Then, run the script:

python3 -u MLP.py

Note that the -u option is used to force the output to be unbuffered, which is useful for interactive sessions. This code should take a few seconds to run, for this case the -u flag doesn’t make a lot of difference.

Use interactive jobs for testing and debugging your code, for running actual jobs, use the batch mode.

Batch jobs

To run jobs in batch mode, you need to create a job script that specifies the resources your job needs and the commands to run. For this example, we’ll use the MLP.py script we used in the interactive session.

To submit a batch job, create a script file (e.g., first_script.sh) with the following content:

#! /usr/bin/env bash
#
#SBATCH --job-name=MLP
#SBATCH --output=S-res.txt
#SBATCH --error=S-err.txt
#
#SBATCH --ntasks=1
#SBATCH --time=00:05:00
#SBATCH --partition=cpu

# load the module
module load Python/Python3.10

# move to work directory
cd ~/Intro-HPC-workshop/02.Working_on_Wolffe/

# do the submission
python3 -u MLP.py
sleep 60

This script does the following:

A note about resource requests.

We can then submit this job script using the sbatch command:

sbatch first_script.sh

You can then check the status of your job using squeue -u $USER.

So you should see something like this:

(base) [30057355@wolffe 02.Working_on_Wolffe]$ sbatch first_script.sh 
Submitted batch job 18330
(base) [30057355@wolffe 02.Working_on_Wolffe]$ squeue -u $USER
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
             18330       cpu      MLP 30057355  R       0:07      1 compute-003

Once the job is complete, you can check the output in the S-res.txt file. This file will contain the output of the Python script, including any print statements. You can also check any error outputs in the S-err.txt file.

Other useful information:

We leave these items as exercises for you to try out.

In summary, we:

  1. Created a python script that trains a simple neural network on the digits dataset.
  2. Created a job script that specifies the resources needed and the commands to run.
  3. Submitted the job script using sbatch.
  4. Checked the status of the job using squeue.
  5. Check the output and the error of the job in the S-res.txt and S-err.txt files.

Parallel jobs and workflow management

Parallel jobs

The power of the HPC comes from the ability to run jobs with multiple tasks or processes. For example, when training a machine learning model, you will want to run multiple training jobs with different hyperparameters to find the best model. You can do it in a single script using for loops, but this can be inefficient and hard to manage. Instead, you should run training tasks as separate jobs, each with its own set of resources. This is where the power of HPC systems comes into play, as they can run many jobs in parallel, significantly speeding up computations.

We will modify the previous python script to accept command line arguments for the random state, and then submit multiple jobs with different random states. To do this, we will create a new script called MLP_pararg.py that accepts a random state as a command line argument.

Test the script by running it with command line input of random states in interactive mode (or in batch mode if you prefer):

python3 MLP_pararg.py --random_state 42

This should run the script and print the test accuracy, and also output the results to a file named res_42.txt.

Next, we will create a job script that submits multiple jobs with different random states. Create a new script called second_script.sh with the following content:

#! /usr/bin/env bash
#
#SBATCH --job-name=MLP
#SBATCH --output=output/S-%a-res.txt
#SBATCH --error=output/S-%a-err.txt
#
#SBATCH --ntasks=1
#SBATCH --time=00:05:00
#SBATCH --partition=cpu
#SBATCH --array=1-10   # Array job with 10 tasks

# load the module
module load Python/Python3.10

# move to work directory
cd ~/Intro-HPC-workshop/02.Working_on_Wolffe/

data_file='random_state.txt'
# read the i-th line from the file and store it as "n"
n=$(sed -n "${SLURM_ARRAY_TASK_ID}p" $data_file)

echo "Running task ${SLURM_ARRAY_TASK_ID} with random state ${n}"

# do the submission
python3 -u MLP_pararg.py --random_state $n
sleep 60

This script does the following:

We’ll also need to create the random_state.txt file, which contains a list of random states, one per line.

You can then submit this job script using the sbatch command:

sbatch second_script.sh

You can check the status of your jobs using squeue -u $USER, and you should see something like this:

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
      18341_[7-10]       cpu      MLP 30057355 PD       0:00      1 (Resources)
           18341_1       cpu      MLP 30057355  R       0:50      1 compute-003
           18341_2       cpu      MLP 30057355  R       0:50      1 compute-003
           18341_3       cpu      MLP 30057355  R       0:50      1 compute-002
           18341_4       cpu      MLP 30057355  R       0:50      1 compute-002
           18341_5       cpu      MLP 30057355  R       0:50      1 compute-002
           18341_6       cpu      MLP 30057355  R       0:50      1 compute-002

Each job in the array has its own job ID, using the main job ID followed by an underscore and the task number (e.g., 18341_1, 18341_2, etc.). The jobs will run in parallel, and you can check the output files in the output directory to see the results of each job. Any job that is still pending will have a status of PD (pending), and they are groups into a single job ID with the task numbers in square brackets (e.g., 18341_[7-10]).

Workflow management

Sometimes you will want to run a series of jobs that depend on each other, for example, you may want to run a job that processes all the accuracies of the MLP training with different random states. In this case, you can use job dependencies to chain jobs together.

To do this, you can use the --dependency option in the sbatch command. For example, if you have a job that processes the results of the MLP training and you want it to run only after all the MLP training jobs have completed, you can submit the processing job with a dependency on the MLP training jobs.

Let’s first create a Python script that gathers all the results of the MLP training into one file, and a job script to run it. We can test everything runs by submitting the job script:

sbatch collect.sh

We then write another Python script that summarises the results, and a job script to run it. We can test this works by submitting the job script:

sbatch summarise.sh

Waiting for a job to complete before submitting the next one can be tedious, so we can use job dependencies to automate this process. We can do this using the following commands:

(base) [30057355@wolffe 02.Working_on_Wolffe]$ sbatch --begin=now+120 second_script.sh 
Submitted batch job 18374
(base) [30057355@wolffe 02.Working_on_Wolffe]$ sbatch -d afterok:18374 collect.sh 
Submitted batch job 18375
(base) [30057355@wolffe 02.Working_on_Wolffe]$ sbatch -d afterok:18375 summarise.sh 
Submitted batch job 18376

The --begin=now+120 option in the first command specifies that the job should start in 120 seconds, which gives us time to submit the next jobs and setup dependencies before the job runs.

The -d afterok:18374 option in the second command specifies that the job should only run after job 18374 has completed successfully. The afterok dependency means that the job will only run if the previous job completed without errors. Note that we use the job ID of the first job (18374) to set the dependency for the second job (18375), instead of the array job ID. Slurm will automatically wait for all tasks in the array job to complete before running the dependent job.

Running squeue -u $USER will show you the status of all these jobs in the queue. You should see something like this:

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
      18374_[7-10]       cpu      MLP 30057355 PD       0:00      1 (Resources)
             18375       cpu  collect 30057355 PD       0:00      1 (Dependency)
             18376       cpu summaris 30057355 PD       0:00      1 (Dependency)
           18374_1       cpu      MLP 30057355  R       0:05      1 compute-003
           18374_2       cpu      MLP 30057355  R       0:05      1 compute-003
           18374_3       cpu      MLP 30057355  R       0:05      1 compute-002
           18374_4       cpu      MLP 30057355  R       0:05      1 compute-002
           18374_5       cpu      MLP 30057355  R       0:05      1 compute-002
           18374_6       cpu      MLP 30057355  R       0:05      1 compute-002

Or we can watch the jobs move through the queue using the watch command:

(base) [30057355@wolffe 02.Working_on_Wolffe]$ watch squeue -u $USER
             JOBID PARTITION     NAME     USER ST	TIME  NODES NODELIST(REASON)
      18386_[1-10]	 cpu	  MLP 30057355 PD	0:00	  1 (BeginTime)
             18387	 cpu  collect 30057355 PD	0:00	  1 (Dependency)
             18388	 cpu summaris 30057355 PD	0:00	  1 (Dependency)
# MLP job is watiing to begin, others are pending due to dependencies

             JOBID PARTITION     NAME     USER ST	TIME  NODES NODELIST(REASON)
      18386_[7-10]	 cpu	  MLP 30057355 PD	0:00	  1 (Resources)
             18387	 cpu  collect 30057355 PD	0:00	  1 (Dependency)
             18388	 cpu summaris 30057355 PD	0:00	  1 (Dependency)
           18386_1	 cpu	  MLP 30057355  R       0:18	  1 compute-003
           18386_2	 cpu	  MLP 30057355  R       0:18	  1 compute-003
           18386_3	 cpu	  MLP 30057355  R       0:18	  1 compute-002
           18386_4	 cpu	  MLP 30057355  R       0:18	  1 compute-002
           18386_5	 cpu	  MLP 30057355  R       0:18	  1 compute-002
           18386_6	 cpu	  MLP 30057355  R       0:18	  1 compute-002
# Some of the MLP jobs are running, others are pending due to resources, and the collect and summarise jobs are pending due to dependencies

             JOBID PARTITION     NAME     USER ST	TIME  NODES NODELIST(REASON)
             18387	 cpu  collect 30057355 PD	0:00	  1 (Dependency)
             18388	 cpu summaris 30057355 PD	0:00	  1 (Dependency)
           18386_9	 cpu      MLP 30057355  R       0:03	  1 compute-002
          18386_10	 cpu	  MLP 30057355  R       0:03	  1 compute-002
           18386_7	 cpu	  MLP 30057355  R       0:08	  1 compute-003
           18386_8	 cpu	  MLP 30057355  R       0:08	  1 compute-003
# All MLP jobs are running, collect and summarise jobs are pending due to dependencies

             JOBID PARTITION     NAME     USER ST	TIME  NODES NODELIST(REASON)
             18388	 cpu summaris 30057355 PD	0:00	  1 (Dependency)
             18387	 cpu collect  30057355 R	0:00	  1 compute-003
# All MLP jobs are finished, collect job is running, summarise job is pending due to dependency

             JOBID PARTITION     NAME     USER ST	TIME  NODES NODELIST(REASON)
             18388	 cpu summaris 30057355 R	0:00	  1 compute-003
# collect job is finished, summarise job is running. 

             JOBID PARTITION     NAME     USER ST	TIME  NODES NODELIST(REASON)
# all jobs are done.             

Once everything is done, you will want to clean up the output files and any temporary files you created. You can do this by creating a script called cleanup.sh so that you know you’re deleting the right files. We leave this as an exercise.

Using the GPU

Other than the ability to run massive parallel jobs, HPC systems also give you access to powerful GPUs. Wolffe currently has NVIDIA A100 and A30 GPUs available for use. To use the GPU, you need to specify the GPU partition when submitting your job.

To use the GPU, you’ll first need a Pyhon (or other) script that uses the GPU. For this example, we’ll use NN_gpu.py that trains a neural network on the digits dataset using the GPU. This script uses the PyTorch library, which is a popular deep learning framework that can take advantage of GPUs for training models.

We now use a new script to run the NN_gpu.py script on the GPU. The script is similar to the previous job scripts, but it specifies the GPU partition and requests a GPU:

#! /usr/bin/env bash
#
#SBATCH --job-name=NN_gpu
#SBATCH --output=output/S-gpu-out.txt
#SBATCH --error=output/S-gpu-err.txt
#
#SBATCH --time=00:05:00
#SBATCH --partition=ampere80
#SBATCH --cpus-per-task=1

# load the module
module load PyTorch/Python3.10

# move to work directory
cd ~/Intro-HPC-workshop/02.Working_on_Wolffe/

# do the submission
python3 -u NN_gpu.py
sleep 60

This script does the following:

Note that in other HPC systems, you may need to use #SBATCH --gres=gpu:1 (gres for “generic resources”) to request a GPU, but on Wolffe, the --partition=ampere80 option is sufficient to request a GPU.

Tips and tricks

To make the most of the HPC resources: