High Performance Computing (HPC) refers to the use of powerful computers, often organized into clusters, to perform large-scale calculations or process massive amounts of data quickly.
HPC is used when a task is:
Examples include:
Most HPC systems have a layout something like the following:
Feature | Desktop Computer | HPC System |
---|---|---|
CPU cores | 2–16 | 100s–100,000s (across nodes) |
Memory (RAM) | Up to 64 GB | Multiple terabytes (cluster-wide) |
Storage | Local HDD/SSD | Shared high-performance storage |
Networking | Ethernet/Wi-Fi | High-speed interconnect |
Parallelism | Limited | Fundamental design feature |
Job execution | Manual run | Scheduled jobs via a queueing system |
System | Kind | Access | Notes |
---|---|---|---|
NCI | HPC cluster | Australian researchers | Annual call for applications |
WSU Cluster | HPC Cluster | Academic Supervisor / ITDS (ACE - SSTaRS) | No requirement for formal applicatons, or time limits associated with accounts or projects |
Azure / GCloud / AWS | Cloud computing | Anyone with a credit card | On demand computing as a service |
An HPC system is made up of nodes, which are individual servers connected into a larger system. Common node types include:
Note. The only system you would normally be connecting to is the Login Node - at no point should you be connecting to individual nodes and directly running jobs on them
HPC environments use different storage areas optimized for different tasks:
The interconnect is the high-speed network connecting all the compute nodes in the cluster.
Unlike regular Ethernet (like what you’d use at home or in an office), HPC systems use specialized networks such as InfiniBand or Omni-Path to:
Think of it like having a private, ultra-fast highway between every machine, rather than a shared local road — which keeps everything moving quickly and smoothly when the workload is distributed.
An HPC cluster is the full system: a collection of compute nodes, login nodes, storage, and interconnects, all managed centrally.
Users submit jobs to a scheduler, which:
Clusters allow multiple users to run large jobs at the same time, efficiently sharing resources.
HPC systems are generally accessed through the command line via SSH. Users don’t run jobs directly, but instead write job scripts that are submitted to a job scheduler.
Common job schedulers:
These schedulers manage when and where jobs are run on the cluster.
A basic SLURM script might look like this:
#!/bin/bash
#SBATCH --job-name=myjob
#SBATCH --ntasks=4
#SBATCH --time=00:30:00
#SBATCH --output=output.log
module load python/3.11
python myscript.py
Key points:
#SBATCH
linesmodule load
(we will come to this soon!)An example of how the user connects to login node, and then the scheduler distributes the jobs to the relevant queues is below: