Intro-HPC-workshop

What is High Performance Computing?

High Performance Computing (HPC) refers to the use of powerful computers, often organized into clusters, to perform large-scale calculations or process massive amounts of data quickly.

HPC is used when a task is:

Examples include:

Most HPC systems have a layout something like the following:

HPC Structure, courtesy of [Ohio Supercomputer Center](https://www.osc.edu/resources/getting_started/hpc_basics)

HPC vs Desktop Computers

Feature Desktop Computer HPC System
CPU cores 2–16 100s–100,000s (across nodes)
Memory (RAM) Up to 64 GB Multiple terabytes (cluster-wide)
Storage Local HDD/SSD Shared high-performance storage
Networking Ethernet/Wi-Fi High-speed interconnect
Parallelism Limited Fundamental design feature
Job execution Manual run Scheduled jobs via a queueing system

What HPC Resources Are Available to Me?

System Kind Access Notes
NCI HPC cluster Australian researchers Annual call for applications
WSU Cluster HPC Cluster Academic Supervisor / ITDS (ACE - SSTaRS) No requirement for formal applicatons, or time limits associated with accounts or projects
Azure / GCloud / AWS Cloud computing Anyone with a credit card On demand computing as a service

Hardware

Different Nodes

An HPC system is made up of nodes, which are individual servers connected into a larger system. Common node types include:

Note. The only system you would normally be connecting to is the Login Node - at no point should you be connecting to individual nodes and directly running jobs on them


Storage Options

HPC environments use different storage areas optimized for different tasks:


Interconnect

The interconnect is the high-speed network connecting all the compute nodes in the cluster.

Unlike regular Ethernet (like what you’d use at home or in an office), HPC systems use specialized networks such as InfiniBand or Omni-Path to:

Think of it like having a private, ultra-fast highway between every machine, rather than a shared local road — which keeps everything moving quickly and smoothly when the workload is distributed.


Clusters

An HPC cluster is the full system: a collection of compute nodes, login nodes, storage, and interconnects, all managed centrally.

Users submit jobs to a scheduler, which:

Clusters allow multiple users to run large jobs at the same time, efficiently sharing resources.


Software

HPC systems are generally accessed through the command line via SSH. Users don’t run jobs directly, but instead write job scripts that are submitted to a job scheduler.

Job Schedulers

Common job schedulers:

These schedulers manage when and where jobs are run on the cluster.

A basic SLURM script might look like this:

#!/bin/bash
#SBATCH --job-name=myjob
#SBATCH --ntasks=4
#SBATCH --time=00:30:00
#SBATCH --output=output.log

module load python/3.11
python myscript.py

Key points:

An example of how the user connects to login node, and then the scheduler distributes the jobs to the relevant queues is below:

HPC structure, with user connecting through the Login node, and the scheduler accessing the relevant queues within the HPC