Intro-HPC-workshop

Additional Information: Containers and Workflows

Overview

This section introduces two critical technologies that have transformed modern HPC: containers for reproducible computing environments and workflow management systems for orchestrating complex computational pipelines.


Part 1: Containers in HPC

The Problem: Dependency Hell

Every HPC system is different. Your local workstation has Python 3.9, but the cluster only has Python 3.6. You need a specific version of NumPy, but it conflicts with another user’s requirements. The specific cuda version you need isn’t installed, and you don’t have admin rights to install it. Your collaborator’s code works perfectly on their system but crashes on yours with cryptic library errors.

This is “dependency hell” - the nightmare of managing software dependencies across different computing environments. Traditional solutions like modules help, but they’re limited and system-specific.

What Are Containers and Why Do They Matter?

Containers package applications with all their dependencies into a portable, lightweight unit that runs consistently across different computing environments.

Key Benefits for HPC:

Introduction to Singularity/Apptainer

Singularity (now called Apptainer) is the container platform designed specifically for HPC environments.

Why Singularity/Apptainer for HPC:

Alternative: Docker

Basic Singularity/Apptainer Concepts

Common Use Cases

Quick Demo/Examples

# Pull a pre-built container
singularity pull docker://ubuntu:20.04

# Run a command in the container using the exec command
singularity exec ubuntu_20.04.sif cat /etc/os-release

# Run a terminal within the container with the shell command
singularity shell ubuntu_20.04.sif

Part 2: Workflow Management

The Problem: Computational Chaos

You have a complex analysis with 50 steps involving downloading data, pre-processing the data, setting up a ML model, optimising over a bunch of different hyper-parameters, re-training a model using the optimum parameters and datasets, making some predictions, aggregating the results, and outputting wonderful plots. All up, it takes 3 days to run. Step 47 fails at 2 AM, and you have to start over. Your pipeline works great on your laptop, but when you try to run it on the HPC cluster, you need to completely rewrite the job submission scripts, since you’re now working on a HPC cluster, and not a laptop you have full admin rights over. You want to run the same analysis on 100 datasets, but manually managing all those jobs is a nightmare. Your workflow uses both containers and bare-metal software, different queue systems from different HPC systems (you might have some jobs running on the WSU HPC, and some running at NCI), and various resource requirements - coordinating all of this manually is error-prone and time-consuming.

This is computational chaos - the challenge of orchestrating complex, multi-step analyses across different computing environments while managing failures, resources, and dependencies.

What Are Computational Workflows?

Workflows are automated sequences of computational tasks that process data through multiple steps, managing dependencies, inputs, outputs, and error handling.

Why Workflows Matter in HPC:

Introduction to Nextflow

Nextflow is a workflow management system designed for data-intensive computational pipelines, particularly popular in bioinformatics and scientific computing.

Key Nextflow Features:

Workflow Components

Real-World Example Scenarios

Integration with HPC Systems

Workflows automatically:


Bringing It All Together

The Power Combination: Containers + Workflows

Getting Started Recommendations

  1. Start Small: Begin with simple, single-step containers
  2. Use Existing Resources: Leverage pre-built containers and workflows
  3. Community Resources:
  4. Documentation: Both technologies have excellent documentation and active communities

Best Practices


Questions and Next Steps

Resources for Continued Learning:

Hands-on Opportunities:

Advanced Workshop Topics: