HPCC Use

In order to improve the computational efficiency of MIPTools, it is possible to use MIPTools via a high-performance computing cluster (HPCC). While we leave the details of HPCC use to the reader, briefly, clusters use a management and job scheduling system to organize user requests. Users may submit jobs to these management systems, which will in turn schedule and execute submitted jobs.

Note

This guide covers HPCCs configured to use slurm cluster manager. For HPCCs configured with another job manager, please review the documentation for the job manager of interest.

To submit a job via slurm, users can run the following:

sbatch <jobscript>

Each job script is a batch script with commands for the HPCC to execute. These files also contain configuration commands used by slurm. The bash shebang and these configuration commands make up what we call the header section of the script. An example is shown below:

 1#!/bin/bash
 2
 3# Request a specific partition:
 4# The partitions available are: batch, gpu, and bigmem
 5#SBATCH --partition=batch
 6
 7# Configure runtime, memory usage, and the number of CPUs:
 8#SBATCH --time=48:00:00
 9#SBATCH --mem=200G
10#SBATCH --cpus-per-task=32
11
12# Notify the user details about the job:
13#SBATCH --mail-user=example@mail.com
14#SBATCH --mail-type=ALL

The next section of the job script contains the bash code to be executed by the HPCC. This can contain a command to run a MIPTools app or some other MITPools command. For example, users could run the wrangler app:

16# Paths to bind to container
17project_resources=heome
18fastq_dir=fastq
19wrangler_dir=wrangler
20
21# Wrangler options
22probe_sets_used='HeOME96'
23sample_sets_used='JJJ'
24experiment_id='example_id'
25sample_list='sample_list.tsv'
26min_capture_length=30
27
28singularity run \
29  -B ${project_resources}:/opt/project_resources \
30  -B ${fastq_dir}:/opt/data \
31  -B ${wrangler_dir}:/opt/analysis \
32  --app wrangler miptools.sif \
33  -p ${probe_sets_used} -s ${sample_sets_used} -e ${experiment_id} \
34  -l ${sample_list} -c ${SLURM_CPUS_PER_TASK} -m ${min_capture_length}

This is just one of the many possible job scripts a user could use. Almost every aspect of the MIPTools pipeline could be configured to run via a HPCC. For more examples of job scripts, see the slurm-scripts repository. The MIPTools folder has scripts to run different MIPTools commands. Please feel free to add more scripts via a pull request!