4 Submitting jobs

The server clusters are shared resources among researchers, and a job queue process is used to manage and allocate resources among users. A job is simply a set of instructions that includes requests for resources and the specific set of commands—typically as scripts—to be executed, such as commands for transforming or analyzing data. When a user submits a job to the server for execution it enters the queue and is scheduled on a specific compute node for a specific time.

4.1 Knot

The Knot cluster has 112 regular compute nodes with 12 cores per node and either 48 GB or 64 GB of RAM per node. There are also 4 ‘fat’ nodes with either 512 GB or 1 TB of RAM and 6 GPU nodes. Knot uses TORQUE PBS to schedule jobs.

To submit a job, first create a new .pbs file with the nano editor using the command

$ nano submit.pbs

The typical structure of a .pbs file for a serial job using R is

#!/bin/bash
#PBS -l nodes=1:ppn=1
#PBS -l walltime=2:00:00
#PBS -V

cd $PBS_O_WORKDIR

Rscript --vanilla script.R

where 1 node with 1 processor is requested with a 2-hour timeframe for computation. The walltime option can be excluded if the computation time is unknown. The default walltime is 75 hours. The cd $PBS_O_WORKDIR changes the working directory to where the .pbs file is located. The Rscript --vanilla script.R line executes the commands in the specified R script. The filepath for the script should be relative to where the .pbs file is located or an absolute path can also be used. This line would change if using different software. There are also many other #PBS options that can be included. Parallel jobs require different specifications, such as requesting more than 1 node and using mpirun commands. Also be sure to include a blank line at the end of the file.

To submit a job, use the command

$ qsub submit.pbs

For jobs that require less than one hour or are used for testing and debugging purposes, use the short queue to minimize waiting time with the command

$ qsub -q short submit.pbs

For jobs that require large memory, use the command

$ qsub -q largemem submit.pbs

for nodes with 256 GB/node or the command

$ qsub -q xlargemem submit.pbs

for nodes with 512 GB/node.

The qsub command will return a job number. To check the status of a job, use the command

$ qstat <job number>

$ qstat -u $USER

To cancel or delete a job, use the command

$ qdel <job number>

The outputs of the analysis in the script will be returned in the same folder as the .pbs file that was submitted, typically with the filename structure submit.pbs.[job number] unless otherwise specified.

4.2 Pod

The Pod cluster is the newest on campus and offers the most compute resources. There are 64 regular compute nodes with 40 cores per node and 192 GB of RAM per node. There are also 4 ‘fat’ nodes with 1 TB of RAM per node and 3 GPU nodes. Pod uses SLURM to schedule jobs.

To submit a job, first create a new .job file with the nano editor using the command

$ nano submit.job

The typical structure of a .job file for a serial job using R is

#!/bin/bash -l
#SBATCH --nodes=1 --ntasks-per-node=1
#SBATCH --time=2:00:00

cd $SLURM_SUBMIT_DIR

module load R
Rscript --vanilla script.R

where 1 node with 1 processor is requested. The cd $PBS_O_WORKDIR changes the working directory to where the .job file is located. The module load R line loads R, and then the Rscript --vanilla script.R line executes the commands in the specified R script. The filepath for the script should be relative to where the .job file is located or an absolute path can also be used. These last two lines would change if using different software. There are also many other #SBATCH options that can be included. The default walltime for computation is 32 hours. Parallel jobs require different specifications, such as requesting more than 1 node and using mpirun commands. Also be sure to include a blank line at the end of the file.

To submit a job, use the command

$ sbatch submit.job

For jobs that require less than one hour or are used for testing and debugging purposes, you can use the short queue to minimize waiting time with the command

$ sbatch -p short submit.job

For jobs that require large memory, use the command

$ sbatch -p largemem submit.job

for nodes with 1 TB/node.

The sbatch command will return a job number. To check the status of a job, use one of the commands

$ squeue -j <job number>

$ squeue -u $USER

To cancel or delete a job, use the command

$ scancel <job number>

The outputs of the analysis in the script will be returned in the same folder as the .job file that was submitted, typically with the filename structure slurm-[jobnumber].out unless otherwise specified.