4 Submitting jobs
The server clusters are shared resources among researchers, and a job queue process is used to manage and allocate resources among users. A job is simply a set of instructions that includes requests for resources and the specific set of commands—typically as scripts—to be executed, such as commands for transforming or analyzing data. When a user submits a job to the server for execution it enters the queue and is scheduled on a specific compute node for a specific time.
4.1 Knot
The Knot cluster has 112 regular compute nodes with 12 cores per node and either 48 GB or 64 GB of RAM per node. There are also 4 ‘fat’ nodes with either 512 GB or 1 TB of RAM and 6 GPU nodes. Knot uses TORQUE PBS to schedule jobs.
To submit a job, first create a new .pbs file with the nano editor using the command
$ nano submit.pbs
The typical structure of a .pbs file for a serial job using R is
#!/bin/bash
#PBS -l nodes=1:ppn=1
#PBS -l walltime=2:00:00
#PBS -V
cd $PBS_O_WORKDIR
Rscript --vanilla script.R
where 1 node with 1 processor is requested with a 2-hour timeframe for computation. The walltime
option can be excluded if the computation time is unknown. The default walltime is 75 hours. The cd $PBS_O_WORKDIR
changes the working directory to where the .pbs file is located. The Rscript --vanilla script.R
line executes the commands in the specified R script. The filepath for the script should be relative to where the .pbs file is located or an absolute path can also be used. This line would change if using different software. There are also many other #PBS
options that can be included. Parallel jobs require different specifications, such as requesting more than 1 node and using mpirun
commands. Also be sure to include a blank line at the end of the file.
To submit a job, use the command
$ qsub submit.pbs
For jobs that require less than one hour or are used for testing and debugging purposes, use the short queue to minimize waiting time with the command
$ qsub -q short submit.pbs
For jobs that require large memory, use the command
$ qsub -q largemem submit.pbs
for nodes with 256 GB/node or the command
$ qsub -q xlargemem submit.pbs
for nodes with 512 GB/node.
The qsub
command will return a job number. To check the status of a job, use the command
$ qstat <job number>
or
$ qstat -u $USER
To cancel or delete a job, use the command
$ qdel <job number>
The outputs of the analysis in the script will be returned in the same folder as the .pbs file that was submitted, typically with the filename structure submit.pbs.[job number]
unless otherwise specified.
4.2 Pod
The Pod cluster is the newest on campus and offers the most compute resources. There are 64 regular compute nodes with 40 cores per node and 192 GB of RAM per node. There are also 4 ‘fat’ nodes with 1 TB of RAM per node and 3 GPU nodes. Pod uses SLURM to schedule jobs.
To submit a job, first create a new .job file with the nano editor using the command
$ nano submit.job
The typical structure of a .job file for a serial job using R is
#!/bin/bash -l
#SBATCH --nodes=1 --ntasks-per-node=1
#SBATCH --time=2:00:00
cd $SLURM_SUBMIT_DIR
module load R
Rscript --vanilla script.R
where 1 node with 1 processor is requested. The cd $PBS_O_WORKDIR
changes the working directory to where the .job file is located. The module load R
line loads R, and then the Rscript --vanilla script.R
line executes the commands in the specified R script. The filepath for the script should be relative to where the .job file is located or an absolute path can also be used. These last two lines would change if using different software. There are also many other #SBATCH
options that can be included. The default walltime for computation is 32 hours. Parallel jobs require different specifications, such as requesting more than 1 node and using mpirun
commands. Also be sure to include a blank line at the end of the file.
To submit a job, use the command
$ sbatch submit.job
For jobs that require less than one hour or are used for testing and debugging purposes, you can use the short queue to minimize waiting time with the command
$ sbatch -p short submit.job
For jobs that require large memory, use the command
$ sbatch -p largemem submit.job
for nodes with 1 TB/node.
The sbatch
command will return a job number. To check the status of a job, use one of the commands
$ squeue -j <job number>
or
$ squeue -u $USER
To cancel or delete a job, use the command
$ scancel <job number>
The outputs of the analysis in the script will be returned in the same folder as the .job file that was submitted, typically with the filename structure slurm-[jobnumber].out
unless otherwise specified.