7.0 Submitting a Parallel Job
This section provides step-by-step instructions for submitting a parallel job on an SGE-grid type cluster using a sample job script. NB: Compute Node 4 is reserved for ONLY SERIAL execution. Parallel jobs should be run on hosts with Infiniband enabled parallel execution.
7.1 Prerequisites
Access to an SGE-grid cluster.
Necessary permissions to submit jobs.
Required modules (e.g., OpenMPI) installed on the cluster.
Use
module availto check available software and libraries.
7.2 Job Submission Script
Below is a sample job submission script. Save this script as parallel_job.sh or any name of your choice.
#!/bin/bash
#$ -N matmul
#$ -cwd
#$ -pe mpi 64
#$ -l h_rt=24:10:00
#$ -j y
#$ -o mpi_matmul.txt
module load openmpi
mpirun -mca btl_openib_allow_ib 1 -np 64 ./matmulmpi.x
7.3 Running the Sample Job Script
This section provides a detailed explanation of the job submission script and its components.
7.4 Script Breakdown
Shebang Line:
#!/bin/bashThis line indicates that the script should be run in the Bash shell.
Job Name:
#$ -N matmulThis sets the name of the job to “matmul”.
Current Working Directory:
#$ -cwdThis option tells the scheduler to run the job from the current working directory.
Parallel Environment:
#$ -pe mpi 64This specifies that the job will use the MPI parallel environment with 64 slots.
Runtime Limit:
#$ -l h_rt=24:10:00This sets a hard runtime limit of 24 hours and 10 minutes for the job.
Job Output:
#$ -j y #$ -o mpi_matmul.txt
These lines combine standard output and error into a single file named
mpi_matmul.txt. You can separate them.Loading Modules:
module load openmpi
This command loads the OpenMPI module, which is necessary for running MPI applications.
Running the MPI Program:
mpirun -mca btl_openib_allow_ib 1 -np 64 ./matmulmpi.x
This command executes the MPI program
matmulmpi.xusing 64 processes.
7.5 Summary
This job submission script is designed to run a parallel matrix multiplication program using MPI on an SGE-grid cluster. Ensure that all prerequisites are met before submitting the job.
7.6 Running the Job
To submit your parallel job, follow these steps:
Open a terminal on your local machine or connect to the SGE-grid cluster via SSH.
Navigate to the directory where your job submission script (
parallel_job.sh) is located.cd /path/to/your/script
Submit the job using the
qsubcommand:qsub parallel_job.shMonitor your job status with the following command:
qstat -u $USER
Check the output of your job in the specified output file (
mpi_matmul.txt) once it has completed.
7.7 Quantum Espresso 7.4.1 Users
#!/bin/bash
#$ -N qe_scf
#$ -cwd
#$ -pe mpi 64
#$ -l h_rt=24:00:00
#$ -l hostname=compute-0-5|compute-0-2 # compute-0-4 runs in serial only
#$ -j y
#$ -o qe_job.txt
module load espresso/7.4.1
mpirun -mca btl_openib_allow_ib 1 -np 64 pw.x < scf.in > scf.out
7.8 Additional Notes For Compiling and Running in Parallel
Ensure that your executable (
matmulmpi.x) is compiled and available in the same directory as your job script.Adjust the number of processors (
-pe mpi 64) and runtime (-l h_rt=24:10:00) according to your job requirements and cluster policies.Ensure that your Makefile is pointing to the correct set of library and include directories.
Use
module show module_nameto display the paths to the libraries for the modules loaded. (e.g.module show fftw/3.3.10-gnu11)For more information on job submission options, refer to the SGE documentation or contact system administrator.