7.0 Submitting a Parallel Job

This section provides step-by-step instructions for submitting a parallel job on an SGE-grid type cluster using a sample job script. NB: Compute Node 4 is reserved for ONLY SERIAL execution. Parallel jobs should be run on hosts with Infiniband enabled parallel execution.

7.1 Prerequisites

  • Access to an SGE-grid cluster.

  • Necessary permissions to submit jobs.

  • Required modules (e.g., OpenMPI) installed on the cluster.

  • Use module avail to check available software and libraries.

7.2 Job Submission Script

Below is a sample job submission script. Save this script as parallel_job.sh or any name of your choice.

#!/bin/bash
#$ -N matmul
#$ -cwd
#$ -pe mpi 64
#$ -l h_rt=24:10:00
#$ -j y
#$ -o mpi_matmul.txt

module load openmpi

mpirun -mca btl_openib_allow_ib 1 -np 64 ./matmulmpi.x

7.3 Running the Sample Job Script

This section provides a detailed explanation of the job submission script and its components.

7.4 Script Breakdown

  1. Shebang Line:

    #!/bin/bash
    

    This line indicates that the script should be run in the Bash shell.

  2. Job Name:

    #$ -N matmul
    

    This sets the name of the job to “matmul”.

  3. Current Working Directory:

    #$ -cwd
    

    This option tells the scheduler to run the job from the current working directory.

  4. Parallel Environment:

    #$ -pe mpi 64
    

    This specifies that the job will use the MPI parallel environment with 64 slots.

  5. Runtime Limit:

    #$ -l h_rt=24:10:00
    

    This sets a hard runtime limit of 24 hours and 10 minutes for the job.

  6. Job Output:

    #$ -j y
    #$ -o mpi_matmul.txt
    

    These lines combine standard output and error into a single file named mpi_matmul.txt. You can separate them.

  7. Loading Modules:

    module load openmpi
    

    This command loads the OpenMPI module, which is necessary for running MPI applications.

  8. Running the MPI Program:

    mpirun -mca btl_openib_allow_ib 1 -np 64 ./matmulmpi.x
    

    This command executes the MPI program matmulmpi.x using 64 processes.

7.5 Summary

This job submission script is designed to run a parallel matrix multiplication program using MPI on an SGE-grid cluster. Ensure that all prerequisites are met before submitting the job.

7.6 Running the Job

To submit your parallel job, follow these steps:

  1. Open a terminal on your local machine or connect to the SGE-grid cluster via SSH.

  2. Navigate to the directory where your job submission script (parallel_job.sh) is located.

    cd /path/to/your/script
    
  3. Submit the job using the qsub command:

    qsub parallel_job.sh
    
  4. Monitor your job status with the following command:

    qstat -u $USER
    
  5. Check the output of your job in the specified output file (mpi_matmul.txt) once it has completed.

7.7 Quantum Espresso 7.4.1 Users

#!/bin/bash
#$ -N qe_scf
#$ -cwd
#$ -pe mpi 64
#$ -l h_rt=24:00:00
#$ -l hostname=compute-0-5|compute-0-2   # compute-0-4 runs in serial only
#$ -j y
#$ -o qe_job.txt

module load espresso/7.4.1

mpirun -mca btl_openib_allow_ib 1 -np 64 pw.x < scf.in > scf.out

7.8 Additional Notes For Compiling and Running in Parallel

  • Ensure that your executable (matmulmpi.x) is compiled and available in the same directory as your job script.

  • Adjust the number of processors (-pe mpi 64) and runtime (-l h_rt=24:10:00) according to your job requirements and cluster policies.

  • Ensure that your Makefile is pointing to the correct set of library and include directories.

  • Use module show module_name to display the paths to the libraries for the modules loaded. (e.g. module show fftw/3.3.10-gnu11)

  • For more information on job submission options, refer to the SGE documentation or contact system administrator.