National Computational Infrastructure

NCI

Parallel Programming and Performance Optimisation (Exercises)

Download Course Slides

Contents

  1. Prerequisites
  2. Exercise 1
  3. Exercise 2

Prerequisites


  • We expect solid experience with C/Fortran programming language and Linux usage;
  • Prior experience with parallel computing is encouraged but not required;
  • Terminal with X-windows. (Xquartz for OSX, X11 for Unix/Linux or Xming for Windows)

Exercise 1


1.1 MPI Hello world

  • Source Code of helloworld.c:
    
    #include  
    #include “mpi.h”
    
    int main (int argc, char **argv)
    {
    int rank, size, name_len;
    char processor_name[MPI_MAX_PROCESSOR_NAME];
    
    MPI_Init (&argc, &argv); /* starts MPI */
    MPI_Comm_rank (MPI_COMM_WORLD, &rank); /* get current process id */
    MPI_Comm_size (MPI_COMM_WORLD, &size); /* get number of processes */
    MPI_Get_processor_name(processor_name, &name_len);
    printf( “Hello world from process %d of %d (processor %s)\n”, rank, size, processor_name );
    MPI_Finalize();
    return 0;
    } 
    
  • Source Code of helloworld.f

    c Fortran example
    program hello
    include ‘mpif.h’
    integer rank, size, ierror, tag, status(MPI_STATUS_SIZE)

    call MPI_INIT(ierror)
    call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierror)
    call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror)
    print*, ‘node’, rank, ‘: Hello world’
    call MPI_FINALIZE(ierror)
    end

  •  Compilation:

    $ module load openmpi/1.8.5
    $ mpicc helloworld.c -o helloworld  // for C version
    $ mpifort helloworld.f -o helloworld  // for Fortran version

  • PBS Script exe1.pbs:

    #!/bin/bash
    #PBS -q express
    #PBS -l walltime=00:05:00
    #PBS -l mem=2G
    #PBS -l ncpus=8
    module load openmpi/1.8.5
    cd $PBS_O_WORKDIR
    mpirun -np 8 ./helloworld

  • Run the job:
    $ qsub exe1.pbs
  • Hints
     qstat (jobid) // used to monitor job status (Queue or Running or Finished)
    helloworld1
  • Sample output

    helloworld

1.2 MPI Point-to-Point Communication

  • Source code of sendrecv.c

    #include
    #include “mpi.h”

    int main (int argc, char **argv)
    {
    int rank, size, i, number;

    MPI_Init (&argc, &argv); /* starts MPI */
    MPI_Comm_rank (MPI_COMM_WORLD, &rank); /* get current process id */
    MPI_Comm_size (MPI_COMM_WORLD, &size); /* get number of processes */

    if (rank==0) {
    for (i=1; i
    MPI_Send(&i, 1, MPI_INT, i, 0, MPI_COMM_WORLD);
    } else {
    MPI_Recv(&number, 1, MPI_INT, MPI_ANY_SOURCE, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
    printf(“Process %d received number %d from process 0\n”, rank, number);
    }

    MPI_Finalize();
    return 0;
    }

  • Compilation:

    $ mpicc sendrecv.c -o sendrecv  

  • PBS job and run
    modify exe1.pbs to require 4 processors to run the sendrecv program.
    #PBS -l ncpus=4
    mpirun -np 4 ./sendrecv
  • Run the job with different number of processes 8, 16, 32
    #PBS -l ncpus=32
    mpirun -np 8 ./sendrecv
    mpirun -np 16 ./sendrecv
    mpirun -np 32 ./sendrecv
  • Hint PBS script
    sendrecv1
  • Sample output
    sendrecv

1.3 MPI Collective Communication

  • Source code of scatter.c

    #include “mpi.h”
    #include
    #include
    #define SIZE 4

    int main (int argc, char *argv[])
    {
    int numtasks, rank, sendcount, recvcount;
    float sendbuf[SIZE][SIZE] = {
    {1.0, 2.0, 3.0, 4.0},
    {5.0, 6.0, 7.0, 8.0},
    {9.0, 10.0, 11.0, 12.0},
    {13.0, 14.0, 15.0, 16.0} };
    float recvbuf[SIZE];

    MPI_Init(&argc,&argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &numtasks);

    if (numtasks == SIZE) {
    sendcount = SIZE;
    recvcount = SIZE;
    MPI_Scatter(sendbuf, sendcount, MPI_FLOAT, recvbuf, recvcount,
    MPI_FLOAT, 0 ,MPI_COMM_WORLD);

    printf(“rank= %d Results: %f %f %f %f\n”,rank,recvbuf[0],
    recvbuf[1],recvbuf[2],recvbuf[3]);
    }
    else
    printf(“Must specify %d processors. Terminating.\n”,SIZE);

    MPI_Finalize();
    }

  • Compilation:

    $ mpicc scatter.c -o scatter  

  • PBS job and run
    modify exe1.pbs to require 4 processors to run the scatter program.
  • Hint PBS script
    scatter1
  • Sample output
    scatter

 

Exercise 2


2.1 NPB MPI Benchmark

http://www.nas.nasa.gov/publications/npb.html

  • Enter NPB 3.3 directory
    $ cd /short/c25/aaa777/parallel_exe/NPB3.3-MPI/
  • Edit config/make.def
    MPIF77 = mpif77
  • Compilation
    $ module unload openmpi
    $ module load openmpi/1.6.5

    $ make cg CLASS=B NPROCS=16
  • PBS job and run

    $ cd bin
    $ cat npb.pbs
    #!/bin/bash

    #PBS -q express
    #PBS -l ncpus=16
    #PBS -l mem=10G
    #PBS -l walltime=00:05:00

    module load openmpi/1.6.5
    cd $PBS_O_WORKDIR
    mpirun -np 16 ./cg.B.16
    $ qsub npb.pbs
    Sample output

    cgB16

  • Try to compile with -O3 flag and then rerun the program
    Hints
    Edit config/make.def
    FFLAGS  = -O3
    FLINKFLAG = -O3
    $ make clean
    $ make cg CLASS=B NPROCS=16
    $ cd bin
    $ qsub npb.pbs
    cgB16o3

2.2 IPM Profiling

  • Edit the PBS file
    module load openmpi/1.6.5
    module load ipm
  • Rerun the job and then IPM log file can be found
  • Plain text summary in standard output
    Sample output
    ipm1
    the following is IPM output with “-O3”
    ipm2
  • IPM viewer (X-windows is required)
    ipm_view [filename].ipm
    Sample output

    $ ssh raijin.nci.org.au -Y
    $ module load openmpi/1.6.5
    $ module load ipm
    $ cd /short/c25/aaa777/parallel_exe/NPB3.3-MPI/bin
    $ ipm_view 1211531.r-man2.aaa777.Course.1435579016.021031.ipm
    ipm3

2.3 Cache Behaviour

  • Similar to the above scenario using ipm/0.983-cache module
    Sample output
    Modify npb.pbs
    module load ipm/0.983-cache
    $ qsub npb.pbs
    ipm_cache1
    $ module unload ipm
    $ module load ipm/0.983-cache
    $ ipm_view …
    ipm_cache

2.4 Vampir Tracing

  • Edit config/make.def
     MPIF77 = mpif77-vt
  • Recompile and run the program
  • View the tracing logs using Vampir (X-windows needed)
    $ module load vampir
    $ vampir [filename].otf

    Details:
    Edit config/Make.config
    MPIF77 = mpif77-vt
    $ make clean
    $ make cg CLASS=B NPROCS=16
    $ cd bin
    Edit npb.pbs
    module load vampir (replace ipm)
    $ qsub npb.pbs
    $ module unload ipm (if ipm is loaded before)
    $ module load vampir

    Sample output

    vampir

 

 

In Collaboration With