How to compile and run WRF on Conte with Intel Xeon Phi Coprocessors

References:

Setting up your environment

In bash shell:
module load intel
module load impi/4.1.1.036
export PNETCDF=/apps/mic/pnetcdf/1.4.1_intel-13.1.1.163
export NETCDF=/apps/mic/pnetcdf/1.4.1_intel-13.1.1.163
export J="-j 16"
export WRFIO_NCD_LARGE_FILE_SUPPORT=1

    Note: J="-j 16" allows for parallel compilation


WRF

Configuring

./configure

Select option #21 to build 

21.  Linux x86_64 i486 i586 i686, Xeon Phi (MIC architecture) ifort compiler with icc  (dm+sm)

Select option #1 for the nesting option, unless you 

Compile for nesting? (1=basic, 2=preset moves, 3=vortex following) [default 1]: 1

If you're going to be driving WRF with climate model data that uses a 365 day calendar (no leap year), add the following to your configure.wrf file:

add  -DNO_LEAP_CALENDAR after ARCH_LOCAL


Note: Make sure you also compile WPS with this option. Paste -DNO_LEAP_CALENDAR after CPPFLAGS in the configure.wps file before compiling

Compiling

Both front end and compute nodes (Phi-enabled) have the required Intel tools and libraries available to compile Phi programs, so compiling should not be any different than usual.

./compile em_real >& compile_xeonphi.log

However, it is probably better to submit a longer-running compilation as a job on a compute node. An example job script is below:

#!/bin/bash
#PBS -N compilewrf
#PBS -e compilewrf.err
#PBS -o compilewrf.out
#PBS -l nodes=1:ppn=16
#PBS -l walltime=01:00:00
#PBS -m a


cd $PBS_O_WORKDIR

module load intel
module load impi/4.1.1.036
export PNETCDF=/apps/mic/pnetcdf/1.4.1_intel-13.1.1.163
export NETCDF=/apps/mic/pnetcdf/1.4.1_intel-13.1.1.163
export J="-j 16"
export WRFIO_NCD_LARGE_FILE_SUPPORT=1
./compile em_real >& compile_xeonphi.log
exit


          Now just qsub your job script, and you should be good to go!

Running

           Now this is where the procedure to run WRF changes quite a bit from what we are normally used to.  Since we will be using pnetcdf, we need to make some modifications to the namelist.input file, namely change the io_form_*  options from 2 (netcdf) to 11 (pnetcdf). Also, pnetcdf does not like colons in the wrfout file name, so add nocolons = .true. to your namelist.


io_form_history                 = 11
io_form_restart                 = 11
io_form_input                   = 11
io_form_boundary                = 11
nocolons                        = .true.,

interactive job

Single node

qsub -I -l nodes=1:ppn=16:mics=2 -l walltime=04:00:00
 

Multi-node

Multi-node support for the phis is still experimental. A special queue called 'testmics' has been set up for users to submit to and test the performance of multiple nodes worth of phis for their applications

qsub -I -l nodes=4:ppn=16:mics=2 -l walltime=04:00:00 -q testmics
cd $PBS_O_WORKDIR
cat $PBS_MICFILE > host
ssh mic0
cd /path/to/WRF/directory
./run_wrf_mic


We have to use the PBS_MICFILE to set up our host. The PBS_MICFILE contains a list of all of the phis allocated to the job.

**you may have to add ssh authorization keys in order for the head mic to be able to access other phis without a password. This should be taken care of by RCAC shortly.

Example run script ('run_wrf_mic'):

source /etc/profile

module load impi

export I_MPI_FABRICS=shm:tcp

export KMP_STACKSIZE=100m
ulimit -s unlimited

export WRF_NUM_TILES_X=3
export WRF_NUM_TILES_Y=60
export I_MPI_PIN_MODE=mpd
export KMP_PLACE_THREADS=60C,3T
export OMP_NUM_THREADS=180
export KMP_AFFINITY=balanced,granularity=thread
export KMP_LIBRARY=turnaround
export KMP_BLOCKTIME=infinite

cd /scratch/lustreD/k/khoogewi/WRFV3_xeonphi/run/TEST_wrf_BC_20360415

mpiexec.hydra -np 8 -machinefile ./host ./wrf.exe

              ** -np 8 <- 8 = nodes*mics

If you encounter this error, particularly after a HALO call (increase your debugging in namelist.input): 

forrtl: error (69): process interrupted (SIGINT)

you likely need to play around with the X and Y tiling

batch job



Comments