References:Setting up your environment
In bash shell:
module load intel
module load impi/4.1.1.036
export PNETCDF=/apps/mic/pnetcdf/1.4.1_intel-13.1.1.163
export NETCDF=/apps/mic/pnetcdf/1.4.1_intel-13.1.1.163
export J="-j 16"
export WRFIO_NCD_LARGE_FILE_SUPPORT=1
Note: J="-j 16" allows for parallel compilation
WRF
Select option #21 to build
21. Linux x86_64 i486 i586 i686, Xeon Phi (MIC architecture) ifort compiler with icc (dm+sm)
Select option #1 for the nesting option, unless you
Compile for nesting? (1=basic, 2=preset moves, 3=vortex following) [default 1]: 1
If you're going to be driving WRF with climate model data that uses a 365 day calendar (no leap year), add the following to your configure.wrf file:
add -DNO_LEAP_CALENDAR after ARCH_LOCAL
Note: Make sure you also compile WPS with this option. Paste -DNO_LEAP_CALENDAR after CPPFLAGS in the configure.wps file before compiling
Compiling
Both front end and compute nodes (Phi-enabled) have the required Intel tools and libraries available to compile Phi programs, so compiling should not be any different than usual.
./compile em_real >& compile_xeonphi.log
However, it is probably better to submit a longer-running compilation as a job on a compute node. An example job script is below:
#!/bin/bash
#PBS -N compilewrf
#PBS -e compilewrf.err
#PBS -o compilewrf.out
#PBS -l nodes=1:ppn=16
#PBS -l walltime=01:00:00
#PBS -m a
cd $PBS_O_WORKDIR
module load intel
module load impi/4.1.1.036
export PNETCDF=/apps/mic/pnetcdf/1.4.1_intel-13.1.1.163
export NETCDF=/apps/mic/pnetcdf/1.4.1_intel-13.1.1.163
export J="-j 16"
export WRFIO_NCD_LARGE_FILE_SUPPORT=1
./compile em_real >& compile_xeonphi.log
exit
Now just qsub your job script, and you should be good to go!
Running
Now this is where the procedure to run WRF changes quite a bit from what we are normally used to. Since we will be using pnetcdf, we need to make some modifications to the namelist.input
file, namely change the io_form_*
options from 2 (netcdf) to 11 (pnetcdf). Also, pnetcdf does not like colons in the wrfout file name, so add nocolons = .true.
to your namelist.
io_form_history = 11
io_form_restart = 11
io_form_input = 11
io_form_boundary = 11
interactive job
Single node
qsub -I -l nodes=1:ppn=16:mics=2 -l walltime=04:00:00
Multi-node
Multi-node support for the phis is still experimental. A special queue called 'testmics' has been set up for users to submit to and test the performance of multiple nodes worth of phis for their applications
q
sub -I -l nodes=4:ppn=16:mics=2 -l walltime=04:00:00 -q testmics
cd $PBS_O_WORKDIR
cat $PBS_MICFILE > host
ssh mic0
cd /path/to/WRF/directory
./run_wrf_mic
We have to use the PBS_MICFILE to set up our host. The PBS_MICFILE contains a list of all of the phis allocated to the job.
**you may have to add ssh authorization keys in order for the head mic to be able to access other phis without a password. This should be taken care of by RCAC shortly.
Example run script ('run_wrf_mic'):
export I_MPI_FABRICS=shm:tcp
export KMP_STACKSIZE=100m
export WRF_NUM_TILES_Y=60
export I_MPI_PIN_MODE=mpd
export KMP_PLACE_THREADS=60C,3T
export OMP_NUM_THREADS=180
export KMP_AFFINITY=balanced,granularity=thread
export KMP_LIBRARY=turnaround
export KMP_BLOCKTIME=infinite
cd /scratch/lustreD/k/khoogewi/WRFV3_xeonphi/run/TEST_wrf_BC_20360415
mpiexec.hydra -np 8 -machinefile ./host ./wrf.exe
** -np 8 <- 8 = nodes*mics
If you encounter this error, particularly after a HALO call (increase your debugging in namelist.input):
forrtl: error (69): process interrupted (SIGINT)
you likely need to play around with the X and Y tiling
batch job