Configuration Files
imsi
imsi reads several machine-specific settings from the YAML files under canesm/CONFIG/imsi-config. These include software modules, directory paths for inputs and outputs (e.g. forcing files, RTD files), compilers, and sequencing flows. The following files need to be added or modified in the source repository in order to set up, build, and run a simulation.
machines/imsi-sites-config.yaml: Add the name of the new machine under an existing site or create a new site for the new machine.machines/imsi-machine-config-<MACHINE>.yaml: General machine-specific info not covered in another file.Directory paths
Software modules (on DRAC systems)
Machine-specific resource info (e.g. CPUs per node, node names, etc.)
Supported compiler keys from
imsi-compiler-config-XXX.yamlfilesEnvironment variables
Supported sequencers
Scheduler name (e.g. slurm)
compilers/imsi-compiler-config-gnu.yamland/orimsi-compiler-config-intel.yml: Add a new key with machine-specific compiler settings:Compiler names (e.g.
mpif90)FFLAGS and LDFLAGS passed to the compiler (e.g.
-qmkl).
sequencing/imsi-sequencing-flow-<MACHINE>.yaml: Machine specific flow definitions for desired model configurations (e.g. coupled model, AGCM).Model run commands (e.g.
mpirunwith specific flags likebind-to,map-byandhostfile)Scheduler directive (e.g.
ntasks-per-node, etc.)
sequencing/imsi_sequencer_config_iss.yamland/orsequencing/imsi_sequencer_config_maestro.yaml: Add the new machine name undersupported_machines.maestrois only used internally within ECCC. DRAC machines use theisssequencer.There are plans to converge to a single open-source sequencer for all platforms.
Changing the PE layout
Note
CanAM (with the spectral_canesm dynamical core) is highly limited in the number of MPI tasks it can use. It is not reccomended to alter from the default of 32 MPI tasks. Additionally, there is not much (if anything) to be gained by using more than 2 OMP threads for CanAM.
Based on the contraints of the machine you’re porting to, you may need to alter the number of parallel processes involved in running the model. For CanAM, this means altering the number of OMP threads, and for CanNEMO, this means altering the number of MPI tasks. These settings are specified in the component-specific files in imsi-config/models. By default, the relevant settings are as follows:
imsi-model-config-canam51_p1.yaml
models:
components:
CanAM:
resources:
mpiprocs: 32
ompthreads: 2
imsi-model-config-cannemo51_p1-esm.yaml
models:
cannemo51_p1-esm:
components:
CanNEMO:
resources:
mpiprocs: 160
ompthreads: 1
namelists:
nammpp:
jpni: '16'
jpnj: '10'
jpnij: '160'
CanCPL is fully serial and runs on one core. The total number of cores required by the default PE layout is therefore 225 (32x2 + 160 + 1).
This PE layout has been found to work well on internall ECCC machines and on DRAC Niagara (no longer online), but it may not be optimal for all machines. In particular, machines that require jobs to use whole nodes with a large number of cores per node. One such machine is DRAC Trillium, which has 192 cores per node. Its resources are used most efficiently by running CanESM5 on 1 node, with fewer tasks (32 + 156 + 1 = 189). Retaining the default PE layout would leave many cores idle during each job. The following changes were applied to the model config files.
imsi-model-config-canam51_p1.yaml
models:
components:
CanAM:
resources:
mpiprocs: 32
ompthreads: 1
imsi-model-config-cannemo51_p1-esm.yaml
models:
cannemo51_p1-esm:
components:
CanNEMO:
resources:
mpiprocs: 156
ompthreads: 1
namelists:
nammpp:
jpni: '13'
jpnj: '12'
jpnij: '156'
On other machines, there may be a different optimal PE layout. Determining this requires some experimentation.
Changing the run command
In most cases, CanESM is run on an HPC cluster via jobs managed by a scheduler like SLURM or PBS. The model code is executed using a call of mpirun, which works together with the scheduler to spread the tasks and threads across the allocated resources. However, when running on multiple nodes, the scheduler does not always determine the optimal distrubution of tasks. The arguments --host or --hostfile accepted by mpirun help it determine how to spread the tasks across nodes, for a multi-node job. See the mpirun FAQ for more information.
The run command is specified in imsi-sequencing-flow-<MACHINE>.yml. Different run commands can be specified for the coupled model (i.e. canesm_two_job_flow-<MACHINE>) and for AGCM simulations (canam_two_job_flow-<MACHINE>), since they may use a different number of nodes. For example, this is the run command for the coupled CanESM5 on DRAC Niagara. It creates a temporary hostfile that tells mpirun which nodes (hosts) to use for CanAM and CanCPL, and assigns the remaining nodes to CanNEMO.
imsi-sequencing-flow-niagara.yaml:
sequencing:
sequencing_flow:
canesm_two_job_flow-niagara:
jobs:
model:
resources:
model_run_commands: |
export OMP_NUM_THREADS=2
hostfile_canam=$(mktemp)
host_arr=( $(scontrol show hostnames $SLURM_NODELIST) )
CPU_PER_NODE=40
N=\${#host_arr[@]}
N1=$(($N - (\${CanNEMO_MPIPROCS} - 1) / \${CPU_PER_NODE} - 1))
slots=$(((\${CanAM_MPIPROCS} + \${CanCPL_MPIPROCS} - 1) / \${N1} + 1))
host1=(\${host_arr[@]:0:\${N1}})
host2_str=\${host_arr[@]:\${N1}:\${N}}
for i in \${host1[@]}
do
echo "\${i} slots=\${slots} max_slots=\${CPU_PER_NODE}" >> \${hostfile_canam}
done
cat \${hostfile_canam}
mpirun --hostfile \${hostfile_canam} --map-by slot -x OMP_NUM_THREADS=\${CanAM_OMPTHREADS} -n \${CanAM_MPIPROCS} ./\${CanAM_EXEC} : \
--host \${host1[-1]} -x OMP_NUM_THREADS=\${CanCPL_OMPTHREADS} -n \${CanCPL_MPIPROCS} ./\${CanCPL_EXEC} : \
--host \${host2_str// /,} -x OMP_NUM_THREADS=\${CanNEMO_OMPTHREADS} -n \${CanNEMO_MPIPROCS} ./\${CanNEMO_EXEC}
rm \${hostfile_canam}
Other machines will likely require different run commands for best results. This is another area that requires some informed experimentation. Ideally, the run command should be designed to work independent of the number of nodes requested for the job.
Postprocessing Sequencing Flow ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`
On DRAC Niagara and the current SciNet machine DRAC Trillium, users may only request full nodes, so the postprocessing job wastes CPU resources. On the other DRAC machines (Fir, Nibi, and Rorqual), users may request partial nodes. Changing the resource request for postprocessing is easy - simply alter the SLURM directives under imsi-sequencing-flow-<MACHINE>.yaml for the postprocessing job to request --ntasks-per-node=1 (because there is no MPI for the postprocessing) and --cpus-per-task=12 (because the postprocssing scripts work on data for each month in parallel).
Node-local scratch
The DRAC machines Rorqual, Fir, and Nibi have SSDs mounted directly on the compute nodes. Doing I/O from this localscratch space is allegedly faster than from the main “network” scratch file system. To do I/O from this node-mounted temporary storage, set the following key/value pair in imsi-machine-config-<MACHINE>.yaml:
machines:
<MACHINE NAME>:
scratch_dir: \${SLURM_TMPDIR}
To monitor the job’s file I/O, you can use squeue --me to find the name of the node running your job, and ssh to that node. The working directory will be under the directory /localscratch/.
config-canesm system
the config-canesm system will need:
a new directory under
CONFIG/PLATFORM(see theeccc-u2directory for what files are needed)a new “site profile” under
CCCma_tools/genericconsideration in the
raw_env_setup_file, underCCCma_tools/generic, to detect/support the new hostnamea new host specific env file under
CCCma_tools/generic, which gets sourced by the processedenv_setup_filea new
fcmcompiler file forCanNEMO(I believe under theARCHdirectory, but Nicolas/Neil can confirm)