Running CanESM on Compute Canada / Cedar (Singularity)
Important note
These are legacy instructions, no longer maintained, and not guaranteed to work. Use these at your own risk of frustration
Overview and resources
This document provides instructions for running the CanESM5 model on the
Compute Canada machine Cedar. It is an opinionated guide that suggests a
recommended approach. These instructions are likely relevant to other Compute
Canada systems, but it has not been tested.
To enable external use of the model, CanESM has been ported to compile with
the GNU compiler, and suitable Docker and Singularity containers created.
Tagged versions of the model should have been tested to function on Cedar.
There is no guarantee for non-tagged versions. At the time of writing, the
latest tagged version verfied on Cedar is v5.0.10.
Working with CanESM on Cedar requires working with singularity and
interacting with the SLURM workload manager.
For information on these software, interested readers are directored to the provided links, as
as the Cedar specific and more general
Compute Canada wiki.
We are going to use the space ~/project/$USER to store some persistent data,
and we’ll put some important files in $HOME. You may changes this if you so
desire (e.g. to ~/scratch, but note that this space is temporary and
periodically deleted).
For setting up a model run, we are going to use the ~/scratch space, which is
temporary space that can handle lots of data (note, data here will be deleted
after 60 days or so). For CanESM, we often use the concept of a RUNID as a
unique identifier of an individual run. We normally create a dedicated
directory to work in for each RUNID.
Note
Only a small number of model configurations are made available. There is no support whatsoever available for using CanESM. We do not certify that the code runs correctly. This is strictly an alpha testing project with no guarantees whatsoever.
Initial Setup (One Time)
Get the source code
Recursively clone the CanESM source code and checkout the desired version
for the model, and its helper tools. We’ll put the source code in our home
directory as it is small, we want to keep it permanently, and home has
relatively fast access:
cd ~/
# clone the code and the desired submodules
git clone --recursive https://gitlab.com/cccma/canesm
Note
you might want to clone from git@gitlab.com:cccma/canesm.git instead.
This will avoid having to enter your password on every push. It requires
setting up SSH keys and entering your public SSH key into gitlab
(see here).
Once this repository is cloned, it allows you a place to inspect/make changes to the
source code, as well as gain access to various helper tools that will make it
easier to setup and run self-contained experiments. It should be noted that the
default branch, develop_canesm is continually updated. If you would like to
be sure you are using a stable version of the codes, it is recommended that you
use one of the available tags, which by convention are created across all sub-repos in
addition to the super repo. To see the available tags, see here or run git tag within the repo.
Note
Only tags including and after v5.0.10 support the functionality discussed
in this document.
Note
To easily checkout a consistent version across all submodules and the super-repo,
you can use CCCma_tools/tools/git-scheckout BRANCH_OR_TAG_COMMIT:
CCCma_tools/tools/git-scheckout v5.0.10
Or if already on your $PATH, you can simply run
git scheckout v5.0.10
Add Tools to your $PATH
To help with the setup/development and running of CanESM, it is advised to
add a few things to your environment.
CCCma “s-scripts”
As discussed here, these scripts help manage all the sub-repos across the super-repo. To get these, add the following to your
.bashrc(or execute every time you log in):export PATH=${PATH}:/path/to/cloned/canesm/CCCma_tools/toolssetup-containerized-canesmThis tool was specifically designed to setup a self-contained run, with independent source code and a configured “run directory”. This tool exists at
/path/to/cloned/canesm/CCCma_tools/container/tools, which could be added to your$PATH. However, that will add all executable scripts in said directory, which could have unintented consequences. As such, the recommended solution is to copy or link the file to your~/bindirectory (making sure that that directory is on yourPATH). For example:ln -s /path/to/cloned/canesm/CCCma_tools/container/tools/setup-containerized-canesm ~/bin/
See here for more information on this tool.
Create a python environment to run infrastructure scripts
For some of the supporting scripts around CanESM, specific python packages
are required. To easily install the necessary packages, it is recommended that users
utilize a python virtual environment. There are multiple ways to create/use
virtual environments, but when working on cedar, users should follow
the Compute Canada recommended method
to build a python 3.8 environment.
Once the environment is created and activated, simply install the packages via pip using
path/to/canesm/CCCma_tools/container/tools/python-requirements.txt
pip install -r path/to/canesm/CCCma_tools/container/python-requirements.txt
Build the Singularity image (once)
The model is compiled and run inside a Docker or Singularity container, which provides all the required dependencies. Using the container alleviates the need to port the model dependencies to each individual HPC system. It is in principle possible to compile the model on the native HPC. However, then all dependencies must be met, including the ESMF library. We do not provide instructions for this, and assume use of the container.
Singularity is available on Compute Canada machines, is designed for HPC, and does not significantly degrade performance relative to bare metal, and thus this guide recommends the use of Singularity.
To run CanESM in a Singularity container, you first need to build the image
from the CanESM docker container. To do this, request an interactive session
on a compute-node and then utilize the singularity build command to convert
the docker singularity to a singularity image - i.e.
# start an interactive session and go to a tmpdir
salloc --time=1:0:0 --constraint=cascade --nodes=1 --ntasks-per-node=1 --mem-per-cpu=5000
cd $SLURM_TMPDIR
# Build the singualrity image from the public Docker hub image
singularity build canesm-docker_latest.sif docker://cccma/canesm-docker
# Create a director to store the image for future use
mkdir -p ~/project/$USER/singularity_images
mv canesm-docker_latest.sif ~/project/$USER/singularity_images
exit
This only needs to be done once - after
you (or another existing user) build and store the image, you can simply point to it
when running CanESM.
Download the Basic required input data
The basic input data needed to run CanESM is available on FTP for some
selected experiments. You need to download the data for the experiment that
you want to run. We typically use standard, CMIP define experiment names. The
example below is for the piControl experiment. This data also contains a
default restart file.
cd ~/project/$USER
mkdir -p canesm_input_data/piControl
cd canesm_input_data/piControl
wget ftp://ftp.cccma.ec.gc.ca/pub/CCCMA/nswart/canesm5_piControl_config_04-06-2021/*
The pdSST-pdSIC experiment from PAMIP is also available at
ftp://ftp.cccma.ec.gc.ca/pub/CCCMA/nswart/canesm5_pamip_config/. Forcing
for other CMIP6 experiments can be made available.
Executing a Run
Once you a stored version of canesm, the input data, and the required singularity
image, you are ready to setup your first run. Which can be done by:
1. Setup your run directory
Provided you’ve added setup-containerized-canesm to a suitable location, it can be used to
easily setup a suitable, self-contained run directory for a desired RUNID. For details on using
this script, see the interface via -h:
$ setup-containerized-canesm -h
Setup containerized canesm run
Creates a run directory for the given runid, using the defined repository/version, in the present working directory
Usage: setup-containerized-canesm [-h] runid=RUNID repo=SOURCE_REPOSITORY version=VERSION
RUNID : the alphanumeric string identifier for the given run.
SOURCE_REPOSITORY : the repository address/path that be cloned and used for the run.
VERSION : the commit hash or branch-name to checkout for this run.
For example, to setup a run from v5.0.10 from the main gitlab repository:
setup-containerized-canesm runid=canesm-test-run01 repo=git@gitlab.com:cccma/canesm.git version=v5.0.10
or if you’ve setup your own forks of CanESM (and the submodules), and want to use your development
version
setup-containerized-canesm runid=canesm-dev-run01 repo=git@gitlab.com:user123/canesm.git version=my-dev-version
which would create a directory name canesm-test-run01 in the PWD, with some useful directories and high level
configuration files, such as canesm.cfg.
For both of these examples, setup-containerized-canesm will create “run directory” in PWD and clone down
a new version of CanESM from the given repo (and checking out the desired version) and will setup
some useful directories and bring in some useful files, like canesm.cfg, which contains high level
configuration settings for the model.
2. Configure the run
Once you run directory is setup, navigate into it and modify canesm.cfg as
necessary for your run. As of writing this only two experiments are available,
piControl or pdSST-pdSIC, where the former is an ESM simulation using the
ocean, atmosphere, and the coupler, while the latter is an AMIP simulation and
only uses the atmosphere and coupler. To set which run you want, simply set the
EXPERIMENT variable within canesm.cfg.
Including EXPERIMENT, the most imporant variables to set are:
CONTAINER_IMAGECC_ACCOUNTEXPERIMENTINPUT_DIRSTART_YEARSTOP_YEAR
Note
The currently acceptable values for EXPERIMENT are piControl and
pdSST-pdSIC, and it should be noted that piControl is tested more
frequently than pdSST-pdSIC.
Once you have decided on these settings, assuming you’ve built and activated
the python environment (discussed above), pull in the necessary configuration files
by running:
bin/config-containerized-canesm
Note
If you don’t have the correct python environment activated, you will see
ModuleNotFoundError: No module named 'f90nml'
This will populate ${WRK_DIR}/config with many different configuration files that
are used to compile/run the model.
Note that informed/curious users can go into the resulting files in config/ and
modify them further for their needs - some of the more important settings will
be discussed below. With that said, to simply get one of the two given experiments running,
users can leave them as they are.
3. Compile the source code
To compile CanESM and supporting diagnostic/utility programs, the
recommended method is to utilize
/path/to/canesm/CONFIG/COMMON/compile-canesm.sh. However, as discussed
above, on cedar, the compilation
must happen within the aforementioned singularity image. Concise, cedar specific,
compilation information is discussed below, but for detailed information
on the compilation system, see the documentation here.
Note
Users may attempt to compile the model on “bare metal” without the container, but this requires compiling all the dependencies including ESMF
Compiling in batch mode
The easiest way to compile the code is to use the batch compilation script, batch_compile_cedar.
Which will use compile-canesm.sh to automatically pull a copy of the source code into
$SLURM_TMPDIR, build all the necessary executables and send them back to EXEC_STORAGE_DIR
(see canesm.cfg). This can be easily achieved via:
sbatch --account=XXX canesm/CCCma_tools/container/tools/batch_compile_cedar
where XXX should be replaced by your allocation account. You can monitor the progress
of this job using squeue -u $(whoami). Upon completion you should see the desired
executables within EXEC_STORAGE_DIR, and can inspect the hidden .compile-canesm*.log
files in PWD.
Note
$SLURM_TMPDIR is used to take advantage of faster I/O on the local scratch space.
Interactively compile the source code manually
It is also possible for users to compile the model interactively. Since we need to singularity container to meet the dependencies, it is generally not possible to compile Interactively on the cedar headnodes.
As such, to compile interatively, you will want to:
Launch an interactive session, move to a tmp directory, and launch an interactive
singularitycontainer from the image built earlier:salloc --time=1:0:0 --constraint=cascade --nodes=1 --ntasks-per-node=8 --mem-per-cpu=5000 --account=XXX cd $SLURM_TMPDIR singularity shell --cleanenv -B /home -B /project -B /scratch -B /localscratch --env APPEND_PATH=/path/to/canesm/CCCma_tools/scripts/comm/ ~/project/$USER/singularity_images/canesm-docker_latest.sif
Copy in the source code and compile the model using the provided utility script:
# copy in the high level configuration file cp /path/to/run/dir/canesm.cfg . # source the config file to get some needed settings source canesm.cfg # call the compilation script (from within SLURM_TMPDIR) ${CANESM_SRC_ROOT}/CONFIG/COMMON/compile-canesm.sh -lwhere the
-lflag forcescompile-canesm.shto copy the source code localy, and will automatically copy the final executables toEXEC_STORAGE_DIR.Confirm the necessary executables were created within
EXEC_STORAGE_DIR, and then exit the container and the interactive session. See.compile-canesm*for compilation errors.
4. Run the Model
Once the excutables are compiled and stored in EXEC_STORAGE_DIR, you are
then ready to run the model, which must be done in batch mode within the
singularity container. It should be noted that the run time configuration of
the model must match the compilation options, input data, and so on. Some basic
runtime configuration is set in the canesm.cfg file (including relevant
paths for binaries, input data, outputs, etc). This section assumes you have
edited the canesm.cfg file appropriately, and ran bin/config-canesm.
Using the job script
The easiest way to launch to job is to simply submit the provided batch_run_cedar from within
your run-directory (must be in ~/scratch):
cd ~/scratch/canesm-test-run
# Assumes that canesm.cfg is here; the paths are configured appropriately; and the executables must
# be pre-compiled and available in EXEC_STORAGE_DIR
# If needed, copy this file and modify it for your case. Submit to the queue with `sbatch`.
sbatch --account=XXX canesm/CCCma_tools/container/tools/batch_run_cedar
where XXX is your account or resource allocation.
Outputs will appear in the location defined in canesm.cfg by the
variable OUTPUT_DIR. These are unpacked raw model history files on tiles.
Launching an interactive run
When running with the batch_run_cedar script, getting the inputs and environment setup
are taken care of. However, it is still possible by, starting an interactive batch session
(with the appropriate resources), copy in the inputs/executables, start the container, and then
run the model.
Warning
Running the model interactively is tested less frequently than batch mode. Guidance
is provided for interested readers, but problems may occur due to on-going changes with
CanESM, as such the recommended method is to run the model in batch mode.
Prior to anything, it is worth noting that in order for CanESM to run, the simulation length
must be set consistently for all components. When ran as part of the job script, this is automatically
handled for the user according to the CURRENT_YEAR setting in canesm.cfg. However,
when running interactively, this must be handled by the user. Details on the parameters that control this
are discussed below, but its is also possible for users to
mimic the behaviour in the job script by executing the following
source canesm.cfg # get configuration settings
source $CANESM_SRC_ROOT/CCCma_tools/tools/CanESM_shell_functions.sh # get helper functions
update_agcm_counters start_date=${CURRENT_YEAR}-01-01 stop_date=${CURRENT_YEAR}-12-31 agcm_timestep=900 \
namelist_file=config/namelists/modl.dat || :
update_nemo_counters start_date=${CURRENT_YEAR}-01-01 stop_date=${CURRENT_YEAR}-12-31 nemo_timestep=3600 \
namelist_file=config/namelists/namelist || :
update_coupler_counters start_date=${CURRENT_YEAR}-01-01 stop_date=${CURRENT_YEAR}-12-31 runid=$RUNID \
namelist_file=config/namelists/nl_coupler_par || :
where || : behind the update_* calls will stop your shell session from exiting if an error occurs.
Once these commands are executed, the namelists will have the correct settings to run the current year. After this, an interactive run can be achieved via
# Launch interactive session
salloc --time=1:0:0 --constraint=cascade --nodes=1 --ntasks-per-node=48 --mem-per-cpu=1000 --account=XXX
# Move to SLURM_TMPDIR to run
cd $SLURM_TMPDIR
# get high level config file and source the settings
cp /path/to/run/dir/canesm.cfg .
source canesm.cfg
# Copy in the input files and executables
cp $INPUT_DIR/* .
cp $WRK_DIR/config/* . # note that this will avoid capturing sub-directories
cp $WRK_DIR/config/namelists/* .
cp $EXEC_STORAGE_DIR/*.exe .
chmod +x *.exe
# Load runtime environment
source runtime_environment
# Load the correct modules
module --force purge
module unload intel openmpi
module load StdEnv/2016.4 nixpkgs/16.09 gcc/7.3.0 mpich/3.2.1 netcdf-fortran/4.4.4
module load singularity
# Create a conf. file that allows us to run the model in multiple program multiple data mode.
# SLURM does not support this very well. Note the specification of CPUs for each exe. The total
# number cannot exceed the resource request.
echo -e "0-15 singularity exec -B /localscratch ${CONTAINER_IMAGE} ./canam.exe \n 16 singularity exec -B /localscratch ${CONTAINER_IMAGE} ./cancpl.exe \n 17-41 singularity exec -B /localscratch ${CONTAINER_IMAGE} ./nemo.exe" > run.conf
# Launch the run
export OMP_NUM_THREADS=1
time srun -n 42 -l --multi-prog run.conf
Notable Configurable Settings
CanESM is a complex system with a multitude of configurable options, ranging from
physical/numerical parameters and input data fields, to MPI topology and infrastructure/sequencing
settings. Due to this complexity, this beta guide can not begin to attempt to cover
all the options comprehensively; nevertheless, some of the more settings are discussed here.
Compile Time Setting: MPI size of CanAM
The number of MPI tasks to be used for CanAM must currently be specified at
compile time. This value is set in config/cppdef_sizes.h using
the variable _PAR_NNODE_A. As discussed here
config-canesm generates config/cppdef_sizes.h (which is called as part
of config-containerized-canesm) from the settings contained in config/namelists/modl.dat.
If a user wishes to changes this, after running config-containerized-canesm, they can
simply modify config/cppdef_sizes.h before executing the compilation.
Note
this specifies the number of MPI tasks - but CanAM also uses openmp, so
this is not the total number of cores. The total number of cores used is
_PAR_NNODE_A x OMP_NUM_THREADS, where OMP_NUM_THREADS is specified in the
runtime script (see below). Also note, the number of MPI tasks specified must
match with the resources supplied at runtime.
Compile Time Setting: Model/Experiment specific cpp macros for CanAM/CanCPL
For Atmosphere and Coupler component of CanESM (and the coupler interface
for the Ocean), many source code files rely of a cppdef_config.h file to
set many model/experiment specific cpp macros. As discussed here, this file is generated by config-canesm
according to the values of cppdef_file and cppdef_diag. As of now, these
get set within the EXPERIMENT specific *.cfg file at
path/to/canesm/CCCma_tools/container/tools/config. If you would like to use
another set of cpp definitions, the available files can be found in
/path/to/canesm/CONFIG/[AMIP|ESM]/cppdefs. Alternatively, you can point to your
own cpp files, or simply alter the generated config/cppdef_config.h
and submit another compilation job.
Note
If a user wishes to modify any of the values in these cppdef_* files, the easiest way
is to modify the the values in the generated config/cppdef_* files before executing
the compilation
Note
within ECCC machines, cppdefs are typically specified by the runmode
construct, and the sizes are computed and set by the compilation script. It
might be helpful looking in canesm/CONFIG/ESM/canesm.cfg to determine
the correct cppdef file to use for an experiment
Compile Time Setting: Compilation Flags
For information on altering compilation flags, see the detailed information
here noting that the compilation environment file,
and make/mkmf templates get extracted into the local config directory.
Run Time Setting: Simulation length
Note, if you use the config-containerized-canesm and batch_run_cedar
scripts, the timers will be configured automatically from START_YEAR
and STOP_YEAR in canesm.cfg. The information below is
provded to help users customize the simulation step-size, if they so desire.
The specified length of the simulation must match between CanNEMO and
CanAM.
The number of CanNEMO steps is specified in the NEMO
namelistfile (typically extracted viaconfig-containerized-canesmintoconfig/namelists). For CanESM5, the nemo timestep is 1 hour, and the number of steps is specified in hours (e.g. 8760 steps for 1 year).In the CanAM, the timers are configured by
kstart,kfinal, anddeltin themodl.datfile that also gets extracted intoconfig/namelistsvia the configuration tool.kstartandkfinalrepresent the start and final step counters since year 1, anddeltrepresents the time-step in seconds. For example, with a time-step of900seconds, if you want the model to start from January 1st, 5550 and go to the end of the year, thenkstart = (3600/900) * 24 * 365 * 5549 ! where 5549 is used because we are start from the END of 5549
and
kfinal = (3600/900) * 24 * 365 * 5550
Note that shorter periods can also be run, but again, the number of steps must be consistent between NEMO and CanAM.
Note
In CanESM5, CanAM takes 4 steps (15 m each) for each single NEMO timestep (1hr each). Coupling occurs every three hours, which is 3 NEMO timesteps and 12 CanAM steps. Below we rely on the default values having been set correctly. If the timers in model.dat and namelist are not set correctly, the run with not work.
Post-processing model data
Creating usable output from the raw model history files requires several steps (repacking, joining tiles, computing diagnostics, converting to timeseries, and finally conversion to CMOR compliant NetCDF). Within ECCC, this is achieved by a diagnostic string, which comprises many jobs, and relies on several pieces of software, including the CanAM diagnostics package (CanDIAG). At this time, this processing pipeline has not been ported to Cedar, but work is ongoing.
However, an interim series of diagnostics has been put it place to provide a basic level of usable netcdf output from the model. In summary:
canesm/CCCma_tools/container/tools/batch_diag_cedaris a batch script that will run these diagnostics. This is automatically launched bybatch_run_cedarat the end of each year of simulation, but it can also be submitted by itself, if the output is available.canesm/CCCma_tools/container/tools/basic_diagis the underlying diagnostics script that will:rebuild the NEMO NetCDF files from individual tiles to global files (via
rebuild_nemo.exe).rebuild the CanAM CCCma binary formatted output from individual tiles to global files (via
candiag_mwe.exe).Compute monthly mean 2D variables output in NetCDF (via
candiag_mwe.exe).Use modified CCCma diagnostic recipes to compute monthly mean 3D temperature and wind fields (via
temp_recipeandwinds_recipe).Convert the 3D CCCma binary files to NetCDF format, using the
ccc2nc_linuxutility (note, this is currently not compiled, and a container-compatiable compiled version can temporarily be obtained from:ftp://ftp.cccma.ec.gc.ca/pub/CCCMA/nswart/ccc2nc_linux, and placed in the run bin directory)