Running CanESM on Compute Canada / Cedar (Singularity)

Important note

These are legacy instructions, no longer maintained, and not guaranteed to work. Use these at your own risk of frustration

Overview and resources

This document provides instructions for running the CanESM5 model on the Compute Canada machine Cedar. It is an opinionated guide that suggests a recommended approach. These instructions are likely relevant to other Compute Canada systems, but it has not been tested.

To enable external use of the model, CanESM has been ported to compile with the GNU compiler, and suitable Docker and Singularity containers created. Tagged versions of the model should have been tested to function on Cedar. There is no guarantee for non-tagged versions. At the time of writing, the latest tagged version verfied on Cedar is v5.0.10.

Working with CanESM on Cedar requires working with singularity and interacting with the SLURM workload manager. For information on these software, interested readers are directored to the provided links, as as the Cedar specific and more general Compute Canada wiki.

We are going to use the space ~/project/$USER to store some persistent data, and we’ll put some important files in $HOME. You may changes this if you so desire (e.g. to ~/scratch, but note that this space is temporary and periodically deleted).

For setting up a model run, we are going to use the ~/scratch space, which is temporary space that can handle lots of data (note, data here will be deleted after 60 days or so). For CanESM, we often use the concept of a RUNID as a unique identifier of an individual run. We normally create a dedicated directory to work in for each RUNID.

Note

Only a small number of model configurations are made available. There is no support whatsoever available for using CanESM. We do not certify that the code runs correctly. This is strictly an alpha testing project with no guarantees whatsoever.

Initial Setup (One Time)

Get the source code

Recursively clone the CanESM source code and checkout the desired version for the model, and its helper tools. We’ll put the source code in our home directory as it is small, we want to keep it permanently, and home has relatively fast access:

cd ~/
# clone the code and the desired submodules
git clone --recursive https://gitlab.com/cccma/canesm

Note

you might want to clone from git@gitlab.com:cccma/canesm.git instead. This will avoid having to enter your password on every push. It requires setting up SSH keys and entering your public SSH key into gitlab (see here).

Once this repository is cloned, it allows you a place to inspect/make changes to the source code, as well as gain access to various helper tools that will make it easier to setup and run self-contained experiments. It should be noted that the default branch, develop_canesm is continually updated. If you would like to be sure you are using a stable version of the codes, it is recommended that you use one of the available tags, which by convention are created across all sub-repos in addition to the super repo. To see the available tags, see here or run git tag within the repo.

Note

Only tags including and after v5.0.10 support the functionality discussed in this document.

Note

To easily checkout a consistent version across all submodules and the super-repo, you can use CCCma_tools/tools/git-scheckout BRANCH_OR_TAG_COMMIT:

CCCma_tools/tools/git-scheckout v5.0.10

Or if already on your $PATH, you can simply run

git scheckout v5.0.10

Add Tools to your $PATH

To help with the setup/development and running of CanESM, it is advised to add a few things to your environment.

CCCma “s-scripts”

As discussed here, these scripts help manage all the sub-repos across the super-repo. To get these, add the following to your .bashrc (or execute every time you log in):
```
export PATH=${PATH}:/path/to/cloned/canesm/CCCma_tools/tools
```
setup-containerized-canesm

This tool was specifically designed to setup a self-contained run, with independent source code and a configured “run directory”. This tool exists at /path/to/cloned/canesm/CCCma_tools/container/tools, which could be added to your $PATH. However, that will add all executable scripts in said directory, which could have unintented consequences. As such, the recommended solution is to copy or link the file to your ~/bin directory (making sure that that directory is on your PATH). For example:
```
ln -s /path/to/cloned/canesm/CCCma_tools/container/tools/setup-containerized-canesm ~/bin/
```
See here for more information on this tool.

Create a python environment to run infrastructure scripts

For some of the supporting scripts around CanESM, specific python packages are required. To easily install the necessary packages, it is recommended that users utilize a python virtual environment. There are multiple ways to create/use virtual environments, but when working on cedar, users should follow the Compute Canada recommended method to build a python 3.8 environment.

Once the environment is created and activated, simply install the packages via pip using path/to/canesm/CCCma_tools/container/tools/python-requirements.txt

pip install -r path/to/canesm/CCCma_tools/container/python-requirements.txt

Build the Singularity image (once)

The model is compiled and run inside a Docker or Singularity container, which provides all the required dependencies. Using the container alleviates the need to port the model dependencies to each individual HPC system. It is in principle possible to compile the model on the native HPC. However, then all dependencies must be met, including the ESMF library. We do not provide instructions for this, and assume use of the container.

Singularity is available on Compute Canada machines, is designed for HPC, and does not significantly degrade performance relative to bare metal, and thus this guide recommends the use of Singularity.

To run CanESM in a Singularity container, you first need to build the image from the CanESM docker container. To do this, request an interactive session on a compute-node and then utilize the singularity build command to convert the docker singularity to a singularity image - i.e.

# start an interactive session and go to a tmpdir
salloc --time=1:0:0 --constraint=cascade --nodes=1 --ntasks-per-node=1 --mem-per-cpu=5000
cd $SLURM_TMPDIR

# Build the singualrity image from the public Docker hub image
singularity build canesm-docker_latest.sif docker://cccma/canesm-docker

# Create a director to store the image for future use
mkdir -p ~/project/$USER/singularity_images
mv canesm-docker_latest.sif ~/project/$USER/singularity_images
exit

This only needs to be done once - after you (or another existing user) build and store the image, you can simply point to it when running CanESM.

Download the Basic required input data

The basic input data needed to run CanESM is available on FTP for some selected experiments. You need to download the data for the experiment that you want to run. We typically use standard, CMIP define experiment names. The example below is for the piControl experiment. This data also contains a default restart file.

cd ~/project/$USER
mkdir -p canesm_input_data/piControl
cd canesm_input_data/piControl
wget ftp://ftp.cccma.ec.gc.ca/pub/CCCMA/nswart/canesm5_piControl_config_04-06-2021/*

The pdSST-pdSIC experiment from PAMIP is also available at ftp://ftp.cccma.ec.gc.ca/pub/CCCMA/nswart/canesm5_pamip_config/. Forcing for other CMIP6 experiments can be made available.

Executing a Run

Once you a stored version of canesm, the input data, and the required singularity image, you are ready to setup your first run. Which can be done by:

1. Setup your run directory

Provided you’ve added setup-containerized-canesm to a suitable location, it can be used to easily setup a suitable, self-contained run directory for a desired RUNID. For details on using this script, see the interface via -h:

$ setup-containerized-canesm -h
Setup containerized canesm run

Creates a run directory for the given runid, using the defined repository/version, in the present working directory

Usage: setup-containerized-canesm [-h] runid=RUNID repo=SOURCE_REPOSITORY version=VERSION

RUNID                     : the alphanumeric string identifier for the given run.
SOURCE_REPOSITORY         : the repository address/path that be cloned and used for the run.
VERSION                   : the commit hash or branch-name to checkout for this run.

For example, to setup a run from v5.0.10 from the main gitlab repository:

setup-containerized-canesm runid=canesm-test-run01 repo=git@gitlab.com:cccma/canesm.git version=v5.0.10

or if you’ve setup your own forks of CanESM (and the submodules), and want to use your development version

setup-containerized-canesm runid=canesm-dev-run01 repo=git@gitlab.com:user123/canesm.git version=my-dev-version

which would create a directory name canesm-test-run01 in the PWD, with some useful directories and high level configuration files, such as canesm.cfg.

For both of these examples, setup-containerized-canesm will create “run directory” in PWD and clone down a new version of CanESM from the given repo (and checking out the desired version) and will setup some useful directories and bring in some useful files, like canesm.cfg, which contains high level configuration settings for the model.

2. Configure the run

Once you run directory is setup, navigate into it and modify canesm.cfg as necessary for your run. As of writing this only two experiments are available, piControl or pdSST-pdSIC, where the former is an ESM simulation using the ocean, atmosphere, and the coupler, while the latter is an AMIP simulation and only uses the atmosphere and coupler. To set which run you want, simply set the EXPERIMENT variable within canesm.cfg.

Including EXPERIMENT, the most imporant variables to set are:

CONTAINER_IMAGE
CC_ACCOUNT
EXPERIMENT
INPUT_DIR
START_YEAR
STOP_YEAR

Note

The currently acceptable values for EXPERIMENT are piControl and pdSST-pdSIC, and it should be noted that piControl is tested more frequently than pdSST-pdSIC.

Once you have decided on these settings, assuming you’ve built and activated the python environment (discussed above), pull in the necessary configuration files by running:

bin/config-containerized-canesm

Note

If you don’t have the correct python environment activated, you will see

ModuleNotFoundError: No module named 'f90nml'

This will populate ${WRK_DIR}/config with many different configuration files that are used to compile/run the model.

Note that informed/curious users can go into the resulting files in config/ and modify them further for their needs - some of the more important settings will be discussed below. With that said, to simply get one of the two given experiments running, users can leave them as they are.

3. Compile the source code

To compile CanESM and supporting diagnostic/utility programs, the recommended method is to utilize /path/to/canesm/CONFIG/COMMON/compile-canesm.sh. However, as discussed above, on cedar, the compilation must happen within the aforementioned singularity image. Concise, cedar specific, compilation information is discussed below, but for detailed information on the compilation system, see the documentation here.

Note

Users may attempt to compile the model on “bare metal” without the container, but this requires compiling all the dependencies including ESMF

Compiling in batch mode

The easiest way to compile the code is to use the batch compilation script, batch_compile_cedar. Which will use compile-canesm.sh to automatically pull a copy of the source code into $SLURM_TMPDIR, build all the necessary executables and send them back to EXEC_STORAGE_DIR (see canesm.cfg). This can be easily achieved via:

sbatch --account=XXX canesm/CCCma_tools/container/tools/batch_compile_cedar

where XXX should be replaced by your allocation account. You can monitor the progress of this job using squeue -u $(whoami). Upon completion you should see the desired executables within EXEC_STORAGE_DIR, and can inspect the hidden .compile-canesm*.log files in PWD.

Note

$SLURM_TMPDIR is used to take advantage of faster I/O on the local scratch space.

Interactively compile the source code manually

It is also possible for users to compile the model interactively. Since we need to singularity container to meet the dependencies, it is generally not possible to compile Interactively on the cedar headnodes.

As such, to compile interatively, you will want to:

Launch an interactive session, move to a tmp directory, and launch an interactive singularity container from the image built earlier:

salloc --time=1:0:0 --constraint=cascade --nodes=1 --ntasks-per-node=8 --mem-per-cpu=5000 --account=XXX
cd $SLURM_TMPDIR
singularity shell --cleanenv -B /home -B /project -B /scratch -B /localscratch --env APPEND_PATH=/path/to/canesm/CCCma_tools/scripts/comm/ ~/project/$USER/singularity_images/canesm-docker_latest.sif

Copy in the source code and compile the model using the provided utility script:

# copy in the high level configuration file
cp /path/to/run/dir/canesm.cfg .

# source the config file to get some needed settings
source canesm.cfg

# call the compilation script (from within SLURM_TMPDIR)
${CANESM_SRC_ROOT}/CONFIG/COMMON/compile-canesm.sh -l

where the -l flag forces compile-canesm.sh to copy the source code localy, and will automatically copy the final executables to EXEC_STORAGE_DIR.

Confirm the necessary executables were created within EXEC_STORAGE_DIR, and then exit the container and the interactive session. See .compile-canesm* for compilation errors.

4. Run the Model

Once the excutables are compiled and stored in EXEC_STORAGE_DIR, you are then ready to run the model, which must be done in batch mode within the singularity container. It should be noted that the run time configuration of the model must match the compilation options, input data, and so on. Some basic runtime configuration is set in the canesm.cfg file (including relevant paths for binaries, input data, outputs, etc). This section assumes you have edited the canesm.cfg file appropriately, and ran bin/config-canesm.

Using the job script

The easiest way to launch to job is to simply submit the provided batch_run_cedar from within your run-directory (must be in ~/scratch):

cd ~/scratch/canesm-test-run

# Assumes that canesm.cfg is here; the paths are configured appropriately; and the executables must
# be pre-compiled and available in EXEC_STORAGE_DIR

# If needed, copy this file and modify it for your case. Submit to the queue with `sbatch`.
sbatch --account=XXX canesm/CCCma_tools/container/tools/batch_run_cedar

where XXX is your account or resource allocation.

Outputs will appear in the location defined in canesm.cfg by the variable OUTPUT_DIR. These are unpacked raw model history files on tiles.

Launching an interactive run

When running with the batch_run_cedar script, getting the inputs and environment setup are taken care of. However, it is still possible by, starting an interactive batch session (with the appropriate resources), copy in the inputs/executables, start the container, and then run the model.

Warning

Running the model interactively is tested less frequently than batch mode. Guidance is provided for interested readers, but problems may occur due to on-going changes with CanESM, as such the recommended method is to run the model in batch mode.

Prior to anything, it is worth noting that in order for CanESM to run, the simulation length must be set consistently for all components. When ran as part of the job script, this is automatically handled for the user according to the CURRENT_YEAR setting in canesm.cfg. However, when running interactively, this must be handled by the user. Details on the parameters that control this are discussed below, but its is also possible for users to mimic the behaviour in the job script by executing the following

source canesm.cfg # get configuration settings
source $CANESM_SRC_ROOT/CCCma_tools/tools/CanESM_shell_functions.sh # get helper functions
update_agcm_counters start_date=${CURRENT_YEAR}-01-01 stop_date=${CURRENT_YEAR}-12-31 agcm_timestep=900 \
                     namelist_file=config/namelists/modl.dat || :
update_nemo_counters start_date=${CURRENT_YEAR}-01-01 stop_date=${CURRENT_YEAR}-12-31 nemo_timestep=3600 \
                     namelist_file=config/namelists/namelist || :
update_coupler_counters start_date=${CURRENT_YEAR}-01-01 stop_date=${CURRENT_YEAR}-12-31 runid=$RUNID \
                     namelist_file=config/namelists/nl_coupler_par || :

where || : behind the update_* calls will stop your shell session from exiting if an error occurs.

Once these commands are executed, the namelists will have the correct settings to run the current year. After this, an interactive run can be achieved via

# Launch interactive session
salloc --time=1:0:0 --constraint=cascade --nodes=1 --ntasks-per-node=48 --mem-per-cpu=1000 --account=XXX

# Move to SLURM_TMPDIR to run
cd $SLURM_TMPDIR

# get high level config file and source the settings
cp /path/to/run/dir/canesm.cfg .
source canesm.cfg

# Copy in the input files and executables
cp $INPUT_DIR/* .
cp $WRK_DIR/config/* .               # note that this will avoid capturing sub-directories
cp $WRK_DIR/config/namelists/* .
cp $EXEC_STORAGE_DIR/*.exe .
chmod +x *.exe

# Load runtime environment
source runtime_environment

# Load the correct modules
module --force purge
module unload intel openmpi
module load StdEnv/2016.4 nixpkgs/16.09 gcc/7.3.0 mpich/3.2.1 netcdf-fortran/4.4.4
module load singularity

# Create a conf. file that allows us to run the model in multiple program multiple data mode.
# SLURM does not support this very well. Note the specification of CPUs for each exe. The total
# number cannot exceed the resource request.

echo -e "0-15 singularity exec -B /localscratch ${CONTAINER_IMAGE} ./canam.exe \n 16 singularity exec -B /localscratch ${CONTAINER_IMAGE} ./cancpl.exe \n 17-41 singularity exec -B /localscratch ${CONTAINER_IMAGE} ./nemo.exe" > run.conf

# Launch the run
export OMP_NUM_THREADS=1
time srun -n 42 -l --multi-prog run.conf

Notable Configurable Settings

CanESM is a complex system with a multitude of configurable options, ranging from physical/numerical parameters and input data fields, to MPI topology and infrastructure/sequencing settings. Due to this complexity, this beta guide can not begin to attempt to cover all the options comprehensively; nevertheless, some of the more settings are discussed here.

Compile Time Setting: MPI size of CanAM

The number of MPI tasks to be used for CanAM must currently be specified at compile time. This value is set in config/cppdef_sizes.h using the variable _PAR_NNODE_A. As discussed here config-canesm generates config/cppdef_sizes.h (which is called as part of config-containerized-canesm) from the settings contained in config/namelists/modl.dat.

If a user wishes to changes this, after running config-containerized-canesm, they can simply modify config/cppdef_sizes.h before executing the compilation.

Note

this specifies the number of MPI tasks - but CanAM also uses openmp, so this is not the total number of cores. The total number of cores used is _PAR_NNODE_A x OMP_NUM_THREADS, where OMP_NUM_THREADS is specified in the runtime script (see below). Also note, the number of MPI tasks specified must match with the resources supplied at runtime.

Compile Time Setting: Model/Experiment specific cpp macros for CanAM/CanCPL

For Atmosphere and Coupler component of CanESM (and the coupler interface for the Ocean), many source code files rely of a cppdef_config.h file to set many model/experiment specific cpp macros. As discussed here, this file is generated by config-canesm according to the values of cppdef_file and cppdef_diag. As of now, these get set within the EXPERIMENT specific *.cfg file at path/to/canesm/CCCma_tools/container/tools/config. If you would like to use another set of cpp definitions, the available files can be found in /path/to/canesm/CONFIG/[AMIP|ESM]/cppdefs. Alternatively, you can point to your own cpp files, or simply alter the generated config/cppdef_config.h and submit another compilation job.

Note

If a user wishes to modify any of the values in these cppdef_* files, the easiest way is to modify the the values in the generated config/cppdef_* files before executing the compilation

Note

within ECCC machines, cppdefs are typically specified by the runmode construct, and the sizes are computed and set by the compilation script. It might be helpful looking in canesm/CONFIG/ESM/canesm.cfg to determine the correct cppdef file to use for an experiment

Compile Time Setting: Compilation Flags

For information on altering compilation flags, see the detailed information here noting that the compilation environment file, and make/mkmf templates get extracted into the local config directory.

Run Time Setting: Simulation length

Note, if you use the config-containerized-canesm and batch_run_cedar scripts, the timers will be configured automatically from START_YEAR and STOP_YEAR in canesm.cfg. The information below is provded to help users customize the simulation step-size, if they so desire. The specified length of the simulation must match between CanNEMO and CanAM.

The number of CanNEMO steps is specified in the NEMO namelist file (typically extracted via config-containerized-canesm into config/namelists). For CanESM5, the nemo timestep is 1 hour, and the number of steps is specified in hours (e.g. 8760 steps for 1 year).
In the CanAM, the timers are configured by kstart, kfinal, and delt in the modl.dat file that also gets extracted into config/namelists via the configuration tool. kstart and kfinal represent the start and final step counters since year 1, and delt represents the time-step in seconds. For example, with a time-step of 900 seconds, if you want the model to start from January 1st, 5550 and go to the end of the year, then
kstart = (3600/900) * 24 * 365 * 5549 ! where 5549 is used because we are start from the END of 5549
and
kfinal = (3600/900) * 24 * 365 * 5550
Note that shorter periods can also be run, but again, the number of steps must be consistent between NEMO and CanAM.

Note

In CanESM5, CanAM takes 4 steps (15 m each) for each single NEMO timestep (1hr each). Coupling occurs every three hours, which is 3 NEMO timesteps and 12 CanAM steps. Below we rely on the default values having been set correctly. If the timers in model.dat and namelist are not set correctly, the run with not work.

Post-processing model data

Creating usable output from the raw model history files requires several steps (repacking, joining tiles, computing diagnostics, converting to timeseries, and finally conversion to CMOR compliant NetCDF). Within ECCC, this is achieved by a diagnostic string, which comprises many jobs, and relies on several pieces of software, including the CanAM diagnostics package (CanDIAG). At this time, this processing pipeline has not been ported to Cedar, but work is ongoing.

However, an interim series of diagnostics has been put it place to provide a basic level of usable netcdf output from the model. In summary:

canesm/CCCma_tools/container/tools/batch_diag_cedar is a batch script that will run these diagnostics. This is automatically launched by batch_run_cedar at the end of each year of simulation, but it can also be submitted by itself, if the output is available.
canesm/CCCma_tools/container/tools/basic_diag is the underlying diagnostics script that will:
- rebuild the NEMO NetCDF files from individual tiles to global files (via rebuild_nemo.exe).
- rebuild the CanAM CCCma binary formatted output from individual tiles to global files (via candiag_mwe.exe).
- Compute monthly mean 2D variables output in NetCDF (via candiag_mwe.exe).
- Use modified CCCma diagnostic recipes to compute monthly mean 3D temperature and wind fields (via temp_recipe and winds_recipe).
- Convert the 3D CCCma binary files to NetCDF format, using the ccc2nc_linux utility (note, this is currently not compiled, and a container-compatiable compiled version can temporarily be obtained from: ftp://ftp.cccma.ec.gc.ca/pub/CCCMA/nswart/ccc2nc_linux, and placed in the run bin directory)