Running CanESM on DRAC Platforms

Overview and resources

This document provides instructions for running the CanESM5 model on the DRAC machines launched as a part of the 2025 DRAC Infrastructure Renewal. On these machines, CanESM runs using bare metal (no container). In general, the directory paths, software modules, and user interface are the same on each platform.

Support for running CanESM on the updated DRAC machines will be rolled out over 2025-26. The following machines are currently supported:

  • Trillium (Scinet, hosted at the University of Toronto)

  • Fir (WESTGRID, hosted at Simon Fraser University)

  • Rorqual (Calcul Québec, hosted at the École de technologie supérieure). - See the platform-specific notes for information about ongoing issues with the Rorqual filesystem.

  • Nibi (SHARCNET, hosted at the University of Waterloo) - - See the platform-specific notes for information about a possible bottleneck with the postprocessing jobs on Nibi.

Activating the imsi python environment

This guide is relying on a model configuration system in development called imsi.

To get a pre-installed, stable version of imsi, and load the required modules:

source /project/rpp-cp4c/cp4c/pyenvs/activate_pyenv-imsi_latest.sh

This will load the version of imsi compatible with the branch v5.1_cp4c in the CCCma CanESM GitLab repository.

Python environments (and the associated activate scripts) for older versions of imsi, no longer compatible with v5.1_cp4c will be retained in the same directory. If you are returning to an experiment that was created using a previous version of imsi, you can still load the corresponding Python environments using these activate scripts.

You may also install imsi yourself (see imsi repo). Read the imsi documentation to get an overview of how the tools work.

Setup a piControl run

Navigate to your scratch space and setup a new piControl run as follows [1]:

cd $SCRATCH
imsi setup --repo=https://gitlab.com/cccma/canesm.git --ver=v5.1_cp4c --exp=cmip6-piControl --model=canesm51_p1 --runid=<unique-runid>

Be sure to specify a unique runid, that won’t clash with other users, and keep it below about 15 characters. The latest version of imsi will enforce the character limit for runid. Using initials is good. e.g. --runid=ncs-hist-tst-01 (see guidance below).

Next change into the run directory (named for <runid>) and build the executables:

imsi build

Save the default restart files:

imsi save-restarts

Submit the run to the batch queue:

imsi submit

This run will complete a year, and output will appear by year in the directory $SCRATCH/canesm_runs/<runid>/data/.

The default behavior for this run will execute 11 years, one year at a time, resubmitting jobs to the Slurm queue after each year. If you would like to change that or other run settings, see the IMSI documentation here.

If you would like to configure email notifications for your jobs or use a non-default allocation account see below and the Slurm documentation.

Runid Guidance

As part creating a run, users must select a “runid”, or run identifier, that will be used to differeniate their runs from others. In general the chosen runids are free form, but there are some restrictions, specifically, they must contain only lower case alphanumeric characters [a-z] and [0-9], the hyphen “-” and the period “.”. To check if another user has already used your runid, you can enter it into the RTD Broswer (see below). If no plot is made, then the runid you entered has not been previously used.

Warning

Do not re-use runids of existing runs

Running different experiments

To run an experiment other than piControl, specify a different value of --exp to imsi setup. The list of pre-configured experiments available to CP4C can be found in canesm/CONFIG/imsi-config/experiments/ (GitLab link). Common other experiments include cmip6-historical, cmip6-ssp370, cmip6-amip, and cmip6-piClim-control.

Note

For AMIP experiments, use --model canam51_p1 instead of canesm51_p1.

Users can add new experiment files to their own CanESM fork. It is strongly recommended that new experiments (especially custom forcing files) are developed in collaboration with ECCC scientists, to ensure scientific and technical validity. To set up a run using a different fork and/or branch, change the values of --repo and --ver accordingly.

Specifying SLURM Directives

Arguments passed to SLURM during the job submissions (model run and postprocessing) can be set by modifying imsi_configuration_<runid>.yaml, and then calling imsi config. The relevant section is under the following keys:

sequencing:
  sequencing_flow:
    jobs:
      model:
        resources:
          directives:
            - <SLURM directives for model run job>
      postproc:
        resources:
          directives:
            - <SLURM directives for post processing job>

Note

You must be granted access by CP4C in order to use the rpp-cp4c allocation.

Jobs will automatically run under your default group allocation, if you have one. For most users, this will be for their primary research group and not the CP4C Research Platforms and Portals allocation rpp-cp4c. Users should specify which allocation to use by setting the following SLURM directive:

- --account=<allocation-name>

Email notifications can be set by adding the following SLURM directives:

- --mail-type=<mail-option>
- --mail-user=<email-address>

Replace <email-address> above with your email. See the Slurm documentation for a list of values for <mail-option>, but some useful options are BEGIN (notify you when run begins execution), END (notify when run ends), FAIL (notify when run fails), and ALL (notify almost any time Slurm does something).

Platform-Specific Notes

Unfortunately, the Rorqual cluster has been experiencing issues that affect the performance of jobs that involve I/O from the scratch filesystem. Occasional I/O stalls delay the progress of reading and writing files. This issue has a greater effect on the postprocessing jobs, which perform a large number of I/O operations from $SCRATCH, i.e. the “network scratch” disks. The model jobs are less affected, since most of their I/O is done using the node-mounted storage, i.e. “local scratch” ($SLURM_TMPDIR). The nature of the CanESM database system makes it difficult to do the postprocessing I/O through the node-mounted storage. The interim solution, advised by DRAC support, is to request more wall time for for jobs run on Rorqual, to allow for the longer I/O wait times. Users are reccomended to configure mail notifications so failed jobs can be re-submitted if necessary. Typical wall times for Rorqual jobs that do not experience any I/O issues are ~100 mins for a coupled model run (one year), and ~45 minutes for a postprocessing job.

CP4C is monitoring this issue, and will keep the community updated through the regular channels (e.g. CP4C working group meetings).

The postprocessing jobs run slightly slower on Nibi compared to the other three DRAC platforms, because of hardware limitations. This is not a problem for coupled runs, since the postprocessing (~1 hour) is faster than the model jobs (~80 minutes). For AMIP runs however, the postprocessing (~50 mins) is usually slower than the model runs (~40 mins). For longer runs, the postprocessing could end up lagging a few years behind the model, leading to an accumulation of data (since the postprocessing reduces the data volume compared to the raw model output). Users should be aware of this possible bottleneck when managing their scratch storage.

CanESM Output

CanESM produces runtime diagnostic (RTD) files which contain high-level summaries of the simulation results. These include annual averages and monthly climatologies of global mean quantities. An interactive webpage called the RTD Broswer can be used to plot the data stored in the RTD files. The RTD files are automatically copied from DRAC machines to the web server that runs the RTD browser. Read the RTD Browser README for more info. The CP4C Reference Runs Table contains a list of runids for simulations that can be used as benchmarks.

With imsi v0.7 and later, the simulation output files are written to $SCRATCH/canesm_runs/<runid>/data/. This differs from previous versions (i.e. v0.3b. used on Niagara), where the ouput was saved to the case directory ($SCRATCH/<runid>/output).

The model postprocessing job converts the output files from CCCma format to netCDF, which is a standard format for climate data. The CanESM Output Tutorial from the 2024 CP4C Workshop provides basic instructions for analyzing the netCDF output.

Footnotes