******************************************************** Running CanESM on DRAC Platforms ******************************************************** Overview and resources ###################### This document provides instructions for running the CanESM5 model on the DRAC machines launched as a part of the 2025 DRAC Infrastructure Renewal. On these machines, CanESM runs using bare metal (no container). In general, the directory paths, software modules, and user interface are the same on each platform. Support for running CanESM on the updated DRAC machines will be rolled out over 2025-26. The following machines are currently supported: - `Trillium `_ (Scinet, hosted at the University of Toronto) - `Fir `_ (WESTGRID, hosted at Simon Fraser University) - `Rorqual `_ (Calcul Québec, hosted at the École de technologie supérieure). - See the :ref:`platform-specific notes ` for information about ongoing issues with the Rorqual filesystem. - `Nibi `_ (SHARCNET, hosted at the University of Waterloo) - - See the :ref:`platform-specific notes ` for information about a possible bottleneck with the postprocessing jobs on Nibi. Activating the imsi python environment ########################################## This guide is relying on a model configuration system in development called *imsi*. To get a pre-installed, stable version of imsi, and load the required modules: .. code-block:: bash source /project/rpp-cp4c/cp4c/pyenvs/activate_pyenv-imsi_latest.sh This will load the version of imsi compatible with the branch ``v5.1_cp4c`` in the CCCma CanESM GitLab repository. Python environments (and the associated activate scripts) for older versions of imsi, no longer compatible with ``v5.1_cp4c`` will be retained in the same directory. If you are returning to an experiment that was created using a previous version of imsi, you can still load the corresponding Python environments using these activate scripts. You may also install imsi yourself (see `imsi repo `_). Read the `imsi documentation `_ to get an overview of how the tools work. Setup a piControl run ##################### Navigate to your scratch space and setup a new piControl run as follows [#f1]_: .. code-block:: bash cd $SCRATCH imsi setup --repo=https://gitlab.com/cccma/canesm.git --ver=v5.1_cp4c --exp=cmip6-piControl --model=canesm51_p1 --runid= Be sure to specify a unique runid, that won't clash with other users, and keep it below about 15 characters. The latest version of imsi will enforce the character limit for runid. Using initials is good. e.g. ``--runid=ncs-hist-tst-01`` (see guidance below). Next change into the run directory (named for ) and build the executables: .. code-block:: bash imsi build Save the default restart files: .. code-block:: bash imsi save-restarts Submit the run to the batch queue: .. code-block:: bash imsi submit This run will complete a year, and output will appear by year in the directory ``$SCRATCH/canesm_runs//data/``. The default behavior for this run will execute 11 years, one year at a time, resubmitting jobs to the Slurm queue after each year. If you would like to change that or other run settings, see the IMSI documentation `here `_. If you would like to configure email notifications for your jobs or use a non-default allocation account see :ref:`below ` and the `Slurm documentation `_. Runid Guidance ^^^^^^^^^^^^^^ As part creating a run, users must select a "runid", or run identifier, that will be used to differeniate their runs from others. In *general* the chosen runids are free form, but there are *some* restrictions, specifically, they must contain **only lower case alphanumeric characters [a-z] and [0-9], the hyphen "-" and the period "."**. To check if another user has already used your runid, you can enter it into the `RTD Broswer `_ (see below). If no plot is made, then the runid you entered has not been previously used. .. warning:: **Do not** re-use runids of existing runs Running different experiments ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ To run an experiment other than piControl, specify a different value of ``--exp`` to ``imsi setup``. The list of pre-configured experiments available to CP4C can be found in ``canesm/CONFIG/imsi-config/experiments/`` (`GitLab link `_). Common other experiments include cmip6-historical, cmip6-ssp370, cmip6-amip, and cmip6-piClim-control. .. note:: For AMIP experiments, use ``--model canam51_p1`` instead of ``canesm51_p1``. Users can add new experiment files to their own CanESM fork. It is strongly recommended that new experiments (especially custom forcing files) are developed in collaboration with ECCC scientists, to ensure scientific and technical validity. To set up a run using a different fork and/or branch, change the values of ``--repo`` and ``--ver`` accordingly. Specifying SLURM Directives ################################# Arguments passed to SLURM during the job submissions (model run and postprocessing) can be set by modifying ``imsi_configuration_.yaml``, and then calling ``imsi config``. The relevant section is under the following keys: .. code-block:: yaml sequencing: sequencing_flow: jobs: model: resources: directives: - postproc: resources: directives: - Setting the DRAC account ~~~~~~~~~~~~~~~~~~~~~~~~~~ .. note:: You must be granted access by CP4C in order to use the ``rpp-cp4c`` allocation. Jobs will automatically run under your default group allocation, if you have one. For most users, this will be for their primary research group and not the CP4C Research Platforms and Portals allocation ``rpp-cp4c``. Users should specify which allocation to use by setting the following SLURM directive: .. code-block:: yaml - --account= Configuring email notifications ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Email notifications can be set by adding the following SLURM directives: .. code-block:: yaml - --mail-type= - --mail-user= Replace above with your email. See the `Slurm documentation `_ for a list of values for , but some useful options are ``BEGIN`` (notify you when run begins execution), ``END`` (notify when run ends), ``FAIL`` (notify when run fails), and ``ALL`` (notify `almost` any time Slurm does something). Platform-Specific Notes ########################## Rorqual I/O Stalls ~~~~~~~~~~~~~~~~~~~~~~~~~ Unfortunately, the Rorqual cluster has been experiencing issues that affect the performance of jobs that involve I/O from the scratch filesystem. Occasional I/O stalls delay the progress of reading and writing files. This issue has a greater effect on the postprocessing jobs, which perform a large number of I/O operations from ``$SCRATCH``, i.e. the "network scratch" disks. The model jobs are less affected, since most of their I/O is done using the node-mounted storage, i.e. "local scratch" (``$SLURM_TMPDIR``). The nature of the CanESM database system makes it difficult to do the postprocessing I/O through the node-mounted storage. The interim solution, advised by DRAC support, is to request more wall time for for jobs run on Rorqual, to allow for the longer I/O wait times. Users are reccomended to configure mail notifications so failed jobs can be re-submitted if necessary. Typical wall times for Rorqual jobs that do not experience any I/O issues are ~100 mins for a coupled model run (one year), and ~45 minutes for a postprocessing job. CP4C is monitoring this issue, and will keep the community updated through the regular channels (e.g. CP4C working group meetings). Nibi AMIP postprocessing ~~~~~~~~~~~~~~~~~~~~~~~~~~~ The postprocessing jobs run slightly slower on Nibi compared to the other three DRAC platforms, because of hardware limitations. This is not a problem for coupled runs, since the postprocessing (~1 hour) is faster than the model jobs (~80 minutes). For AMIP runs however, the postprocessing (~50 mins) is usually slower than the model runs (~40 mins). For longer runs, the postprocessing could end up lagging a few years behind the model, leading to an accumulation of data (since the postprocessing reduces the data volume compared to the raw model output). Users should be aware of this possible bottleneck when managing their scratch storage. CanESM Output ################ Runtime Diagnostics Browser ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ CanESM produces runtime diagnostic (RTD) files which contain high-level summaries of the simulation results. These include annual averages and monthly climatologies of global mean quantities. An interactive webpage called the `RTD Broswer `_ can be used to plot the data stored in the RTD files. The RTD files are automatically copied from DRAC machines to the web server that runs the RTD browser. `Read the RTD Browser README `_ for more info. The `CP4C Reference Runs Table `_ contains a list of runids for simulations that can be used as benchmarks. netCDF files ~~~~~~~~~~~~~~~~~~~~~ With imsi v0.7 and later, the simulation output files are written to ``$SCRATCH/canesm_runs//data/``. This differs from previous versions (i.e. v0.3b. used on Niagara), where the ouput was saved to the case directory (``$SCRATCH//output``). The model postprocessing job converts the output files from CCCma format to netCDF, which is a standard format for climate data. The `CanESM Output Tutorial `_ from the 2024 CP4C Workshop provides basic instructions for analyzing the netCDF output. .. rubric:: Footnotes .. [#f1] Note, using the `https` address for gitlab is used here as it has no additional requirements. In general using `--repo=git@gitlab.com:cccma/canesm.git` is superior, but requires `adding your ssh keys to gitlab `_.