********************************************************
Running CanESM on Compute Canada / Cedar (Singularity)
********************************************************
Important note
##############
These are legacy instructions, no longer maintained, and not guaranteed to work.
Use these at your own risk of frustration
Overview and resources
######################
This document provides instructions for running the CanESM5 model on the
Compute Canada machine ``Cedar``. It is an opinionated guide that suggests a
recommended approach. These instructions are likely relevant to other Compute
Canada systems, but it has not been tested.
To enable external use of the model, ``CanESM`` has been ported to compile with
the GNU compiler, and suitable ``Docker`` and ``Singularity`` containers created.
Tagged versions of the model should have been tested to function on Cedar.
There is no guarantee for non-tagged versions. At the time of writing, the
latest tagged version verfied on Cedar is ``v5.0.10``.
Working with ``CanESM`` on ``Cedar`` requires working with `singularity `_ and
interacting with the `SLURM workload manager `_.
For information on these software, interested readers are directored to the provided links, as
as the `Cedar specific `_ and more general
`Compute Canada `_ wiki.
We are going to use the space ``~/project/$USER`` to store some persistent data,
and we'll put some important files in ``$HOME``. You may changes this if you so
desire (e.g. to ``~/scratch``, but note that this space is temporary and
periodically deleted).
For setting up a model run, we are going to use the ``~/scratch`` space, which is
temporary space that can handle lots of data (note, data here will be deleted
after 60 days or so). For ``CanESM``, we often use the concept of a ``RUNID`` as a
unique identifier of an individual run. We normally create a dedicated
directory to work in for each ``RUNID``.
.. note::
Only a small number of model configurations are made available. There is
no support whatsoever available for using CanESM. We do not certify that
the code runs correctly. This is strictly an alpha testing project with no
guarantees whatsoever.
Initial Setup (One Time)
########################
Get the source code
*******************
Recursively clone the ``CanESM`` source code and checkout the desired version
for the model, and its helper tools. We'll put the source code in our home
directory as it is small, we want to keep it permanently, and home has
relatively fast access::
cd ~/
# clone the code and the desired submodules
git clone --recursive https://gitlab.com/cccma/canesm
.. note::
you might want to clone from ``git@gitlab.com:cccma/canesm.git`` instead.
This will avoid having to enter your password on every push. It requires
setting up SSH keys and entering your public SSH key into gitlab
(`see here `_).
Once this repository is cloned, it allows you a place to inspect/make changes to the
source code, as well as gain access to various helper tools that will make it
easier to setup and run self-contained experiments. It should be noted that the
default branch, ``develop_canesm`` is continually updated. If you would like to
be sure you are using a stable version of the codes, it is recommended that you
use one of the available tags, which by convention are created across all sub-repos in
addition to the super repo. To see the available tags, see `here
`_ or run ``git tag`` within the repo.
.. note::
Only tags including and after ``v5.0.10`` support the functionality discussed
in this document.
.. note::
To easily checkout a consistent version across all submodules and the super-repo,
you can use ``CCCma_tools/tools/git-scheckout BRANCH_OR_TAG_COMMIT``::
CCCma_tools/tools/git-scheckout v5.0.10
Or if already on your ``$PATH``, you can simply run ::
git scheckout v5.0.10
Add Tools to your $PATH
***********************
To help with the setup/development and running of ``CanESM``, it is advised to
add a few things to your environment.
1. CCCma "*s-scripts*"
As discussed :ref:`here `,
these scripts help manage all the sub-repos across the super-repo. To get these, add the
following to your ``.bashrc`` (or execute every time you log in)::
export PATH=${PATH}:/path/to/cloned/canesm/CCCma_tools/tools
2. ``setup-containerized-canesm``
This tool was specifically designed to setup a *self-contained* run, with
independent source code and a configured "run directory". This tool exists
at ``/path/to/cloned/canesm/CCCma_tools/container/tools``, which *could* be added
to your ``$PATH``. However, that will add *all* executable scripts in said
directory, which could have unintented consequences. As such, the recommended
solution is to copy or link the file to your ``~/bin`` directory (making sure
that that directory is on your ``PATH``). For example::
ln -s /path/to/cloned/canesm/CCCma_tools/container/tools/setup-containerized-canesm ~/bin/
See :ref:`here <1. Setup your run directory>` for more information on this tool.
Create a python environment to run infrastructure scripts
*********************************************************
For some of the supporting scripts around ``CanESM``, specific ``python`` packages
are required. To easily install the necessary packages, it is recommended that users
utilize a ``python`` virtual environment. There are multiple ways to create/use
virtual environments, but when working on ``cedar``, users should follow
`the Compute Canada recommended method `_
to build a python 3.8 environment.
Once the environment is created and activated, simply install the packages via ``pip`` using
``path/to/canesm/CCCma_tools/container/tools/python-requirements.txt``
.. code-block::
pip install -r path/to/canesm/CCCma_tools/container/python-requirements.txt
Build the Singularity image (once)
**********************************
The model is compiled and run inside a Docker or Singularity container, which
provides all the required dependencies. Using the container alleviates the
need to port the model dependencies to each individual HPC system. It is in
principle possible to compile the model on the native HPC. However, then all
dependencies must be met, including the ESMF library. We do not provide
instructions for this, and assume use of the container.
Singularity is available on Compute Canada machines, is designed for HPC, and
does not significantly degrade performance relative to bare metal, and thus this guide
recommends the use of Singularity.
To run ``CanESM`` in a Singularity container, you first need to build the image
from the ``CanESM`` docker container. To do this, request an interactive session
on a compute-node and then utilize the ``singularity build`` command to convert
the docker singularity to a singularity image - i.e.
.. code-block::
# start an interactive session and go to a tmpdir
salloc --time=1:0:0 --constraint=cascade --nodes=1 --ntasks-per-node=1 --mem-per-cpu=5000
cd $SLURM_TMPDIR
# Build the singualrity image from the public Docker hub image
singularity build canesm-docker_latest.sif docker://cccma/canesm-docker
# Create a director to store the image for future use
mkdir -p ~/project/$USER/singularity_images
mv canesm-docker_latest.sif ~/project/$USER/singularity_images
exit
**This only needs to be done once** - after
you (or another existing user) build and store the image, you can simply point to it
when running ``CanESM``.
Download the Basic required input data
**************************************
The basic input data needed to run CanESM is available on FTP for some
selected experiments. You need to download the data for the experiment that
you want to run. We typically use standard, CMIP define experiment names. The
example below is for the ``piControl`` experiment. This data also contains a
default restart file.
.. code-block::
cd ~/project/$USER
mkdir -p canesm_input_data/piControl
cd canesm_input_data/piControl
wget ftp://ftp.cccma.ec.gc.ca/pub/CCCMA/nswart/canesm5_piControl_config_04-06-2021/*
The ``pdSST-pdSIC`` experiment from PAMIP is also available at
``ftp://ftp.cccma.ec.gc.ca/pub/CCCMA/nswart/canesm5_pamip_config/``. Forcing
for other CMIP6 experiments can be made available.
Executing a Run
###############
Once you a stored version of ``canesm``, the input data, and the required ``singularity``
image, you are ready to setup your first run. Which can be done by:
1. Setup your run directory
***************************
Provided you've added ``setup-containerized-canesm`` to a suitable location, it can be used to
easily setup a suitable, *self-contained* run directory for a desired ``RUNID``. For details on using
this script, see the interface via ``-h``:
.. code-block::
$ setup-containerized-canesm -h
Setup containerized canesm run
Creates a run directory for the given runid, using the defined repository/version, in the present working directory
Usage: setup-containerized-canesm [-h] runid=RUNID repo=SOURCE_REPOSITORY version=VERSION
RUNID : the alphanumeric string identifier for the given run.
SOURCE_REPOSITORY : the repository address/path that be cloned and used for the run.
VERSION : the commit hash or branch-name to checkout for this run.
For example, to setup a run from ``v5.0.10`` from the main gitlab repository:
.. code-block:: text
setup-containerized-canesm runid=canesm-test-run01 repo=git@gitlab.com:cccma/canesm.git version=v5.0.10
or if you've setup your own forks of ``CanESM`` (*and the submodules*), and want to use your development
version
.. code-block:: text
setup-containerized-canesm runid=canesm-dev-run01 repo=git@gitlab.com:user123/canesm.git version=my-dev-version
which would create a directory name ``canesm-test-run01`` in the ``PWD``, with some useful directories and high level
configuration files, such as ``canesm.cfg``.
For both of these examples, ``setup-containerized-canesm`` will create "run directory" in ``PWD`` and clone down
a *new* version of ``CanESM`` from the given repo (and checking out the desired version) and will setup
some useful directories and bring in some useful files, like ``canesm.cfg``, which contains high level
configuration settings for the model.
2. Configure the run
********************
Once you run directory is setup, navigate into it and modify ``canesm.cfg`` as
necessary for your run. As of writing this only two experiments are available,
``piControl`` or ``pdSST-pdSIC``, where the former is an ESM simulation using the
ocean, atmosphere, and the coupler, while the latter is an AMIP simulation and
only uses the atmosphere and coupler. To set which run you want, simply set the
``EXPERIMENT`` variable within ``canesm.cfg``.
Including ``EXPERIMENT``, the most imporant variables to set are:
- ``CONTAINER_IMAGE``
- ``CC_ACCOUNT``
- ``EXPERIMENT``
- ``INPUT_DIR``
- ``START_YEAR``
- ``STOP_YEAR``
.. note::
The currently acceptable values for ``EXPERIMENT`` are ``piControl`` and
``pdSST-pdSIC``, and it should be noted that ``piControl`` is tested more
frequently than ``pdSST-pdSIC``.
Once you have decided on these settings, assuming you've built and activated
the ``python`` environment (discussed :ref:`above `), pull in the necessary configuration files
by running:
.. code-block::
bin/config-containerized-canesm
.. note::
If you don't have the correct ``python`` environment activated, you will see
.. code-block::
ModuleNotFoundError: No module named 'f90nml'
This will populate ``${WRK_DIR}/config`` with many different configuration files that
are used to compile/run the model.
Note that informed/curious users can go into the resulting files in ``config/`` and
modify them further for their needs - some of the more important settings will
be discussed below. With that said, to simply get one of the two given experiments running,
users can leave them as they are.
3. Compile the source code
**************************
To compile ``CanESM`` and supporting diagnostic/utility programs, the
recommended method is to utilize
``/path/to/canesm/CONFIG/COMMON/compile-canesm.sh``. However, as discussed
:ref:`above `, on ``cedar``, the compilation
must happen within the aforementioned ``singularity`` image. Concise, ``cedar`` specific,
compilation information is discussed below, but for detailed information
on the compilation system, see the documentation :ref:`here `.
.. note::
Users may attempt to compile the model on "bare metal" without the container, but
this requires compiling all the dependencies including ESMF
Compiling in batch mode
^^^^^^^^^^^^^^^^^^^^^^^
The easiest way to compile the code is to use the batch compilation script, ``batch_compile_cedar``.
Which will use ``compile-canesm.sh`` to automatically pull a copy of the source code into
``$SLURM_TMPDIR``, build all the necessary executables and send them back to ``EXEC_STORAGE_DIR``
(see ``canesm.cfg``). This can be easily achieved via::
sbatch --account=XXX canesm/CCCma_tools/container/tools/batch_compile_cedar
where ``XXX`` should be replaced by your allocation account. You can monitor the progress
of this job using ``squeue -u $(whoami)``. Upon completion you should see the desired
executables within ``EXEC_STORAGE_DIR``, and can inspect the hidden ``.compile-canesm*.log``
files in ``PWD``.
.. note::
``$SLURM_TMPDIR`` is used to take advantage of faster I/O on the local scratch space.
Interactively compile the source code manually
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
It is also possible for users to compile the model interactively. Since we need
to singularity container to meet the dependencies, it is generally not possible
to compile Interactively on the cedar headnodes.
As such, to compile interatively, you will want to:
1. Launch an interactive session, move to a tmp directory, and launch an
interactive ``singularity`` container from the image built earlier::
salloc --time=1:0:0 --constraint=cascade --nodes=1 --ntasks-per-node=8 --mem-per-cpu=5000 --account=XXX
cd $SLURM_TMPDIR
singularity shell --cleanenv -B /home -B /project -B /scratch -B /localscratch --env APPEND_PATH=/path/to/canesm/CCCma_tools/scripts/comm/ ~/project/$USER/singularity_images/canesm-docker_latest.sif
2. Copy in the source code and compile the model using the provided utility script::
# copy in the high level configuration file
cp /path/to/run/dir/canesm.cfg .
# source the config file to get some needed settings
source canesm.cfg
# call the compilation script (from within SLURM_TMPDIR)
${CANESM_SRC_ROOT}/CONFIG/COMMON/compile-canesm.sh -l
where the ``-l`` flag forces ``compile-canesm.sh`` to copy the source code localy, and will automatically
copy the final executables to ``EXEC_STORAGE_DIR``.
3. Confirm the necessary executables were created within ``EXEC_STORAGE_DIR``, and then exit the container **and** the
interactive session. See ``.compile-canesm*`` for compilation errors.
4. Run the Model
****************
Once the excutables are compiled and stored in ``EXEC_STORAGE_DIR``, you are
then ready to run the model, which must be done in batch mode within the
singularity container. It should be noted that the run time configuration of
the model must match the compilation options, input data, and so on. Some basic
runtime configuration is set in the ``canesm.cfg`` file (including relevant
paths for binaries, input data, outputs, etc). This section assumes you have
edited the ``canesm.cfg`` file appropriately, and ran ``bin/config-canesm``.
Using the job script
********************
The easiest way to launch to job is to simply submit the provided ``batch_run_cedar`` from within
your run-directory (must be in ``~/scratch``)::
cd ~/scratch/canesm-test-run
# Assumes that canesm.cfg is here; the paths are configured appropriately; and the executables must
# be pre-compiled and available in EXEC_STORAGE_DIR
# If needed, copy this file and modify it for your case. Submit to the queue with `sbatch`.
sbatch --account=XXX canesm/CCCma_tools/container/tools/batch_run_cedar
where ``XXX`` is your account or resource allocation.
Outputs will appear in the location defined in ``canesm.cfg`` by the
variable ``OUTPUT_DIR``. These are unpacked raw model history files on tiles.
Launching an interactive run
****************************
When running with the ``batch_run_cedar`` script, getting the inputs and environment setup
are taken care of. However, it is still possible by, starting an interactive batch session
(with the appropriate resources), copy in the inputs/executables, start the container, and then
run the model.
.. warning::
Running the model interactively is tested less frequently than batch mode. Guidance
is provided for interested readers, but problems may occur due to on-going changes with
``CanESM``, as such the recommended method is to run the model in batch mode.
Prior to anything, it is worth noting that in order for ``CanESM`` to run, the simulation length
**must be set consistently for all components**. When ran as part of the job script, this is automatically
handled for the user according to the ``CURRENT_YEAR`` setting in ``canesm.cfg``. However,
when running interactively, this must be handled by the user. Details on the parameters that control this
are discussed :ref:`below `, but its is also possible for users to
mimic the behaviour in the job script by executing the following
.. code-block:: bash
source canesm.cfg # get configuration settings
source $CANESM_SRC_ROOT/CCCma_tools/tools/CanESM_shell_functions.sh # get helper functions
update_agcm_counters start_date=${CURRENT_YEAR}-01-01 stop_date=${CURRENT_YEAR}-12-31 agcm_timestep=900 \
namelist_file=config/namelists/modl.dat || :
update_nemo_counters start_date=${CURRENT_YEAR}-01-01 stop_date=${CURRENT_YEAR}-12-31 nemo_timestep=3600 \
namelist_file=config/namelists/namelist || :
update_coupler_counters start_date=${CURRENT_YEAR}-01-01 stop_date=${CURRENT_YEAR}-12-31 runid=$RUNID \
namelist_file=config/namelists/nl_coupler_par || :
where ``|| :`` behind the ``update_*`` calls will stop your shell session from exiting if an error occurs.
Once these commands are executed, the namelists will have the correct settings to run the current year. After
this, an interactive run can be achieved via
.. code-block:: bash
# Launch interactive session
salloc --time=1:0:0 --constraint=cascade --nodes=1 --ntasks-per-node=48 --mem-per-cpu=1000 --account=XXX
# Move to SLURM_TMPDIR to run
cd $SLURM_TMPDIR
# get high level config file and source the settings
cp /path/to/run/dir/canesm.cfg .
source canesm.cfg
# Copy in the input files and executables
cp $INPUT_DIR/* .
cp $WRK_DIR/config/* . # note that this will avoid capturing sub-directories
cp $WRK_DIR/config/namelists/* .
cp $EXEC_STORAGE_DIR/*.exe .
chmod +x *.exe
# Load runtime environment
source runtime_environment
# Load the correct modules
module --force purge
module unload intel openmpi
module load StdEnv/2016.4 nixpkgs/16.09 gcc/7.3.0 mpich/3.2.1 netcdf-fortran/4.4.4
module load singularity
# Create a conf. file that allows us to run the model in multiple program multiple data mode.
# SLURM does not support this very well. Note the specification of CPUs for each exe. The total
# number cannot exceed the resource request.
echo -e "0-15 singularity exec -B /localscratch ${CONTAINER_IMAGE} ./canam.exe \n 16 singularity exec -B /localscratch ${CONTAINER_IMAGE} ./cancpl.exe \n 17-41 singularity exec -B /localscratch ${CONTAINER_IMAGE} ./nemo.exe" > run.conf
# Launch the run
export OMP_NUM_THREADS=1
time srun -n 42 -l --multi-prog run.conf
Notable Configurable Settings
*****************************
``CanESM`` is a complex system with a multitude of configurable options, ranging from
physical/numerical parameters and input data fields, to MPI topology and infrastructure/sequencing
settings. Due to this complexity, this *beta* guide can not begin to attempt to cover
all the options comprehensively; nevertheless, some of the more settings are discussed here.
Compile Time Setting: MPI size of CanAM
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The number of MPI tasks to be used for CanAM **must currently be specified at
compile time.** This value is set in ``config/cppdef_sizes.h`` using
the variable ``_PAR_NNODE_A``. As discussed :ref:`here `
``config-canesm`` generates ``config/cppdef_sizes.h`` (which is called as part
of ``config-containerized-canesm``) from the settings contained in ``config/namelists/modl.dat``.
If a user wishes to changes this, after running ``config-containerized-canesm``, they can
simply modify ``config/cppdef_sizes.h`` **before executing the compilation**.
.. note::
this specifies the number of MPI tasks - but CanAM also uses openmp, so
this is not the total number of cores. The total number of cores used is
``_PAR_NNODE_A x OMP_NUM_THREADS``, where ``OMP_NUM_THREADS`` is specified in the
runtime script (see below). Also note, the number of MPI tasks specified must
match with the resources supplied at runtime.
Compile Time Setting: Model/Experiment specific cpp macros for CanAM/CanCPL
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
For Atmosphere and Coupler component of ``CanESM`` (and the coupler interface
for the Ocean), many source code files rely of a ``cppdef_config.h`` file to
set many model/experiment specific ``cpp`` macros. As discussed :ref:`here
`, this file is generated by ``config-canesm``
according to the values of ``cppdef_file`` and ``cppdef_diag``. As of now, these
get set within the ``EXPERIMENT`` specific ``*.cfg`` file at
``path/to/canesm/CCCma_tools/container/tools/config``. If you would like to use
another set of ``cpp`` definitions, the available files can be found in
``/path/to/canesm/CONFIG/[AMIP|ESM]/cppdefs``. Alternatively, you can point to your
*own* ``cpp`` files, *or* simply alter the generated ``config/cppdef_config.h``
and submit another compilation job.
.. note::
If a user wishes to modify any of the values in these ``cppdef_*`` files, the easiest way
is to modify the the values in the generated ``config/cppdef_*`` files **before executing
the compilation**
.. note::
within ECCC machines, cppdefs are typically specified by the ``runmode``
construct, and the sizes are computed and set by the compilation script. It
might be helpful looking in ``canesm/CONFIG/ESM/canesm.cfg`` to determine
the correct ``cppdef`` file to use for an experiment
Compile Time Setting: Compilation Flags
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
For information on altering compilation flags, see the detailed information
:ref:`here ` noting that the compilation environment file,
and ``make``/``mkmf`` templates get extracted into the local ``config`` directory.
Run Time Setting: Simulation length
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Note, if you use the ``config-containerized-canesm`` and ``batch_run_cedar``
scripts, the timers will be configured automatically from ``START_YEAR``
and ``STOP_YEAR`` in ``canesm.cfg``. The information below is
provded to help users customize the simulation step-size, if they so desire.
**The specified length of the simulation must match between CanNEMO and
CanAM.**
- The number of CanNEMO steps is specified in the NEMO ``namelist`` file
(typically extracted via ``config-containerized-canesm`` into ``config/namelists``).
For CanESM5, the nemo timestep is 1 hour, and the number of steps is specified in hours
(e.g. 8760 steps for 1 year).
- In the CanAM, the timers are configured
by ``kstart``, ``kfinal``, and ``delt`` in the ``modl.dat`` file that also
gets extracted into ``config/namelists`` via the configuration tool.
``kstart`` and ``kfinal`` represent the start and final step counters
*since year 1*, and ``delt`` represents the time-step in seconds.
For example, with a time-step of ``900`` seconds, if you want the model
to start from January 1st, 5550 and go to the end of the year, then
.. code-block:: fortran
kstart = (3600/900) * 24 * 365 * 5549 ! where 5549 is used because we are start from the END of 5549
and
.. code-block:: fortran
kfinal = (3600/900) * 24 * 365 * 5550
Note that shorter periods can also be run, but again, the number of steps must be
consistent between NEMO and CanAM.
.. note::
In CanESM5, CanAM takes 4 steps (15 m each) for each single NEMO
timestep (1hr each). Coupling occurs every three hours, which is 3 NEMO
timesteps and 12 CanAM steps. Below we rely on the default values having been
set correctly. If the timers in model.dat and namelist are not set correctly,
the run with not work.
Post-processing model data
##########################
Creating usable output from the raw model history files requires several
steps (repacking, joining tiles, computing diagnostics, converting to
timeseries, and finally conversion to CMOR compliant NetCDF). Within ECCC,
this is achieved by a diagnostic string, which comprises many jobs, and
relies on several pieces of software, including the CanAM diagnostics package
(CanDIAG). At this time, this processing pipeline has not been ported to
Cedar, but work is ongoing.
However, an interim series of diagnostics has been put it place to provide a basic level
of usable netcdf output from the model. In summary:
- ``canesm/CCCma_tools/container/tools/batch_diag_cedar`` is a batch script that
will run these diagnostics. This is automatically launched by
``batch_run_cedar`` at the end of each year of simulation, but it can also be
submitted by itself, if the output is available.
- ``canesm/CCCma_tools/container/tools/basic_diag`` is the underlying diagnostics
script that will:
- rebuild the NEMO NetCDF files from individual tiles to global files (via
``rebuild_nemo.exe``).
- rebuild the CanAM CCCma binary formatted output from individual tiles to
global files (via ``candiag_mwe.exe``).
- Compute monthly mean 2D variables output in NetCDF (via
``candiag_mwe.exe``).
- Use modified CCCma diagnostic recipes to compute monthly mean 3D
temperature and wind fields (via ``temp_recipe`` and ``winds_recipe``).
- Convert the 3D CCCma binary files to NetCDF format, using the
``ccc2nc_linux`` utility (note, this is currently not compiled, and a
container-compatiable compiled version can temporarily be obtained from:
``ftp://ftp.cccma.ec.gc.ca/pub/CCCMA/nswart/ccc2nc_linux``, and placed in the
run bin directory)