******************************************************** Running CanESM on Compute Canada / Cedar (Singularity) ******************************************************** Important note ############## These are legacy instructions, no longer maintained, and not guaranteed to work. Use these at your own risk of frustration Overview and resources ###################### This document provides instructions for running the CanESM5 model on the Compute Canada machine ``Cedar``. It is an opinionated guide that suggests a recommended approach. These instructions are likely relevant to other Compute Canada systems, but it has not been tested. To enable external use of the model, ``CanESM`` has been ported to compile with the GNU compiler, and suitable ``Docker`` and ``Singularity`` containers created. Tagged versions of the model should have been tested to function on Cedar. There is no guarantee for non-tagged versions. At the time of writing, the latest tagged version verfied on Cedar is ``v5.0.10``. Working with ``CanESM`` on ``Cedar`` requires working with `singularity `_ and interacting with the `SLURM workload manager `_. For information on these software, interested readers are directored to the provided links, as as the `Cedar specific `_ and more general `Compute Canada `_ wiki. We are going to use the space ``~/project/$USER`` to store some persistent data, and we'll put some important files in ``$HOME``. You may changes this if you so desire (e.g. to ``~/scratch``, but note that this space is temporary and periodically deleted). For setting up a model run, we are going to use the ``~/scratch`` space, which is temporary space that can handle lots of data (note, data here will be deleted after 60 days or so). For ``CanESM``, we often use the concept of a ``RUNID`` as a unique identifier of an individual run. We normally create a dedicated directory to work in for each ``RUNID``. .. note:: Only a small number of model configurations are made available. There is no support whatsoever available for using CanESM. We do not certify that the code runs correctly. This is strictly an alpha testing project with no guarantees whatsoever. Initial Setup (One Time) ######################## Get the source code ******************* Recursively clone the ``CanESM`` source code and checkout the desired version for the model, and its helper tools. We'll put the source code in our home directory as it is small, we want to keep it permanently, and home has relatively fast access:: cd ~/ # clone the code and the desired submodules git clone --recursive https://gitlab.com/cccma/canesm .. note:: you might want to clone from ``git@gitlab.com:cccma/canesm.git`` instead. This will avoid having to enter your password on every push. It requires setting up SSH keys and entering your public SSH key into gitlab (`see here `_). Once this repository is cloned, it allows you a place to inspect/make changes to the source code, as well as gain access to various helper tools that will make it easier to setup and run self-contained experiments. It should be noted that the default branch, ``develop_canesm`` is continually updated. If you would like to be sure you are using a stable version of the codes, it is recommended that you use one of the available tags, which by convention are created across all sub-repos in addition to the super repo. To see the available tags, see `here `_ or run ``git tag`` within the repo. .. note:: Only tags including and after ``v5.0.10`` support the functionality discussed in this document. .. note:: To easily checkout a consistent version across all submodules and the super-repo, you can use ``CCCma_tools/tools/git-scheckout BRANCH_OR_TAG_COMMIT``:: CCCma_tools/tools/git-scheckout v5.0.10 Or if already on your ``$PATH``, you can simply run :: git scheckout v5.0.10 Add Tools to your $PATH *********************** To help with the setup/development and running of ``CanESM``, it is advised to add a few things to your environment. 1. CCCma "*s-scripts*" As discussed :ref:`here `, these scripts help manage all the sub-repos across the super-repo. To get these, add the following to your ``.bashrc`` (or execute every time you log in):: export PATH=${PATH}:/path/to/cloned/canesm/CCCma_tools/tools 2. ``setup-containerized-canesm`` This tool was specifically designed to setup a *self-contained* run, with independent source code and a configured "run directory". This tool exists at ``/path/to/cloned/canesm/CCCma_tools/container/tools``, which *could* be added to your ``$PATH``. However, that will add *all* executable scripts in said directory, which could have unintented consequences. As such, the recommended solution is to copy or link the file to your ``~/bin`` directory (making sure that that directory is on your ``PATH``). For example:: ln -s /path/to/cloned/canesm/CCCma_tools/container/tools/setup-containerized-canesm ~/bin/ See :ref:`here <1. Setup your run directory>` for more information on this tool. Create a python environment to run infrastructure scripts ********************************************************* For some of the supporting scripts around ``CanESM``, specific ``python`` packages are required. To easily install the necessary packages, it is recommended that users utilize a ``python`` virtual environment. There are multiple ways to create/use virtual environments, but when working on ``cedar``, users should follow `the Compute Canada recommended method `_ to build a python 3.8 environment. Once the environment is created and activated, simply install the packages via ``pip`` using ``path/to/canesm/CCCma_tools/container/tools/python-requirements.txt`` .. code-block:: pip install -r path/to/canesm/CCCma_tools/container/python-requirements.txt Build the Singularity image (once) ********************************** The model is compiled and run inside a Docker or Singularity container, which provides all the required dependencies. Using the container alleviates the need to port the model dependencies to each individual HPC system. It is in principle possible to compile the model on the native HPC. However, then all dependencies must be met, including the ESMF library. We do not provide instructions for this, and assume use of the container. Singularity is available on Compute Canada machines, is designed for HPC, and does not significantly degrade performance relative to bare metal, and thus this guide recommends the use of Singularity. To run ``CanESM`` in a Singularity container, you first need to build the image from the ``CanESM`` docker container. To do this, request an interactive session on a compute-node and then utilize the ``singularity build`` command to convert the docker singularity to a singularity image - i.e. .. code-block:: # start an interactive session and go to a tmpdir salloc --time=1:0:0 --constraint=cascade --nodes=1 --ntasks-per-node=1 --mem-per-cpu=5000 cd $SLURM_TMPDIR # Build the singualrity image from the public Docker hub image singularity build canesm-docker_latest.sif docker://cccma/canesm-docker # Create a director to store the image for future use mkdir -p ~/project/$USER/singularity_images mv canesm-docker_latest.sif ~/project/$USER/singularity_images exit **This only needs to be done once** - after you (or another existing user) build and store the image, you can simply point to it when running ``CanESM``. Download the Basic required input data ************************************** The basic input data needed to run CanESM is available on FTP for some selected experiments. You need to download the data for the experiment that you want to run. We typically use standard, CMIP define experiment names. The example below is for the ``piControl`` experiment. This data also contains a default restart file. .. code-block:: cd ~/project/$USER mkdir -p canesm_input_data/piControl cd canesm_input_data/piControl wget ftp://ftp.cccma.ec.gc.ca/pub/CCCMA/nswart/canesm5_piControl_config_04-06-2021/* The ``pdSST-pdSIC`` experiment from PAMIP is also available at ``ftp://ftp.cccma.ec.gc.ca/pub/CCCMA/nswart/canesm5_pamip_config/``. Forcing for other CMIP6 experiments can be made available. Executing a Run ############### Once you a stored version of ``canesm``, the input data, and the required ``singularity`` image, you are ready to setup your first run. Which can be done by: 1. Setup your run directory *************************** Provided you've added ``setup-containerized-canesm`` to a suitable location, it can be used to easily setup a suitable, *self-contained* run directory for a desired ``RUNID``. For details on using this script, see the interface via ``-h``: .. code-block:: $ setup-containerized-canesm -h Setup containerized canesm run Creates a run directory for the given runid, using the defined repository/version, in the present working directory Usage: setup-containerized-canesm [-h] runid=RUNID repo=SOURCE_REPOSITORY version=VERSION RUNID : the alphanumeric string identifier for the given run. SOURCE_REPOSITORY : the repository address/path that be cloned and used for the run. VERSION : the commit hash or branch-name to checkout for this run. For example, to setup a run from ``v5.0.10`` from the main gitlab repository: .. code-block:: text setup-containerized-canesm runid=canesm-test-run01 repo=git@gitlab.com:cccma/canesm.git version=v5.0.10 or if you've setup your own forks of ``CanESM`` (*and the submodules*), and want to use your development version .. code-block:: text setup-containerized-canesm runid=canesm-dev-run01 repo=git@gitlab.com:user123/canesm.git version=my-dev-version which would create a directory name ``canesm-test-run01`` in the ``PWD``, with some useful directories and high level configuration files, such as ``canesm.cfg``. For both of these examples, ``setup-containerized-canesm`` will create "run directory" in ``PWD`` and clone down a *new* version of ``CanESM`` from the given repo (and checking out the desired version) and will setup some useful directories and bring in some useful files, like ``canesm.cfg``, which contains high level configuration settings for the model. 2. Configure the run ******************** Once you run directory is setup, navigate into it and modify ``canesm.cfg`` as necessary for your run. As of writing this only two experiments are available, ``piControl`` or ``pdSST-pdSIC``, where the former is an ESM simulation using the ocean, atmosphere, and the coupler, while the latter is an AMIP simulation and only uses the atmosphere and coupler. To set which run you want, simply set the ``EXPERIMENT`` variable within ``canesm.cfg``. Including ``EXPERIMENT``, the most imporant variables to set are: - ``CONTAINER_IMAGE`` - ``CC_ACCOUNT`` - ``EXPERIMENT`` - ``INPUT_DIR`` - ``START_YEAR`` - ``STOP_YEAR`` .. note:: The currently acceptable values for ``EXPERIMENT`` are ``piControl`` and ``pdSST-pdSIC``, and it should be noted that ``piControl`` is tested more frequently than ``pdSST-pdSIC``. Once you have decided on these settings, assuming you've built and activated the ``python`` environment (discussed :ref:`above `), pull in the necessary configuration files by running: .. code-block:: bin/config-containerized-canesm .. note:: If you don't have the correct ``python`` environment activated, you will see .. code-block:: ModuleNotFoundError: No module named 'f90nml' This will populate ``${WRK_DIR}/config`` with many different configuration files that are used to compile/run the model. Note that informed/curious users can go into the resulting files in ``config/`` and modify them further for their needs - some of the more important settings will be discussed below. With that said, to simply get one of the two given experiments running, users can leave them as they are. 3. Compile the source code ************************** To compile ``CanESM`` and supporting diagnostic/utility programs, the recommended method is to utilize ``/path/to/canesm/CONFIG/COMMON/compile-canesm.sh``. However, as discussed :ref:`above `, on ``cedar``, the compilation must happen within the aforementioned ``singularity`` image. Concise, ``cedar`` specific, compilation information is discussed below, but for detailed information on the compilation system, see the documentation :ref:`here `. .. note:: Users may attempt to compile the model on "bare metal" without the container, but this requires compiling all the dependencies including ESMF Compiling in batch mode ^^^^^^^^^^^^^^^^^^^^^^^ The easiest way to compile the code is to use the batch compilation script, ``batch_compile_cedar``. Which will use ``compile-canesm.sh`` to automatically pull a copy of the source code into ``$SLURM_TMPDIR``, build all the necessary executables and send them back to ``EXEC_STORAGE_DIR`` (see ``canesm.cfg``). This can be easily achieved via:: sbatch --account=XXX canesm/CCCma_tools/container/tools/batch_compile_cedar where ``XXX`` should be replaced by your allocation account. You can monitor the progress of this job using ``squeue -u $(whoami)``. Upon completion you should see the desired executables within ``EXEC_STORAGE_DIR``, and can inspect the hidden ``.compile-canesm*.log`` files in ``PWD``. .. note:: ``$SLURM_TMPDIR`` is used to take advantage of faster I/O on the local scratch space. Interactively compile the source code manually ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ It is also possible for users to compile the model interactively. Since we need to singularity container to meet the dependencies, it is generally not possible to compile Interactively on the cedar headnodes. As such, to compile interatively, you will want to: 1. Launch an interactive session, move to a tmp directory, and launch an interactive ``singularity`` container from the image built earlier:: salloc --time=1:0:0 --constraint=cascade --nodes=1 --ntasks-per-node=8 --mem-per-cpu=5000 --account=XXX cd $SLURM_TMPDIR singularity shell --cleanenv -B /home -B /project -B /scratch -B /localscratch --env APPEND_PATH=/path/to/canesm/CCCma_tools/scripts/comm/ ~/project/$USER/singularity_images/canesm-docker_latest.sif 2. Copy in the source code and compile the model using the provided utility script:: # copy in the high level configuration file cp /path/to/run/dir/canesm.cfg . # source the config file to get some needed settings source canesm.cfg # call the compilation script (from within SLURM_TMPDIR) ${CANESM_SRC_ROOT}/CONFIG/COMMON/compile-canesm.sh -l where the ``-l`` flag forces ``compile-canesm.sh`` to copy the source code localy, and will automatically copy the final executables to ``EXEC_STORAGE_DIR``. 3. Confirm the necessary executables were created within ``EXEC_STORAGE_DIR``, and then exit the container **and** the interactive session. See ``.compile-canesm*`` for compilation errors. 4. Run the Model **************** Once the excutables are compiled and stored in ``EXEC_STORAGE_DIR``, you are then ready to run the model, which must be done in batch mode within the singularity container. It should be noted that the run time configuration of the model must match the compilation options, input data, and so on. Some basic runtime configuration is set in the ``canesm.cfg`` file (including relevant paths for binaries, input data, outputs, etc). This section assumes you have edited the ``canesm.cfg`` file appropriately, and ran ``bin/config-canesm``. Using the job script ******************** The easiest way to launch to job is to simply submit the provided ``batch_run_cedar`` from within your run-directory (must be in ``~/scratch``):: cd ~/scratch/canesm-test-run # Assumes that canesm.cfg is here; the paths are configured appropriately; and the executables must # be pre-compiled and available in EXEC_STORAGE_DIR # If needed, copy this file and modify it for your case. Submit to the queue with `sbatch`. sbatch --account=XXX canesm/CCCma_tools/container/tools/batch_run_cedar where ``XXX`` is your account or resource allocation. Outputs will appear in the location defined in ``canesm.cfg`` by the variable ``OUTPUT_DIR``. These are unpacked raw model history files on tiles. Launching an interactive run **************************** When running with the ``batch_run_cedar`` script, getting the inputs and environment setup are taken care of. However, it is still possible by, starting an interactive batch session (with the appropriate resources), copy in the inputs/executables, start the container, and then run the model. .. warning:: Running the model interactively is tested less frequently than batch mode. Guidance is provided for interested readers, but problems may occur due to on-going changes with ``CanESM``, as such the recommended method is to run the model in batch mode. Prior to anything, it is worth noting that in order for ``CanESM`` to run, the simulation length **must be set consistently for all components**. When ran as part of the job script, this is automatically handled for the user according to the ``CURRENT_YEAR`` setting in ``canesm.cfg``. However, when running interactively, this must be handled by the user. Details on the parameters that control this are discussed :ref:`below `, but its is also possible for users to mimic the behaviour in the job script by executing the following .. code-block:: bash source canesm.cfg # get configuration settings source $CANESM_SRC_ROOT/CCCma_tools/tools/CanESM_shell_functions.sh # get helper functions update_agcm_counters start_date=${CURRENT_YEAR}-01-01 stop_date=${CURRENT_YEAR}-12-31 agcm_timestep=900 \ namelist_file=config/namelists/modl.dat || : update_nemo_counters start_date=${CURRENT_YEAR}-01-01 stop_date=${CURRENT_YEAR}-12-31 nemo_timestep=3600 \ namelist_file=config/namelists/namelist || : update_coupler_counters start_date=${CURRENT_YEAR}-01-01 stop_date=${CURRENT_YEAR}-12-31 runid=$RUNID \ namelist_file=config/namelists/nl_coupler_par || : where ``|| :`` behind the ``update_*`` calls will stop your shell session from exiting if an error occurs. Once these commands are executed, the namelists will have the correct settings to run the current year. After this, an interactive run can be achieved via .. code-block:: bash # Launch interactive session salloc --time=1:0:0 --constraint=cascade --nodes=1 --ntasks-per-node=48 --mem-per-cpu=1000 --account=XXX # Move to SLURM_TMPDIR to run cd $SLURM_TMPDIR # get high level config file and source the settings cp /path/to/run/dir/canesm.cfg . source canesm.cfg # Copy in the input files and executables cp $INPUT_DIR/* . cp $WRK_DIR/config/* . # note that this will avoid capturing sub-directories cp $WRK_DIR/config/namelists/* . cp $EXEC_STORAGE_DIR/*.exe . chmod +x *.exe # Load runtime environment source runtime_environment # Load the correct modules module --force purge module unload intel openmpi module load StdEnv/2016.4 nixpkgs/16.09 gcc/7.3.0 mpich/3.2.1 netcdf-fortran/4.4.4 module load singularity # Create a conf. file that allows us to run the model in multiple program multiple data mode. # SLURM does not support this very well. Note the specification of CPUs for each exe. The total # number cannot exceed the resource request. echo -e "0-15 singularity exec -B /localscratch ${CONTAINER_IMAGE} ./canam.exe \n 16 singularity exec -B /localscratch ${CONTAINER_IMAGE} ./cancpl.exe \n 17-41 singularity exec -B /localscratch ${CONTAINER_IMAGE} ./nemo.exe" > run.conf # Launch the run export OMP_NUM_THREADS=1 time srun -n 42 -l --multi-prog run.conf Notable Configurable Settings ***************************** ``CanESM`` is a complex system with a multitude of configurable options, ranging from physical/numerical parameters and input data fields, to MPI topology and infrastructure/sequencing settings. Due to this complexity, this *beta* guide can not begin to attempt to cover all the options comprehensively; nevertheless, some of the more settings are discussed here. Compile Time Setting: MPI size of CanAM ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The number of MPI tasks to be used for CanAM **must currently be specified at compile time.** This value is set in ``config/cppdef_sizes.h`` using the variable ``_PAR_NNODE_A``. As discussed :ref:`here ` ``config-canesm`` generates ``config/cppdef_sizes.h`` (which is called as part of ``config-containerized-canesm``) from the settings contained in ``config/namelists/modl.dat``. If a user wishes to changes this, after running ``config-containerized-canesm``, they can simply modify ``config/cppdef_sizes.h`` **before executing the compilation**. .. note:: this specifies the number of MPI tasks - but CanAM also uses openmp, so this is not the total number of cores. The total number of cores used is ``_PAR_NNODE_A x OMP_NUM_THREADS``, where ``OMP_NUM_THREADS`` is specified in the runtime script (see below). Also note, the number of MPI tasks specified must match with the resources supplied at runtime. Compile Time Setting: Model/Experiment specific cpp macros for CanAM/CanCPL ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ For Atmosphere and Coupler component of ``CanESM`` (and the coupler interface for the Ocean), many source code files rely of a ``cppdef_config.h`` file to set many model/experiment specific ``cpp`` macros. As discussed :ref:`here `, this file is generated by ``config-canesm`` according to the values of ``cppdef_file`` and ``cppdef_diag``. As of now, these get set within the ``EXPERIMENT`` specific ``*.cfg`` file at ``path/to/canesm/CCCma_tools/container/tools/config``. If you would like to use another set of ``cpp`` definitions, the available files can be found in ``/path/to/canesm/CONFIG/[AMIP|ESM]/cppdefs``. Alternatively, you can point to your *own* ``cpp`` files, *or* simply alter the generated ``config/cppdef_config.h`` and submit another compilation job. .. note:: If a user wishes to modify any of the values in these ``cppdef_*`` files, the easiest way is to modify the the values in the generated ``config/cppdef_*`` files **before executing the compilation** .. note:: within ECCC machines, cppdefs are typically specified by the ``runmode`` construct, and the sizes are computed and set by the compilation script. It might be helpful looking in ``canesm/CONFIG/ESM/canesm.cfg`` to determine the correct ``cppdef`` file to use for an experiment Compile Time Setting: Compilation Flags ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ For information on altering compilation flags, see the detailed information :ref:`here ` noting that the compilation environment file, and ``make``/``mkmf`` templates get extracted into the local ``config`` directory. Run Time Setting: Simulation length ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Note, if you use the ``config-containerized-canesm`` and ``batch_run_cedar`` scripts, the timers will be configured automatically from ``START_YEAR`` and ``STOP_YEAR`` in ``canesm.cfg``. The information below is provded to help users customize the simulation step-size, if they so desire. **The specified length of the simulation must match between CanNEMO and CanAM.** - The number of CanNEMO steps is specified in the NEMO ``namelist`` file (typically extracted via ``config-containerized-canesm`` into ``config/namelists``). For CanESM5, the nemo timestep is 1 hour, and the number of steps is specified in hours (e.g. 8760 steps for 1 year). - In the CanAM, the timers are configured by ``kstart``, ``kfinal``, and ``delt`` in the ``modl.dat`` file that also gets extracted into ``config/namelists`` via the configuration tool. ``kstart`` and ``kfinal`` represent the start and final step counters *since year 1*, and ``delt`` represents the time-step in seconds. For example, with a time-step of ``900`` seconds, if you want the model to start from January 1st, 5550 and go to the end of the year, then .. code-block:: fortran kstart = (3600/900) * 24 * 365 * 5549 ! where 5549 is used because we are start from the END of 5549 and .. code-block:: fortran kfinal = (3600/900) * 24 * 365 * 5550 Note that shorter periods can also be run, but again, the number of steps must be consistent between NEMO and CanAM. .. note:: In CanESM5, CanAM takes 4 steps (15 m each) for each single NEMO timestep (1hr each). Coupling occurs every three hours, which is 3 NEMO timesteps and 12 CanAM steps. Below we rely on the default values having been set correctly. If the timers in model.dat and namelist are not set correctly, the run with not work. Post-processing model data ########################## Creating usable output from the raw model history files requires several steps (repacking, joining tiles, computing diagnostics, converting to timeseries, and finally conversion to CMOR compliant NetCDF). Within ECCC, this is achieved by a diagnostic string, which comprises many jobs, and relies on several pieces of software, including the CanAM diagnostics package (CanDIAG). At this time, this processing pipeline has not been ported to Cedar, but work is ongoing. However, an interim series of diagnostics has been put it place to provide a basic level of usable netcdf output from the model. In summary: - ``canesm/CCCma_tools/container/tools/batch_diag_cedar`` is a batch script that will run these diagnostics. This is automatically launched by ``batch_run_cedar`` at the end of each year of simulation, but it can also be submitted by itself, if the output is available. - ``canesm/CCCma_tools/container/tools/basic_diag`` is the underlying diagnostics script that will: - rebuild the NEMO NetCDF files from individual tiles to global files (via ``rebuild_nemo.exe``). - rebuild the CanAM CCCma binary formatted output from individual tiles to global files (via ``candiag_mwe.exe``). - Compute monthly mean 2D variables output in NetCDF (via ``candiag_mwe.exe``). - Use modified CCCma diagnostic recipes to compute monthly mean 3D temperature and wind fields (via ``temp_recipe`` and ``winds_recipe``). - Convert the 3D CCCma binary files to NetCDF format, using the ``ccc2nc_linux`` utility (note, this is currently not compiled, and a container-compatiable compiled version can temporarily be obtained from: ``ftp://ftp.cccma.ec.gc.ca/pub/CCCMA/nswart/ccc2nc_linux``, and placed in the run bin directory)