Configuration Files
===================

.. contents::
   :local:
   :depth: 2

imsi
----

``imsi`` reads several machine-specific settings from the YAML files under ``canesm/CONFIG/imsi-config``. These include software modules, directory paths for inputs and outputs (e.g. forcing files, RTD files), compilers, and sequencing flows. The following files need to be added or modified in the source repository in order to set up, build, and run a simulation.

- ``machines/imsi-sites-config.yaml``: Add the name of the new machine under an existing site or create a new site for the new machine.
- ``machines/imsi-machine-config-<MACHINE>.yaml``: General machine-specific info not covered in another file.
    - Directory paths
    - Software modules (on DRAC systems)
    - Machine-specific resource info (e.g. CPUs per node, node names, etc.)
    - Supported compiler keys from ``imsi-compiler-config-XXX.yaml`` files
    - Environment variables
    - Supported sequencers
    - Scheduler name (e.g. slurm)
- ``compilers/imsi-compiler-config-gnu.yaml`` and/or ``imsi-compiler-config-intel.yml``: Add a new key with machine-specific compiler settings:
    - Compiler names (e.g. ``mpif90``) 
    - FFLAGS and LDFLAGS passed to the compiler (e.g. ``-qmkl``).
- ``sequencing/imsi-sequencing-flow-<MACHINE>.yaml``: Machine specific flow definitions for desired model configurations (e.g. coupled model, AGCM).
    - Model run commands (e.g. ``mpirun`` with specific flags like ``bind-to``, ``map-by`` and ``hostfile``)
    - Scheduler directive (e.g. ``ntasks-per-node``, etc.)
- ``sequencing/imsi_sequencer_config_iss.yaml`` and/or ``sequencing/imsi_sequencer_config_maestro.yaml``: Add the new machine name under ``supported_machines``.
    - ``maestro`` is only used internally within ECCC. DRAC machines use the ``iss`` sequencer. 
    - There are plans to converge to a single open-source sequencer for all platforms.

Changing the PE layout
~~~~~~~~~~~~~~~~~~~~~~

.. note::
   CanAM (with the spectral_canesm dynamical core) is highly limited in the number of MPI tasks it can use. It is not reccomended to alter from the default of 32 MPI tasks. Additionally, there is not much (if anything) to be gained by using more than 2 OMP threads for CanAM. 

Based on the contraints of the machine you're porting to, you may need to alter the number of parallel processes involved in running the model. For CanAM, this means altering the number of OMP threads, and for CanNEMO, this means altering the number of MPI tasks. These settings are specified in the component-specific files in ``imsi-config/models``. By default, the relevant settings are as follows:

``imsi-model-config-canam51_p1.yaml``

.. code-block::

    models:
      components:
        CanAM:
          resources:
            mpiprocs: 32
            ompthreads: 2

``imsi-model-config-cannemo51_p1-esm.yaml``

.. code-block:: yaml
 
    models:
      cannemo51_p1-esm:
        components:
          CanNEMO:
            resources:
              mpiprocs: 160
              ompthreads: 1
            namelists:
              nammpp:
                jpni: '16'
                jpnj: '10'
                jpnij: '160'

CanCPL is fully serial and runs on one core. The total number of cores required by the default PE layout is therefore 225 (32x2 + 160 + 1).

This PE layout has been found to work well on internall ECCC machines and on DRAC Niagara (no longer online), but it may not be optimal for all machines. In particular, machines that require jobs to use whole nodes with a large number of cores per node. One such machine is DRAC Trillium, which has 192 cores per node. Its resources are used most efficiently by running CanESM5 on 1 node, with fewer tasks (32 + 156 + 1 = 189). Retaining the default PE layout would leave many cores idle during each job. The following changes were applied to the model config files.

``imsi-model-config-canam51_p1.yaml``

.. code-block:: yaml

    models:
      components:
        CanAM:
          resources:
            mpiprocs: 32
            ompthreads: 1

``imsi-model-config-cannemo51_p1-esm.yaml``

.. code-block::

    models:
      cannemo51_p1-esm:
        components:
          CanNEMO:
            resources:
              mpiprocs: 156
              ompthreads: 1
            namelists:
              nammpp:
                jpni: '13'
                jpnj: '12'
                jpnij: '156'

On other machines, there may be a different optimal PE layout. Determining this requires some experimentation.

Changing the run command
~~~~~~~~~~~~~~~~~~~~~~~~

In most cases, CanESM is run on an HPC cluster via jobs managed by a scheduler like SLURM or PBS. The model code is executed using a call of ``mpirun``, which works together with the scheduler to spread the tasks and threads across the allocated resources. However, when running on multiple nodes, the scheduler does not always determine the optimal distrubution of tasks. The arguments ``--host`` or ``--hostfile`` accepted by ``mpirun`` help it determine how to spread the tasks across nodes, for a multi-node job. See the `mpirun FAQ <https://www.open-mpi.org/faq/?category=running#mpirun-hostfile>`_ for more information.

The run command is specified in ``imsi-sequencing-flow-<MACHINE>.yml``. Different run commands can be specified for the coupled model (i.e. ``canesm_two_job_flow-<MACHINE>``) and for AGCM simulations (``canam_two_job_flow-<MACHINE>``), since they may use a different number of nodes. For example, this is the run command for the coupled CanESM5 on DRAC Niagara. It creates a temporary ``hostfile`` that tells ``mpirun`` which nodes (hosts) to use for CanAM and CanCPL, and assigns the remaining nodes to CanNEMO.

``imsi-sequencing-flow-niagara.yaml``:

.. code-block:: yaml

    sequencing:
      sequencing_flow:
        canesm_two_job_flow-niagara:
          jobs:
            model:
              resources:
                model_run_commands: |
                  export OMP_NUM_THREADS=2
                  hostfile_canam=$(mktemp)
                  host_arr=( $(scontrol show hostnames $SLURM_NODELIST) )
                  CPU_PER_NODE=40
                  N=\${#host_arr[@]}
                  N1=$(($N - (\${CanNEMO_MPIPROCS} - 1) / \${CPU_PER_NODE} - 1))
                  slots=$(((\${CanAM_MPIPROCS} + \${CanCPL_MPIPROCS} - 1) / \${N1} + 1))
                  host1=(\${host_arr[@]:0:\${N1}})
                  host2_str=\${host_arr[@]:\${N1}:\${N}}
                  for i in \${host1[@]}
                    do
                      echo "\${i} slots=\${slots} max_slots=\${CPU_PER_NODE}" >> \${hostfile_canam}
                    done
                    cat \${hostfile_canam}
                  mpirun --hostfile \${hostfile_canam} --map-by slot -x OMP_NUM_THREADS=\${CanAM_OMPTHREADS} -n \${CanAM_MPIPROCS} ./\${CanAM_EXEC} : \
                         --host \${host1[-1]} -x OMP_NUM_THREADS=\${CanCPL_OMPTHREADS} -n \${CanCPL_MPIPROCS} ./\${CanCPL_EXEC} : \
                         --host \${host2_str// /,} -x OMP_NUM_THREADS=\${CanNEMO_OMPTHREADS} -n \${CanNEMO_MPIPROCS} ./\${CanNEMO_EXEC}
                  rm \${hostfile_canam}

Other machines will likely require different run commands for best results. This is another area that requires some informed experimentation. Ideally, the run command should be designed to work independent of the number of nodes requested for the job.

Postprocessing Sequencing Flow
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`

On DRAC Niagara and the current SciNet machine DRAC Trillium, users may only request full nodes, so the postprocessing job wastes CPU resources. On the other DRAC machines (Fir, Nibi, and Rorqual), users may request partial nodes. Changing the resource request for postprocessing is easy - simply alter the SLURM directives under ``imsi-sequencing-flow-<MACHINE>.yaml`` for the postprocessing job to request ``--ntasks-per-node=1`` (because there is no MPI for the postprocessing) and ``--cpus-per-task=12`` (because the postprocssing scripts work on data for each month in parallel). 

Node-local scratch
~~~~~~~~~~~~~~~~~~~~~~

The DRAC machines Rorqual, Fir, and Nibi have SSDs mounted directly on the compute nodes. Doing I/O from this ``localscratch`` space is allegedly faster than from the main "network" scratch file system. To do I/O from this node-mounted temporary storage, set the following key/value pair in ``imsi-machine-config-<MACHINE>.yaml``:

.. code-block:: yaml

   machines:
     <MACHINE NAME>:
       scratch_dir: \${SLURM_TMPDIR}

To monitor the job's file I/O, you can use ``squeue --me`` to find the name of the node running your job, and ``ssh`` to that node. The working directory will be under the directory ``/localscratch/``.

config-canesm system
--------------------

the ``config-canesm`` system will need:

- a new directory under ``CONFIG/PLATFORM`` (see the ``eccc-u2`` directory for what files are needed)
- a new "site profile" under ``CCCma_tools/generic``
- consideration in the ``raw_env_setup_file``, under ``CCCma_tools/generic``, to detect/support the new hostname
- a new host specific env file under ``CCCma_tools/generic``, which gets sourced by the processed ``env_setup_file``
- a new ``fcm`` compiler file for ``CanNEMO`` (I believe under the ``ARCH`` directory, but Nicolas/Neil can confirm)