Input Files
===========

With all the software dependencies met, the final step to having a running modelling system is
to port the input files and build the database that tracks them.

.. contents::
   :local:
   :depth: 2

Input File Directory Tree
-------------------------

Due to the wide range of experiments that are ran with ``CanESM``, along with its multiple
modelling realms (atmosphere, ocean, land, etc), there are many potential input files into
the model - ranging from reference restarts to forcing and geophysical input files. Fortunately,
porting these files are quite easy, although the exact method might depend on the specific platform
you are porting to.

.. note::

   During initial testing, its okay to put this directory tree in a user space (assuming
   you have the storage available). However, **once you plan on having this used by
   other users, you will need to put this in a group location that is readable
   by others**.

.. warning::

   Be sure to check the documentation for the platform you are using, or contact the sys-admins
   to determine the preferred way to transfer large(r) files. Some systems discourage the direct
   use of ``scp`` or ``rsync`` to/from headnodes, and many have specific data transfer guidance.

From Niagara
~~~~~~~~~~~~

If starting from ``niagara``, the location of the input file directory tree is at

.. code-block::

   /project/c/cp4c/cp4c/forcing_data

Simply bring this over to the new platform to get going.

From ECCC's U2 Platform
~~~~~~~~~~~~~~~~~~~~~~~

If starting from ECCC's ``u2`` platform, the location of the input file directory tree is at

.. code-block::

   /space/hall5/sitestore/eccc/crd/ccrn/forcing

or

.. code-block::

   /space/hall6/sitestore/eccc/crd/ccrn/forcing

Simply bring one of these over to the new platform to get going.

Input File Database
-------------------

Why a database?
~~~~~~~~~~~~~~~

A clever observer who has access to both ECCC's ``u2`` platform and DRAC's
``niagara`` might notice that the two platforms have a notably different
directory structure.  This is because the ``CanESM`` input file system
currently relies on a ``sqlite`` database to track the location of files. For
example, if on a platform where the forcing database is stored

.. code-block::

   /path/to/forcing/file/database.db

A user could retrieve a file to their local working directory via

.. code-block::

   access local_name <filename>

provided their environment has:

* ``DATAPATH_DB`` exported into their environment, pointint to ``/path/to/forcing/file/database.db``
* ``access`` (and supporting tools) on their ``PATH`` variable

(see below on setting up this environment)

Creating a new database
~~~~~~~~~~~~~~~~~~~~~~~

Instead of simply copying an existing database from another platform, **you will need to
BUILD a new database** after copying the input file directory tree onto the new platform. 
See below for how to do this:

1. **Transfer the input file tree to the platform** - lets call the location ``/path/to/new/input_file/tree``
2. **Get the** ``CanESM`` **source repo** - `link <https://gitlab.com/cccma/canesm>`_

    .. note::

        The location/version of this repo doesn't matter too much, but the version needs to have
        the following tools within the ``CanESM`` repo:
        
            * ``save``
            * ``fdb``

        as they will be used as helper tools to build the database. For ``CP4C``, a safe bet is
        the ``v5.1_cp4c`` branch.

3. **Find the required scripts in the** ``CanESM`` **repo and add their locations to** ``PATH``

    These tools are likely under ``/path/to/cloned/CanESM/CCCma_tools/scripts/subproc`` and
    ``/path/to/cloned/CanESM/CCCma_tools/scripts/comm``, but if you can't find them, you can
    make use of ``find /path/to/clone/CanESM -name "<TOOL_NAME>"`` to locate them. Once you
    have the `absolute` path, add them to ``PATH`` with:

    .. code-block:: bash

       export PATH=${PATH}:${loc1} # where loc1 is the absolute path to the directory containing save or access
       export PATH=${PATH}:${loc2} #    and likewise for loc2
    
4. **Tell the scripts where the input directory tree is**:

    This is done via the environment variable ``RUNPATH_ROOT``, so simply execute:

    .. code-block:: bash
       
       export RUNPATH_ROOT=/path/to/new/input_file/tree

5. **Define the path to the new desired database**

    This is done via the environment variable ``DATPATH_DB``, so simply execute:

    .. code-block:: bash

       export DATAPATH_DB=/path/to/desired/database.db

6. **Build the database!**

    This is done using the ``fdb`` command, which calls ``save`` underneath it and
    uses the above noted environment variables.
    Simply execute:

    .. code-block:: bash

       fdb update