Instructions for Using mksurfdata_esmf to Create Surface Datasets
Table of contents
Purpose
This tool is intended to generate fsurdat files (surface datasets) for the CTSM. It can generate global, regional, and single-point fsurdat files, as long as a mesh file is available for the grid.
The subset_data tool allows users to make fsurdat files from existing fsurdat files when a mesh file is unavailable. Generally, users should consider the subset_data tool for generating regional and single-point fsurdat files.
Building
Build Requirements
mksurfdata_esmf is a distributed memory parallel program (using Message Passing Interface -- MPI) that utilizes both ESMF (Earth System Modelling Framework) for regridding as well as PIO (Parallel I/O) and NetCDF output. As such, libraries must be built for the following:
- MPI
- NetCDF
- PIO
- ESMF
In addition for the build: python, bash-shell, CMake and GNU-Make are required
These libraries need to be built such that they can all work together in the same executable. Hence, the above order may be required in building them.
CTSM externals that are required are: cime and ccs_config. See Building on getting those. A python environment that includes particular packages is also required we demonstrate how to use the ctsm_pylib environment that we support in CTSM.
Note, PNETCDF is an optional library that can be used, but is NOT required.
Use cime to manage the build requirements
See [IMPORTANT NOTE](important note-only-working-on-derecho-currently)
For users working on cime machines you can use the build script to build the tool. On other machines you'll need to do a port to cime and tell how to build for that machine. That's talked about in the cime documentation. And you'll have to make some modifications to the build script.
https://github.com/ESMCI/cime/wiki/Porting-Overview
Machines that already run CTSM or CESM have been ported to cime. So if you can run the model on your machine, you will be able to build the tool there.
To get a list of the machines that have been ported to cime:
# Assuming pwd is the tools/mksurfdata_esmf directory
cd ../../cime/scripts # or ../../../../cime/scripts for a CESM checkout
./query_config --machines
NOTE:
In addition to having a port to cime, the machine also needs to have PIO built and able to be referenced with the env variable PIO which will need to be in the porting instructions for the machine. An independent PIO library is available on supported CESM machines.
IMPORTANT NOTE: ONLY WORKING ON DERECHO CURRENTLY
Important
Currently we have run and tested mksurfdata_esmf on Derecho. Please see this github issue about mksurfdata_esmf on other CESM machines:
https://github.com/ESCOMP/CTSM/issues/2341
Building the executable
Before starting, be sure that you have run
# Assuming pwd is the tools/mksurfdata_esmf directory
./manage_externals/checkout_externals # Assuming at the top level of the CTSM/CESM checkout
This will bring in CIME and ccs_config which are required for building.
# Assuming pwd is the tools/mksurfdata_esmf directory
./gen_mksurfdata_build # For machines with a cime build
Note: The pio_iotype value gets set and written to a simple .txt file by this build script. The value depends on your machine. If not running on derecho, casper, or izumi, you may need to update this, though a default value does get set for other machines.
Running for a single submission
Setup ctsm_pylib
Work in the ctsm_pylib environment, which requires the following steps when running on Derecho. On other machines it will be similar but might be different in order to get conda in your path and activate the ctsm_pylib environment.
# Assuming pwd is the tools/mksurfdata_esmf directory
module load conda
cd ../.. # or ../../../.. for a CESM checkout)
./py_env_create # Assuming at the top level of the CTSM/CESM checkout
conda activate ctsm_pylib
to generate your target namelist:
# Assuming pwd is the tools/mksurfdata_esmf directory
./gen_mksurfdata_namelist --help
for example try --res 1.9x2.5 --start-year 1850 --end-year 1850:
# Assuming pwd is the tools/mksurfdata_esmf directory
./gen_mksurfdata_namelist --res <resolution> --start-year <year1> --end-year <year2>
Tip
IF FILES ARE MISSING FROM /inputdata, a target namelist will be generated but with a generic name and with warning to run
./download_input_data
next. IF A SMALLER SET OF FILES IS STILL MISSING AFTER RUNNING./download_input_data
and rerunning./gen_mksurfdata_namelist
, then rerun./gen_mksurfdata_namelist with your options needed. and rerun
./download_input_datauntil
./gen_mksurfdata_namelist` finds all files.
Example, to generate your target jobscript (again use --help for instructions):
# Assuming pwd is the tools/mksurfdata_esmf directory
./gen_mksurfdata_jobscript_single --number-of-nodes 2 --tasks-per-node 128 --namelist-file target.namelist
qsub mksurfdata_jobscript_single.sh
Read note about regional grids at the end.
Running for the generation of multiple datasets
Work in the ctsm_pylib environment, as explained in earlier section.
gen_mksurfdata_jobscript_multi runs ./gen_mksurfdata_namelist
for you
# Assuming pwd is the tools/mksurfdata_esmf directory
./gen_mksurfdata_jobscript_multi --number-of-nodes 2 --scenario global-present
qsub mksurfdata_jobscript_multi.sh
If you are looking to generate all (or a large number of) the datasets or the single-point (1x1) datasets, you are best off using the Makefile. For example
# Assuming pwd is the tools/mksurfdata_esmf directory
make all # ...or
make all-subset
NOTES
Guidelines for input datasets to mksurfdata_esmf
Tip
ALL raw datasets *.nc FILES MUST NOT BE NetCDF4.
Example to convert to CDF5
nccopy -k cdf5 oldfile newfile
Tip
The LAI raw dataset *.nc FILE MUST HAVE an "unlimited" time dimension
Example to change time to unlimted dimension using the NCO operator ncks.
ncks --mk_rec_dmn time file_with_time_equals_12.nc -o file_with_time_unlimited.nc
IMPORTANT THERE HAVE BEEN PROBLEMS with REGIONAL grids!!
Caution
See
In general we recommend using subset_data and/or fsurdat_modifier for regional grids.