How to organize code and data for simulations at NERSC

I recently improved my strategy for organizing code and data for simulations run at NERSC, I'll write it here for reference.

Libraries

I mostly use Python (often with C/C++ extensions), so I first rely on the Anaconda module maintained by NERSC, currently python/3.6-anaconda-4.4.

If I need to add many more packages I can create a conda environment, but for just installing 1 or 2 packages I prefer to just add them to my PYTHONPATH.

I have core libraries that I rely on and often modify to run my simulations, those should be installed on Global Common Software: /global/common/software/projectname which is specifically designed to access small files like Python packages. I generally create a subfolder and reference it with an environment variable:

 export PREFIX=/global/common/software/projectname/zonca/python_prefix

Then I create a env.sh script in the source folder of the package (in Global Home) that loads the environment:

module load python/3.6-anaconda-4.4
export PREFIX=/global/common/software/projectname/zonca/python_prefix
export PATH=$PREFIX/bin:$PATH
export LD_LIBRARY_PATH=$PREFIX/lib:$LD_LIBRARY_PATH
export PYTHONPATH=$PREFIX/lib/python3.6/site-packages:$PYTHONPATH

This environment is automatically propagated to the computing nodes when I submit a SLURM script, therefore I do not add any of these environment details to my SLURM scripts.

Then I can install a package there with:

python setup.py install --prefix=$PREFIX

or from pip:

pip install apackage --prefix=$PREFIX

It is also common to install a newer version of a package which is already provided by the base environment:

pip install apackage --ignore-installed --upgrade --no-deps --prefix=$PREFIX

Simulations SLURM scripts and configuration files

I first create a repository on Github for my simulations and clone it to my home folder at NERSC. I generally create a repository for each experiment, then I create a subfolder for each type of simulation I am working on.

Inside a folder I create parameters files to configure my run and slurm scripts to launch the simulations and put everything under version control immediately, I often create a Pull Request on Github and ask my collaborators to cross-check the configuration before a submit a run.

Smaller input data files, even binaries, can be added for convenience to the Github repository.

Once a run has been validated, inside the simulation type folder I createa a subfolder runs/201806_details_about_run and add a README.md, this will include all the details about the simulation. I also tag both the core library I depend on and the simulation repository with the same name e.g.:

git tag -a 201806_details_about_run -m "software version used for 201806_details_about_run"

I'll also add the path at NERSC of the input data and output results.

Then for future simulations I'll keep modifying the SLURM scripts and parameter files but always have a reference to each previous version.

Larger input data and output data

Larger input data and outputs are not suitable for version control and should live in a SCRATCH filesystem. I always use the Global Scratch $CSCRATCH which is available both on Edison on Cori and also from the Jupyter Notebook environment at: https://jupyter.nersc.gov.

I create a root folder for the project at:

$CSCRATCH/projectname

Then a subfolder for each simulation type:

$CSCRATCH/projectname/simulation_type_1
$CSCRATCH/projectname/simulation_type_2

Then I symlink those inside the simulation repository as the folder out/:

cd $HOME/projectname/simulation_type_1
ln -s $CSCRATCH/projectname/simulation_type_1 out

Therefore I can setup my simulation software to save all results inside out/201806_details_about_run and this is going to be written to CSCRATCH.

This setup makes it very convenient to regularly backup everything to tape using cput which just backs up files that are not already on tape, e.g.:

cd $CSCRATCH
hsi
cput -R projectname

This is going to synchronize the backup on tape with the latest results on CSCRATCH.

I do the same for input files:

mkdir $CSCRATCH/projectname/input_simulation_type_1
cd $HOME/projectname/simulation_type_1
ln -s $CSCRATCH/projectname/input_simulation_type_1 input