Skip to content

modelblocks-org/data-module-template

Repository files navigation

Modelblocks data module template

A template for modular data workflows built with Snakemake. This template is part of the Modelblocks toolset.

Note

Looking for general general information on Modelblocks and modular workflows?

Features

  • Stable Snakemake development using pixi's lockfile and conda-pinning functionality, with the following environments:
    • default: the development environment, including Snakemake and conda as dependencies. This is never delivered to module users!
    • module: the environment used by rules in the Snakemake workflow. It should only contain minimal dependencies needed by your module's processing steps.

Important

All software dependencies should be defined in pixi.toml. Before running your module for the first time, use the export-snakemake-env pixi command to export the required Snakemake environments to conda-compatible dependency files. This is necessary as long as Snakemake does not directly support the use of pixi. This must include at least the module environment, as well as any additional environments created for this purpose. See the commands section for more information.

  • Standardised input-output structure across modules:
    • resources/: files needed for the module's processes.
      • user/: files that should be provided by users. Document them well!
      • automatic/: files that the module downloads or prepares in intermediate steps.
    • results/: files generated by the module's algorithms that are relevant to the user.
  • Preconfigured integration setup for your module.
    • Continuous Integration (CI) settings, ready for pre-commit.ci.
    • Contributor recognition via All Contributors.
    • GitHub Actions to automate chores during pull requests and releases.
  • Fully compliant with the Snakemake workflow catalogue listing requirements, so modules can be included automatically once published. Read more about those requirements here.

Important

Keep these points in mind.

  • Modules do not work like regular Snakemake workflows
    • They must be tested externally using the module: command in Snakemake, passing user resources, and requesting specific results. Check the pre-made example in tests/integration for details.
    • Internal access (e.g., calling the all: rule) is discouraged, as the module may not have the necessary resources/ to execute properly.
  • Please be sure to maintain the following files to ensure Modelblocks compatibility
    • INTERFACE.yaml: a simple description of the module's input/output structure.
    • config/config.yaml: a basic functioning example of how to configure this module.
    • workflow/internal/config.schema.yaml: the module's configuration schema, used by Snakemake for validation.
    • AUTHORS / CITATION.cff / LICENSE: licensing and attribution of this module's code and methods.

How to use this template

This template uses pixi as its package manager. Once installed, do the following:

  1. Install the templater tool copier.

    pixi global install copier
  2. Use copier to build a project with this template. A new module will be created in the directory you chose. We recommend you use the module name as the directory name.

    copier copy https://github.com/modelblocks-org/data-module-template.git ./<module_name>

Tip

If copier is not available in your terminal, you may need to update your PATH variable to include ~/.pixi/bin.

  1. Answer a few questions so we can pre-fill licensing, citation files, etc.

  2. Initialise the pixi project environment of your new module.

    cd ./<module_name> # navigate to the new project
    pixi install --all  # install the project's environments
    pixi run export-snakemake-env module  # initialise the Snakemake environment
  3. Register your project in pre-commit.ci and allcontributors.org to benefit from CI and contributor task automation.

  4. Extra: run the auto-generated example module!

    cd tests/integration  # go to the integration test...
    pixi run snakemake --use-conda  # run it!

pixi task commands

pixi run export-snakemake-env <ENVIRONMENT>

Export <ENVIRONMENT> to conda-compatible dependency files, saved in workflow/envs, allowing Snakemake to use them during rule execution. This will generate both an <ENVIRONMENT>.yaml file and platform-specific pin files for Windows, Linux and macOS (e.g., <ENVIRONMENT>.win-64.pin.txt).

pixi run test-integration

Run a minimal set of standardised tests to ensure your module complies with Modelblock requirements. These are executed by Github's CI during pull requests.

Contributors ✨

Thanks goes to these wonderful people, sorted alphabetically (emoji key):

Bryn Pickering
Bryn Pickering

💻 🤔 👀
Ivan Ruiz Manuel
Ivan Ruiz Manuel

💻 🤔 📖
Jann Launer
Jann Launer

🤔 📓
Stefan Pfenninger-Lee
Stefan Pfenninger-Lee

📖 💻 🤔

This project follows the all-contributors specification. Contributions of any kind welcome!

About

A template for modular data workflows, making energy systems analysis more understandable and transparent!

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors