A template for modular data workflows built with Snakemake. This template is part of the Modelblocks toolset.
Note
Looking for general general information on Modelblocks and modular workflows?
- Visit the Modelblocks official website and documentation.
- Read about
Snakemakemodularisation in theSnakemakedocumentation.
- Stable
Snakemakedevelopment usingpixi's lockfile and conda-pinning functionality, with the following environments:default: the development environment, includingSnakemakeandcondaas dependencies. This is never delivered to module users!module: the environment used by rules in theSnakemakeworkflow. It should only contain minimal dependencies needed by your module's processing steps.
Important
All software dependencies should be defined in pixi.toml.
Before running your module for the first time, use the export-snakemake-env pixi command to export the required Snakemake environments to conda-compatible dependency files. This is necessary as long as Snakemake does not directly support the use of pixi.
This must include at least the module environment, as well as any additional environments created for this purpose.
See the commands section for more information.
- Standardised input-output structure across modules:
resources/: files needed for the module's processes.user/: files that should be provided by users. Document them well!automatic/: files that the module downloads or prepares in intermediate steps.
results/: files generated by the module's algorithms that are relevant to the user.
- Preconfigured integration setup for your module.
- Continuous Integration (CI) settings, ready for pre-commit.ci.
- Contributor recognition via All Contributors.
- GitHub Actions to automate chores during pull requests and releases.
- Fully compliant with the
Snakemakeworkflow catalogue listing requirements, so modules can be included automatically once published. Read more about those requirements here.
Important
Keep these points in mind.
- Modules do not work like regular
Snakemakeworkflows- They must be tested externally using the
module:command inSnakemake, passing user resources, and requesting specific results. Check the pre-made example intests/integrationfor details. - Internal access (e.g., calling the
all:rule) is discouraged, as the module may not have the necessaryresources/to execute properly.
- They must be tested externally using the
- Please be sure to maintain the following files to ensure Modelblocks compatibility
INTERFACE.yaml: a simple description of the module's input/output structure.config/config.yaml: a basic functioning example of how to configure this module.workflow/internal/config.schema.yaml: the module's configuration schema, used bySnakemakefor validation.AUTHORS/CITATION.cff/LICENSE: licensing and attribution of this module's code and methods.
This template uses pixi as its package manager. Once installed, do the following:
-
Install the templater tool
copier.pixi global install copier
-
Use
copierto build a project with this template. A new module will be created in the directory you chose. We recommend you use the module name as the directory name.copier copy https://github.com/modelblocks-org/data-module-template.git ./<module_name>
Tip
If copier is not available in your terminal, you may need to update your PATH variable to include ~/.pixi/bin.
-
Answer a few questions so we can pre-fill licensing, citation files, etc.
-
Initialise the
pixiproject environment of your new module.cd ./<module_name> # navigate to the new project pixi install --all # install the project's environments pixi run export-snakemake-env module # initialise the Snakemake environment
-
Register your project in pre-commit.ci and allcontributors.org to benefit from CI and contributor task automation.
-
Extra: run the auto-generated example module!
cd tests/integration # go to the integration test... pixi run snakemake --use-conda # run it!
Export <ENVIRONMENT> to conda-compatible dependency files, saved in workflow/envs, allowing Snakemake to use them during rule execution.
This will generate both an <ENVIRONMENT>.yaml file and platform-specific pin files for Windows, Linux and macOS (e.g., <ENVIRONMENT>.win-64.pin.txt).
Run a minimal set of standardised tests to ensure your module complies with Modelblock requirements. These are executed by Github's CI during pull requests.
Thanks goes to these wonderful people, sorted alphabetically (emoji key):
Bryn Pickering 💻 🤔 👀 |
Ivan Ruiz Manuel 💻 🤔 📖 |
Jann Launer 🤔 📓 |
Stefan Pfenninger-Lee 📖 💻 🤔 |
This project follows the all-contributors specification. Contributions of any kind welcome!