Zstash documentation
What is zstash?
Zstash is an HPSS long-term archiving solution for E3SM.
Zstash is written entirely in Python using standard libraries. Its design is intentionally minimalistic to provide an effective long-term HPSS archiving solution without creating an overly complicated (and hard to maintain) tool.
Key features:
Files are archived into standard tar files with a user specified maximum size.
Tar files are first created locally, then transferred to HPSS.
Checksums (md5) of input files are computed on-the-fly during archiving. For large files, this saves a considerable amount of time compared to separate checksumming and archiving steps. Checksums are also computed on-the-fly for tars.
Checksums and additional metadata (size, modification time, tar file and offset) are stored in a sqlite3 index database.
Database enables faster retrieval of individual files by locating in which tar file a specific file is stored, as well as its location (offset) within the tar file.
File integrity is verified by computing checksums on-the-fly while extracting files.
Source code is available on Github: https://github.com/E3SM-Project/zstash.
To change the documentation version, use the version selector in the bottom left-hand corner.
For documentation not included in the version selector (<= v1.0.1):
The documentation is organized into two major sections:
User Guide for installation, day-to-day usage, Globus setup, and archive management
Developer Guide for contributing, testing, release work, and internal implementation details
User Guide pages
The user-facing documentation is organized under User Guide and includes:
Design considerations for the high-level architecture and implementation overview
Getting started for installation and first-time setup
Usage for command-line usage details
Globus for Globus account setup and transfer workflows and
.zstash.iniconfiguration detailsBest practices for E3SM for archive management recommendations
Database for the archive index database layout
Support for where to ask questions or report issues
Archived Documentation for older documentation that may still be useful as a reference.
Developer Guide pages
The contributor and maintainer documentation is organized under Developer Guide and includes:
Project Standards for coding standards and conventions
Understanding CI for continuous integration details
Tar Tracking Modes for tar tracking behavior in each storage mode
Testing for the test layout and execution guidance
How to Prepare a Release for the release process
Contributing to This Documentation for development environment setup and contribution workflow