Pack Format

All files in the repository except Key and Pack files just contain raw data, stored as IV || Ciphertext || MAC. Pack files may contain one or more Blobs of data.

A Pack’s structure is as follows:

EncryptedBlob1 || ... || EncryptedBlobN || EncryptedHeader || Header_Length

At the end of the Pack file is a header, which describes the content. The header is encrypted and authenticated. Header_Length is the length of the encrypted header encoded as a four byte integer in little-endian encoding. Placing the header at the end of a file allows writing the blobs in a continuous stream as soon as they are read during the backup phase. This reduces code complexity and avoids having to re-write a file once the pack is complete and the content and length of the header is known.

All the blobs (EncryptedBlob1, EncryptedBlobN etc.) are authenticated and encrypted independently. This enables repository reorganisation without having to touch the encrypted Blobs. In addition it also allows efficient indexing, for only the header needs to be read in order to find out which Blobs are contained in the Pack. Since the header is authenticated, authenticity of the header can be checked without having to read the complete Pack.

After decryption, a Pack’s header consists of the following elements:

Type_Blob1 || Data_Blob1 ||
[...]
Type_BlobN || Data_BlobN ||

The Blob type field is a single byte. What follows it depends on the type. The following Blob types are defined:

Type	Meaning	Data
0b00	data blob	Length(encrypted_blob) or Hash(plaintext_blob)
0b01	tree blob	Length(encrypted_blob) or Hash(plaintext_blob)
0b10	compressed data blob	Length(encrypted_blob) or Length(plaintext_blob) or Hash(plaintext_blob)
0b11	compressed tree blob	Length(encrypted_blob) or Length(plaintext_blob) or Hash(plaintext_blob)

This is enough to calculate the offsets for all the Blobs in the Pack. The length fields are encoded as four byte integers in little-endian format. In the Data column, Length(plaintext_blob) means the length of the decrypted and uncompressed data a blob consists of.

All other types are invalid, more types may be added in the future. The compressed types are only valid for repository format version 2. Data and tree blobs may be compressed with the zstandard compression algorithm.

In repository format version 1, data and tree blobs should be stored in separate pack files. In version 2, they must be stored in separate files. Compressed and non-compress blobs of the same type may be mixed in a pack file.

For reconstructing the index or parsing a pack without an index, first the last four bytes must be read in order to find the length of the header. Afterwards, the header can be read and parsed, which yields all plaintext hashes, types, offsets and lengths of all included blobs.

rustic dev documentation

Pack Format