storage-modules.md 5.9 KB

Storage Modules

Table Of Contents

Design

Motivation

The storage modules contains all runtime functionality pertaining to managing the Joystream storage and distribution network. As such it contains information on actors participating in the network, as well as data that should be retrievable.

Structure

There are the following modules, with their own detailed specifications.

  1. Data Object Type Registry: manages how data may be stored on the network.
  2. Data Directory: manages what data exists on the network.
  3. Data Object Storage Registry: manages where data exists on the network.
  4. Storage Staking manages joining and leaving storage tranches.

Also related is the Content Directory, which provides information for users to discover stored content, but it is not a proper module.

Concepts

  • DataObjectType: a structure describing the type of data objects that can be stored. This is not to be confused with file types. Instead, data object types will be used to group files that should follow the same storage patterns. See the Data Object Type Registry for details.

  • ContentId: a unique identifier for DataObject and ContentMetadata entries.

  • DataObject: an entry in the Data Directory describing a single piece of content in the network.

  • ContentMetadata: a structure for describing content metadata in a hierarchical fashion. Refers to one or more DataObject entries.

  • SchemaId: an identifier for a metadata schema. Metadata schemas are used to validate ContentMetadata entries.

  • Liaison: the actor account that is responsible for accepting uploads for a DataObject, and making the content available to other storage nodes.

  • StorageRelationship: an entry in the Data Object Storage Registry, describing which actor has stored a particular DataObject.

  • A storage provider is an actor who has staked for a storage tranche.

ContentId, DataObject, ContentMetadata

There is a somewhat strange relationship between these three concepts, as ContentId identifies both DataObject and ContentMetadata.

Each ContentId can be thought of as a file name in a file system: it identifies the file contents on disk (i.e. DataObject here), as well as some metadata, such as file ownership, permissions, etc.

In our system, we do not manage ownership or permissions in quite this manner, but in order to have content discoverable by humans, do manage descriptive information - aka ContentMetadata.

The most often used term for such identifiers is a content identifier, hence the ContentId and corresponding ContentMetadata names. They best reflect the consumer's point of view, that content has a name and some information.

The DataObject on the other hand refers to any generic data BLOB. Rather than introducing a DataObjectId and creating a 1:1 mapping between them and ContentIds, the latter is simply re-used.

Storage Providers

Storage providers, as indicated above, are actor accounts (public keys) which have staked for a storage tranche. The specs will treat these and storage nodes interchangeably, which may leave the impression that the runtime stores any information on where storage machines are to be contacted, such as IP addresses or host names. This is not true.

At the level of abstraction of the storage module, only actor account IDs are managed. It is the purpose of the Discovery Module to resolve actor account IDs further to currently up-to-date contact information.

Architecture

The basic unit of storage is a DataObject, for which a unique ContentId is entered into the DataDirectory. Each DataObject is associated with a DataObjectType, which describes storage parameters such as maximum permissible file sizes, etc.

For each DataObject, one storage provider acts as the Liaison, accepting and validating the actual content upload, and making the content available to other storage providers. The Liaison and any other storage provider that holds the content available enters this fact into the runtime as a StorageRelationship.

For purposes of content discovery, ContentMetadata is added to the runtime. Each ContentMetadata is identified by a ContentId; that is, one ContentId usually maps to a DataObject and a ContentMetadata entry. The ContentMetadata has a JSON payload, and a SchemaId indicating to clients how are to interpret the payload.

ContentMetadata can be used hierarchically. Each entry can have any number of ContentId as children. These child IDs can be used to store DataObject and/or ContentMetadata entries of their own, allowing for organizing DataObject entries into hierarchical structures, e.g. for:

  • Podcast episodes in a Podcast
  • Series episodes in a video series
  • Individual language audio files for translated videos, or subtitle texts.
  • etc.

The runtime imposes no restrictions on how SchemaId is to be used; however, the intent is to eventually add a schema registry that stores e.g. well documented schemas, or some Joystream specific derivates.

Traits

Most of the storage module's sub-modules only make use of each other, so there is not much need for documenting traits as interfaces between them. However, one public trait, to be used by the related Content Directory does exist:

  • DataObjectHasActiveStorageRelationships: implements a method has_active_storage_relationships(content_id) that returns true if there exist active StorageRelationship entries, and false otherwise.