# Storage Modules ## Table Of Contents - [Design](#design) - [Concepts](#concepts) - [Architecture](#architecture) - [Traits](#traits) ## Design ### Motivation The storage modules contains all runtime functionality pertaining to managing the Joystream storage and distribution network. As such it contains information on actors participating in the network, as well as data that should be retrievable. ### Structure There are the following modules, with their own detailed specifications. 1. [Data Object Type Registry](data-object-type-registry-module.md): manages how data may be stored on the network. 2. [Data Directory](data-directory-module.md): manages *what* data exists on the network. 3. [Data Object Storage Registry](data-object-storage-registry-module.md): manages *where* data exists on the network. 4. [Storage Staking](storage-staking-module.md) manages joining and leaving storage *tranches*. Also related is the [Content Directory](content-directory.md), which provides information for users to discover stored content, but it is not a proper module. ## Concepts - `DataObjectType`: a structure describing the type of data objects that can be stored. This is not to be confused with file types. Instead, data object types will be used to group files that should follow the same storage patterns. See the [Data Object Type Registry](data-object-type-registry-module.md) for details. - `ContentId`: a unique identifier for `DataObject` and `ContentMetadata` entries. - `DataObject`: an entry in the [Data Directory](data-directory-module.md) describing a single piece of content in the network. - `ContentMetadata`: a structure for describing content metadata in a hierarchical fashion. Refers to one or more `DataObject` entries. - `SchemaId`: an identifier for a metadata schema. Metadata schemas are used to validate `ContentMetadata` entries. - `Liaison`: the actor account that is responsible for accepting uploads for a `DataObject`, and making the content available to other storage nodes. - `StorageRelationship`: an entry in the [Data Object Storage Registry](data-object-storage-registry-module.md), describing which actor has stored a particular `DataObject`. - A storage provider is an `actor` who has staked for a storage tranche. #### ContentId, DataObject, ContentMetadata There is a somewhat strange relationship between these three concepts, as `ContentId` identifies both `DataObject` and `ContentMetadata`. Each `ContentId` can be thought of as a file name in a file system: it identifies the file contents on disk (i.e. `DataObject` here), as well as some metadata, such as file ownership, permissions, etc. In our system, we do not manage ownership or permissions in quite this manner, but in order to have content discoverable by humans, *do* manage descriptive information - aka `ContentMetadata`. The most often used term for such identifiers is a *content identifier*, hence the `ContentId` and corresponding `ContentMetadata` names. They best reflect the consumer's point of view, that content has a name and some information. The `DataObject` on the other hand refers to any generic data BLOB. Rather than introducing a `DataObjectId` and creating a 1:1 mapping between them and `ContentIds`, the latter is simply re-used. #### Storage Providers Storage providers, as indicated above, are actor accounts (public keys) which have staked for a storage tranche. The specs will treat these and storage nodes interchangeably, which may leave the impression that the runtime stores any information on where storage machines are to be contacted, such as IP addresses or host names. This is not true. At the level of abstraction of the storage module, *only* actor account IDs are managed. It is the purpose of the [Discovery Module](discovery-module.md) to resolve actor account IDs further to currently up-to-date contact information. ### Architecture The basic unit of storage is a `DataObject`, for which a unique `ContentId` is entered into the `DataDirectory`. Each `DataObject` is associated with a `DataObjectType`, which describes storage parameters such as maximum permissible file sizes, etc. For each `DataObject`, one storage provider acts as the `Liaison`, accepting and validating the actual content upload, and making the content available to other storage providers. The `Liaison` and any other storage provider that holds the content available enters this fact into the runtime as a `StorageRelationship`. For purposes of content discovery, `ContentMetadata` is added to the runtime. Each `ContentMetadata` is identified by a `ContentId`; that is, one `ContentId` usually maps to a `DataObject` and a `ContentMetadata` entry. The `ContentMetadata` has a JSON payload, and a `SchemaId` indicating to clients how are to interpret the payload. `ContentMetadata` *can* be used hierarchically. Each entry can have any number of `ContentId` as children. These child IDs can be used to store `DataObject` and/or `ContentMetadata` entries of their own, allowing for organizing `DataObject` entries into hierarchical structures, e.g. for: - Podcast episodes in a Podcast - Series episodes in a video series - Individual language audio files for translated videos, or subtitle texts. - etc. The runtime imposes no restrictions on how `SchemaId` is to be used; however, the intent is to eventually add a schema registry that stores e.g. [well documented schemas](https://schema.org), or some Joystream specific derivates. ## Traits Most of the storage module's sub-modules only make use of each other, so there is not much need for documenting traits as interfaces between them. However, one public trait, to be used by the related [Content Directory](content-directory.md) does exist: - `DataObjectHasActiveStorageRelationships`: implements a method `has_active_storage_relationships(content_id)` that returns true if there exist active `StorageRelationship` entries, and false otherwise.