OpenSK/libraries/persistent_store/src/lib.rs

// Copyright 2019-2020 Google LLC
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//      http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

// TODO(ia0): Add links once the code is complete.
//! Store abstraction for flash storage
//!
//! # Specification
//!
//! The store provides a partial function from keys to values on top of a storage
//! interface. The store total capacity depends on the size of the storage. Store
//! updates may be bundled in transactions. Mutable operations are atomic, including
//! when interrupted.
//!
//! The store is flash-efficient in the sense that it uses the storage lifetime
//! efficiently. For each page, all words are written at least once between erase
//! cycles and all erase cycles are used. However, not all written words are user
//! content: lifetime is also consumed with metadata and compaction.
//!
//! The store is extendable with other entries than key-values. It is essentially a
//! framework providing access to the storage lifetime. The partial function is
//! simply the most common usage and can be used to encode other usages.
//!
//! ## Definitions
//!
//! An _entry_ is a pair of a key and a value. A _key_ is a number between 0
//! and 4095. A _value_ is a byte slice with a length between 0 and 1023 bytes (for
//! large enough pages).
//!
//! The store provides the following _updates_:
//! -   Given a key and a value, `Insert` updates the store such that the value is
//!     associated with the key. The value for other keys are left unchanged.
//! -   Given a key, `Remove` updates the store such that no value is associated for
//!     the key. The value for other keys are left unchanged. Additionally, if there
//!     was a value associated with the key, the value is wiped from the storage
//!     (all its bits are set to 0).
//!
//! The store provides the following _read-only operations_:
//! -   `Iter` iterates through the store returning all entries exactly once. The
//!     iteration order is not specified but stable between mutable operations.
//! -   `Capacity` returns how many words can be stored before the store is full.
//! -   `Lifetime` returns how many words can be written before the storage lifetime
//!     is consumed.
//!
//! The store provides the following _mutable operations_:
//! -   Given a set of independent updates, `Transaction` applies the sequence of
//!     updates.
//! -   Given a threshold, `Clear` removes all entries with a key greater or equal
//!     to the threshold.
//! -   Given a length in words, `Prepare` makes one step of compaction unless that
//!     many words can be written without compaction. This operation has no effect
//!     on the store but may still mutate its storage. In particular, the store has
//!     the same capacity but a possibly reduced lifetime.
//!
//! A mutable operation is _atomic_ if, when power is lost during the operation, the
//! store is either updated (as if the operation succeeded) or left unchanged (as if
//! the operation did not occur). If the store is left unchanged, lifetime may still
//! be consumed.
//!
//! The store relies on the following _storage interface_:
//! -   It is possible to read a byte slice. The slice won't span multiple pages.
//! -   It is possible to write a word slice. The slice won't span multiple pages.
//! -   It is possible to erase a page.
//! -   The pages are sequentially indexed from 0. If the actual underlying storage
//!     is segmented, then the storage layer should translate those indices to
//!     actual page addresses.
//!
//! The store has a _total capacity_ of `C = (N - 1) * (P - 4) - M - 1` words, where
//! `P` is the number of words per page, `N` is the number of pages, and `M` is the
//! maximum length in words of a value (256 for large enough pages). The capacity
//! used by each mutable operation is given below (a transient word only uses
//! capacity during the operation):
//! -   `Insert` uses `1 + ceil(len / 4)` words where `len` is the length of the
//!     value in bytes. If an entry was replaced, the words used by its insertion
//!     are freed.
//! -   `Remove` doesn't use capacity if alone in the transaction and 1 transient
//!     word otherwise. If an entry was deleted, the words used by its insertion are
//!     freed.
//! -   `Transaction` uses 1 transient word. In addition, the updates of the
//!     transaction use and free words as described above.
//! -   `Clear` doesn't use capacity and frees the words used by the insertion of
//!     the deleted entries.
//! -   `Prepare` doesn't use capacity.
//!
//! The _total lifetime_ of the store is below `L = ((E + 1) * N - 1) * (P - 2)` and
//! above `L - M` words, where `E` is the maximum number of erase cycles. The
//! lifetime is used when capacity is used, including transiently, as well as when
//! compaction occurs. The more the store is loaded (few remaining words of
//! capacity), the more compactions are frequent, and the more lifetime is used.
//!
//! It is possible to approximate the cost of transient words in terms of capacity:
//! `L` transient words are equivalent to `C - x` words of capacity where `x` is the
//! average capacity (including transient) of operations.
//!
//! ## Preconditions
//!
//! The store may behave in unexpected ways if the following assumptions don't hold:
//! -   A word can be written twice between erase cycles.
//! -   A page can be erased `E` times after the first boot of the store.
//! -   When power is lost while writing a slice or erasing a page, the next read
//!     returns a slice where a subset (possibly none or all) of the bits that
//!     should have been modified have been modified.
//! -   Reading a slice is deterministic. When power is lost while writing a slice
//!     or erasing a slice (erasing a page containing that slice), reading that
//!     slice repeatedly returns the same result (until it is overwritten or its
//!     page is erased).
//! -   To decide whether a page has been erased, it is enough to test if all its
//!     bits are equal to 1.
//! -   When power is lost while writing a slice or erasing a page, that operation
//!     does not count towards the limits. However, completing that write or erase
//!     operation would count towards the limits, as if the number of writes per
//!     word and number of erase cycles could be fractional.
//! -   The storage is only modified by the store. Note that completely erasing the
//!     storage is supported, essentially losing all content and lifetime tracking.
//!     It is preferred to use `Clear` with a threshold of 0 to keep the lifetime
//!     tracking.
//!
//! The store properties may still hold outside some of those assumptions but with
//! weaker probabilities as the usage diverges from them.
//!
//! # Implementation
//!
//! We define the following constants:
//! -   `E < 65536` the number of times a page can be erased.
//! -   `3 <= N < 64` the number of pages in the storage.
//! -   `8 <= P <= 1024` the number of words in a page.
//! -   `Q = P - 2` the number of words in a virtual page.
//! -   `K = 4096` the maximum number of keys.
//! -   `M = min(Q - 1, 256)` the maximum length in words of a value.
//! -   `V = (N - 1) * (Q - 1) - M` the virtual capacity.
//! -   `C = V - N` the user capacity.
//!
//! We build a virtual storage from the physical storage using the first 2 words of
//! each page:
//! -   The first word contains the number of times the page has been erased.
//! -   The second word contains the starting word to which this page is being moved
//!     during compaction.
//!
//! The virtual storage has a length of `(E + 1) * N * Q` words and represents the
//! lifetime of the store. (We reserve the last `Q + M` words to support adding
//! emergency lifetime.) This virtual storage has a linear address space.
//!
//! We define a set of overlapping windows of `N * Q` words at each `Q`-aligned
//! boundary. We call `i` the window spanning from `i * Q` to `(i + N) * Q`. Only
//! those windows actually exist in the underlying storage. We use compaction to
//! shift the current window from `i` to `i + 1`, preserving the content of the
//! store.
//!
//! For a given state of the virtual storage, we define `h_i` as the position of the
//! first entry of the window `i`. We call it the head of the window `i`. Because
//! entries are at most `M + 1` words, they can overlap on the next page only by `M`
//! words. So we have `i * Q <= h_i <= i * Q + M` . Since there are no entries
//! before the first page, we have `h_0 = 0`.
//!
//! We define `t_i` as one past the last entry of the window `i`. If there are no
//! entries in that window, we have `t_i = h_i`. We call `t_i` the tail of the
//! window `i`. We define the compaction invariant as `t_i - h_i <= V`.
//!
//! We define `|x|` as the capacity used before position `x`. We have `|x| <= x`. We
//! define the capacity invariant as `|t_i| - |h_i| <= C`.
//!
//! Using this virtual storage, entries are appended to the tail as long as there is
//! both virtual capacity to preserve the compaction invariant and capacity to
//! preserve the capacity invariant. When virtual capacity runs out, the first page
//! of the window is compacted and the window is shifted.
//!
//! Entries are identified by a prefix of bits. The prefix has to contain at least
//! one bit set to zero to differentiate from the tail. Entries can be one of:
//! -   Padding: A word whose first bit is set to zero. The rest is arbitrary. This
//!     entry is used to mark words partially written after an interrupted operation
//!     as padding such that they are ignored by future operations.
//! -   Header: A word whose second bit is set to zero. It contains the following fields:
//!     -   A bit indicating whether the entry is deleted.
//!     -   A bit indicating whether the value is word-aligned and has all bits set
//!         to 1 in its last word. The last word of an entry is used to detect that
//!         an entry has been fully written. As such it must contain at least one
//!         bit equal to zero.
//!     -   The key of the entry.
//!     -   The length in bytes of the value. The value follows the header. The
//!         entry is word-aligned if the value is not.
//!     -   The checksum of the first and last word of the entry.
//! -   Erase: A word used during compaction. It contains the page to be erased and
//!     a checksum.
//! -   Clear: A word used during the `Clear` operation. It contains the threshold
//!     and a checksum.
//! -   Marker: A word used during the `Transaction` operation. It contains the
//!     number of updates following the marker and a checksum.
//! -   Remove: A word used during the `Transaction` operation. It contains the key
//!     of the entry to be removed and a checksum.
//!
//! Checksums are the number of bits equal to 0.
//!
//! # Proofs
//!
//! ## Compaction
//!
//! It should always be possible to fully compact the store, after what the
//! remaining capacity should be available in the current window (restoring the
//! compaction invariant). We consider all notations on the virtual storage after
//! the full compaction. We will use the `|x|` notation although we update the state
//! of the virtual storage. This is fine because compaction doesn't change the
//! status of an existing word.
//!
//! We want to show that the next `N - 1` compactions won't move the tail past the
//! last page of their window, with `I` the initial window:
//!
//! ```
//! forall 1 <= i <= N - 1, t_{I + i} <= (I + i + N - 1) * Q
//! ```
//!
//! We assume `i` between `1` and `N - 1`.
//!
//! One step of compaction advances the tail by how many words were used in the
//! first page of the window with the last entry possibly overlapping on the next
//! page.
//!
//! ```
//! forall j, t_{j + 1} = t_j + |h_{j + 1}| - |h_j| + 1
//! ```
//!
//! By induction, we have:
//!
//! ```
//! t_{I + i} <= t_I + |h_{I + i}| - |h_I| + i
//! ```
//!
//! We have the following properties:
//!
//! ```
//! t_I <= h_I + V
//! |h_{I + i}| - |h_I| <= h_{I + i} - h_I
//! h_{I + i} <= (I + i) * Q + M
//! ```
//!
//! Replacing into our previous equality, we can conclude:
//!
//! ```
//! t_{I + i}  = t_I + |h_{I + i}| - |h_I| + i
//!           <= h_I + V + (I + i) * Q + M - h_I + i
//!            = (N - 1) * (Q - 1) - M + (I + i) * Q + M + i
//!            = (N - 1) * (Q - 1) + (I + i) * Q + i
//!            = (I + i + N - 1) * Q + i - (N - 1)
//!           <= (I + i + N - 1) * Q
//! ```
//!
//! We also want to show that after `N - 1` compactions, the remaining capacity is
//! available without compaction.
//!
//! ```
//! V - (t_{I + N - 1} - h_{I + N - 1}) >=    // The available words in the window.
//!   C - (|t_{I + N - 1}| - |h_{I + N - 1}|) // The remaining capacity.
//!   + 1                                     // Reserved for Clear.
//! ```
//!
//! We can replace the definition of `C` and simplify:
//!
//! ```
//! V - (t_{I + N - 1} - h_{I + N - 1}) >= V - N - (|t_{I + N - 1}| - |h_{I + N - 1}|) + 1
//! iff t_{I + N - 1} - h_{I + N - 1} <= |t_{I + N - 1}| - |h_{I + N - 1}| + N - 1
//! ```
//!
//! We have the following properties:
//!
//! ```
//! t_{I + N - 1} = t_I + |h_{I + N - 1}| - |h_I| + N - 1
//! |t_{I + N - 1}| - |h_{I + N - 1}| = |t_I| - |h_I| // Compaction preserves capacity.
//! |h_{I + N - 1}| - |t_I| <= h_{I + N - 1} - t_I
//! ```
//!
//! From which we conclude:
//!
//! ```
//! t_{I + N - 1} - h_{I + N - 1} <= |t_{I + N - 1}| - |h_{I + N - 1}| + N - 1
//! iff t_I + |h_{I + N - 1}| - |h_I| + N - 1 - h_{I + N - 1} <= |t_I| - |h_I| + N - 1
//! iff t_I + |h_{I + N - 1}| - h_{I + N - 1} <= |t_I|
//! iff |h_{I + N - 1}| - |t_I| <= h_{I + N - 1} - t_I
//! ```
//!
//!
//! ## Checksum
//!
//! The main property we want is that all partially written/erased words are either
//! the initial word, the final word, or invalid.
//!
//! We say that a bit sequence `TARGET` is reachable from a bit sequence `SOURCE` if
//! both have the same length and `SOURCE & TARGET == TARGET` where `&` is the
//! bitwise AND operation on bit sequences of that length. In other words, when
//! `SOURCE` has a bit equal to 0 then `TARGET` also has that bit equal to 0.
//!
//! The only written entries start with `101` or `110` and are written from an
//! erased word. Marking an entry as padding or deleted is a single bit operation,
//! so the property trivially holds. For those cases, the proof relies on the fact
//! that there is exactly one bit equal to 0 in the 3 first bits. Either the 3 first
//! bits are still `111` in which case we expect the remaining bits to bit equal
//! to 1. Otherwise we can use the checksum of the given type of entry because those
//! 2 types of entries are not reachable from each other.
//!
//! To show that valid entries of a given type are not reachable from each other, we
//! show 3 lemmas:
//!
//! 1.  A bit sequence is not reachable from another if its number of bits equal to
//!     0 is smaller.
//!
//! 2.  A bit sequence is not reachable from another if they have the same number of
//!     bits equals to 0 and are different.
//!
//! 3.  A bit sequence is not reachable from another if it is bigger when they are
//!     interpreted as numbers in binary representation.
//!
//! From those lemmas we consider the 2 cases. If both entries have the same number
//! of bits equal to 0, they are either equal or not reachable from each other
//! because of the second lemma. If they don't have the same number of bits equal to
//! 0, then the one with less bits equal to 0 is not reachable from the other
//! because of the first lemma and the one with more bits equal to 0 is not
//! reachable from the other because of the third lemma and the definition of the
//! checksum.
//!
//! # Fuzzing
//!
//! For any sequence of operations and interruptions starting from an erased
//! storage, the store is checked against its model and some internal invariant at
//! each step.
//!
//! For any sequence of operations and interruptions starting from an arbitrary
//! storage, the store is checked not to crash.

#![cfg_attr(not(feature = "std"), no_std)]

#[macro_use]
extern crate alloc;

#[macro_use]
mod bitfield;
mod format;
mod storage;
mod store;

pub use self::storage::{Storage, StorageError, StorageIndex, StorageResult};
pub use self::store::{StoreError, StoreResult};