Files
OpenSK/libraries/persistent_store/src/lib.rs
2022-08-19 12:47:29 +02:00

418 lines
20 KiB
Rust
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
// Copyright 2019-2021 Google LLC
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// The documentation is easier to read from a browser:
// - Run: cargo doc --document-private-items --features=std
// - Open: target/doc/persistent_store/index.html
//! Store abstraction for flash storage
//!
//! # Specification
//!
//! The [store](Store) provides a partial function from keys to values on top of a
//! [storage](Storage) interface. The store total [capacity](Store::capacity) depends on the size of
//! the storage. Store [updates](StoreUpdate) may be bundled in [transactions](Store::transaction).
//! Mutable operations are atomic, including when interrupted.
//!
//! The store is flash-efficient in the sense that it uses the storage [lifetime](Store::lifetime)
//! efficiently. For each page, all words are written at least once between erase cycles and all
//! erase cycles are used. However, not all written words are user content: Lifetime is also
//! consumed with metadata and compaction.
//!
//! The store is extendable with other entries than key-values. It is essentially a framework
//! providing access to the storage lifetime. The partial function is simply the most common usage
//! and can be used to encode other usages.
//!
//! ## Definitions
//!
//! An _entry_ is a pair of a key and a value. A _key_ is a number between 0 and
//! [4095](format::MAX_KEY_INDEX). A _value_ is a byte slice with a length between 0 and
//! [1023](format::Format::max_value_len) bytes (for large enough pages).
//!
//! The store provides the following _updates_:
//! - Given a key and a value, [`StoreUpdate::Insert`] updates the store such that the value is
//! associated with the key. The values for other keys are left unchanged.
//! - Given a key, [`StoreUpdate::Remove`] updates the store such that no value is associated with
//! the key. The values for other keys are left unchanged. Additionally, if there was a value
//! associated with the key, the value is wiped from the storage (all its bits are set to 0).
//!
//! The store provides the following _read-only operations_:
//! - [`Store::iter`] iterates through the store returning all entries exactly once. The iteration
//! order is not specified but stable between mutable operations.
//! - [`Store::capacity`] returns how many words can be stored before the store is full.
//! - [`Store::lifetime`] returns how many words can be written before the storage lifetime is
//! consumed.
//!
//! The store provides the following _mutable operations_:
//! - Given a set of independent updates, [`Store::transaction`] applies the sequence of updates.
//! - Given a threshold, [`Store::clear`] removes all entries with a key greater or equal to the
//! threshold.
//! - Given a length in words, [`Store::prepare`] makes one step of compaction unless that many
//! words can be written without compaction. This operation has no effect on the store but may
//! still mutate its storage. In particular, the store has the same capacity but a possibly
//! reduced lifetime.
//!
//! A mutable operation is _atomic_ if, when power is lost during the operation, the store is either
//! updated (as if the operation succeeded) or left unchanged (as if the operation did not occur).
//! If the store is left unchanged, lifetime may still be consumed.
//!
//! The store relies on the following _storage interface_:
//! - It is possible to [read](Storage::read_slice) a byte slice. The slice won't span multiple
//! pages.
//! - It is possible to [write](Storage::write_slice) a word slice. The slice won't span multiple
//! pages.
//! - It is possible to [erase](Storage::erase_page) a page.
//! - The pages are sequentially indexed from 0. If the actual underlying storage is segmented,
//! then the storage layer should translate those indices to actual page addresses.
//!
//! The store has a _total capacity_ of C = (N - 1) × (P - 4) - M - 1 words, where:
//! - P is the number of words per page
//! - [N](format::Format::num_pages) is the number of pages
//! - [M](format::Format::max_prefix_len) is the maximum length in words of a value (256 for large
//! enough pages)
//!
//! The capacity used by each mutable operation is given below (a transient word only uses capacity
//! during the operation):
//!
//! | Operation/Update | Used capacity | Freed capacity | Transient capacity |
//! | ----------------------- | ---------------- | ----------------- | ------------------ |
//! | [`StoreUpdate::Insert`] | 1 + value length | overwritten entry | 0 |
//! | [`StoreUpdate::Remove`] | 0 | deleted entry | see below\* |
//! | [`Store::transaction`] | 0 + updates | 0 + updates | 1 |
//! | [`Store::clear`] | 0 | deleted entries | 0 |
//! | [`Store::prepare`] | 0 | 0 | 0 |
//!
//! \*0 if the update is alone in the transaction, otherwise 1.
//!
//! The _total lifetime_ of the store is below L = ((E + 1) × N - 1) × (P - 2) and above L - M
//! words, where E is the maximum number of erase cycles. The lifetime is used when capacity is
//! used, including transiently, as well as when compaction occurs. Compaction frequency and
//! lifetime consumption are positively correlated to the store load factor (the ratio of used
//! capacity to total capacity).
//!
//! It is possible to approximate the cost of transient words in terms of capacity: L transient
//! words are equivalent to C - x words of capacity where x is the average capacity (including
//! transient) of operations.
//!
//! ## Preconditions
//!
//! The following assumptions need to hold, or the store may behave in unexpected ways:
//! - A word can be written [twice](Storage::max_word_writes) between erase cycles.
//! - A page can be erased [E](Storage::max_page_erases) times after the first boot of the store.
//! - When power is lost while writing a slice or erasing a page, the next read returns a slice
//! where a subset (possibly none or all) of the bits that should have been modified have been
//! modified.
//! - Reading a slice is deterministic. When power is lost while writing a slice or erasing a
//! slice (erasing a page containing that slice), reading that slice repeatedly returns the same
//! result (until it is overwritten or its page is erased).
//! - To decide whether a page has been erased, it is enough to test if all its bits are equal
//! to 1.
//! - When power is lost while writing a slice or erasing a page, that operation does not count
//! towards the limits. However, completing that write or erase operation would count towards
//! the limits, as if the number of writes per word and number of erase cycles could be
//! fractional.
//! - The storage is only modified by the store. Note that completely erasing the storage is
//! supported, essentially losing all content and lifetime tracking. It is preferred to use
//! [`Store::clear`] with a threshold of 0 to keep the lifetime tracking.
//!
//! The store properties may still hold outside some of those assumptions, but with an increasing
//! chance of failure.
//!
//! # Implementation
//!
//! We define the following constants:
//! - [E](format::Format::max_page_erases) ≤ [65535](format::MAX_ERASE_CYCLE) the number of times
//! a page can be erased.
//! - 3 ≤ [N](format::Format::num_pages) < 64 the number of pages in the storage.
//! - 8 ≤ P ≤ 1024 the number of words in a page.
//! - [Q](format::Format::virt_page_size) = P - 2 the number of words in a virtual page.
//! - [M](format::Format::max_prefix_len) = min(Q - 1, 256) the maximum length in words of a
//! value.
//! - [W](format::Format::window_size) = (N - 1) × Q - M the window size.
//! - [V](format::Format::virt_size) = (N - 1) × (Q - 1) - M the virtual capacity.
//! - [C](format::Format::total_capacity) = V - N the user capacity.
//!
//! We build a virtual storage from the physical storage using the first 2 words of each page:
//! - The first word contains the number of times the page has been erased.
//! - The second word contains the starting word to which this page is being moved during
//! compaction.
//!
//! The virtual storage has a length of (E + 1) × N × Q words and represents the lifetime of the
//! store. (We reserve the last Q + M words to support adding emergency lifetime.) This virtual
//! storage has a linear address space.
//!
//! We define a set of overlapping windows of N × Q words at each Q-aligned boundary. We call i the
//! window spanning from i × Q to (i + N) × Q. Only those windows actually exist in the underlying
//! storage. We use compaction to shift the current window from i to i + 1, preserving the content
//! of the store.
//!
//! For a given state of the virtual storage, we define h\_i as the position of the first entry of
//! the window i. We call it the head of the window i. Because entries are at most M + 1 words, they
//! can overlap on the next page only by M words. So we have i × Q ≤ h_i ≤ i × Q + M . Since there
//! are no entries before the first page, we have h\_0 = 0.
//!
//! We define t\_i as one past the last entry of the window i. If there are no entries in that
//! window, we have t\_i = h\_i. We call t\_i the tail of the window i. We define the compaction
//! invariant as t\_i - h\_i ≤ V and the window invariant as t\_i - h\_i ≤ W. The compaction
//! invariant may temporarily be broken during a sequence of (at most N - 1) compactions.
//!
//! We define |x| as the capacity used before position x. We have |x| ≤ x. We define the capacity
//! invariant as |t\_i| - |h\_i| ≤ C.
//!
//! Using this virtual storage, entries are appended to the tail as long as there is both virtual
//! capacity to preserve the compaction invariant and capacity to preserve the capacity invariant.
//! When virtual capacity runs out, the first page of the window is compacted and the window is
//! shifted.
//!
//! Entries are identified by a prefix of bits. The prefix has to contain at least one bit set to
//! zero to differentiate from the tail. Entries can be one of:
//! - [Padding](format::ID_PADDING): A word whose first bit is set to zero. The rest is arbitrary.
//! This entry is used to mark words partially written after an interrupted operation as padding
//! such that they are ignored by future operations.
//! - [Header](format::ID_HEADER): A word whose second bit is set to zero. It contains the
//! following fields:
//! - A [bit](format::HEADER_DELETED) indicating whether the entry is deleted.
//! - A [bit](format::HEADER_FLIPPED) indicating whether the value is word-aligned and has all
//! bits set to 1 in its last word. The last word of an entry is used to detect that an
//! entry has been fully written. As such it must contain at least one bit equal to zero.
//! - The [key](format::HEADER_KEY) of the entry.
//! - The [length](format::HEADER_LENGTH) in bytes of the value. The value follows the header.
//! The entry is word-aligned if the value is not.
//! - The [checksum](format::HEADER_CHECKSUM) of the first and last word of the entry.
//! - [Erase](format::ID_ERASE): A word used during compaction. It contains the
//! [page](format::ERASE_PAGE) to be erased and a [checksum](format::WORD_CHECKSUM).
//! - [Clear](format::ID_CLEAR): A word used during the clear operation. It contains the
//! [threshold](format::CLEAR_MIN_KEY) and a [checksum](format::WORD_CHECKSUM).
//! - [Marker](format::ID_MARKER): A word used during a transaction. It contains the [number of
//! updates](format::MARKER_COUNT) following the marker and a [checksum](format::WORD_CHECKSUM).
//! - [Remove](format::ID_REMOVE): A word used inside a transaction. It contains the
//! [key](format::REMOVE_KEY) of the entry to be removed and a
//! [checksum](format::WORD_CHECKSUM).
//!
//! Checksums are the number of bits equal to 0.
//!
//! # Proofs
//!
//! ## Compaction
//!
//! Let I be a window at which all invariants hold. We will show that the next N - 1 compactions
//! will preserve the window invariant (the capacity invariant is trivially preserved) after each
//! compaction. We will also show that after N - 1 compactions, the compaction invariant is
//! restored.
//!
//! We consider all notations on the virtual storage after the full compaction. We will use the |x|
//! notation although we update the state of the virtual storage. This is fine because compaction
//! doesn't change the status of an existing word.
//!
//! We first show that after each compaction, the window invariant is preserved.
//!
//! ```text
//! ∀(1 ≤ i ≤ N - 1) t_{I + i} - h_{I + i} ≤ W
//! ```
//!
//! We assume i between 1 and N - 1.
//!
//! One step of compaction advances the tail by how many words were used in the first page of the
//! window with the last entry possibly overlapping on the next page.
//!
//! ```text
//! ∀j t_{j + 1} = t_j + |h_{j + 1}| - |h_j| + 1
//! ```
//!
//! By induction, we have:
//!
//! ```text
//! t_{I + i} = t_I + |h_{I + i}| - |h_I| + i
//! ```
//!
//! We have the following properties:
//!
//! ```text
//! t_I ≤ h_I + V
//! |h_{I + i}| - |h_I| ≤ h_{I + i} - h_I
//! ```
//!
//! Replacing into our previous equality, we can conclude:
//!
//! ```text
//! t_{I + i} = t_I + |h_{I + i}| - |h_I| + i
//! ≤ h_I + V + h_{I + 1} - h_I + i
//! iff
//! t_{I + i} - h_{I + 1} ≤ V + i
//! ≤ V + N - 1
//! = W
//! ```
//!
//! An important corollary is that the tail stays within the window:
//!
//! ```text
//! t_{I + i} ≤ (I + i + N - 1) × Q
//! ```
//!
//! We have the following property:
//!
//! ```text
//! h_{I + i} ≤ (I + i) × Q + M
//! ```
//!
//! From which we conclude with the definition of W:
//!
//! ```text
//! t_{I + i} ≤ h_{I + i} + W
//! ≤ (I + i) × Q + M + (N - 1) × Q - M
//! = (I + i + N - 1) × Q
//! ```
//!
//! We finally show that after N - 1 compactions, the compaction invariant is restored. In
//! particular, the remaining capacity is available without compaction.
//!
//! ```text
//! V - (t_{I + N - 1} - h_{I + N - 1}) ≥ C - (|t_{I + N - 1}| - |h_{I + N - 1}|) + 1
//! ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~
//! immediate capacity remaining capacity |
//! reserved for clear
//! ```
//!
//! We can replace the definition of C and simplify:
//!
//! ```text
//! V - (t_{I + N - 1} - h_{I + N - 1}) ≥ V - N - (|t_{I + N - 1}| - |h_{I + N - 1}|) + 1
//! iff t_{I + N - 1} - h_{I + N - 1} ≤ |t_{I + N - 1}| - |h_{I + N - 1}| + N - 1
//! ```
//!
//! We have the following properties:
//!
//! ```text
//! t_{I + N - 1} = t_I + |h_{I + N - 1}| - |h_I| + N - 1
//! |t_{I + N - 1}| - |h_{I + N - 1}| = |t_I| - |h_I|
//! |h_{I + N - 1}| - |t_I| ≤ h_{I + N - 1} - t_I
//! ```
//!
//! From which we conclude:
//!
//! ```text
//! t_{I + N - 1} - h_{I + N - 1} ≤ |t_{I + N - 1}| - |h_{I + N - 1}| + N - 1
//! iff t_I + |h_{I + N - 1}| - |h_I| + N - 1 - h_{I + N - 1} ≤ |t_I| - |h_I| + N - 1
//! iff t_I + |h_{I + N - 1}| - h_{I + N - 1} ≤ |t_I|
//! iff |h_{I + N - 1}| - |t_I| ≤ h_{I + N - 1} - t_I
//! ```
//!
//! ## Checksum
//!
//! The main property we want is that all partially written/erased words are either the initial
//! word, the final word, or invalid.
//!
//! We say that a bit sequence `TARGET` is reachable from a bit sequence `SOURCE` if both have the
//! same length and `SOURCE & TARGET == TARGET` where `&` is the bitwise AND operation on bit
//! sequences of that length. In other words, when `SOURCE` has a bit equal to 0 then `TARGET` also
//! has that bit equal to 0.
//!
//! The only written entries start with `101` or `110` and are written from an erased word. Marking
//! an entry as padding or deleted is a single bit operation, so the property trivially holds. For
//! those cases, the proof relies on the fact that there is exactly one bit equal to 0 in the 3
//! first bits. Either the 3 first bits are still `111` in which case we expect the remaining bits
//! to be equal to 1. Otherwise we can use the checksum of the given type of entry because those 2
//! types of entries are not reachable from each other. Here is a visualization of the partitioning
//! based on the first 3 bits:
//!
//! | First 3 bits | Description | How to check |
//! | ------------:| ------------------ | ---------------------------- |
//! | `111` | Erased word | All bits set to `1` |
//! | `101` | User entry | Contains a checksum |
//! | `110` | Internal entry | Contains a checksum |
//! | `100` | Deleted user entry | No check, atomically written |
//! | `0??` | Padding entry | No check, atomically written |
//!
//! To show that valid entries of a given type are not reachable from each other, we show 3 lemmas:
//!
//! 1. A bit sequence is not reachable from another if its number of bits equal to 0 is smaller.
//! 2. A bit sequence is not reachable from another if they have the same number of bits equals to
//! 0 and are different.
//! 3. A bit sequence is not reachable from another if it is bigger when they are interpreted as
//! numbers in binary representation.
//!
//! From those lemmas we consider the 2 cases. If both entries have the same number of bits equal to
//! 0, they are either equal or not reachable from each other because of the second lemma. If they
//! don't have the same number of bits equal to 0, then the one with less bits equal to 0 is not
//! reachable from the other because of the first lemma and the one with more bits equal to 0 is not
//! reachable from the other because of the third lemma and the definition of the checksum.
//!
//! # Fuzzing
//!
//! For any sequence of operations and interruptions starting from an erased storage, the store is
//! checked against its model and some internal invariant at each step.
//!
//! For any sequence of operations and interruptions starting from an arbitrary storage, the store
//! is checked not to crash.
#![cfg_attr(not(feature = "std"), no_std)]
#[macro_use]
extern crate alloc;
#[cfg(feature = "std")]
mod buffer;
pub mod concat;
#[cfg(feature = "std")]
mod driver;
#[cfg(feature = "std")]
mod file;
mod format;
pub mod fragment;
#[cfg(feature = "std")]
mod model;
mod storage;
mod store;
#[cfg(test)]
mod test;
#[cfg(feature = "std")]
pub use self::buffer::{BufferCorruptFunction, BufferOptions, BufferStorage};
#[cfg(feature = "std")]
pub use self::driver::{
StoreDriver, StoreDriverOff, StoreDriverOn, StoreInterruption, StoreInvariant,
};
#[cfg(feature = "std")]
pub use self::file::{FileOptions, FileStorage};
#[cfg(feature = "std")]
pub use self::model::{StoreModel, StoreOperation};
pub use self::storage::{Storage, StorageError, StorageIndex, StorageResult};
pub use self::store::{
Store, StoreError, StoreHandle, StoreIter, StoreRatio, StoreResult, StoreUpdate,
};
/// Internal representation of natural numbers.
///
/// In Rust natural numbers are represented as `usize`. However, internally we represent them as
/// `u32`. This is done to preserve semantics across different targets. This is useful when tests
/// run with `usize = u64` while the actual target has `usize = u32`.
///
/// To avoid too many conversions between `usize` and `Nat` which are necessary when interfacing
/// with Rust, `usize` is used instead of `Nat` in code meant only for tests.
///
/// Currently, the store only supports targets with `usize = u32`.
// Make sure production builds have `usize = 32`.
#[cfg(any(target_pointer_width = "32", feature = "std"))]
type Nat = u32;
/// Returns the internal representation of a Rust natural number.
///
/// # Panics
///
/// Panics if the conversion overflows.
fn usize_to_nat(x: usize) -> Nat {
use core::convert::TryFrom;
Nat::try_from(x).unwrap()
}