// Copyright 2019-2021 Google LLC // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. // You may obtain a copy of the License at // // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // See the License for the specific language governing permissions and // limitations under the License. // The documentation is easier to read from a browser: // - Run: cargo doc --document-private-items --features=std // - Open: target/doc/persistent_store/index.html //! Store abstraction for flash storage //! //! # Specification //! //! The [store](Store) provides a partial function from keys to values on top of a //! [storage](Storage) interface. The store total [capacity](Store::capacity) depends on the size of //! the storage. Store [updates](StoreUpdate) may be bundled in [transactions](Store::transaction). //! Mutable operations are atomic, including when interrupted. //! //! The store is flash-efficient in the sense that it uses the storage [lifetime](Store::lifetime) //! efficiently. For each page, all words are written at least once between erase cycles and all //! erase cycles are used. However, not all written words are user content: Lifetime is also //! consumed with metadata and compaction. //! //! The store is extendable with other entries than key-values. It is essentially a framework //! providing access to the storage lifetime. The partial function is simply the most common usage //! and can be used to encode other usages. //! //! ## Definitions //! //! An _entry_ is a pair of a key and a value. A _key_ is a number between 0 and //! [4095](format::MAX_KEY_INDEX). A _value_ is a byte slice with a length between 0 and //! [1023](format::Format::max_value_len) bytes (for large enough pages). //! //! The store provides the following _updates_: //! - Given a key and a value, [`StoreUpdate::Insert`] updates the store such that the value is //! associated with the key. The values for other keys are left unchanged. //! - Given a key, [`StoreUpdate::Remove`] updates the store such that no value is associated with //! the key. The values for other keys are left unchanged. Additionally, if there was a value //! associated with the key, the value is wiped from the storage (all its bits are set to 0). //! //! The store provides the following _read-only operations_: //! - [`Store::iter`] iterates through the store returning all entries exactly once. The iteration //! order is not specified but stable between mutable operations. //! - [`Store::capacity`] returns how many words can be stored before the store is full. //! - [`Store::lifetime`] returns how many words can be written before the storage lifetime is //! consumed. //! //! The store provides the following _mutable operations_: //! - Given a set of independent updates, [`Store::transaction`] applies the sequence of updates. //! - Given a threshold, [`Store::clear`] removes all entries with a key greater or equal to the //! threshold. //! - Given a length in words, [`Store::prepare`] makes one step of compaction unless that many //! words can be written without compaction. This operation has no effect on the store but may //! still mutate its storage. In particular, the store has the same capacity but a possibly //! reduced lifetime. //! //! A mutable operation is _atomic_ if, when power is lost during the operation, the store is either //! updated (as if the operation succeeded) or left unchanged (as if the operation did not occur). //! If the store is left unchanged, lifetime may still be consumed. //! //! The store relies on the following _storage interface_: //! - It is possible to [read](Storage::read_slice) a byte slice. The slice won't span multiple //! pages. //! - It is possible to [write](Storage::write_slice) a word slice. The slice won't span multiple //! pages. //! - It is possible to [erase](Storage::erase_page) a page. //! - The pages are sequentially indexed from 0. If the actual underlying storage is segmented, //! then the storage layer should translate those indices to actual page addresses. //! //! The store has a _total capacity_ of C = (N - 1) × (P - 4) - M - 1 words, where: //! - P is the number of words per page //! - [N](format::Format::num_pages) is the number of pages //! - [M](format::Format::max_prefix_len) is the maximum length in words of a value (256 for large //! enough pages) //! //! The capacity used by each mutable operation is given below (a transient word only uses capacity //! during the operation): //! //! | Operation/Update | Used capacity | Freed capacity | Transient capacity | //! | ----------------------- | ---------------- | ----------------- | ------------------ | //! | [`StoreUpdate::Insert`] | 1 + value length | overwritten entry | 0 | //! | [`StoreUpdate::Remove`] | 0 | deleted entry | see below\* | //! | [`Store::transaction`] | 0 + updates | 0 + updates | 1 | //! | [`Store::clear`] | 0 | deleted entries | 0 | //! | [`Store::prepare`] | 0 | 0 | 0 | //! //! \*0 if the update is alone in the transaction, otherwise 1. //! //! The _total lifetime_ of the store is below L = ((E + 1) × N - 1) × (P - 2) and above L - M //! words, where E is the maximum number of erase cycles. The lifetime is used when capacity is //! used, including transiently, as well as when compaction occurs. Compaction frequency and //! lifetime consumption are positively correlated to the store load factor (the ratio of used //! capacity to total capacity). //! //! It is possible to approximate the cost of transient words in terms of capacity: L transient //! words are equivalent to C - x words of capacity where x is the average capacity (including //! transient) of operations. //! //! ## Preconditions //! //! The following assumptions need to hold, or the store may behave in unexpected ways: //! - A word can be written [twice](Storage::max_word_writes) between erase cycles. //! - A page can be erased [E](Storage::max_page_erases) times after the first boot of the store. //! - When power is lost while writing a slice or erasing a page, the next read returns a slice //! where a subset (possibly none or all) of the bits that should have been modified have been //! modified. //! - Reading a slice is deterministic. When power is lost while writing a slice or erasing a //! slice (erasing a page containing that slice), reading that slice repeatedly returns the same //! result (until it is overwritten or its page is erased). //! - To decide whether a page has been erased, it is enough to test if all its bits are equal //! to 1. //! - When power is lost while writing a slice or erasing a page, that operation does not count //! towards the limits. However, completing that write or erase operation would count towards //! the limits, as if the number of writes per word and number of erase cycles could be //! fractional. //! - The storage is only modified by the store. Note that completely erasing the storage is //! supported, essentially losing all content and lifetime tracking. It is preferred to use //! [`Store::clear`] with a threshold of 0 to keep the lifetime tracking. //! //! The store properties may still hold outside some of those assumptions, but with an increasing //! chance of failure. //! //! # Implementation //! //! We define the following constants: //! - [E](format::Format::max_page_erases) ≤ [65535](format::MAX_ERASE_CYCLE) the number of times //! a page can be erased. //! - 3 ≤ [N](format::Format::num_pages) < 64 the number of pages in the storage. //! - 8 ≤ P ≤ 1024 the number of words in a page. //! - [Q](format::Format::virt_page_size) = P - 2 the number of words in a virtual page. //! - [M](format::Format::max_prefix_len) = min(Q - 1, 256) the maximum length in words of a //! value. //! - [W](format::Format::window_size) = (N - 1) × Q - M the window size. //! - [V](format::Format::virt_size) = (N - 1) × (Q - 1) - M the virtual capacity. //! - [C](format::Format::total_capacity) = V - N the user capacity. //! //! We build a virtual storage from the physical storage using the first 2 words of each page: //! - The first word contains the number of times the page has been erased. //! - The second word contains the starting word to which this page is being moved during //! compaction. //! //! The virtual storage has a length of (E + 1) × N × Q words and represents the lifetime of the //! store. (We reserve the last Q + M words to support adding emergency lifetime.) This virtual //! storage has a linear address space. //! //! We define a set of overlapping windows of N × Q words at each Q-aligned boundary. We call i the //! window spanning from i × Q to (i + N) × Q. Only those windows actually exist in the underlying //! storage. We use compaction to shift the current window from i to i + 1, preserving the content //! of the store. //! //! For a given state of the virtual storage, we define h\_i as the position of the first entry of //! the window i. We call it the head of the window i. Because entries are at most M + 1 words, they //! can overlap on the next page only by M words. So we have i × Q ≤ h_i ≤ i × Q + M . Since there //! are no entries before the first page, we have h\_0 = 0. //! //! We define t\_i as one past the last entry of the window i. If there are no entries in that //! window, we have t\_i = h\_i. We call t\_i the tail of the window i. We define the compaction //! invariant as t\_i - h\_i ≤ V and the window invariant as t\_i - h\_i ≤ W. The compaction //! invariant may temporarily be broken during a sequence of (at most N - 1) compactions. //! //! We define |x| as the capacity used before position x. We have |x| ≤ x. We define the capacity //! invariant as |t\_i| - |h\_i| ≤ C. //! //! Using this virtual storage, entries are appended to the tail as long as there is both virtual //! capacity to preserve the compaction invariant and capacity to preserve the capacity invariant. //! When virtual capacity runs out, the first page of the window is compacted and the window is //! shifted. //! //! Entries are identified by a prefix of bits. The prefix has to contain at least one bit set to //! zero to differentiate from the tail. Entries can be one of: //! - [Padding](format::ID_PADDING): A word whose first bit is set to zero. The rest is arbitrary. //! This entry is used to mark words partially written after an interrupted operation as padding //! such that they are ignored by future operations. //! - [Header](format::ID_HEADER): A word whose second bit is set to zero. It contains the //! following fields: //! - A [bit](format::HEADER_DELETED) indicating whether the entry is deleted. //! - A [bit](format::HEADER_FLIPPED) indicating whether the value is word-aligned and has all //! bits set to 1 in its last word. The last word of an entry is used to detect that an //! entry has been fully written. As such it must contain at least one bit equal to zero. //! - The [key](format::HEADER_KEY) of the entry. //! - The [length](format::HEADER_LENGTH) in bytes of the value. The value follows the header. //! The entry is word-aligned if the value is not. //! - The [checksum](format::HEADER_CHECKSUM) of the first and last word of the entry. //! - [Erase](format::ID_ERASE): A word used during compaction. It contains the //! [page](format::ERASE_PAGE) to be erased and a [checksum](format::WORD_CHECKSUM). //! - [Clear](format::ID_CLEAR): A word used during the clear operation. It contains the //! [threshold](format::CLEAR_MIN_KEY) and a [checksum](format::WORD_CHECKSUM). //! - [Marker](format::ID_MARKER): A word used during a transaction. It contains the [number of //! updates](format::MARKER_COUNT) following the marker and a [checksum](format::WORD_CHECKSUM). //! - [Remove](format::ID_REMOVE): A word used inside a transaction. It contains the //! [key](format::REMOVE_KEY) of the entry to be removed and a //! [checksum](format::WORD_CHECKSUM). //! //! Checksums are the number of bits equal to 0. //! //! # Proofs //! //! ## Compaction //! //! Let I be a window at which all invariants hold. We will show that the next N - 1 compactions //! will preserve the window invariant (the capacity invariant is trivially preserved) after each //! compaction. We will also show that after N - 1 compactions, the compaction invariant is //! restored. //! //! We consider all notations on the virtual storage after the full compaction. We will use the |x| //! notation although we update the state of the virtual storage. This is fine because compaction //! doesn't change the status of an existing word. //! //! We first show that after each compaction, the window invariant is preserved. //! //! ```text //! ∀(1 ≤ i ≤ N - 1) t_{I + i} - h_{I + i} ≤ W //! ``` //! //! We assume i between 1 and N - 1. //! //! One step of compaction advances the tail by how many words were used in the first page of the //! window with the last entry possibly overlapping on the next page. //! //! ```text //! ∀j t_{j + 1} = t_j + |h_{j + 1}| - |h_j| + 1 //! ``` //! //! By induction, we have: //! //! ```text //! t_{I + i} = t_I + |h_{I + i}| - |h_I| + i //! ``` //! //! We have the following properties: //! //! ```text //! t_I ≤ h_I + V //! |h_{I + i}| - |h_I| ≤ h_{I + i} - h_I //! ``` //! //! Replacing into our previous equality, we can conclude: //! //! ```text //! t_{I + i} = t_I + |h_{I + i}| - |h_I| + i //! ≤ h_I + V + h_{I + 1} - h_I + i //! iff //! t_{I + i} - h_{I + 1} ≤ V + i //! ≤ V + N - 1 //! = W //! ``` //! //! An important corollary is that the tail stays within the window: //! //! ```text //! t_{I + i} ≤ (I + i + N - 1) × Q //! ``` //! //! We have the following property: //! //! ```text //! h_{I + i} ≤ (I + i) × Q + M //! ``` //! //! From which we conclude with the definition of W: //! //! ```text //! t_{I + i} ≤ h_{I + i} + W //! ≤ (I + i) × Q + M + (N - 1) × Q - M //! = (I + i + N - 1) × Q //! ``` //! //! We finally show that after N - 1 compactions, the compaction invariant is restored. In //! particular, the remaining capacity is available without compaction. //! //! ```text //! V - (t_{I + N - 1} - h_{I + N - 1}) ≥ C - (|t_{I + N - 1}| - |h_{I + N - 1}|) + 1 //! ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~ //! immediate capacity remaining capacity | //! reserved for clear //! ``` //! //! We can replace the definition of C and simplify: //! //! ```text //! V - (t_{I + N - 1} - h_{I + N - 1}) ≥ V - N - (|t_{I + N - 1}| - |h_{I + N - 1}|) + 1 //! iff t_{I + N - 1} - h_{I + N - 1} ≤ |t_{I + N - 1}| - |h_{I + N - 1}| + N - 1 //! ``` //! //! We have the following properties: //! //! ```text //! t_{I + N - 1} = t_I + |h_{I + N - 1}| - |h_I| + N - 1 //! |t_{I + N - 1}| - |h_{I + N - 1}| = |t_I| - |h_I| //! |h_{I + N - 1}| - |t_I| ≤ h_{I + N - 1} - t_I //! ``` //! //! From which we conclude: //! //! ```text //! t_{I + N - 1} - h_{I + N - 1} ≤ |t_{I + N - 1}| - |h_{I + N - 1}| + N - 1 //! iff t_I + |h_{I + N - 1}| - |h_I| + N - 1 - h_{I + N - 1} ≤ |t_I| - |h_I| + N - 1 //! iff t_I + |h_{I + N - 1}| - h_{I + N - 1} ≤ |t_I| //! iff |h_{I + N - 1}| - |t_I| ≤ h_{I + N - 1} - t_I //! ``` //! //! ## Checksum //! //! The main property we want is that all partially written/erased words are either the initial //! word, the final word, or invalid. //! //! We say that a bit sequence `TARGET` is reachable from a bit sequence `SOURCE` if both have the //! same length and `SOURCE & TARGET == TARGET` where `&` is the bitwise AND operation on bit //! sequences of that length. In other words, when `SOURCE` has a bit equal to 0 then `TARGET` also //! has that bit equal to 0. //! //! The only written entries start with `101` or `110` and are written from an erased word. Marking //! an entry as padding or deleted is a single bit operation, so the property trivially holds. For //! those cases, the proof relies on the fact that there is exactly one bit equal to 0 in the 3 //! first bits. Either the 3 first bits are still `111` in which case we expect the remaining bits //! to be equal to 1. Otherwise we can use the checksum of the given type of entry because those 2 //! types of entries are not reachable from each other. Here is a visualization of the partitioning //! based on the first 3 bits: //! //! | First 3 bits | Description | How to check | //! | ------------:| ------------------ | ---------------------------- | //! | `111` | Erased word | All bits set to `1` | //! | `101` | User entry | Contains a checksum | //! | `110` | Internal entry | Contains a checksum | //! | `100` | Deleted user entry | No check, atomically written | //! | `0??` | Padding entry | No check, atomically written | //! //! To show that valid entries of a given type are not reachable from each other, we show 3 lemmas: //! //! 1. A bit sequence is not reachable from another if its number of bits equal to 0 is smaller. //! 2. A bit sequence is not reachable from another if they have the same number of bits equals to //! 0 and are different. //! 3. A bit sequence is not reachable from another if it is bigger when they are interpreted as //! numbers in binary representation. //! //! From those lemmas we consider the 2 cases. If both entries have the same number of bits equal to //! 0, they are either equal or not reachable from each other because of the second lemma. If they //! don't have the same number of bits equal to 0, then the one with less bits equal to 0 is not //! reachable from the other because of the first lemma and the one with more bits equal to 0 is not //! reachable from the other because of the third lemma and the definition of the checksum. //! //! # Fuzzing //! //! For any sequence of operations and interruptions starting from an erased storage, the store is //! checked against its model and some internal invariant at each step. //! //! For any sequence of operations and interruptions starting from an arbitrary storage, the store //! is checked not to crash. #![cfg_attr(not(feature = "std"), no_std)] #[macro_use] extern crate alloc; #[cfg(feature = "std")] mod buffer; #[cfg(feature = "std")] mod driver; #[cfg(feature = "std")] mod file; mod format; pub mod fragment; #[cfg(feature = "std")] mod model; mod storage; mod store; #[cfg(test)] mod test; #[cfg(feature = "std")] pub use self::buffer::{BufferCorruptFunction, BufferOptions, BufferStorage}; #[cfg(feature = "std")] pub use self::driver::{ StoreDriver, StoreDriverOff, StoreDriverOn, StoreInterruption, StoreInvariant, }; #[cfg(feature = "std")] pub use self::file::{FileOptions, FileStorage}; #[cfg(feature = "std")] pub use self::model::{StoreModel, StoreOperation}; pub use self::storage::{Storage, StorageError, StorageIndex, StorageResult}; pub use self::store::{ Store, StoreError, StoreHandle, StoreIter, StoreRatio, StoreResult, StoreUpdate, }; /// Internal representation of natural numbers. /// /// In Rust natural numbers are represented as `usize`. However, internally we represent them as /// `u32`. This is done to preserve semantics across different targets. This is useful when tests /// run with `usize = u64` while the actual target has `usize = u32`. /// /// To avoid too many conversions between `usize` and `Nat` which are necessary when interfacing /// with Rust, `usize` is used instead of `Nat` in code meant only for tests. /// /// Currently, the store only supports targets with `usize = u32`. // Make sure production builds have `usize = 32`. #[cfg(any(target_pointer_width = "32", feature = "std"))] type Nat = u32; /// Returns the internal representation of a Rust natural number. /// /// # Panics /// /// Panics if the conversion overflows. fn usize_to_nat(x: usize) -> Nat { use core::convert::TryFrom; Nat::try_from(x).unwrap() }