OSTEP Chapter 46 (Fin)

This commit is contained in:
ridethepig 2023-04-16 22:01:45 +08:00
parent 539ca5c3d1
commit 5647679359
6 changed files with 1355 additions and 26 deletions

File diff suppressed because it is too large Load Diff

View File

@ -342,7 +342,7 @@
;; ;use triple underscore `___` for slash `/` in page title
;; ;use Percent-encoding for other invalid characters
:file/name-format :triple-lowbar
:ui/show-brackets? true
:ui/show-brackets? false
:feature/enable-timetracking? false
;; specify the format of the filename for journal files

View File

@ -1671,31 +1671,27 @@ file-path:: ../assets/ostep_1681115599584_0.pdf
- COW: never overwrite in place
- back-pointer: add backward pointer to inode to check consistency
- optimistic crash consistency: kind of transaction checksum
- premise
- premise 引出,预先提出;作为…的前提
ls-type:: annotation
hl-page:: 563
hl-color:: green
id:: 643b824d-2732-46ae-961d-74a06db18138
- tad
- tad 少量;一点儿:
ls-type:: annotation
hl-page:: 563
hl-color:: green
id:: 643b824f-c588-4b82-93e6-393016d3b5b1
- hideous
- hideous 可怕的;丑恶的
ls-type:: annotation
hl-page:: 572
hl-color:: green
id:: 643b9c40-6329-4720-9d26-75a78701392c
- hairy
ls-type:: annotation
hl-page:: 572
hl-color:: green
id:: 643b9c49-9e7c-4379-a640-8aff152ea511
- ## Log-structured File Systems
hl-page:: 579
ls-type:: annotation
id:: 643b8dad-3813-4048-8d04-5eb93a6bd182
hl-color:: yellow
collapsed:: true
- **Writing To Disk Sequentially**
hl-page:: 580
ls-type:: annotation
@ -1803,38 +1799,337 @@ file-path:: ../assets/ostep_1681115599584_0.pdf
ls-type:: annotation
id:: 643bc10e-be20-47b3-bab4-713493dd5153
hl-color:: yellow
- mandate (政府组织经过选举得到的)授权;(政府)任期;委托;
hl-page:: 586
ls-type:: annotation
id:: 643bb439-b7d0-4170-9417-cd900062bfbd
hl-color:: green
- entail 牵涉;须要;使必要 involve
hl-page:: 586
ls-type:: annotation
id:: 643bb533-1fee-4b9a-9965-7a63016d5591
hl-color:: green
- ceremonious 讲究礼节的,正式的
hl-page:: 587
ls-type:: annotation
id:: 643bb6e8-6466-4574-aa04-4ea25b3e9034
hl-color:: green
- cease 停止, 终止, 结束
ls-type:: annotation
hl-page:: 595
hl-color:: green
id:: 643bc4b4-dc22-471c-9229-558a42904cc8
- ## Flash-based SSDs
ls-type:: annotation
hl-page:: 595
hl-color:: yellow
id:: 643ba369-83df-42f9-9ee9-b45d4652e8fb
collapsed:: true
- Storing a Single Bit
ls-type:: annotation
hl-page:: 595
hl-color:: yellow
id:: 643bce2b-3e0d-4860-a7fd-34c0b0565fe2
- Flash chips are designed to store one or more bits in a single transistor; the level of charge trapped within the transistor is mapped to a binary value. Such as SLC (0, 1), MLC (00, 01, 10, 11), TLC and even QLC
hl-page:: 595
ls-type:: annotation
id:: 643bce49-47a3-43c6-ae82-825fd5224dd4
hl-color:: yellow
- From Bits to Banks
ls-type:: annotation
hl-page:: 596
hl-color:: yellow
id:: 643bcf1a-e41d-4ad2-83a3-ed550f9be123
- page: a few KB in size
- block (erase block): hundreds of KB, consists of many pages
- bank/plane: flash chips are organized into banks/planes, consisting of a large number of cells.
- Basic Flash Operations
ls-type:: annotation
hl-page:: 597
hl-color:: yellow
id:: 643bcf8f-05db-476d-8b73-e9a052d91e4d
- **Read** (a page): ==Any page==; Fast; Access any location ==uniformly quickly==
- **Erase** (a ==block==): Before writing to a page, the page's enclosing block must be *erased* (all set to 1). ==Expensive==. Flash chips will ==wear out== as it is erased.
- **Program** (a page): Once a block has been erased, it can be *programmed* by page, changing some of the 1s to 0s in order to write the desired content. Slower than *read*, but faster than *erase*.
- One way to think about flash chips is that each page has a state associated with it, namely INVALID, VALID and ERASED.
hl-page:: 597
ls-type:: annotation
id:: 643bd219-634d-4c9a-abf7-e266b5b3c2d7
hl-color:: yellow
- Reliability Problem
- Wear out
- when a flash block is erased and programmed, it slowly accrues a little bit of extra charge. Over time, as that extra charge builds up, it becomes increasingly difficult to differentiate between a 0 and a 1
ls-type:: annotation
hl-page:: 599
hl-color:: yellow
id:: 643bd3c4-c868-44fb-bd96-1ac7f3fe14c0
- Disturbance
- When accessing a particular page within a flash, it is possible that some bits get flipped in neighboring pages
ls-type:: annotation
hl-page:: 599
hl-color:: yellow
id:: 643bd3e8-304f-4e43-93a6-a8630df283b0
- Most SSDs will write pages in order (i.e., low to high), reducing reliability problems related to program disturbance.
ls-type:: annotation
hl-page:: 603
hl-color:: yellow
id:: 643bd8fd-99ed-4d7b-adaa-50be9ee619dc
- Flash Translation Layer (FTL)
hl-page:: 600
ls-type:: annotation
id:: 643bd544-e923-48b0-a513-2e8d3753e0c2
hl-color:: yellow
- FTL turns client reads and writes into internal flash operations, i.e., accepts requests on logical blocks and do low-level commands on underlying physical blocks and pages.
- **write amplification**: The total traffic issued to the flash chips by FTL $\div$ the total traffic issued by the client.
hl-page:: 600
ls-type:: annotation
id:: 643bd5c6-9fbd-4bac-a71a-0e86a73b7ce2
hl-color:: yellow
- Goal: More parallelism, Less write amplification, Reduce wear out, Minimize program disturbance
- Direct mapped FTL
hl-page:: 601
ls-type:: annotation
id:: 643bd69a-5cc5-4ce1-97c9-805f422a0562
hl-color:: yellow
- A logical page is mapped directly to a physical page.
- Bad idea. Write is slow and leads to severe amplification, because it needs to read, erase and program the whole block for a single page.
- Log-Structured FTL
ls-type:: annotation
hl-page:: 602
hl-color:: yellow
id:: 643bd777-947b-4627-844b-b84fd5573657
- Upon a write to logical block N , the device appends the write to the next free spot in the currently-being-written-to block.
hl-page:: 602
ls-type:: annotation
id:: 643bd89f-4561-4b1e-94fa-9bd46914d870
hl-color:: yellow
- To allow for subsequent reads of block N , the device keeps a mapping table which stores the physical address of each logical block in the system.
ls-type:: annotation
hl-page:: 602
hl-color:: yellow
id:: 643bd8de-7dda-4882-ab7c-bbbe75f2a925
- Garbage Collection
ls-type:: annotation
hl-page:: 604
hl-color:: yellow
id:: 643bdcba-9e46-4dfc-8366-6472c734abdb
- Find a block that contains dead pages, read its live pages, write those live pages to the log, and reclaim the entire block.
id:: 643bdcd2-053a-4c18-bf9a-393fd367ebef
- GC can be ==expensive==, requiring reading and rewriting of live data. The ideal candidate for reclamation is a ==block that consists of only dead pages==.
- overprovision: adding extra flash capacity, cleaning can be delayed and pushed to the background
hl-page:: 606
ls-type:: annotation
id:: 643bdd25-8551-40be-9072-2cc3342f6c42
hl-color:: yellow
- **trim** operation: inform FTL that the logical block has been deleted and thus the device no longer need to track it.
hl-page:: 606
ls-type:: annotation
id:: 643bde06-ad8f-42a7-a322-85d8b511d56e
hl-color:: yellow
- Mapping Table Size
ls-type:: annotation
hl-page:: 606
hl-color:: yellow
id:: 643bdf64-fb3a-4417-8dc8-3cc736841285
- Page-level mapping takes up too much space
- Block-Based Mapping
ls-type:: annotation
hl-page:: 606
hl-color:: yellow
id:: 643bdf97-2eae-45e1-b6be-c93c7c47112b
- Block-level mapping is akin to larger page size in VM, the basic unit grows from page to block.
- Terrible performance under log-structured scheme. Even a write is small (page size), the FTL has to read from the old block and write the updated block to log. This leads to severe write amplification.
- Hybrid Mapping
ls-type:: annotation
hl-page:: 608
hl-color:: yellow
id:: 643be2e5-4f61-4aa5-9885-e0fc862c3df6
- **log table**: FTL keeps a few blocks erased and directs all writes to them, and keeps per-page mappings for these *log blocks*.
- **data table**: per-block mappings
- When looking for a logical address, FTL first consults the *log table*, and consults the *data table* if not found.
- To keep the log table small, FTL has to periodically examine the *log blocks* and switch them into *data blocks* (which can be pointed to by a block-level mapping). The details of three different situation, refer to the example in the book.
- switch merge: the pages in a log block can exactly share the same block number
hl-page:: 609
ls-type:: annotation
id:: 643be6ec-00bb-413f-96a1-7268f5b01709
hl-color:: yellow
- partial merge: some of the pages in a log block can share the same block, so FTL needs to move their buddies here to form a data block
hl-page:: 610
ls-type:: annotation
id:: 643be6f3-d351-4b04-a201-03dda410950d
hl-color:: yellow
- full merge: none of these pages can share the same block. better not merge the block
hl-page:: 610
ls-type:: annotation
id:: 643be6f7-4656-4f93-a208-88a6fa9be6e0
hl-color:: yellow
- Page Mapping Plus Caching
hl-page:: 610
ls-type:: annotation
id:: 643be86a-9bd8-4962-92ac-76832cc93a6c
hl-color:: yellow
collapsed:: true
- Akin to paging in VM, load a small active set of the page-level mappings into the memory.
- If working set is limited, this approach works fine. Otherwise, frequent eviction will damage the performance.
- Wear Leveling
ls-type:: annotation
hl-page:: 611
hl-color:: yellow
id:: 643be88d-4648-4dc3-8f5a-fc7c45fa144a
collapsed:: true
- Spread erase/program across the blocks of the device evenly.
- The log structured approach does most of the work for this goal, but one problem remains. Blocks filled with long-lived data rarely get overwritten and thus do not receive fair share of write load.
- One simple solution could be periodically move such blocks elsewhere, but it will increase write amplification.
- SSD Performance
ls-type:: annotation
hl-page:: 611
hl-color:: yellow
id:: 643bdf55-53c3-407f-b87f-86b3d8f1141b
- SSD outperforms HDD dramatically in random IO, while there is less difference in Sequential IO.
- Random read is slower than random write for SSD, due to the log-structured design.
- accrue 逐渐增加;积累
hl-page:: 599
ls-type:: annotation
id:: 643bd3a4-af24-4e7f-905b-f3c3a8739831
hl-color:: green
- rigid 死板的;僵硬的
hl-page:: 600
ls-type:: annotation
id:: 643bd351-d4f4-406a-9910-f44ab31bc83f
hl-color:: green
- ## Data Integrity and Protection
ls-type:: annotation
hl-page:: 619
hl-color:: yellow
id:: 643ba392-acd9-4255-930e-a97f94fb28ef
- spouse
collapsed:: true
- Disk Failure Modes
ls-type:: annotation
hl-page:: 619
hl-color:: yellow
id:: 643bec95-40fd-4df9-9981-1f6d641ec520
- Latent-sector errors
- LSEs arise when a disk sector (or group of sectors) has been damaged in some way.
ls-type:: annotation
hl-page:: 620
hl-color:: yellow
id:: 643beca7-e6d1-4a17-93ec-d7445eee92c1
- Head crash (disk head somehow touches the surface and damages it) or Cosmic rays!
- Can be detected or even corrected by in-disk ECC (error correcting code).
- Block Corruption
- Not detectable by the disk itself. Silent faults
- Buggy firmware, faulty bus
- Handling Latent Sector Errors
ls-type:: annotation
hl-page:: 621
hl-color:: yellow
id:: 643bed56-bc80-4332-b799-933755811759
- Since LSEs can be ==easily detected==, the storage system simply uses whatever ==redundancy mechanism to recover== this.
- Detecting Corruption: The Checksum
ls-type:: annotation
hl-page:: 622
hl-color:: yellow
id:: 643beee5-af3c-44c2-bf55-716c0a4ce0c4
- A function takes a chunk of data as input and produces ==a small summary of the data==, which is the checksum. Checksum should enable the system to detect data corruption by ==re-computing and matching==
- Common Checksum Functions
ls-type:: annotation
hl-page:: 623
hl-color:: yellow
id:: 643befcd-f69a-4c19-bb76-21d8945d4cc8
- XOR: only detect odd number of bit(s) flip
- 2's compliment addition (ignoring overflow): vulnerable to shift
- Fletcher checksum: almost as strong as the CRC, detecting all single-bit, double-bit errors, and many burst errors
- ```C
uint16_t Fletcher16( uint8_t *data, int count )
{
uint16_t sum1 = 0;
uint16_t sum2 = 0;
int index;
for ( index = 0; index < count; ++index ) {
sum1 = (sum1 + data[index]) % 255;
sum2 = (sum2 + sum1) % 255;
}
return (sum2 << 8) | sum1;
}
```
- CRC: Treat the data block `D` as a large binary number and divide it by an agreed value `k`. The remainder is the CRC value.
- No perfect checksum, there is always a collision (non-identical data generate identical checksum)
- Checksum Layout
ls-type:: annotation
hl-page:: 624
hl-color:: yellow
id:: 643bf039-a6cb-475c-b990-df21d8f3919f
- If supported by drive manufacturer, one solution is to format the drive with 8-byte checksum and 520-byte data per sector.
- Another solution: the FS packs checksums into 512 Byte blocks to be stored in sectors with corresponding data sectors following.
- Using Checksums: compare *stored checksum* and *computed checksum*
hl-page:: 625
ls-type:: annotation
id:: 643bf2c6-9c4f-44a8-bcc5-0af3570b64be
hl-color:: yellow
- Misdirected Writes
ls-type:: annotation
hl-page:: 626
hl-color:: yellow
id:: 643bf2f7-fc6a-4289-a50b-784e6a765eb9
- Disk/RAID controllers write the data to disk correctly but ==in the wrong location==. Checksum itself won't help in this situation.
hl-page:: 626
ls-type:: annotation
id:: 643bf30b-6fcf-4d5f-9c13-dd29d4284f63
hl-color:: yellow
- Add an extra *physical ID* to each checksum, and we can check this since data itself is correct.
- Lost Writes
ls-type:: annotation
hl-page:: 627
hl-color:: yellow
id:: 643bf3ff-eeec-4c89-b385-6a104d0596bd
- The device informs the upper layer that a write is ==completed but in fact not persisted==. Checksum won't help, since the new checksum does not get to disk either.
hl-page:: 627
ls-type:: annotation
id:: 643bf40f-573a-4004-9b3f-443502a7a198
hl-color:: yellow
- Solution: Perform a write verify or read-after-write, though slow. Add a checksum elsewhere in the system to detect lost writes.
- Disk Scrubbing
hl-page:: 628
ls-type:: annotation
id:: 643bf592-4dec-43c4-b8ff-996d765e071b
hl-color:: yellow
- Most data is rarely accessed, and thus would stay unchecked, which affects the reliability.
- Many systems utilize disk scrubbing (i.e., periodically read through every block and check them)
- Overheads Of Checksumming
hl-page:: 628
ls-type:: annotation
id:: 643bf4f5-4ec4-4b42-962d-8c3a7729b64e
hl-color:: yellow
- Space: disk (take up user data space) and memory (mostly short-lived, not a problem)
- Time: CPU (has to compute through the data) and IO (checksum stored elsewhere, or scrubbing)
- CPU overheads can be reduced by combining data copying and checking, since copy is needed anyhow
- beverage (除水以外的)饮料
hl-page:: 623
ls-type:: annotation
id:: 643befc3-80a8-40de-a3b9-c994a90c0f0a
hl-color:: green
- scrub 擦洗;刷洗;矮树丛
hl-page:: 627
ls-type:: annotation
id:: 643bf4d3-df61-4530-928f-ed524699c44f
hl-color:: green
- spouse 配偶
ls-type:: annotation
hl-page:: 633
hl-color:: green
id:: 643ba3b2-5a2a-4589-a871-62ad213de195
- mandate
- levity 轻率的举止;轻浮
hl-page:: 633
ls-type:: annotation
hl-page:: 586
id:: 643bfdfa-6681-4fcc-b7c1-b84887afeecd
hl-color:: green
id:: 643bb439-b7d0-4170-9417-cd900062bfbd
- entail
- sarcastic 讥讽的, 讽刺的,
hl-page:: 634
ls-type:: annotation
hl-page:: 586
id:: 643bfe9d-913f-4ab7-aba3-a3fac83d1dfb
hl-color:: green
id:: 643bb533-1fee-4b9a-9965-7a63016d5591
- ceremonious
- scribble 草草记下,匆匆书写;胡写乱画;潦草的文字
hl-page:: 634
ls-type:: annotation
hl-page:: 587
hl-color:: green
id:: 643bb6e8-6466-4574-aa04-4ea25b3e9034
- cease
ls-type:: annotation
hl-page:: 595
hl-color:: green
id:: 643bc4b4-dc22-471c-9229-558a42904cc8
id:: 643bfeb8-34d1-428d-82d5-0bfefb871d4e
hl-color:: green