OSTEP Chapter 46 (Fin)
This commit is contained in:
parent
539ca5c3d1
commit
5647679359
File diff suppressed because it is too large
Load Diff
@ -342,7 +342,7 @@
|
||||
;; ;use triple underscore `___` for slash `/` in page title
|
||||
;; ;use Percent-encoding for other invalid characters
|
||||
:file/name-format :triple-lowbar
|
||||
:ui/show-brackets? true
|
||||
:ui/show-brackets? false
|
||||
:feature/enable-timetracking? false
|
||||
|
||||
;; specify the format of the filename for journal files
|
||||
|
||||
@ -1671,31 +1671,27 @@ file-path:: ../assets/ostep_1681115599584_0.pdf
|
||||
- COW: never overwrite in place
|
||||
- back-pointer: add backward pointer to inode to check consistency
|
||||
- optimistic crash consistency: kind of transaction checksum
|
||||
- premise
|
||||
- premise 引出,预先提出;作为…的前提
|
||||
ls-type:: annotation
|
||||
hl-page:: 563
|
||||
hl-color:: green
|
||||
id:: 643b824d-2732-46ae-961d-74a06db18138
|
||||
- tad
|
||||
- tad 少量;一点儿:
|
||||
ls-type:: annotation
|
||||
hl-page:: 563
|
||||
hl-color:: green
|
||||
id:: 643b824f-c588-4b82-93e6-393016d3b5b1
|
||||
- hideous
|
||||
- hideous 可怕的;丑恶的
|
||||
ls-type:: annotation
|
||||
hl-page:: 572
|
||||
hl-color:: green
|
||||
id:: 643b9c40-6329-4720-9d26-75a78701392c
|
||||
- hairy
|
||||
ls-type:: annotation
|
||||
hl-page:: 572
|
||||
hl-color:: green
|
||||
id:: 643b9c49-9e7c-4379-a640-8aff152ea511
|
||||
- ## Log-structured File Systems
|
||||
hl-page:: 579
|
||||
ls-type:: annotation
|
||||
id:: 643b8dad-3813-4048-8d04-5eb93a6bd182
|
||||
hl-color:: yellow
|
||||
collapsed:: true
|
||||
- **Writing To Disk Sequentially**
|
||||
hl-page:: 580
|
||||
ls-type:: annotation
|
||||
@ -1803,38 +1799,337 @@ file-path:: ../assets/ostep_1681115599584_0.pdf
|
||||
ls-type:: annotation
|
||||
id:: 643bc10e-be20-47b3-bab4-713493dd5153
|
||||
hl-color:: yellow
|
||||
- mandate (政府组织经过选举得到的)授权;(政府)任期;委托;
|
||||
hl-page:: 586
|
||||
ls-type:: annotation
|
||||
id:: 643bb439-b7d0-4170-9417-cd900062bfbd
|
||||
hl-color:: green
|
||||
- entail 牵涉;须要;使必要 involve
|
||||
hl-page:: 586
|
||||
ls-type:: annotation
|
||||
id:: 643bb533-1fee-4b9a-9965-7a63016d5591
|
||||
hl-color:: green
|
||||
- ceremonious 讲究礼节的,正式的
|
||||
hl-page:: 587
|
||||
ls-type:: annotation
|
||||
id:: 643bb6e8-6466-4574-aa04-4ea25b3e9034
|
||||
hl-color:: green
|
||||
- cease 停止, 终止, 结束
|
||||
ls-type:: annotation
|
||||
hl-page:: 595
|
||||
hl-color:: green
|
||||
id:: 643bc4b4-dc22-471c-9229-558a42904cc8
|
||||
- ## Flash-based SSDs
|
||||
ls-type:: annotation
|
||||
hl-page:: 595
|
||||
hl-color:: yellow
|
||||
id:: 643ba369-83df-42f9-9ee9-b45d4652e8fb
|
||||
collapsed:: true
|
||||
- Storing a Single Bit
|
||||
ls-type:: annotation
|
||||
hl-page:: 595
|
||||
hl-color:: yellow
|
||||
id:: 643bce2b-3e0d-4860-a7fd-34c0b0565fe2
|
||||
- Flash chips are designed to store one or more bits in a single transistor; the level of charge trapped within the transistor is mapped to a binary value. Such as SLC (0, 1), MLC (00, 01, 10, 11), TLC and even QLC
|
||||
hl-page:: 595
|
||||
ls-type:: annotation
|
||||
id:: 643bce49-47a3-43c6-ae82-825fd5224dd4
|
||||
hl-color:: yellow
|
||||
- From Bits to Banks
|
||||
ls-type:: annotation
|
||||
hl-page:: 596
|
||||
hl-color:: yellow
|
||||
id:: 643bcf1a-e41d-4ad2-83a3-ed550f9be123
|
||||
- page: a few KB in size
|
||||
- block (erase block): hundreds of KB, consists of many pages
|
||||
- bank/plane: flash chips are organized into banks/planes, consisting of a large number of cells.
|
||||
- Basic Flash Operations
|
||||
ls-type:: annotation
|
||||
hl-page:: 597
|
||||
hl-color:: yellow
|
||||
id:: 643bcf8f-05db-476d-8b73-e9a052d91e4d
|
||||
- **Read** (a page): ==Any page==; Fast; Access any location ==uniformly quickly==
|
||||
- **Erase** (a ==block==): Before writing to a page, the page's enclosing block must be *erased* (all set to 1). ==Expensive==. Flash chips will ==wear out== as it is erased.
|
||||
- **Program** (a page): Once a block has been erased, it can be *programmed* by page, changing some of the 1s to 0s in order to write the desired content. Slower than *read*, but faster than *erase*.
|
||||
- One way to think about flash chips is that each page has a state associated with it, namely INVALID, VALID and ERASED.
|
||||
hl-page:: 597
|
||||
ls-type:: annotation
|
||||
id:: 643bd219-634d-4c9a-abf7-e266b5b3c2d7
|
||||
hl-color:: yellow
|
||||
- Reliability Problem
|
||||
- Wear out
|
||||
- when a flash block is erased and programmed, it slowly accrues a little bit of extra charge. Over time, as that extra charge builds up, it becomes increasingly difficult to differentiate between a 0 and a 1
|
||||
ls-type:: annotation
|
||||
hl-page:: 599
|
||||
hl-color:: yellow
|
||||
id:: 643bd3c4-c868-44fb-bd96-1ac7f3fe14c0
|
||||
- Disturbance
|
||||
- When accessing a particular page within a flash, it is possible that some bits get flipped in neighboring pages
|
||||
ls-type:: annotation
|
||||
hl-page:: 599
|
||||
hl-color:: yellow
|
||||
id:: 643bd3e8-304f-4e43-93a6-a8630df283b0
|
||||
- Most SSDs will write pages in order (i.e., low to high), reducing reliability problems related to program disturbance.
|
||||
ls-type:: annotation
|
||||
hl-page:: 603
|
||||
hl-color:: yellow
|
||||
id:: 643bd8fd-99ed-4d7b-adaa-50be9ee619dc
|
||||
- Flash Translation Layer (FTL)
|
||||
hl-page:: 600
|
||||
ls-type:: annotation
|
||||
id:: 643bd544-e923-48b0-a513-2e8d3753e0c2
|
||||
hl-color:: yellow
|
||||
- FTL turns client reads and writes into internal flash operations, i.e., accepts requests on logical blocks and do low-level commands on underlying physical blocks and pages.
|
||||
- **write amplification**: The total traffic issued to the flash chips by FTL $\div$ the total traffic issued by the client.
|
||||
hl-page:: 600
|
||||
ls-type:: annotation
|
||||
id:: 643bd5c6-9fbd-4bac-a71a-0e86a73b7ce2
|
||||
hl-color:: yellow
|
||||
- Goal: More parallelism, Less write amplification, Reduce wear out, Minimize program disturbance
|
||||
- Direct mapped FTL
|
||||
hl-page:: 601
|
||||
ls-type:: annotation
|
||||
id:: 643bd69a-5cc5-4ce1-97c9-805f422a0562
|
||||
hl-color:: yellow
|
||||
- A logical page is mapped directly to a physical page.
|
||||
- Bad idea. Write is slow and leads to severe amplification, because it needs to read, erase and program the whole block for a single page.
|
||||
- Log-Structured FTL
|
||||
ls-type:: annotation
|
||||
hl-page:: 602
|
||||
hl-color:: yellow
|
||||
id:: 643bd777-947b-4627-844b-b84fd5573657
|
||||
- Upon a write to logical block N , the device appends the write to the next free spot in the currently-being-written-to block.
|
||||
hl-page:: 602
|
||||
ls-type:: annotation
|
||||
id:: 643bd89f-4561-4b1e-94fa-9bd46914d870
|
||||
hl-color:: yellow
|
||||
- To allow for subsequent reads of block N , the device keeps a mapping table which stores the physical address of each logical block in the system.
|
||||
ls-type:: annotation
|
||||
hl-page:: 602
|
||||
hl-color:: yellow
|
||||
id:: 643bd8de-7dda-4882-ab7c-bbbe75f2a925
|
||||
- Garbage Collection
|
||||
ls-type:: annotation
|
||||
hl-page:: 604
|
||||
hl-color:: yellow
|
||||
id:: 643bdcba-9e46-4dfc-8366-6472c734abdb
|
||||
- Find a block that contains dead pages, read its live pages, write those live pages to the log, and reclaim the entire block.
|
||||
id:: 643bdcd2-053a-4c18-bf9a-393fd367ebef
|
||||
- GC can be ==expensive==, requiring reading and rewriting of live data. The ideal candidate for reclamation is a ==block that consists of only dead pages==.
|
||||
- overprovision: adding extra flash capacity, cleaning can be delayed and pushed to the background
|
||||
hl-page:: 606
|
||||
ls-type:: annotation
|
||||
id:: 643bdd25-8551-40be-9072-2cc3342f6c42
|
||||
hl-color:: yellow
|
||||
- **trim** operation: inform FTL that the logical block has been deleted and thus the device no longer need to track it.
|
||||
hl-page:: 606
|
||||
ls-type:: annotation
|
||||
id:: 643bde06-ad8f-42a7-a322-85d8b511d56e
|
||||
hl-color:: yellow
|
||||
- Mapping Table Size
|
||||
ls-type:: annotation
|
||||
hl-page:: 606
|
||||
hl-color:: yellow
|
||||
id:: 643bdf64-fb3a-4417-8dc8-3cc736841285
|
||||
- Page-level mapping takes up too much space
|
||||
- Block-Based Mapping
|
||||
ls-type:: annotation
|
||||
hl-page:: 606
|
||||
hl-color:: yellow
|
||||
id:: 643bdf97-2eae-45e1-b6be-c93c7c47112b
|
||||
- Block-level mapping is akin to larger page size in VM, the basic unit grows from page to block.
|
||||
- Terrible performance under log-structured scheme. Even a write is small (page size), the FTL has to read from the old block and write the updated block to log. This leads to severe write amplification.
|
||||
- Hybrid Mapping
|
||||
ls-type:: annotation
|
||||
hl-page:: 608
|
||||
hl-color:: yellow
|
||||
id:: 643be2e5-4f61-4aa5-9885-e0fc862c3df6
|
||||
- **log table**: FTL keeps a few blocks erased and directs all writes to them, and keeps per-page mappings for these *log blocks*.
|
||||
- **data table**: per-block mappings
|
||||
- When looking for a logical address, FTL first consults the *log table*, and consults the *data table* if not found.
|
||||
- To keep the log table small, FTL has to periodically examine the *log blocks* and switch them into *data blocks* (which can be pointed to by a block-level mapping). The details of three different situation, refer to the example in the book.
|
||||
- switch merge: the pages in a log block can exactly share the same block number
|
||||
hl-page:: 609
|
||||
ls-type:: annotation
|
||||
id:: 643be6ec-00bb-413f-96a1-7268f5b01709
|
||||
hl-color:: yellow
|
||||
- partial merge: some of the pages in a log block can share the same block, so FTL needs to move their buddies here to form a data block
|
||||
hl-page:: 610
|
||||
ls-type:: annotation
|
||||
id:: 643be6f3-d351-4b04-a201-03dda410950d
|
||||
hl-color:: yellow
|
||||
- full merge: none of these pages can share the same block. better not merge the block
|
||||
hl-page:: 610
|
||||
ls-type:: annotation
|
||||
id:: 643be6f7-4656-4f93-a208-88a6fa9be6e0
|
||||
hl-color:: yellow
|
||||
- Page Mapping Plus Caching
|
||||
hl-page:: 610
|
||||
ls-type:: annotation
|
||||
id:: 643be86a-9bd8-4962-92ac-76832cc93a6c
|
||||
hl-color:: yellow
|
||||
collapsed:: true
|
||||
- Akin to paging in VM, load a small active set of the page-level mappings into the memory.
|
||||
- If working set is limited, this approach works fine. Otherwise, frequent eviction will damage the performance.
|
||||
- Wear Leveling
|
||||
ls-type:: annotation
|
||||
hl-page:: 611
|
||||
hl-color:: yellow
|
||||
id:: 643be88d-4648-4dc3-8f5a-fc7c45fa144a
|
||||
collapsed:: true
|
||||
- Spread erase/program across the blocks of the device evenly.
|
||||
- The log structured approach does most of the work for this goal, but one problem remains. Blocks filled with long-lived data rarely get overwritten and thus do not receive fair share of write load.
|
||||
- One simple solution could be periodically move such blocks elsewhere, but it will increase write amplification.
|
||||
- SSD Performance
|
||||
ls-type:: annotation
|
||||
hl-page:: 611
|
||||
hl-color:: yellow
|
||||
id:: 643bdf55-53c3-407f-b87f-86b3d8f1141b
|
||||
- SSD outperforms HDD dramatically in random IO, while there is less difference in Sequential IO.
|
||||
- Random read is slower than random write for SSD, due to the log-structured design.
|
||||
- accrue 逐渐增加;积累
|
||||
hl-page:: 599
|
||||
ls-type:: annotation
|
||||
id:: 643bd3a4-af24-4e7f-905b-f3c3a8739831
|
||||
hl-color:: green
|
||||
- rigid 死板的;僵硬的
|
||||
hl-page:: 600
|
||||
ls-type:: annotation
|
||||
id:: 643bd351-d4f4-406a-9910-f44ab31bc83f
|
||||
hl-color:: green
|
||||
- ## Data Integrity and Protection
|
||||
ls-type:: annotation
|
||||
hl-page:: 619
|
||||
hl-color:: yellow
|
||||
id:: 643ba392-acd9-4255-930e-a97f94fb28ef
|
||||
- spouse
|
||||
collapsed:: true
|
||||
- Disk Failure Modes
|
||||
ls-type:: annotation
|
||||
hl-page:: 619
|
||||
hl-color:: yellow
|
||||
id:: 643bec95-40fd-4df9-9981-1f6d641ec520
|
||||
- Latent-sector errors
|
||||
- LSEs arise when a disk sector (or group of sectors) has been damaged in some way.
|
||||
ls-type:: annotation
|
||||
hl-page:: 620
|
||||
hl-color:: yellow
|
||||
id:: 643beca7-e6d1-4a17-93ec-d7445eee92c1
|
||||
- Head crash (disk head somehow touches the surface and damages it) or Cosmic rays!
|
||||
- Can be detected or even corrected by in-disk ECC (error correcting code).
|
||||
- Block Corruption
|
||||
- Not detectable by the disk itself. Silent faults
|
||||
- Buggy firmware, faulty bus
|
||||
- Handling Latent Sector Errors
|
||||
ls-type:: annotation
|
||||
hl-page:: 621
|
||||
hl-color:: yellow
|
||||
id:: 643bed56-bc80-4332-b799-933755811759
|
||||
- Since LSEs can be ==easily detected==, the storage system simply uses whatever ==redundancy mechanism to recover== this.
|
||||
- Detecting Corruption: The Checksum
|
||||
ls-type:: annotation
|
||||
hl-page:: 622
|
||||
hl-color:: yellow
|
||||
id:: 643beee5-af3c-44c2-bf55-716c0a4ce0c4
|
||||
- A function takes a chunk of data as input and produces ==a small summary of the data==, which is the checksum. Checksum should enable the system to detect data corruption by ==re-computing and matching==
|
||||
- Common Checksum Functions
|
||||
ls-type:: annotation
|
||||
hl-page:: 623
|
||||
hl-color:: yellow
|
||||
id:: 643befcd-f69a-4c19-bb76-21d8945d4cc8
|
||||
- XOR: only detect odd number of bit(s) flip
|
||||
- 2's compliment addition (ignoring overflow): vulnerable to shift
|
||||
- Fletcher checksum: almost as strong as the CRC, detecting all single-bit, double-bit errors, and many burst errors
|
||||
- ```C
|
||||
uint16_t Fletcher16( uint8_t *data, int count )
|
||||
{
|
||||
uint16_t sum1 = 0;
|
||||
uint16_t sum2 = 0;
|
||||
int index;
|
||||
for ( index = 0; index < count; ++index ) {
|
||||
sum1 = (sum1 + data[index]) % 255;
|
||||
sum2 = (sum2 + sum1) % 255;
|
||||
}
|
||||
return (sum2 << 8) | sum1;
|
||||
}
|
||||
```
|
||||
- CRC: Treat the data block `D` as a large binary number and divide it by an agreed value `k`. The remainder is the CRC value.
|
||||
- No perfect checksum, there is always a collision (non-identical data generate identical checksum)
|
||||
- Checksum Layout
|
||||
ls-type:: annotation
|
||||
hl-page:: 624
|
||||
hl-color:: yellow
|
||||
id:: 643bf039-a6cb-475c-b990-df21d8f3919f
|
||||
- If supported by drive manufacturer, one solution is to format the drive with 8-byte checksum and 520-byte data per sector.
|
||||
- Another solution: the FS packs checksums into 512 Byte blocks to be stored in sectors with corresponding data sectors following.
|
||||
- Using Checksums: compare *stored checksum* and *computed checksum*
|
||||
hl-page:: 625
|
||||
ls-type:: annotation
|
||||
id:: 643bf2c6-9c4f-44a8-bcc5-0af3570b64be
|
||||
hl-color:: yellow
|
||||
- Misdirected Writes
|
||||
ls-type:: annotation
|
||||
hl-page:: 626
|
||||
hl-color:: yellow
|
||||
id:: 643bf2f7-fc6a-4289-a50b-784e6a765eb9
|
||||
- Disk/RAID controllers write the data to disk correctly but ==in the wrong location==. Checksum itself won't help in this situation.
|
||||
hl-page:: 626
|
||||
ls-type:: annotation
|
||||
id:: 643bf30b-6fcf-4d5f-9c13-dd29d4284f63
|
||||
hl-color:: yellow
|
||||
- Add an extra *physical ID* to each checksum, and we can check this since data itself is correct.
|
||||
- Lost Writes
|
||||
ls-type:: annotation
|
||||
hl-page:: 627
|
||||
hl-color:: yellow
|
||||
id:: 643bf3ff-eeec-4c89-b385-6a104d0596bd
|
||||
- The device informs the upper layer that a write is ==completed but in fact not persisted==. Checksum won't help, since the new checksum does not get to disk either.
|
||||
hl-page:: 627
|
||||
ls-type:: annotation
|
||||
id:: 643bf40f-573a-4004-9b3f-443502a7a198
|
||||
hl-color:: yellow
|
||||
- Solution: Perform a write verify or read-after-write, though slow. Add a checksum elsewhere in the system to detect lost writes.
|
||||
- Disk Scrubbing
|
||||
hl-page:: 628
|
||||
ls-type:: annotation
|
||||
id:: 643bf592-4dec-43c4-b8ff-996d765e071b
|
||||
hl-color:: yellow
|
||||
- Most data is rarely accessed, and thus would stay unchecked, which affects the reliability.
|
||||
- Many systems utilize disk scrubbing (i.e., periodically read through every block and check them)
|
||||
- Overheads Of Checksumming
|
||||
hl-page:: 628
|
||||
ls-type:: annotation
|
||||
id:: 643bf4f5-4ec4-4b42-962d-8c3a7729b64e
|
||||
hl-color:: yellow
|
||||
- Space: disk (take up user data space) and memory (mostly short-lived, not a problem)
|
||||
- Time: CPU (has to compute through the data) and IO (checksum stored elsewhere, or scrubbing)
|
||||
- CPU overheads can be reduced by combining data copying and checking, since copy is needed anyhow
|
||||
- beverage (除水以外的)饮料
|
||||
hl-page:: 623
|
||||
ls-type:: annotation
|
||||
id:: 643befc3-80a8-40de-a3b9-c994a90c0f0a
|
||||
hl-color:: green
|
||||
- scrub 擦洗;刷洗;矮树丛
|
||||
hl-page:: 627
|
||||
ls-type:: annotation
|
||||
id:: 643bf4d3-df61-4530-928f-ed524699c44f
|
||||
hl-color:: green
|
||||
- spouse 配偶
|
||||
ls-type:: annotation
|
||||
hl-page:: 633
|
||||
hl-color:: green
|
||||
id:: 643ba3b2-5a2a-4589-a871-62ad213de195
|
||||
- mandate
|
||||
- levity 轻率的举止;轻浮
|
||||
hl-page:: 633
|
||||
ls-type:: annotation
|
||||
hl-page:: 586
|
||||
id:: 643bfdfa-6681-4fcc-b7c1-b84887afeecd
|
||||
hl-color:: green
|
||||
id:: 643bb439-b7d0-4170-9417-cd900062bfbd
|
||||
- entail
|
||||
- sarcastic 讥讽的, 讽刺的,
|
||||
hl-page:: 634
|
||||
ls-type:: annotation
|
||||
hl-page:: 586
|
||||
id:: 643bfe9d-913f-4ab7-aba3-a3fac83d1dfb
|
||||
hl-color:: green
|
||||
id:: 643bb533-1fee-4b9a-9965-7a63016d5591
|
||||
- ceremonious
|
||||
- scribble 草草记下,匆匆书写;胡写乱画;潦草的文字
|
||||
hl-page:: 634
|
||||
ls-type:: annotation
|
||||
hl-page:: 587
|
||||
hl-color:: green
|
||||
id:: 643bb6e8-6466-4574-aa04-4ea25b3e9034
|
||||
- cease
|
||||
ls-type:: annotation
|
||||
hl-page:: 595
|
||||
hl-color:: green
|
||||
id:: 643bc4b4-dc22-471c-9229-558a42904cc8
|
||||
id:: 643bfeb8-34d1-428d-82d5-0bfefb871d4e
|
||||
hl-color:: green
|
||||
Loading…
Reference in New Issue
Block a user