2135 lines
100 KiB
Markdown
2135 lines
100 KiB
Markdown
file:: [ostep_1681115599584_0.pdf](../assets/ostep_1681115599584_0.pdf)
|
||
file-path:: ../assets/ostep_1681115599584_0.pdf
|
||
|
||
- # Part II
|
||
- ## thread
|
||
ls-type:: annotation
|
||
hl-page:: 311
|
||
hl-color:: yellow
|
||
id:: 6433ca28-1bdf-433d-8ed9-0d54bf5ba940
|
||
- share the same address space and thus can access the same data
|
||
- context switch: the address space remains the same
|
||
hl-page:: 311
|
||
ls-type:: annotation
|
||
id:: 6433cb70-d168-4863-8268-1e969df6ce06
|
||
hl-color:: yellow
|
||
- thread control blocks
|
||
ls-type:: annotation
|
||
hl-page:: 311
|
||
hl-color:: yellow
|
||
id:: 6433cb56-fbef-46da-83c2-13fa2dba2967
|
||
- thread-local storage: one stack per thread in the address space
|
||
hl-page:: 312
|
||
ls-type:: annotation
|
||
id:: 6433cba2-61bd-4549-a29f-2ad85b3e30cd
|
||
hl-color:: yellow
|
||
- Why thread?
|
||
- possible speedup through parallelization
|
||
- enable overlap of IO in a single program
|
||
- Though these could be done through multi-processing, threading makes share data easier
|
||
- KEY CONCURRENCY TERMS
|
||
ls-type:: annotation
|
||
hl-page:: 323
|
||
hl-color:: yellow
|
||
id:: 6433eabf-48d6-4776-b66f-a5f7804d1ddc
|
||
collapsed:: true
|
||
- **indeterminate**: the results depend on the timing execution of the code.
|
||
- race condition
|
||
ls-type:: annotation
|
||
hl-page:: 320
|
||
hl-color:: yellow
|
||
id:: 6433e4cc-69e4-4057-8cc6-1766240d82f4
|
||
- A **critical section** is a piece of code that accesses a shared variable (or resource) and must not be concurrently executed by more than one thread.
|
||
hl-page:: 320
|
||
ls-type:: annotation
|
||
id:: 6433e52b-1f38-4f7c-b168-0aed624f9bdf
|
||
hl-color:: yellow
|
||
- **mutual exclusion**: This property guarantees that if one thread is executing within the *critical section*, the others will be prevented from doing so.
|
||
hl-page:: 320
|
||
ls-type:: annotation
|
||
id:: 6433e566-e6ef-45b3-84b1-eba981be914a
|
||
hl-color:: yellow
|
||
- Atomicity: *as a unit*, or, *all or none*
|
||
hl-page:: 321
|
||
ls-type:: annotation
|
||
id:: 6433e6a1-407c-4936-b184-dee868ef4107
|
||
hl-color:: yellow
|
||
- synchronization primitives
|
||
ls-type:: annotation
|
||
hl-page:: 322
|
||
hl-color:: yellow
|
||
id:: 6433e729-7043-453b-8d60-6e6c41560543
|
||
- sane 精神健全的;神志正常的;明智的;理智的
|
||
ls-type:: annotation
|
||
hl-page:: 322
|
||
hl-color:: green
|
||
id:: 6433e6e7-d995-4b69-96b3-261b79f94c1d
|
||
- Thread API
|
||
hl-page:: 327
|
||
ls-type:: annotation
|
||
id:: 6433f35b-403b-4b25-b9f9-076e9e34777e
|
||
hl-color:: yellow
|
||
collapsed:: true
|
||
- `pthread_create` `pthread_join` `pthread_mutex_lock` `pthread_cond_*`
|
||
- ## Locks
|
||
ls-type:: annotation
|
||
hl-page:: 339
|
||
hl-color:: yellow
|
||
id:: 6433f45b-0345-4790-8379-3d1a94e57ef5
|
||
collapsed:: true
|
||
- A lock is just a variable
|
||
hl-page:: 339
|
||
ls-type:: annotation
|
||
id:: 6433f4ba-f2e4-4743-a536-e2b7747433b7
|
||
hl-color:: yellow
|
||
- **lock variable**: some type of variable, which holds the *state* of the lock(and maybe additional data such as its holder or a queue for acquisition)
|
||
- **lock state**: available (or unlocked or free); acquired (or locked or held)
|
||
- **lock routines**:
|
||
- `lock()` tries to acquire the lock. If no other thread holds the lock, the thread will acquire the lock and enter the critical section(become the owner of the lock). Otherwise, it will not return while the lock is held by another thread.
|
||
- `unlock()` : The owner of the lock calls `unlock()`, then it is *available* again. If there are waiting threads, one of them will (eventually) notice (or be informed of) this change of the lock's state, acquire the lock, and enter the critical section.
|
||
- Locks help transform the chaos that is traditional OS scheduling into a more controlled activity
|
||
hl-page:: 340
|
||
ls-type:: annotation
|
||
id:: 6433f5e6-bc06-42a9-866e-e9a3053f528f
|
||
hl-color:: yellow
|
||
- Controlling Interrupts
|
||
ls-type:: annotation
|
||
hl-page:: 342
|
||
hl-color:: yellow
|
||
id:: 6433fbfd-a1bf-4fd9-a54d-e15189c77b15
|
||
- For *single-processor* systems, **disable interrupts** for critical sections.
|
||
- Problems
|
||
- Disable interrupts is a privileged instruction. In the worst case, the OS may never regain control when the interrupt isn't going to be enabled.
|
||
- NOT work on multi-processor systems, each CPU has its own interrupt state
|
||
- importance interrupts may get lost
|
||
- inefficient
|
||
- Just Using Loads/Stores(Fail)
|
||
hl-page:: 343
|
||
ls-type:: annotation
|
||
id:: 6433fe7e-2221-41ee-ad6b-7deaa4459aa5
|
||
hl-color:: yellow
|
||
- use a simple variable (flag) to indicate whether some thread has possession of a lock
|
||
hl-page:: 343
|
||
ls-type:: annotation
|
||
id:: 6433ff4a-856d-4e4b-af30-6cb600aefeb5
|
||
hl-color:: yellow
|
||
- On acquisition, load, test the flag. If free, set the flag; If not free, spin-wait(loop load and test).
|
||
- On releasing, clear the flag.
|
||
- Problem
|
||
- When interrupted between load and test, *mutual exclusion* is broken.
|
||
- Low efficiency because of spin-waiting.
|
||
- **spin lock**
|
||
- ((6436aafd-c85f-414c-8aee-acdc71e9138e))
|
||
- Requires a preemptive scheduler(or it may spin forever) and NO fairness guarantee
|
||
- For single processor systems, terrible performance, because the thread holding the lock cannot make any progress to release the lock until it is scheduled again and thus all other threads waiting for the lock can do nothing but spinning even they are scheduled.
|
||
- For multi-processor systems, spin lock may work well when thread B on CPU1 waits for thread A on CPU0, and the critical section is short. Because lock owner keeps making progress, spinning doesn't waste many cycles.
|
||
- **Priority Inversion**: Threads with high priority wait for locks held by threads with low priority.
|
||
hl-page:: 355
|
||
ls-type:: annotation
|
||
id:: 6435099b-0834-483e-9ef2-98a0b795cf00
|
||
hl-color:: yellow
|
||
Solution: **priority inheritance** or give up the priority?
|
||
- **Test-And-Set (Atomic Exchange)**
|
||
hl-page:: 344
|
||
ls-type:: annotation
|
||
id:: 643401e0-fcec-41d3-9898-d5c4175ac464
|
||
hl-color:: yellow
|
||
- Returns the old value pointed to by the `old_ptr`, and simultaneously updates said value to `new`.
|
||
- "test" the old value (which is what is returned) while simultaneously "set" the memory location to a new value
|
||
- ((6436af87-3f1b-4ee8-a2c8-4de0f1961f1a))
|
||
- **Compare-And-Swap**
|
||
hl-page:: 348
|
||
ls-type:: annotation
|
||
id:: 6434f8ac-d762-40a4-abb0-2955c2c8b396
|
||
hl-color:: yellow
|
||
- Test whether the value at the address specified by `ptr` is equal to `expected`.
|
||
hl-page:: 348
|
||
ls-type:: annotation
|
||
id:: 6434fab0-08de-4f28-8d8e-f48f7e04aaaa
|
||
hl-color:: yellow
|
||
If so, update the memory location with the `new` value.
|
||
If not, do nothing.
|
||
Return the old value at the memory location.
|
||
- ((6436c5c7-32e7-4071-b909-4fdc14bb479d))
|
||
- ((b7679e9b-aabe-4bd3-8c2c-eb0a23fad491))
|
||
- **load-linked** and **store-conditional**
|
||
hl-page:: 349
|
||
ls-type:: annotation
|
||
id:: 6434fde1-9d19-4381-805e-f2a972875dc2
|
||
hl-color:: yellow
|
||
- The **load-linked** operates much like a typical load instruction, and simply fetches a value from memory and places it in a register.
|
||
ls-type:: annotation
|
||
hl-page:: 349
|
||
hl-color:: yellow
|
||
id:: 6434fe1c-47f3-422c-a317-be72f08d6aef
|
||
- **store-conditional** only succeeds if no intervening store to the address has taken place.
|
||
hl-page:: 349
|
||
ls-type:: annotation
|
||
id:: 6434fe62-0e92-4414-86cc-b0c37fcf51ec
|
||
hl-color:: yellow
|
||
On success, return 1 and update the value at `ptr` to value.
|
||
On failure, return 0 and the value at `ptr` is not updated.
|
||
- ((6436c620-4884-45a7-9273-b7952a6521ae))
|
||
- ((c38274a9-22dd-40e2-b74a-d3a9be63600e))
|
||
- **Fetch-And-Add**
|
||
ls-type:: annotation
|
||
hl-page:: 350
|
||
hl-color:: yellow
|
||
id:: 64350170-c853-4080-9ed1-2777ea3a18c8
|
||
- Atomically increments a value while returning the old value at a particular address
|
||
- ((6436c66c-807b-4e9d-93ed-b1d9703e6dc2))
|
||
- **ticket lock**
|
||
hl-page:: 351
|
||
ls-type:: annotation
|
||
id:: 64350331-8fbb-4c41-9ac1-1a4ba852f772
|
||
hl-color:: yellow
|
||
- ((6436af5c-0000-4bfb-9a27-1d7cf0a830db))
|
||
- Ensure progress for all threads. Once a thread is assigned its ticket value, it will be scheduled at some point in the future (i.e. it will definitely get its turn as `unlock()` operations increase global `turn` value).
|
||
hl-page:: 351
|
||
ls-type:: annotation
|
||
id:: 64350420-ca8a-4cac-af2f-f4e7deb5d1be
|
||
hl-color:: yellow
|
||
In contrast, test-and-set spin lock may starve, if it is very unlucky.(never succeeds in contention)
|
||
- Simple **Yield Lock**
|
||
hl-page:: 353
|
||
ls-type:: annotation
|
||
id:: 64350781-6995-41db-8b8e-2de0eb84136a
|
||
hl-color:: yellow
|
||
- `yield`: a system call that moves the caller from the running state to the ready state, and thus promotes another thread to running.
|
||
hl-page:: 353
|
||
ls-type:: annotation
|
||
id:: 643507af-1153-46c1-b232-31a9a203e5df
|
||
hl-color:: yellow
|
||
- ((6436c684-ac4a-4144-9e7e-b4cb8f976c1f))
|
||
- Problem: Starvation is still possible; Context switch overhead, though better than spinning
|
||
- **Lock With Queues**, Test-and-set, Yield, And Wakeup
|
||
ls-type:: annotation
|
||
hl-page:: 354
|
||
hl-color:: yellow
|
||
id:: 64350b44-dfae-4544-93f9-ff2b343fefd4
|
||
- The real problem is: We have not much control over which thread to run next and thus causes potential waste.
|
||
hl-page:: 353
|
||
ls-type:: annotation
|
||
id:: 64350b4e-9559-49d9-aa37-eda9fe425b7f
|
||
hl-color:: yellow
|
||
- `park()`: put a calling thread to sleep
|
||
hl-page:: 354
|
||
ls-type:: annotation
|
||
id:: 64350bfb-64f7-4d41-8cc2-260dbec3372d
|
||
hl-color:: yellow
|
||
- `unpark(threadID)`: wake a particular thread
|
||
hl-page:: 354
|
||
ls-type:: annotation
|
||
id:: 64350c01-39bb-4d15-b554-0287b13806ee
|
||
hl-color:: yellow
|
||
- ((6436b05f-2873-4af4-952c-86d82685b583))
|
||
- When a thread is woken up, it will be as if it is returning from `park()`. Thus when `unpark` a thread, pass the lock directly from the thread releasing the lock to the next thread acquiring it; flag is not set to 0 in-between.
|
||
- wakeup/waiting race: If the thread is scheduled out just before it calls `park`, and then the lock owner calls `unpark` on that thread, it would sleep forever.
|
||
hl-page:: 356
|
||
ls-type:: annotation
|
||
id:: 64351ba3-d4b5-4999-bc61-7733d5e0a061
|
||
hl-color:: yellow
|
||
- One solution is to use `setpark()`: indicate the thread is about to `park`. If it happens to be interrupted and another thread calls `unpark` before `park` is actually called, the subsequent park returns immediately instead of sleeping.
|
||
- Peterson's algorithm: mutual exclusion lock for 2 threads without hardware atomic instruction. Use 2 intention flags and a turn flag.
|
||
hl-page:: 345
|
||
ls-type:: annotation
|
||
id:: 6434edd3-2a7b-4e11-af18-29854e628bc6
|
||
hl-color:: yellow
|
||
- **two-phase lock**
|
||
hl-page:: 358
|
||
ls-type:: annotation
|
||
id:: 643522a7-4b16-4998-9b2f-47a852681a16
|
||
hl-color:: yellow
|
||
- A combination of spin lock and sleep lock
|
||
- In the first phase, the lock spins for a while, hoping that it can acquire the lock.
|
||
hl-page:: 358
|
||
ls-type:: annotation
|
||
id:: 6435230e-d84a-4c91-8329-b7608b0d543a
|
||
hl-color:: yellow
|
||
- A second phase is entered if the lock is not acquired, where the caller is put to sleep, and only woken up when the lock becomes free later.
|
||
ls-type:: annotation
|
||
hl-page:: 358
|
||
hl-color:: yellow
|
||
id:: 64352344-d140-468c-987c-e8afa05c2171
|
||
- Linux System Call **futex**
|
||
hl-page:: 356
|
||
ls-type:: annotation
|
||
id:: 64351e9a-6505-4176-a6fb-ddf63f3245a8
|
||
hl-color:: yellow
|
||
- each `futex` is associated with ==a specific physical memory location==, and ==an in-kernel queue==
|
||
- `futex_wake(address)` wakes one thread that is waiting on the queue.
|
||
- `futex_wait(address, expected)` puts the calling thread to sleep, assuming the value at `address` is equal to `expected`. If it is not equal, the call returns immediately.
|
||
- Figure 28.10: Linux-based Futex Locks
|
||
ls-type:: annotation
|
||
hl-page:: 357
|
||
hl-color:: yellow
|
||
id:: 64352221-d590-4371-a5f0-29e9cfa75ccb
|
||
- efficacy 功效,效力
|
||
ls-type:: annotation
|
||
hl-page:: 341
|
||
hl-color:: green
|
||
id:: 6433fb69-1425-46b4-996f-f91da5d3e8d0
|
||
- foil
|
||
ls-type:: annotation
|
||
hl-page:: 347
|
||
hl-color:: green
|
||
id:: 6434f523-44b7-40ab-8fea-528969c5acfd
|
||
- delve 钻研;探究;挖
|
||
ls-type:: annotation
|
||
hl-page:: 349
|
||
hl-color:: green
|
||
id:: 6434fb8c-2b3b-4d80-83fb-3b34da4dcd28
|
||
- brag 吹嘘;自吹自擂
|
||
ls-type:: annotation
|
||
hl-page:: 351
|
||
hl-color:: green
|
||
id:: 643501c1-f11b-4e85-8125-d2a5a31f69b0
|
||
- scourge 鞭打;鞭笞;折磨;使受苦难
|
||
- ## Lock-based Concurrent Data Structures
|
||
ls-type:: annotation
|
||
hl-page:: 361
|
||
hl-color:: yellow
|
||
id:: 643525b0-e245-489b-877d-a2a1d63e7ea6
|
||
collapsed:: true
|
||
- **Concurrent Counters**
|
||
hl-page:: 361
|
||
ls-type:: annotation
|
||
id:: 643525e5-fb85-48d4-905a-2a88b9ac0b0d
|
||
hl-color:: yellow
|
||
collapsed:: true
|
||
- **Counter with lock**
|
||
- Wrap the all the operations with a single lock.
|
||
- Performance is bad due to lock contention and it gets worse when the number of threads increases.
|
||
- **perfect scaling**: the increase in thread number doesn't harm the performance
|
||
hl-page:: 363
|
||
ls-type:: annotation
|
||
id:: 64352751-d9bd-4d5e-a8ba-cd18f86b1a15
|
||
hl-color:: yellow
|
||
- **approximate counter**
|
||
hl-page:: 363
|
||
ls-type:: annotation
|
||
id:: 64352794-d7c8-42f9-8321-f874967cebf2
|
||
hl-color:: yellow
|
||
- represent a single logical counter via ==numerous local physical counters==(one per CPU core), as well as ==a single global counter==. Each actual counter has a ==lock==.
|
||
- To add the counter, acquire the ==local lock== and increase it, thus avoiding contention.
|
||
- To read the counter, acquire the ==global lock== and read.
|
||
- To keep the global counter up to date, the local values are periodically transferred to the global counter and reset, which requires ==global lock and local lock==. A threshold `S` determines how often this transfer happens, tuning the trade-off between scalability and precision.
|
||
- **Concurrent Linked Lists**
|
||
ls-type:: annotation
|
||
hl-page:: 367
|
||
hl-color:: yellow
|
||
id:: 643530d8-9d09-4c8a-9e92-47dfe814ef50
|
||
collapsed:: true
|
||
- Again, the simplest way to implement this is to wrap all operations on the list with a single lock.
|
||
- Assuming the `malloc` is ==thread-safe==, we can improve the code a little by narrowing critical section: only operations on global structure need to be locked.
|
||
- **hand-over-hand locking**: a lock per node.
|
||
hl-page:: 369
|
||
ls-type:: annotation
|
||
id:: 64353237-4b74-4148-b7c1-5854d83a18c7
|
||
hl-color:: yellow
|
||
- When traversing the list, the code first grabs the next node's lock and then releases the current node's lock.
|
||
- In practice, it ==doesn't work== due to prohibitive overhead
|
||
- **Concurrent Queues**
|
||
ls-type:: annotation
|
||
hl-page:: 370
|
||
hl-color:: yellow
|
||
id:: 64353353-9de2-421b-967d-dc80a597eecd
|
||
collapsed:: true
|
||
- Two locks, head and tail, for `enqueue` and `dequeue` operation.
|
||
- Add a dummy node to separate head and tail operation. Without this, `dequeue` operation needs to acquire both locks in case the queue is empty.
|
||
- **Concurrent Hash Table**
|
||
hl-page:: 372
|
||
ls-type:: annotation
|
||
id:: 6435360d-c176-494a-9d61-b1fd0107a9bd
|
||
hl-color:: yellow
|
||
collapsed:: true
|
||
- instead of having a single lock for the entire structure, it uses a lock per hash bucket
|
||
ls-type:: annotation
|
||
hl-page:: 372
|
||
hl-color:: yellow
|
||
id:: 6435363d-c697-42a6-bfd0-8a2332cef394
|
||
- ubiquitous 似乎无所不在的;十分普遍的
|
||
ls-type:: annotation
|
||
hl-page:: 372
|
||
hl-color:: green
|
||
id:: 6435365a-b5d6-46fc-a9a1-25b0d23aa529
|
||
- humble 谦逊;低声下气;虚心;贬低
|
||
ls-type:: annotation
|
||
hl-page:: 373
|
||
hl-color:: green
|
||
id:: 6435367f-dd9e-449d-b0e4-3d8c9e14f6c2
|
||
- sloppy 马虎的,草率的;(衣服)宽松肥大的;太稀的,不够稠的;
|
||
hl-page:: 376
|
||
ls-type:: annotation
|
||
id:: 643536c8-fc05-4bbe-8d1d-0f4f6d1c4fee
|
||
hl-color:: green
|
||
- gross 总的,毛的;严重的,极端的;粗鲁的;臃肿的;粗略的;
|
||
hl-page:: 378
|
||
ls-type:: annotation
|
||
id:: 643537d3-7d01-442b-b47e-59433c2aa6db
|
||
hl-color:: green
|
||
- ## condition variable
|
||
hl-page:: 378
|
||
ls-type:: annotation
|
||
id:: 643537ff-1028-4725-8d7a-c0338cc946d3
|
||
hl-color:: yellow
|
||
collapsed:: true
|
||
- A ==condition variable== is an explicit queue that threads can put themselves on when some state of execution(condition) is not as desired (by *waiting on the condition*); some other thread, when it changes said state, can then wake one (or more) of those waiting threads and thus allow them to continue (by *signaling*).
|
||
hl-page:: 378
|
||
ls-type:: annotation
|
||
id:: 64353882-7697-4c16-8e53-c8f59ea256c1
|
||
hl-color:: yellow
|
||
- Operations
|
||
- `wait()` put the caller to sleep. `pthread_cond_wait(pthread_cond_t *c, pthread_mutex_t *m)`
|
||
hl-page:: 378
|
||
ls-type:: annotation
|
||
id:: 643538d5-9ea3-4399-9fa2-d75fdf0e1dd4
|
||
hl-color:: yellow
|
||
- `signal()` wake up a sleeping thread waiting on this condition. `pthread_cond_signal(pthread_cond_t *c);`
|
||
hl-page:: 379
|
||
ls-type:: annotation
|
||
id:: 643538de-cc40-4dd2-8f03-9492004f209b
|
||
hl-color:: yellow
|
||
- The `wait()` also takes a mutex as a parameter; it assumes that this mutex is locked when `wait()` is called. The responsibility of `wait()` is to ==release the lock and put the calling thread to sleep== (atomically); when the thread wakes up, it must ==re-acquire the lock before returning== to the caller. The design is helpful to avoid some race conditions when trying to sleep.
|
||
- use a while loop instead of just an if statement when deciding whether to wait on the condition.
|
||
ls-type:: annotation
|
||
hl-page:: 380
|
||
hl-color:: yellow
|
||
id:: 643547c5-1613-49e9-899e-0e86f59a1462
|
||
- stem (花草的)茎;(花或叶的)梗,柄;阻止;封堵;遏止;
|
||
hl-page:: 379
|
||
ls-type:: annotation
|
||
id:: 64353eb8-8ed8-4680-a3c0-91608b429408
|
||
hl-color:: green
|
||
- **stem from sth ** 是…的结果;起源于;根源是
|
||
- **Producer/Consumer Problem**
|
||
hl-page:: 382
|
||
ls-type:: annotation
|
||
id:: 64354974-adea-4b20-90f4-a12ebe1e4d5b
|
||
hl-color:: yellow
|
||
collapsed:: true
|
||
- **Mesa semantics**: Signaling a thread only wakes them up; it is thus a hint that the state of the world has ==changed==, but there is ==no guarantee== that when the woken thread runs, the state will ==still be as desired==. (Another guy may run before the thread and change the state again)
|
||
hl-page:: 385
|
||
ls-type:: annotation
|
||
id:: 64354cc4-14c5-408d-b879-7d4d011b2b5c
|
||
hl-color:: yellow
|
||
- So, always use while loops. While loops make sure the thread wake up in the desired state of world, which tackles the ((64355502-f41f-40dd-b71f-e0abdbc76716)) and provides support for ((64355441-5a1b-4015-baa1-65917526079c))
|
||
hl-page:: 386
|
||
ls-type:: annotation
|
||
id:: 64354db0-8c74-4c14-b063-d26378a10555
|
||
hl-color:: yellow
|
||
- **Hoare semantics**: provides a stronger guarantee that the woken thread will run immediately upon being woken
|
||
hl-page:: 386
|
||
ls-type:: annotation
|
||
id:: 64354d46-4286-44fd-9e82-2ba562a50f25
|
||
hl-color:: yellow
|
||
- Incorrect Solution: single condition variable. The problem arises from the ==undirected wakeup operation==: God knows which thread is to be woken up.
|
||
- Envision multiple consumers and one producer:
|
||
1. producer `P1` increases count to 1, signals the CV and sleeps
|
||
2. consumer `C1` is awaken, reduces count to 0, signals the CV and sleeps
|
||
3. another consumer `C2` is woken up ==by accident==, finds out count is 0, sleeps
|
||
4. In this case, they all sleep and thus nobody will signal any of them
|
||
- If in step 3, the producer `P1` is woken up, everything is fine. Obviously, one solution is to ==exert control over which thread is to be woken up==. Well, wake up all threads may also solve this problem, see ((64355441-5a1b-4015-baa1-65917526079c)).
|
||
- Correct solution: 2 condition variable.
|
||
- Producer threads wait on the condition `empty`, and signals `fill`. Conversely, consumer threads wait on `fill` and signal `empty`.
|
||
- ((6436b07d-9279-46bb-9c6b-985eb2324df8))
|
||
- **spurious wakeups**
|
||
hl-page:: 390
|
||
ls-type:: annotation
|
||
id:: 64355502-f41f-40dd-b71f-e0abdbc76716
|
||
hl-color:: yellow
|
||
- In some thread packages, due to details of the implementation, it is possible that two threads get woken up though just a single signal has taken place.
|
||
- **covering condition**
|
||
hl-page:: 391
|
||
ls-type:: annotation
|
||
id:: 64355441-5a1b-4015-baa1-65917526079c
|
||
hl-color:: yellow
|
||
- covers all the cases where a thread needs to wake up, other threads simply wake up, re-check condition and go back to sleep
|
||
- `pthread_cond_broadcast()` wakes up all waiting threads
|
||
- albeit 尽管;虽然
|
||
ls-type:: annotation
|
||
hl-page:: 390
|
||
hl-color:: green
|
||
id:: 64354f54-b26c-48dc-a328-4ae355b680f3
|
||
- spurious 虚假的;伪造的;建立在错误的观念(或思想方法)之上的;谬误的
|
||
hl-page:: 390
|
||
ls-type:: annotation
|
||
id:: 643554f4-75a7-48fa-9366-87058ee723fb
|
||
hl-color:: green
|
||
- ## Semaphores
|
||
hl-page:: 396
|
||
ls-type:: annotation
|
||
id:: 64356d96-cce8-48ad-80f1-e3e02a1a4684
|
||
hl-color:: yellow
|
||
collapsed:: true
|
||
- A semaphore is an ==object with an integer value== that we can manipulate with two routines `sem_wait()` and `sem_post()`. The initial value determines its behavior, so we need to give it an initial value through `sem_init()`
|
||
hl-page:: 396
|
||
ls-type:: annotation
|
||
id:: 64356dba-48b4-49b8-8182-c962f12f03a5
|
||
hl-color:: yellow
|
||
- Semaphore: Definitions Of **Wait And Post**
|
||
ls-type:: annotation
|
||
hl-page:: 397
|
||
hl-color:: yellow
|
||
id:: 6435744b-a300-40ad-ba91-157666d8cd2a
|
||
- `sem_wait(sem_t *s)`: First decrement the value of the semaphore by one. Then wait if the value of semaphore is negative
|
||
- `sem_post(sem_t*s)`: First increment the value of the semaphore by one. If there is any thread waiting, wait up one of them
|
||
- The value of the semaphore, *when negative*, is equal to the ==number of waiting threads==
|
||
hl-page:: 397
|
||
ls-type:: annotation
|
||
id:: 64357512-e25b-4226-961a-caec367fc8a3
|
||
hl-color:: yellow
|
||
- **Binary Semaphores (Locks)**
|
||
ls-type:: annotation
|
||
hl-page:: 398
|
||
hl-color:: yellow
|
||
id:: 6435753a-65b5-4e46-82bc-54c11c1cd533
|
||
- Initialize semaphore to 1, indicating we only have one piece of resource (the critical section).
|
||
- Wrap the critical section with `sem_wait` and `sem_post`
|
||
- When the lock is acquired, the semaphore is 0. On another acquisition request, the value goes to -1, which makes the caller sleep. When the lock is free, the value is decreased to 0 on acquisition, which will not get stuck.
|
||
- **Semaphores For Ordering (Condition Variable, or Ordering Primitive)**
|
||
hl-page:: 399
|
||
ls-type:: annotation
|
||
id:: 64357930-2d96-4867-bc3d-2fe89990ce5f
|
||
hl-color:: yellow
|
||
- Initialize the semaphore to 0
|
||
- Consider the *join* operation. The parent calls `sem_wait`and the child calls `sem_post`. In either case, no matter which thread is scheduled first, the semaphore guarantees the desired result.
|
||
- **The Producer/Consumer (Bounded Buffer) Problem (Again)**
|
||
hl-page:: 401
|
||
ls-type:: annotation
|
||
id:: 64357c6d-381e-492e-b901-095454f5315e
|
||
hl-color:: yellow
|
||
- 2 semaphores `empty` and `full` for coordination between consumer and producer, and 1 semaphore for lock
|
||
- Initialize `empty <- MAX`, and `full <- 0`
|
||
- Consumer waits for `full` and posts `empty` and conversely, produce waits for `empty` and posts `full`
|
||
- Special case for `MAX=1`
|
||
- When only one slot is available in the buffer, we don't even need a lock! Actually, it is binary semaphore which not only controls the buffer entry but also works as a lock.
|
||
- Otherwise, there will be a ==data race== inside the `put/get` operation due to potential multi-thread access to these procedures (when `MAX > 1`, the `sem_wait(&empty)` may allow in more than one thread).
|
||
- Deadlock avoidance
|
||
- If the lock semaphore is the outmost semaphore, deadlock occurs (the thread may sleep in `sem_wait(&empty)` with `mutex` unrelease). Therefore, put the lock inside the `empty/full` semaphore pair.
|
||
- ((6436bebd-0681-4f94-9d04-4d8e4a554512))
|
||
- **Readers-Writer Locks**
|
||
ls-type:: annotation
|
||
hl-page:: 406
|
||
hl-color:: yellow
|
||
id:: 643583b4-26b1-4cbf-801c-11ed6e63976e
|
||
- Either allow ==multiple readers to read== concurrently, or allow ==only one writer to write==.
|
||
- Two sets of operation
|
||
- `rwlock_acquire/release_writelock()`: simply `wait/post` the `writelock`
|
||
- `rwlock_acquire/release_readlock()`: acquire `writelock` when the ==first reader acquires==, and release it when the ==last reader releases==
|
||
- ((6436c668-5be8-4ce1-b701-1f2a00d34cc9))
|
||
- Problem: More overhead; Unfairness, writer is much more likely to starve.
|
||
- To tackle the writer starvation problem, we may manually wake up the writers (if ever suspended) every time read lock releases. [Wiki](https://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock)
|
||
- **The Dining Philosophers**
|
||
hl-page:: 408
|
||
ls-type:: annotation
|
||
id:: 643587a7-ade4-4f09-be50-aea233ff02c0
|
||
hl-color:: yellow
|
||
- Background setting
|
||
hl-page:: 408
|
||
ls-type:: annotation
|
||
id:: 6435889f-1375-4b94-8630-b3d0d7bdfa56
|
||
hl-color:: yellow
|
||
- 5 "philosophers" around a table.
|
||
Between each pair of philosophers is a single fork (and thus, 5 total).
|
||
The philosophers each have times where they think (don’t need forks), and times where they eat.
|
||
In order to eat, a philosopher needs two forks (left and right).
|
||
The contention for these forks is our synchronization problem.
|
||
- Solution
|
||
- A semaphore per fork, and helper function `left/right(p)` which is the fork on philosopher `p`'s left/right.
|
||
- Deadlock: if each philosopher tries to grab the fork on their left first, there will be a deadlock. When all of them get their left-side forks, all of the forks are locked and no one could get their right-side fork.
|
||
- Non-deadlock: force one philosopher to try to grab the right-side fork first
|
||
- ((6436bebd-0681-4f94-9d04-4d8e4a554512))
|
||
- Implement Semaphores
|
||
ls-type:: annotation
|
||
hl-page:: 411
|
||
hl-color:: yellow
|
||
id:: 643589a6-31e6-4603-9259-999e9c8860f7
|
||
- Implementing Zemaphores With One Lock And One CV: the book authors provide us a simple implement for semaphore.
|
||
hl-page:: 412
|
||
ls-type:: annotation
|
||
id:: 64358de1-f418-44fd-8a77-bc0faa368059
|
||
hl-color:: yellow
|
||
- ((6436c47e-dc86-4452-b9b5-4e7997dbfbfb))
|
||
- salient 最重要的;显着的;突出的:
|
||
ls-type:: annotation
|
||
hl-page:: 397
|
||
hl-color:: green
|
||
id:: 64357404-d348-42b3-96a3-ba28575baa66
|
||
- ensue 跟着发生,接着发生;
|
||
ls-type:: annotation
|
||
hl-page:: 408
|
||
hl-color:: green
|
||
id:: 64358802-3b22-46ed-a0e2-71cc9df69a7b
|
||
- Throttle 节流阀;风门;喉咙;使窒息;使节流;
|
||
hl-page:: 411
|
||
ls-type:: annotation
|
||
id:: 64358758-cb9c-4e8d-aaa4-f8e50457db88
|
||
hl-color:: green
|
||
- bog 沼泽;泥塘;使陷于泥沼;使动弹不得
|
||
hl-page:: 411
|
||
ls-type:: annotation
|
||
id:: 64358755-1fae-4ea2-93a3-8c9d3d3e11c3
|
||
hl-color:: green
|
||
- ramification (众多复杂而又难以预料的)结果,后果
|
||
hl-page:: 410
|
||
ls-type:: annotation
|
||
id:: 64358b0c-e441-4d0a-852d-ecfde369306c
|
||
hl-color:: green
|
||
- **Non-Deadlock Bugs**: A large fraction (97%) of non-deadlock bugs studied by Lu et al. are either ==atomicity violations== or ==order violations==.
|
||
hl-page:: 420
|
||
ls-type:: annotation
|
||
id:: 64361e4c-62eb-4599-9809-0f77f9ce1cd0
|
||
hl-color:: yellow
|
||
- ## Deadlock
|
||
hl-page:: 420
|
||
ls-type:: annotation
|
||
id:: 64361fb7-5aa6-45cd-8b1e-aa0d0c300ad2
|
||
hl-color:: yellow
|
||
collapsed:: true
|
||
- **Conditions for Deadlock**
|
||
hl-page:: 422
|
||
ls-type:: annotation
|
||
id:: 64361fd1-49ff-4023-8493-840ac423086a
|
||
hl-color:: yellow
|
||
- If any of these four conditions are not met, deadlock cannot occur.
|
||
- **Mutual exclusion**: Threads claim exclusive control of resources that they require
|
||
- **Hold-and-wait**: Threads hold resources allocated to them while waiting for additional resources
|
||
- **No preemption**: Resources cannot be forcibly removed from threads that are holding them.
|
||
- **Circular wait**: There exists a circular chain of threads such that each thread holds one or more resources that are being requested by the next thread in the chain.
|
||
- **Prevention**: break the conditions for deadlock
|
||
hl-page:: 422
|
||
ls-type:: annotation
|
||
id:: 643620d9-cdb6-4073-89f4-f9f8ac223073
|
||
hl-color:: yellow
|
||
- **Circular Wait**: Never induce a circular wait.
|
||
hl-page:: 422
|
||
ls-type:: annotation
|
||
id:: 643620fb-edc6-43b2-b4b2-43b010cfc46e
|
||
hl-color:: yellow
|
||
- total ordering and partial ordering of lock acquisition (think about your Discrete Math, total ordering is a restricted form of partial ordering, in partial ordering, some pairs of elements are not comparable)
|
||
- Anyways, follow some kind of ordering when acquire lock in order to avoid cycles.
|
||
- ENFORCE LOCK ORDERING BY LOCK ADDRESS
|
||
ls-type:: annotation
|
||
hl-page:: 423
|
||
hl-color:: yellow
|
||
id:: 64362497-58cd-45da-8ab5-84f96e899e16
|
||
- **Hold-and-wait**: acquiring all locks at once, atomically.
|
||
hl-page:: 423
|
||
ls-type:: annotation
|
||
id:: 643625fe-423c-4b18-8c22-32d38720c5d0
|
||
hl-color:: yellow
|
||
- Not practical
|
||
- **No Preemption**
|
||
hl-page:: 424
|
||
ls-type:: annotation
|
||
id:: 64362632-50e8-41dd-a1bc-bbf3d4312b0f
|
||
hl-color:: yellow
|
||
- `trylock` either grabs the lock (if it is available) and returns success or returns an error code indicating the lock is held
|
||
- Instead of blocking at the lock call, give up all previous locks and try over again if some of the locks is not available.
|
||
- ```C
|
||
while (true) {
|
||
mutex_lock(&lock1);
|
||
if (mutex_trylock(&lock2) == 0) break;
|
||
else mutex_unlock(&lock1);
|
||
}
|
||
```
|
||
- **livelock** problem: in some special cases, two threads may keep trying and giving up locks due to each other's intervention
|
||
hl-page:: 424
|
||
ls-type:: annotation
|
||
id:: 6436281f-4fdc-4586-83fb-b686cec3b76b
|
||
hl-color:: yellow
|
||
- random delay before looping back and trying the entire thing over again
|
||
- **Mutual Exclusion**: lock-free data structures
|
||
hl-page:: 425
|
||
ls-type:: annotation
|
||
id:: 643629ba-e746-41a6-b073-1199b3db3691
|
||
hl-color:: yellow
|
||
- use atomic instructions provided by hardware
|
||
- **Avoidance**
|
||
hl-page:: 427
|
||
ls-type:: annotation
|
||
id:: 64362af4-9b35-4e27-8ba2-0f5f8817526a
|
||
hl-color:: yellow
|
||
- By careful scheduling, deadlock could be avoided.
|
||
- Limited usage: OS does not always have sufficient knowledge to make deadlock-free scheduling. Such approaches also limit concurrency.
|
||
- [[Banker's Algorithm]]
|
||
- **Detect and Recover**
|
||
ls-type:: annotation
|
||
hl-page:: 428
|
||
hl-color:: yellow
|
||
id:: 64362c62-3a12-4bcb-95ae-baf1ca69312e
|
||
- Allow deadlocks to occasionally occur, and then take some action once such a deadlock has been detected.
|
||
- terrific 极好的;绝妙的;了不起的;很大的
|
||
ls-type:: annotation
|
||
hl-page:: 428
|
||
hl-color:: green
|
||
id:: 64362b38-6dfb-4c00-8aa6-b756e8983de4
|
||
- maxim 格言;箴言;座右铭
|
||
ls-type:: annotation
|
||
hl-page:: 428
|
||
hl-color:: green
|
||
id:: 64362b40-5f07-418f-83f3-c83eb5927c94
|
||
- nasty 极差的;令人厌恶的;令人不悦的;不友好的
|
||
ls-type:: annotation
|
||
hl-page:: 432
|
||
hl-color:: green
|
||
id:: 64364569-01b4-45e1-83f8-ac1bd8af5850
|
||
- ## Event-based Concurrency
|
||
hl-page:: 432
|
||
ls-type:: annotation
|
||
id:: 64364585-ace4-4920-87fe-87aad004dffd
|
||
hl-color:: yellow
|
||
collapsed:: true
|
||
- event loop: waits for something to do and then, for each event returned, processes them, one at a time
|
||
hl-page:: 433
|
||
ls-type:: annotation
|
||
id:: 643658f3-4761-4d0c-b044-4cadcfea27aa
|
||
hl-color:: yellow
|
||
- event handler
|
||
ls-type:: annotation
|
||
hl-page:: 433
|
||
hl-color:: yellow
|
||
id:: 643658f9-5eee-4d1a-a3d6-4f8eb9ed3d7b
|
||
- `select` or `poll`
|
||
hl-page:: 433
|
||
ls-type:: annotation
|
||
id:: 64365db8-a249-46bc-bd9c-237251c544b5
|
||
hl-color:: yellow
|
||
- Check whether there is any incoming I/O that should be attended to.
|
||
- ```C
|
||
int select(
|
||
int nfds,
|
||
fd_set *restrict readfds,
|
||
fd_set *restrict writefds,
|
||
fd_set *restrict errorfds,
|
||
struct timeval *restrict timeout);
|
||
```
|
||
- Examine if some of their descriptors are ready for reading/writing or have an exceptional condition pending. The first n descriptors are checked in each set
|
||
hl-page:: 434
|
||
ls-type:: annotation
|
||
id:: 64365eb6-5310-4893-9d11-5e332ef84c4a
|
||
hl-color:: yellow
|
||
- `select` places the given descriptor sets with ==subsets of ready descriptors==. `select()` ==returns the total number of ready descriptors== in all the sets.
|
||
hl-page:: 434
|
||
ls-type:: annotation
|
||
id:: 64365ef8-3c62-4d78-8bc6-d0a4b2c81d49
|
||
hl-color:: yellow
|
||
- Block IO: NO blocking calls are allowed in event-based systems, because it will just stop the whole process.
|
||
- **Asynchronous I/O**
|
||
ls-type:: annotation
|
||
hl-page:: 437
|
||
hl-color:: yellow
|
||
id:: 643693db-d363-46ee-b0d6-910b30408946
|
||
- Issue an I/O request and return control immediately to the caller, before completion. Additional interfaces to determine whether the IOs have completed.
|
||
hl-page:: 437
|
||
ls-type:: annotation
|
||
id:: 64369701-8a39-4aa4-9985-129572c04f53
|
||
hl-color:: yellow
|
||
- AIO control block `aiocb`
|
||
- `int aio_read(struct aiocb *aiocbp);` issues an asynchronous read request
|
||
- `int aio_error(const struct aiocb *aiocbp);` checks whether the request (designated by the `aiocb`) has completed
|
||
- Checking IO completion is inefficient, perhaps we need interrupt-based approaches (e.g. UNIX signals) to inform applications when async IO completes.
|
||
- Problems
|
||
- State management
|
||
- manual stack management: when an event handler issues an asynchronous I/O, it must package up some ==program state for the next event handler== to use when the I/O finally completes; this additional work is ==not needed in thread-based programs==, as the state the program needs is on the stack of the thread.
|
||
hl-page:: 438
|
||
ls-type:: annotation
|
||
id:: 6436a3d9-ee29-4378-af79-4efc770cc209
|
||
hl-color:: yellow
|
||
- continuation: record the needed information to finish processing this event in some data structure; when the event happens (i.e., when the disk I/O completes), look up the needed information and process the event.
|
||
hl-page:: 440
|
||
ls-type:: annotation
|
||
id:: 6436a40a-121f-4fab-b428-b278e4cb65d3
|
||
hl-color:: yellow
|
||
- Utilizing multiple CPUs
|
||
hl-page:: 440
|
||
ls-type:: annotation
|
||
id:: 6436a46c-f845-4c7b-8bb1-97da71589c67
|
||
hl-color:: yellow
|
||
- Implicit blocking such as paging
|
||
hl-page:: 440
|
||
ls-type:: annotation
|
||
id:: 6436a485-7a70-4974-93d2-9e11b010a948
|
||
hl-color:: yellow
|
||
- Messy code base due to complicated asynchronous logic
|
||
- obstinate 固执的;棘手的;难以去除的;
|
||
hl-page:: 448
|
||
ls-type:: annotation
|
||
id:: 6436ca1f-f4e7-431e-9620-be7764825acd
|
||
hl-color:: green
|
||
- pickle 泡菜;腌菜
|
||
ls-type:: annotation
|
||
hl-page:: 448
|
||
hl-color:: green
|
||
id:: 6436caa1-6fe0-4de8-9ad4-2a057960fc1a
|
||
- ## System Architecture
|
||
ls-type:: annotation
|
||
hl-page:: 450
|
||
hl-color:: yellow
|
||
id:: 6436cc2e-b1af-4555-9d1d-808e6de120b1
|
||
collapsed:: true
|
||
- memory bus, general IO bus, peripheral bus
|
||
- **Canonical Device**
|
||
hl-page:: 452
|
||
ls-type:: annotation
|
||
id:: 643786f0-5f9c-4441-8898-82ccd6a1a464
|
||
hl-color:: yellow
|
||
- Hardware interface with protocols which allows OS software to control and internal structure which implements the abstraction
|
||
- **Canonical Protocol**
|
||
hl-page:: 453
|
||
ls-type:: annotation
|
||
id:: 64378926-ce8a-4e38-a3fe-62fb5c4994e6
|
||
hl-color:: yellow
|
||
- Interface is comprised of 3 registers: *status*, *command*, *data*.
|
||
- 1. Poll the device, i.e. repeatedly read the *status* register to see if the device is ready
|
||
2. Transfer some data to *data* register
|
||
3. Write a command to the *command* register, informing the device to work
|
||
4. Poll again to see if it is completed
|
||
- programmed I/O (PIO): CPU is involved with the data movement
|
||
hl-page:: 453
|
||
ls-type:: annotation
|
||
id:: 64378c55-677c-4ab7-94c6-02ff41b90ded
|
||
hl-color:: yellow
|
||
- **Interrupt** instead of poll
|
||
- Polling wastes CPU time, then interrupts come up. The OS ==issues a request, put the caller to sleep, and context switch==. When the device is done, it raises a hardware interrupt, causing CPU jump to the ==interrupt service routine==(ISR), which ==finishes the request and wakes up the process==.
|
||
- Interrupt is no panacea.
|
||
- Not suitable for ==high speed devices== which may complete the work on first poll. Interrupt only adds to the overhead
|
||
- Not suitable for network due to possible *livelock*: with ==huge amount of packets incoming==, the systems may find itself ==only processing interrupts== and never allowing a user process to service these requests.
|
||
- Interrupt coalescing: raise a single interrupt for multiple tasks.
|
||
hl-page:: 455
|
||
ls-type:: annotation
|
||
id:: 64378e9e-0f95-4312-a19e-3ee9d0b4ef1e
|
||
hl-color:: yellow
|
||
- **Direct Memory Access (DMA)**
|
||
hl-page:: 456
|
||
ls-type:: annotation
|
||
id:: 64379241-c097-4aaa-b545-582df132b35f
|
||
hl-color:: yellow
|
||
- Programmed IO also wastes CPU: it does nothing but tediously copying data.
|
||
- To transfer data to device, OS tells DMA controller the data address and size and then context switch. Then DMA does the rest copying work which overlaps with CPU.
|
||
- IO instructions and memory-mapped IO
|
||
- **Device Driver**
|
||
hl-page:: 457
|
||
ls-type:: annotation
|
||
id:: 6437989d-c18e-4cc7-9cb0-737384cc7960
|
||
hl-color:: yellow
|
||
- Encapsulates any ==specifics of device== interaction. ==Software in OS== which knows detail of device at the ==lowest level==.
|
||
- Figure 36.4: The Linux File System Stack
|
||
ls-type:: annotation
|
||
hl-page:: 458
|
||
hl-color:: yellow
|
||
id:: 643799a7-dfae-46e0-88e6-ebf587755d75
|
||
- System Call API, File System/Raw, Generic Block Interface(block r/w), Generic Block Layer, Specific Block Interface (protocol r/w), Device Driver
|
||
- A Simple IDE Disk Driver
|
||
ls-type:: annotation
|
||
hl-page:: 458
|
||
hl-color:: yellow
|
||
id:: 64379e9a-840a-48c9-b804-03e6b179a6a6
|
||
- An introduction to the xv6 IDE driver, which gives an intuition about how the stuff works, quite trivial.
|
||
- manifold
|
||
ls-type:: annotation
|
||
hl-page:: 450
|
||
hl-color:: green
|
||
id:: 64378274-897c-4aac-b246-49bda634b872
|
||
- oblivious
|
||
ls-type:: annotation
|
||
hl-page:: 457
|
||
hl-color:: green
|
||
id:: 64379a07-5bc3-49b2-93e2-f371ad2b5347
|
||
- haul
|
||
ls-type:: annotation
|
||
hl-page:: 460
|
||
hl-color:: green
|
||
id:: 64379b8b-7c37-4d7e-8135-1d025eb42ae3
|
||
- trailer
|
||
ls-type:: annotation
|
||
hl-page:: 460
|
||
hl-color:: green
|
||
id:: 64379b93-cb30-45a8-afe6-53052c08fa6f
|
||
- obscure
|
||
ls-type:: annotation
|
||
hl-page:: 460
|
||
hl-color:: green
|
||
id:: 64379ba3-e41d-411f-ab6d-9a5f1424ac26
|
||
- ## Hard Disk Drives
|
||
ls-type:: annotation
|
||
hl-page:: 464
|
||
hl-color:: yellow
|
||
id:: 64379f7c-b440-4023-bc10-fd27071ec742
|
||
collapsed:: true
|
||
- Address Space of HDD: Array of sectors (512-byte block), numbered from 0 to n-1, which can be read/written as a unit.
|
||
hl-page:: 464
|
||
ls-type:: annotation
|
||
id:: 6437a316-6185-4eae-bc56-eeca9c5dfc0d
|
||
hl-color:: yellow
|
||
- Only a ==single sector write is atomic==, though multi-sector operations are possible (e.g. widely-used 4KB r/w)
|
||
- one can usually assume that accessing two blocks near one-another within the drive’s address space will be faster than accessing two blocks that are far apart. One can also usually assume that accessing blocks in a contiguous chunk (i.e., a sequential read or write) is the fastest access mode, and usually much faster than any more random access pattern.
|
||
ls-type:: annotation
|
||
hl-page:: 465
|
||
hl-color:: yellow
|
||
id:: 6437a4a9-3103-4830-abc7-dba0b1067b76
|
||
- **Components of Disk**
|
||
hl-page:: 465
|
||
ls-type:: annotation
|
||
id:: 6437a4da-bca4-4f13-b018-30f3400d169f
|
||
hl-color:: yellow
|
||
collapsed:: true
|
||
- platter (大平盘): a circular hard surface on which data is stored, an HDD is comprised of one or more platters
|
||
hl-page:: 465
|
||
ls-type:: annotation
|
||
id:: 6437a4f2-5d89-495a-a984-b427a3d03e74
|
||
hl-color:: yellow
|
||
- surface: 2 sides of a platter
|
||
hl-page:: 465
|
||
ls-type:: annotation
|
||
id:: 6437a4f9-3de4-451b-a7cc-faf67b8530e8
|
||
hl-color:: yellow
|
||
- spindle (轴;纺锤): connected with a motor that spins the platters bound around it. rotations per minute (RPM)
|
||
hl-page:: 465
|
||
ls-type:: annotation
|
||
id:: 6437a4fd-450b-49ff-acd1-e46d3b507079
|
||
hl-color:: yellow
|
||
- track: a concentric circle of sectors, a surface consists of many tracks.
|
||
hl-page:: 465
|
||
ls-type:: annotation
|
||
id:: 6437a503-6b91-4b61-b288-9cea9c2ea832
|
||
hl-color:: yellow
|
||
- disk head: magnetic sensor, one per surface
|
||
hl-page:: 465
|
||
ls-type:: annotation
|
||
id:: 6437a50b-a53a-476b-8ef2-8bcbc21d7073
|
||
hl-color:: yellow
|
||
- disk arm: all disk heads connect to the disk arm, which moves disk head to get to the desired track
|
||
hl-page:: 465
|
||
ls-type:: annotation
|
||
id:: 6437a50f-5c6c-47ff-9179-ac48118342d7
|
||
hl-color:: yellow
|
||
- **IO time**
|
||
- **Rotational Delay**: wait for the desired sector to rotate under the disk head
|
||
hl-page:: 466
|
||
ls-type:: annotation
|
||
id:: 6437a841-9b37-42dc-a8dc-339085099a5a
|
||
hl-color:: yellow
|
||
- **Seek operation**: move the *disk head* to the ==desired track==.
|
||
hl-page:: 467
|
||
ls-type:: annotation
|
||
id:: 6437aa03-61a9-40c1-ba53-98d0e1ab87b9
|
||
hl-color:: yellow
|
||
- Seek phases: Acceleration (start), Coasting (move at full speed), Deceleration (slow down), Settling (stop carefully, often take most of the time)
|
||
- **General IO process**: 1. seek; 2. waiting for the rotational delay; 3. finally the transfer.
|
||
hl-page:: 467
|
||
ls-type:: annotation
|
||
id:: 6437abff-a6b8-4d28-8a4e-8e67fe9cdd4d
|
||
hl-color:: yellow
|
||
- Mathematical Analysis
|
||
ls-type:: annotation
|
||
hl-page:: 469
|
||
hl-color:: yellow
|
||
id:: 6437bbc8-c313-4ec2-81ef-c3b0969214e4
|
||
- IO time: $T_{IO} = T_{seek} + T_{rotation} + T_{transfer}$
|
||
- IO rate: $R_{IO} = \frac{Size_{\text{trans}}}{T_{IO}}$
|
||
- random workload, issues small (e.g., 4KB) reads to random locations on the disk
|
||
hl-page:: 470
|
||
ls-type:: annotation
|
||
id:: 6437d014-ef82-45e4-8083-974da0d39296
|
||
hl-color:: yellow
|
||
- sequential workload, reads a large number of sectors consecutively from the disk
|
||
ls-type:: annotation
|
||
hl-page:: 470
|
||
hl-color:: yellow
|
||
id:: 6437d023-d891-45f1-89e5-c08801e33d71
|
||
- As for **random** workload, $T_{\text{trans}} \approx \frac{Size_{\text{trans}}}{\text{Peak Transfer Rate}},T_{\text{rotation}} \approx \frac{1}{2}\frac{1}{\text{RPM}/60}$ and $T_{seek}$ is an average value measured by manufacturer.
|
||
id:: 6437bb98-57d2-4924-af5c-74b6be542e8f
|
||
- As for **sequential** workload, we can assume there is ==a single seek and rotation== before ==a long transfer==, and the result is very close to the *Peak Transfer Rate*, especially when read size is very large.
|
||
- Average Seek Time is roughly 1/3 of a full seek (from inner-most track to out-most), which could be derived from a simple integral
|
||
hl-page:: 472
|
||
ls-type:: annotation
|
||
id:: 6437d17a-0bee-478e-a843-fea71d3b74e2
|
||
hl-color:: yellow
|
||
- Miscellaneous details about HDD
|
||
- **Track skew**: optimization for continuous read across track boundary
|
||
hl-page:: 467
|
||
ls-type:: annotation
|
||
id:: 6437acc8-bc95-466b-9d04-acfe22b0eeee
|
||
hl-color:: yellow
|
||
- **Multi-zoned Disk**: outer tracks tend to have more sectors than inner tracks. a zone is a set of tracks with the same number of sectors, and a disk is organized into multiple zones
|
||
hl-page:: 468
|
||
ls-type:: annotation
|
||
id:: 6437ad1b-5292-4a42-80c4-8a1ff9f7f691
|
||
hl-color:: yellow
|
||
- cache, write back and write through
|
||
hl-page:: 468
|
||
ls-type:: annotation
|
||
id:: 6437ada7-4a51-4032-bdcc-110b47796be9
|
||
hl-color:: yellow
|
||
- **Disk Scheduling**
|
||
hl-page:: 473
|
||
ls-type:: annotation
|
||
id:: 6437d1c9-ffce-44b6-b9ee-9e8c4d29a3fc
|
||
hl-color:: yellow
|
||
- **FCFS**
|
||
- Though not included in this textbook, put it here for a full covering.
|
||
- **SSTF: Shortest Seek Time First**
|
||
ls-type:: annotation
|
||
hl-page:: 473
|
||
hl-color:: yellow
|
||
id:: 6437d47e-ea32-439d-98c2-364af2d48f58
|
||
- First complete requests on the ==track nearest== to the disk head's current track.
|
||
hl-page:: 473
|
||
ls-type:: annotation
|
||
id:: 6437d48d-ce5b-4a23-b2b3-00d1696a54b5
|
||
hl-color:: yellow
|
||
- **Nearest Block First (NBF)**: schedule by block address, because the *track* information is unavailable for OS (OS only sees an array of blocks).
|
||
- Problem: ==starvation== of requests to far-away tracks
|
||
- Figure 37.8: SSTF: Sometimes Not Good Enough
|
||
ls-type:: annotation
|
||
hl-page:: 475
|
||
hl-color:: yellow
|
||
id:: 6437e177-28f4-4b48-a501-f8e0620b3026
|
||
- When $T_{seek} \gg T_{rotation}$, SSTF is a good policy
|
||
- When $T_{seek} \lt T_{rotation}$, sometimes it is better to seek to another track than to wait for a full rotational time.
|
||
- **Elevator (SCAN)**
|
||
hl-page:: 474
|
||
ls-type:: annotation
|
||
id:: 6437d990-abcf-4f2d-a9c1-13f7c853c00a
|
||
hl-color:: yellow
|
||
- Move back and forth across the disk servicing requests in order across the tracks. If a request for a block on a track already serviced in this *sweep* (a single pass from outer to inner tracks, or reversed), it won't be handled until next *sweep*.
|
||
hl-page:: 474
|
||
ls-type:: annotation
|
||
id:: 6437da6a-2921-4909-a9b9-b5cbd844e04b
|
||
hl-color:: yellow
|
||
- **F-SCAN**: freeze the queue during a sweep, which avoids starvation of far-away requests, though delays late-arriving (but nearer by) requests.
|
||
- **C-SCAN**: sweep in a single direction (and than reset) rather than both. A bit more fair for outer and inner tracks, because bi-directional sweep favors middle tracks (twice).
|
||
- Problem: it doesn't make any effort to emulate SJF. Instead, it ==only prevents starvation==.
|
||
- **SPTF: Shortest Positioning Time First**
|
||
ls-type:: annotation
|
||
hl-page:: 475
|
||
hl-color:: yellow
|
||
id:: 6437e26b-14ef-49f3-968a-956509d62296
|
||
- SSTF is not the best policy for modern HDDs where seek time and rotation time are roughly equal.
|
||
- SPTF requires detailed information about the disk internals. Thus, it becomes a part of the disk controller rather than driver in OS. OS issues a few requests to disk controller, and the disk itself decides how to serve these requests.
|
||
- in/at a pinch 必要时;不得已时
|
||
hl-page:: 475
|
||
ls-type:: annotation
|
||
id:: 6437e0d2-c585-4c74-a7d1-500ae29b38df
|
||
hl-color:: green
|
||
- gem 宝石
|
||
hl-page:: 475
|
||
ls-type:: annotation
|
||
id:: 6437e0db-ceb2-4b13-ae37-5598fa7dd519
|
||
hl-color:: green
|
||
- ## Redundant Arrays of Inexpensive Disks(RAIDs)
|
||
hl-page:: 480
|
||
ls-type:: annotation
|
||
id:: 6437e8b0-b179-46c1-9173-e9b080273f7e
|
||
hl-color:: yellow
|
||
collapsed:: true
|
||
- RAID Interface
|
||
id:: 643e8edc-8d60-4c99-ac89-8fb4720a1ac4
|
||
- Look like a ==big, fast and reliable disk==, which provides an abstraction of ==a linear array of blocks==. Usually, a RAID is connected to the host through ==standard interfaces== (e.g. SATA)
|
||
- Internally, the RAID controller decides how to perform ==physical I/Os== in order to complete a single ==logical I/O==.
|
||
- At a high level, a RAID is very much a specialized computer system: it has a processor, memory, and disks; however, instead of running applications, it runs specialized software designed to operate the RAID.
|
||
ls-type:: annotation
|
||
hl-page:: 482
|
||
hl-color:: yellow
|
||
id:: 6437ef13-e1d1-4dce-bbd9-1a6f09dae4f0
|
||
- Fault Model
|
||
ls-type:: annotation
|
||
hl-page:: 482
|
||
hl-color:: yellow
|
||
id:: 6437ef79-0ca3-4937-9861-2648b2579524
|
||
- **fail-stop** fault model
|
||
- A disk can be either *working* or *failed*. If working, all blocks can be read/written. If failed, permanently lost (ignore realistic errors like corruption or latent sector error).
|
||
- Disk failure can be immediately detected
|
||
- **RAID0**: Striping
|
||
hl-page:: 483
|
||
ls-type:: annotation
|
||
id:: 6437f261-2d97-4f0c-85aa-06dd6d230ce0
|
||
hl-color:: yellow
|
||
collapsed:: true
|
||
- spread the blocks of the array across the disks in a round-robin fashion
|
||
ls-type:: annotation
|
||
hl-page:: 483
|
||
hl-color:: yellow
|
||
id:: 6437fe8e-f81a-4646-aab2-87c5b3376e91
|
||
- Chunk Size: number of consecutive blocks placed in one disk before moving on to the next disk
|
||
hl-page:: 484
|
||
ls-type:: annotation
|
||
id:: 6437feab-eceb-4f11-9ced-ae43e2798c0c
|
||
hl-color:: yellow
|
||
- Small *chunk size* increases parallelism but positioning time increases as well
|
||
- Large chunk size decreases parallelism but positioning time also decreases (more consecutive reads in the same disk)
|
||
- **Capacity**: Full utilization, `N` disk each of size `B` blocks make `N*B` blocks available
|
||
- **Reliability**: No redundancy at all.
|
||
- **Performance**: Full utilization in parallel. *Single-request latency* is identical to that of a single disk, while it offers full bandwidth as for *steady-state sequential throughput*.
|
||
- Two performance metrics
|
||
- **single-request latency**: how much parallelism can exist during a single logical I/O operation
|
||
- **steady-state throughput**: total bandwidth of many concurrent requests, under two basic types of workload: ==random and sequential==
|
||
- **RAID1**: Mirroring
|
||
hl-page:: 486
|
||
ls-type:: annotation
|
||
id:: 64380351-ec20-460c-bf79-a423d22e59e3
|
||
hl-color:: yellow
|
||
- make more than one copy of each block in the system
|
||
- Read: read from any one of these copies; Write: update all copies
|
||
- ((64382b19-3f45-4fb0-9160-638bdbfdf481))
|
||
- **Capacity**: expensive, only half of RAID0
|
||
- **Reliability**: RAID1 can tolerate at least 1 disk failure, and up to `N/2` failures depending on the actual situation
|
||
- **Performance**
|
||
- Single-request latency:
|
||
- 1. As for read, identical to a single disk.
|
||
2. As for write, slightly higher than a single disk because it has to wait for multiple disks.
|
||
- Steady-state throughput:
|
||
- 1. Under sequential workload, only half the total bandwidth.
|
||
2. Random write also gets half of the bandwidth.
|
||
3. However for random read, full bandwidth is available by distributing reads across redundant disks (We can't do this for sequential read, in comparison to RAID0, because the operation needs to skip blocks while it is a consecutive read in RAID0, see ((64380cd0-34b3-4c34-8c35-a5cf1bf77eee))).
|
||
- To see that this is not the case
|
||
hl-page:: 489
|
||
ls-type:: annotation
|
||
id:: 64380cd0-34b3-4c34-8c35-a5cf1bf77eee
|
||
hl-color:: yellow
|
||
- **RAID4**: Parity
|
||
hl-page:: 489
|
||
ls-type:: annotation
|
||
id:: 64380d3f-30a1-4780-aff2-96cfeb474786
|
||
hl-color:: yellow
|
||
- Add an additional disk to store the parity information for other disks.
|
||
- Each block in the parity disk stores the XOR of other disks' blocks which are in the same stripe.
|
||
- XOR indicates if there are odd or even number of 1s in the input. Given the parity bit and the remaining bits, any one bit lost can be recovered. With this parity bit, we just count how many 1s are there in the remaining bits and the lost bit can be derived.
|
||
id:: 6438114c-fb40-4e2f-a60a-59f48922f5db
|
||
- **Parity computation (single write)**
|
||
id:: 64382061-9676-4d98-af30-d033789eed50
|
||
- **additive parity**: read all other data blocks in the same stripe in parallel, do XOR, and write the new data block and new parity block in parallel
|
||
hl-page:: 491
|
||
ls-type:: annotation
|
||
id:: 643821ec-6db3-410c-87bf-0a2c9928cdf9
|
||
hl-color:: yellow
|
||
- **subtractive parity**: read the old data block and the old parity block, if new bit is identical to the old bit, parity stays unchanged; else, parity bit flips, and finally write them in parallel (Because we are dealing with blocks, there is little chance that the parity block stays totally unchanged).
|
||
ls-type:: annotation
|
||
hl-page:: 491
|
||
hl-color:: yellow
|
||
id:: 64382387-01fb-4f28-b5bf-d68dcb529642
|
||
The calculation can be expressed as $P_{new} = (C_{old} \oplus C_{new}) \oplus P_{old}$
|
||
- **Capacity**: `(N-1)*B` useful capacity
|
||
- **Reliability**: tolerate 1 disk failure and only 1
|
||
- **Performance**:
|
||
- Single-request latency
|
||
id:: 643811af-3534-4f5c-bc93-ae6da9056d5d
|
||
- 1. Single read is identical to a single disk
|
||
2. Single write is roughly twice of a single disk (2 reads and 2 writes, both in parallel).
|
||
- Steady-state throughput
|
||
- 1. *Sequential read* can use all disks except the parity disk, i.e. `(N-1)*S`.
|
||
2. *Sequential write* is also `(N-1)*S` in average. The blocks are consecutive and in large quantity, so we can perform *full-stripe write*, i.e. calculate parity and write the whole stripe (including the parity disk) in parallel, without overhead.
|
||
3. *Random read* is similar to sequential read, `(N-1)*R`
|
||
4. *Random write* is only half of a single disk, `R/2`. **small-write problem**: even though data disk writes can be done concurrently, the parity disk force them to serialize. In either way ((64382061-9676-4d98-af30-d033789eed50)), parity disk requires 1 read and 1 write, thus halving the bandwidth.
|
||
- **RAID5**: Rotating Parity
|
||
hl-page:: 493
|
||
ls-type:: annotation
|
||
id:: 6438241b-f487-4cf3-b717-60811340a5bd
|
||
hl-color:: yellow
|
||
collapsed:: true
|
||
- Improved version of *RAID4*, RAID5 rotate the parity block across drives.
|
||
- ((64382b1e-7c59-4729-a70f-68005b0640b4))
|
||
- **Performance**
|
||
- Except for *Random R/W*, all other stuff is almost identical to *RAID4*
|
||
- *Random read*: slightly larger bandwidth, since we can use all disks now, `N*R`
|
||
- *Random write*: allow for some parallelism, but conflict still exists (data block and parity block on the same disk), so there can be ==at most N/2 operations at the same time==. `N/4 * R = N/2 * (R/2)`
|
||
- Figure 38.8: RAID Capacity, Reliability, and Performance
|
||
ls-type:: annotation
|
||
hl-page:: 494
|
||
hl-color:: yellow
|
||
id:: 64382a3d-6747-462d-8431-7775ad76cc22
|
||
- latent 潜在的;潜伏的;隐藏的
|
||
ls-type:: annotation
|
||
hl-page:: 482
|
||
hl-color:: green
|
||
id:: 6437f025-9841-4e02-a629-66940e341341
|
||
- incur 招致,引发;蒙受
|
||
hl-page:: 484
|
||
ls-type:: annotation
|
||
id:: 6437fffc-d2d1-4dbc-a28e-89e95b6efdfa
|
||
hl-color:: green
|
||
- deem 认为,视作;相信
|
||
hl-page:: 485
|
||
ls-type:: annotation
|
||
id:: 64380150-55a0-493e-9fee-a5c666a095d4
|
||
hl-color:: green
|
||
- taxonomy 分类学,分类系统
|
||
ls-type:: annotation
|
||
hl-page:: 495
|
||
hl-color:: green
|
||
id:: 64382546-3a1e-438d-9f0e-434018661bda
|
||
- tandem 串联,串联的
|
||
hl-page:: 498
|
||
ls-type:: annotation
|
||
id:: 64382ab8-ad41-4c20-be74-3ce7446f20d6
|
||
hl-color:: green
|
||
- ## Files And Directories
|
||
ls-type:: annotation
|
||
hl-page:: 498
|
||
hl-color:: yellow
|
||
id:: 6438d8bf-19fd-4a4a-b491-3887c425aebf
|
||
collapsed:: true
|
||
- File Directory
|
||
- File: a linear array of bytes, each of which can be read or written
|
||
- Directory: a list of *(user-readable name, low-level name)* pairs
|
||
- directory tree, root directory, separator, sub-directories, absolute pathname
|
||
- inode number: low-level name of a file or directory
|
||
- File operations
|
||
- Creating Files: system call `open` with `O_CREAT` flag
|
||
hl-page:: 500
|
||
ls-type:: annotation
|
||
id:: 6438dd4b-55ba-4649-90fd-7de69ed9c2ba
|
||
hl-color:: yellow
|
||
- file descriptor: an integer, ==private per process==, used to access files
|
||
hl-page:: 501
|
||
ls-type:: annotation
|
||
id:: 6438dd78-68fe-47c6-8ca6-b2de3206f4f1
|
||
hl-color:: yellow
|
||
- can be seen as a handle for file operations, or a pointer to an object of type file
|
||
- Reading And Writing Files: system call `read` and `write`
|
||
hl-page:: 502
|
||
ls-type:: annotation
|
||
id:: 6438e073-2636-4ec8-807d-3ecf83a5c0c0
|
||
hl-color:: yellow
|
||
- Non-sequential access: system call `lseek(fd, offset, whence)`
|
||
- explicit and implicit update to file offset
|
||
- open file table: represent all currently opened files in the system
|
||
- Shared File Table Entries
|
||
ls-type:: annotation
|
||
hl-page:: 506
|
||
hl-color:: yellow
|
||
id:: 6438e88b-e3e6-44a4-9f79-411c1f27ae71
|
||
- On every `open` call, the OS creates a new entry in the *open file table* even for the same file (same inode). Thus, they have ==independent offsets==.
|
||
- Through `fork` or `dup`, we can make 2 file descriptors point to the ==same entry==. In this case, ==reference count== is needed to track when to release the entry.
|
||
- Figure 39.3: Processes Sharing An Open File Table Entry
|
||
ls-type:: annotation
|
||
hl-page:: 508
|
||
hl-color:: yellow
|
||
id:: 6438e98b-b46f-4038-a984-eb172a628cc7
|
||
- Writing Immediately
|
||
ls-type:: annotation
|
||
hl-page:: 509
|
||
hl-color:: yellow
|
||
id:: 643a7463-39e5-4645-8b43-b8be7b8ea7bc
|
||
- `write` system call is generally buffered, so the change may be applied to the disk some time later
|
||
- In rare cases, this could lead to data loss which is unacceptable for software like DB.j
|
||
- `fsync()` a particular file descriptor, and then the FS will force all dirty data to disk.
|
||
- In some cases, `fsync` the directory containing the target file is also necessary.
|
||
id:: 643a7594-e799-4dec-a9f6-fa94fada363f
|
||
- Renaming File: A special system call `rename(char*, char*)` for this, which is usually implemented to be atomic.
|
||
hl-page:: 509
|
||
ls-type:: annotation
|
||
id:: 643a782a-b874-4f34-9968-50b69a04b849
|
||
hl-color:: yellow
|
||
- File Information: `stat` or `fstat` system call which fetches information store in the file's *inode*
|
||
hl-page:: 510
|
||
ls-type:: annotation
|
||
id:: 643a789e-6346-4845-9d38-95d26927a32b
|
||
hl-color:: yellow
|
||
- Removing Files: `unlink`
|
||
hl-page:: 511
|
||
ls-type:: annotation
|
||
id:: 643a79be-d297-4f8d-8825-83c9f830af0e
|
||
hl-color:: yellow
|
||
- Directory Operations
|
||
- Making Directories: `mkdir`. Even an empty (newly created) directory has 2 entries: `.` and `..`
|
||
hl-page:: 512
|
||
ls-type:: annotation
|
||
id:: 643a7a52-0c85-435c-8250-3f4198a09fc0
|
||
hl-color:: yellow
|
||
- Reading Directories: 3 calls `opendir` `readdir` `closedir` and a `dirent` structure with a few fields. A directory "file" is comprised of many entries said above.
|
||
hl-page:: 513
|
||
ls-type:: annotation
|
||
id:: 643a7c32-0be4-4a2e-9532-31869f4e725a
|
||
hl-color:: yellow
|
||
- Deleting Directories: `rmdir`. Note that, this syscall can only remove empty directories or it will simply fail.
|
||
hl-page:: 514
|
||
ls-type:: annotation
|
||
id:: 643a7d55-768c-4881-b78d-f721c8d7929d
|
||
hl-color:: yellow
|
||
- Links
|
||
- Hard Links
|
||
hl-page:: 514
|
||
ls-type:: annotation
|
||
id:: 643a7d87-b945-4068-94dd-2b2d203eae67
|
||
hl-color:: yellow
|
||
- syscall `link` creates another name in the directory which refers to ==the same inode== of the original file.
|
||
- The *inode* keeps a reference count indicating how many hard links refer to it. On each `unlink`, RC decreases and the file will be deleted once the RC gets to 0.
|
||
- Hard links are essentially ==entries in directories== and hard links pointing to the same *inode* are just ==identical except their names==.
|
||
- An interesting usage of `link` is to rename, link to new and unlink the old
|
||
- Limitation: cannot link to a directory, cannot link to file on another partition (because *inode numbers* are only unique in the same FS/partition)
|
||
- Symbolic Links (Soft Links)
|
||
hl-page:: 516
|
||
ls-type:: annotation
|
||
id:: 643a8017-1499-4cd3-a015-0f1e8d143e93
|
||
hl-color:: yellow
|
||
- syscall `symlink`
|
||
- A symbolic link is essentially a ==special type of file==, which holds the pathname of the linked-to file.
|
||
- Dangling reference is possible, when then original file is deleted.
|
||
- Permission Bits
|
||
ls-type:: annotation
|
||
hl-page:: 518
|
||
hl-color:: yellow
|
||
id:: 643a84cb-102e-4beb-a138-e8690f68356f
|
||
- 10 characters (as shown in the out of `ls`)
|
||
- The left most indicates the type of the file, such as `-` for regular, `d` for directory, `s` for symbolic link and so on.
|
||
- The other characters are grouped by 3, each corresponding to a bit. Each 3-bit group indicates the permission of *owner*, *group* and *anyone*, and the bits means namely r/w/x. Note that as for directories, x bit represents the permission to enter directory.
|
||
- eponymous (与标题)同名的
|
||
hl-page:: 512
|
||
ls-type:: annotation
|
||
id:: 643a7a83-66d0-4bbc-8b17-bd7604a0ed5f
|
||
hl-color:: green
|
||
- hamster 仓鼠
|
||
hl-page:: 515
|
||
ls-type:: annotation
|
||
id:: 643a7dda-5c26-4ed5-8e06-c03e5a0e9fb7
|
||
hl-color:: green
|
||
- ## File System Implementation
|
||
hl-page:: 526
|
||
ls-type:: annotation
|
||
id:: 643a88bf-206b-4b6d-9d92-95516bcbe270
|
||
hl-color:: yellow
|
||
collapsed:: true
|
||
- The Mental Model: data structure of the FS and its access methods
|
||
hl-page:: 526
|
||
ls-type:: annotation
|
||
id:: 643a8c8e-b3ae-4de8-9280-75201b02d6db
|
||
hl-color:: yellow
|
||
- Disk Organization
|
||
id:: 643e8edc-c382-48ae-80d2-d661571bc5dd
|
||
- **blocks**: divide the disk into blocks (i.e. commonly used 4KB)
|
||
hl-page:: 527
|
||
ls-type:: annotation
|
||
id:: 643a8ce0-15b0-4d37-987d-6e649a59b616
|
||
hl-color:: yellow
|
||
- **data region**: most of the space is user data
|
||
hl-page:: 527
|
||
ls-type:: annotation
|
||
id:: 643a8ef8-7d3f-4794-9f00-4973a3be9bb7
|
||
hl-color:: yellow
|
||
- **inode table**: an array of inodes containing metadata to track per-file information
|
||
hl-page:: 528
|
||
ls-type:: annotation
|
||
id:: 643a8f4b-ac22-4cb8-b68d-e7f3078709c1
|
||
hl-color:: yellow
|
||
- **allocation structures**: recording information about free blocks, such as free list or bitmap. The `vsfs` from the book has one for inodes and another one for user data.
|
||
hl-page:: 528
|
||
ls-type:: annotation
|
||
id:: 643a8fa4-9783-4e86-8b04-2fad1f3a5c66
|
||
hl-color:: yellow
|
||
- **superblock**: information about the whole file system
|
||
hl-page:: 529
|
||
ls-type:: annotation
|
||
id:: 643a9022-da94-4092-8a64-e22cd2364365
|
||
hl-color:: yellow
|
||
- Inode
|
||
ls-type:: annotation
|
||
hl-page:: 529
|
||
hl-color:: yellow
|
||
id:: 643a92a1-90df-4342-a183-85c55242b8af
|
||
- **i-number**: each inode is referred to by a number (low-level name). Given an i-number, the on-disk location of the inode could be calculated fairly easily. Note that disk is not byte-addressed, need to read the whole sector.
|
||
- **metadata**: Inside each inode is virtually all of the information you need about a file. In addition, some necessary data for looking up data blocks.
|
||
- The Multi-Level Index
|
||
ls-type:: annotation
|
||
hl-page:: 531
|
||
hl-color:: yellow
|
||
id:: 643a93f0-e122-4ea5-a228-48607f93e404
|
||
- **direct pointer**: refer to one disk block that belongs to the file
|
||
- **indirect pointer**: points to a ==block that contains more pointers==, each pointing to a user data block.
|
||
- An inode may have s==ome fixed number of direct pointers==, and ==a single indirect pointer==. If a file grows large enough, an indirect block is allocated (from the ==data-block region of the disk==), and the inode’s slot for an indirect pointer is set to point to it. If even file grows even larger, add double/triple/... indirect pointer to your inode. The pointer refers to a ==block containing pointers to indirect blocks==, described above.
|
||
hl-page:: 531
|
||
ls-type:: annotation
|
||
id:: 643a950d-c2c2-419b-8206-0b957b7de178
|
||
hl-color:: yellow
|
||
- Figure 40.2: File System Measurement Summary -- Most files are small
|
||
hl-page:: 533
|
||
ls-type:: annotation
|
||
id:: 643a96d1-5114-40f7-9dad-4291462344ff
|
||
hl-color:: yellow
|
||
- Extent-based approaches
|
||
hl-page:: 532
|
||
ls-type:: annotation
|
||
id:: 643a975d-f0a2-47e7-bf85-63a300a2c504
|
||
hl-color:: yellow
|
||
- **extent**: a disk pointer and a length (how many contiguous blocks are there starting from the pointer)
|
||
- The advantage of this kind of approach is that, it is more compact, thus ==saving a lot of metadata==. The disadvantage is that, sometimes it is not easy to find many ==contiguous chunks==. Thus it works better when there is enough free space.
|
||
id:: 643a97ff-4b62-41bf-aa72-7316a8f3b974
|
||
- Linked-based approaches (FAT)
|
||
hl-page:: 534
|
||
ls-type:: annotation
|
||
id:: 643a9c2b-a519-4800-9188-253b13e9c2e5
|
||
hl-color:: yellow
|
||
- For each file, there is only one ==pointer to the first block== of the file. If more blocks are needed, add a ==pointer to another block at the end of this block==.
|
||
- Directory Organization
|
||
ls-type:: annotation
|
||
hl-page:: 533
|
||
hl-color:: yellow
|
||
id:: 643a990d-2b82-4bed-a49d-46746821b5a4
|
||
- A linear ==array of entries==. Each entry is a`(entry name, inode number)` pair, and perhaps an additional `length` (total bytes of this entry) and `strlen` (for the name).
|
||
- Directory, from the FS's perspective, is a ==special type of file==. Directories are allocated in *data region*, and also has corresponding inodes.
|
||
- Deleting a file can leave an empty space in the middle of the directory's block, and thus the FS needs to handle that (maybe mark it for reuse?).
|
||
- Free Space Management
|
||
ls-type:: annotation
|
||
hl-page:: 535
|
||
hl-color:: yellow
|
||
id:: 643a9d3c-3847-4a2e-b50e-f8423b089b39
|
||
- For our simple `vsfs`, search through the bitmap.
|
||
- pre-allocation: when allocating, look for a sequence of blocks (contiguous on disk) and give them to the new file, in order to improve performance
|
||
hl-page:: 535
|
||
ls-type:: annotation
|
||
id:: 643a9d6e-eab7-475c-9822-9b2efa21e6f0
|
||
hl-color:: yellow
|
||
- Reading A File From Disk
|
||
ls-type:: annotation
|
||
hl-page:: 536
|
||
hl-color:: yellow
|
||
id:: 643a9e6c-c2bd-48b9-80b8-b98340f91f7a
|
||
- Open: traverse the pathname and locate the desired inode. Since the inode of root directory `/` is fixed, we can start from `/`: load inode, read directory data, search for the next-level entry and recursively go down until the desired file's inode is loaded.
|
||
hl-page:: 536
|
||
ls-type:: annotation
|
||
id:: 643a9e8a-5f34-4dc9-817b-30466e84ffe2
|
||
hl-color:: yellow
|
||
- The amount of I/O generated by the open is proportional to the length of the pathname.
|
||
ls-type:: annotation
|
||
hl-page:: 537
|
||
hl-color:: yellow
|
||
id:: 643ab82b-d15c-4298-b52b-30c27da1deb0
|
||
- Read: first consult the file's inode for block address, and may update `access_time` field in the inode after read.
|
||
- Writing A File To Disk
|
||
ls-type:: annotation
|
||
hl-page:: 537
|
||
hl-color:: yellow
|
||
id:: 643ab83e-d2fe-4f02-b467-b969ab2482cc
|
||
- Write (with new block allocation): besides the cost of `open`, write to new block needs 5 IOs: read-write data bitmap (to allocate new free block), read-write inode (give the new block to the file), and finally write the block itself.
|
||
- Create: First of all, walk the path to its parent directory (a lot of IOs similar to `open`). Then, read-write inode bitmap (to allocate new inode block), write the new inode, read-write parent inode. Finally, if the directory block cannot accommodate the new file, more IOs generated.
|
||
- Caching and Buffering
|
||
ls-type:: annotation
|
||
hl-page:: 539
|
||
hl-color:: yellow
|
||
id:: 643abc13-2fdb-4a66-932a-eea1eafa449a
|
||
- static partitioning: a fixed-size cache to hold popular blocks with swap strategies such as LRU. Can be wasteful, though
|
||
hl-page:: 539
|
||
ls-type:: annotation
|
||
id:: 643abc6f-93a1-44e8-bb62-e56b2a9f2541
|
||
hl-color:: yellow
|
||
- dynamic partitioning: modern OSs integrate VM pages and FS pages into a unified page cache
|
||
hl-page:: 539
|
||
ls-type:: annotation
|
||
id:: 643abc73-d5b4-4627-9ab9-ef8863bc8f3d
|
||
hl-color:: yellow
|
||
- write buffering: FS can batch some updates into one IO to disk; FS can schedule subsequent IOs; some writes can be eliminated (such as an overwrite)
|
||
hl-page:: 540
|
||
ls-type:: annotation
|
||
id:: 643abc3c-040a-4f2f-89e4-ca14c9c0c230
|
||
hl-color:: yellow
|
||
- akin 相似的;类似的:
|
||
ls-type:: annotation
|
||
hl-page:: 532
|
||
hl-color:: green
|
||
id:: 643a96a0-f55d-4a84-bf93-4732a282b124
|
||
- readily 容易地;乐意地;
|
||
ls-type:: annotation
|
||
hl-page:: 533
|
||
hl-color:: green
|
||
id:: 643a96cb-bb7d-44c7-ac86-1e55a7ea3249
|
||
- per se 本质上,本身 by itself
|
||
hl-page:: 534
|
||
ls-type:: annotation
|
||
id:: 643a9be6-58e7-47a5-9ef2-bfff1f4c88a7
|
||
hl-color:: green
|
||
- bad mouth 说人坏话
|
||
hl-page:: 541
|
||
ls-type:: annotation
|
||
id:: 643abbf4-1d1a-4b32-9d91-345983406ce5
|
||
hl-color:: green
|
||
- ## Fast File System
|
||
ls-type:: annotation
|
||
hl-page:: 544
|
||
hl-color:: yellow
|
||
id:: 643abf69-04dc-428c-996d-b139cba0fa1f
|
||
collapsed:: true
|
||
- Problems with the rudimentary FS
|
||
- Data spread over the space, leading to long seek time between, e.g., inode and its data
|
||
- Free space get fragmented, making sequential read to a file slow
|
||
- Block size (512B) too small
|
||
- On-disk Structure
|
||
hl-page:: 546
|
||
ls-type:: annotation
|
||
id:: 643ac12f-817c-459b-b9ba-bbb1745519fb
|
||
hl-color:: yellow
|
||
- **cylinder**: ==a set of tracks== on different surfaces of a hard drive that are the ==same distance from the center== of the drive
|
||
hl-page:: 546
|
||
ls-type:: annotation
|
||
id:: 643ac141-3bff-4b00-9c18-57ad7c108fe5
|
||
hl-color:: yellow
|
||
- **cylinder groups**: aggregates consecutive cylinders into a group
|
||
hl-page:: 546
|
||
ls-type:: annotation
|
||
id:: 643ac144-24f3-46ca-9354-47c9d2f4ffe0
|
||
hl-color:: yellow
|
||
- **block groups**: modern HDDs tend not to expose internal structure, so the OS cannot have cylinder groups. Instead, organize the drive into block groups (they are consecutive anyways).
|
||
hl-page:: 547
|
||
ls-type:: annotation
|
||
id:: 643ac1ec-08d9-4584-b6ed-4332f8f791a0
|
||
hl-color:: yellow
|
||
- FFS spreads the components of a FS into each cylinder group.
|
||
- Each group has a copy of the super block.
|
||
- A per-group inode bitmap and data bitmap, keeping free block info in the group
|
||
- The remaining blocks are data blocks.
|
||
- Allocation Policies
|
||
hl-page:: 548
|
||
ls-type:: annotation
|
||
id:: 643ac42c-edf9-4b68-8890-b04d724c2ac2
|
||
hl-color:: yellow
|
||
- Basic mantra: Keep related stuff together.
|
||
- **Directory** placement: put the new directory in the cylinder group with a ==low number of allocated directories== (to balance directories across groups) and a ==high number of free inodes== (to subsequently be able to allocate a bunch of files)
|
||
hl-page:: 548
|
||
ls-type:: annotation
|
||
id:: 643ac55f-b4a2-4bfc-ad0b-23c785c4871e
|
||
hl-color:: yellow
|
||
- **File** placement: data blocks of a file in the ==same group as its inode== (long seeks between inode and data) , and it places files that are in the ==same directory== into the cylinder group of their enclosing directory.
|
||
hl-page:: 548
|
||
ls-type:: annotation
|
||
id:: 643ac616-ab90-40d5-94b2-7b8c294c4360
|
||
hl-color:: yellow
|
||
- The Large-File Exception
|
||
ls-type:: annotation
|
||
hl-page:: 551
|
||
hl-color:: yellow
|
||
id:: 643ac8cc-7164-4066-9730-231063c6c7c0
|
||
- Large file hurts locality by filling up the group and preventing other related files being placed in the same group
|
||
- After some number of blocks are allocated into the first block group, FFS places the next "l==arge" chunk of the file in another block group==. For example, put blocks pointed by direct pointers (12 blocks) in the first group, and those pointed to by the indirect block (1K blocks) in another.
|
||
hl-page:: 551
|
||
ls-type:: annotation
|
||
id:: 643ac9e7-1479-4be9-81fe-acb750f363b4
|
||
hl-color:: yellow
|
||
- Potential Performance Problem: large sequential read from a large file. However, with selected chunk size (threshold of going to another group), ==cost of seek between groups can be amortized==. The larger size of a chunk, the higher average bandwidth you will reach.
|
||
- Measuring File Locality
|
||
hl-page:: 550
|
||
ls-type:: annotation
|
||
id:: 643ac769-6511-4954-a43a-957f3b561c56
|
||
hl-color:: yellow
|
||
- The metric can be: the distance to the common ancestor of the 2 files which are consecutively opened.
|
||
- About 40% are either the same or under same directory (FFS captures this), and 25% have distance of 2 (FFS failed to capture this)
|
||
- A Few Other Things About FFS
|
||
hl-page:: 553
|
||
ls-type:: annotation
|
||
id:: 643acef0-5592-4bad-8557-878567dc18a1
|
||
hl-color:: yellow
|
||
divide block into sub-blocks to save disk space (4K is too large for small files), and modify *libc* to buffer writes in 4KB chunks;
|
||
parameterization, a different disk block layout to improve sequential read
|
||
- replica 复制品
|
||
ls-type:: annotation
|
||
hl-page:: 547
|
||
hl-color:: green
|
||
id:: 643ac2c5-a62f-473b-b0e5-32807c7b2e6a
|
||
- corollary 必然的结果(或结论)
|
||
hl-page:: 548
|
||
ls-type:: annotation
|
||
id:: 643ac41b-c2e0-4351-b0e3-012e792cf426
|
||
hl-color:: green
|
||
- extensive 广阔的,广泛的;大量的,大规模的
|
||
ls-type:: annotation
|
||
hl-page:: 549
|
||
hl-color:: green
|
||
id:: 643ac4ec-df99-430a-9e2e-78101c74b14e
|
||
- nuance (意义、声音、颜色、感情等方面的)细微差别
|
||
hl-page:: 549
|
||
ls-type:: annotation
|
||
id:: 643ac4f0-6809-4acd-b141-2a96174f5395
|
||
hl-color:: green
|
||
- watershed 分水岭
|
||
hl-page:: 555
|
||
ls-type:: annotation
|
||
id:: 643acd1b-5a18-4b14-acdc-006c2f97c0e8
|
||
hl-color:: green
|
||
- ## Crash Consistency
|
||
hl-page:: 558
|
||
ls-type:: annotation
|
||
id:: 643acfc2-eef7-4c7f-a8a5-1740c8788159
|
||
hl-color:: yellow
|
||
collapsed:: true
|
||
- Crash Scenarios
|
||
ls-type:: annotation
|
||
hl-page:: 560
|
||
hl-color:: yellow
|
||
id:: 643b7d72-4da5-42cf-abd5-dbbc3d9332d7
|
||
collapsed:: true
|
||
- Consider a ==write operation with new data block allocation== in the `vsfs` introduced above, which involves 3 independent write to the disk
|
||
- Only one operation is done
|
||
- data block: no a problem for FS, as if the write never happened, though user data get lost
|
||
- inode: FS inconsistency, bitmap says it is not allocated while inode says it is, read garbage from the block
|
||
- data bitmap: FS inconsistency, space leak, the block won't be utilized forever
|
||
- Two operations are done
|
||
- inode and bitmap: read garbage, though FS consistent
|
||
- data block and inode/bitmap: inconsistent
|
||
- The File System Checker
|
||
ls-type:: annotation
|
||
hl-page:: 562
|
||
hl-color:: yellow
|
||
id:: 643b8161-a7b8-496d-9121-4ce20ee8deb6
|
||
collapsed:: true
|
||
- Let inconsistencies happen and then fix them later when rebooting. This approach cannot solve all problems (like data loss), the only goal is to make the FS metadata consistent internally. Run before the FS is mounted
|
||
hl-page:: 562
|
||
ls-type:: annotation
|
||
id:: 643b81ae-edfd-4a9e-9df2-fdba2175dde2
|
||
hl-color:: yellow
|
||
- Basic summary of what `fsck` does
|
||
- Superblock: if corrupt, use an ==alternative copy==
|
||
- Free blocks: scan inodes, (double/triple...) indirect blocks to collect ==information about allocated blocks== and use this information to ==correct the bitmap==.
|
||
- Inode links: traverse the whole directory tree and calculate ==reference count for each inode==. Verify this for each inode. Ff inode allocated without any directory referring to it, move to `lost+found`
|
||
- Duplicates: multiple inode pointers point to the same block. Copy the block or clear inode
|
||
- Bad blocks, Inode state, Directory checks, etc.
|
||
- Problem: too slow
|
||
- Journaling (or Write-Ahead Logging)
|
||
ls-type:: annotation
|
||
hl-page:: 564
|
||
hl-color:: yellow
|
||
id:: 643b86d9-afcc-4494-87d1-275502df79a7
|
||
- Basic Idea: Before writing the structures in place, first write a log elsewhere on the disk. If crash takes place during the actual update, FS can fix inconsistency according to the log.
|
||
- **Data Journaling**
|
||
ls-type:: annotation
|
||
hl-page:: 565
|
||
hl-color:: yellow
|
||
id:: 643b8dc6-e380-49e3-ab08-ce27aa8767e2
|
||
- **physical logging**: put the exact physical contents of the update in the journal
|
||
hl-page:: 565
|
||
ls-type:: annotation
|
||
id:: 643b8ebd-b694-4862-90f0-9fb0f1a847f5
|
||
hl-color:: yellow
|
||
- **checkpointing**: overwrite the old structures in the FS
|
||
hl-page:: 565
|
||
ls-type:: annotation
|
||
id:: 643b8eed-3b37-4ac0-b924-9a2d035f2517
|
||
hl-color:: yellow
|
||
- **transaction identifier**: transaction begin including information about the pending update, and transaction end marker
|
||
hl-page:: 565
|
||
ls-type:: annotation
|
||
id:: 643b8f7d-76a2-477e-89ea-1321475b3dbe
|
||
hl-color:: yellow
|
||
- Journal write: Write the transaction (*Tx Begin* mark, data to update, *Tx End* mark) to log
|
||
- To make things faster, instead of issuing serial write requests, we may ==merge these requests.==
|
||
id:: 643b9070-8e4c-422a-8173-388fb801930d
|
||
- To avoid possible data loss during a single issue (due to internal disk scheduling), the *Tx End* mark must be written with ==a separate request==, while other part of the log can be issued as a package.
|
||
- Well, add a checksum is also a solution. With checksum, you can write all these stuff in a single request. If disk failed to propagate all of the bits to disk, this failure will be notice during the reboot scan and the log will be skipped.
|
||
hl-page:: 567
|
||
ls-type:: annotation
|
||
id:: 643b9816-e478-4f45-a5bd-fbe168fdc406
|
||
hl-color:: yellow
|
||
- Thus, this step can be split into 2 stages: ==Journal Write and Journal Commit==, which respectively means write Tx Begin mark and pending update and write Tx End mark.
|
||
- To re-use the log region, add a journal superblock on the disk for information about transaction checkpoint completion (free checkpointed ones). Perhaps a circular log.
|
||
- Protocol
|
||
- hl-page:: 570
|
||
ls-type:: annotation
|
||
id:: 643b9e00-4597-4a1a-890a-be95041f6b3b
|
||
hl-color:: yellow
|
||
1. **Journal write**: Write the contents of the transaction (Tx Begin, contents of the update) to the log; wait for these writes to complete.
|
||
2. **Journal commit**: Write the transaction commit block (Tx End) to the log; wait for the write to complete; the transaction is now committed.
|
||
3. **Checkpoint**: Write the contents of the update to their final locations within the file system.
|
||
4. **Free**: Some time later, mark the transaction free in the journal by updating the journal superblock.
|
||
- Recovery
|
||
ls-type:: annotation
|
||
hl-page:: 568
|
||
hl-color:: yellow
|
||
id:: 643b9301-4a07-459a-a413-5c2738560e10
|
||
- Crash before transaction commit, skip.
|
||
- Crash after transaction commit (but before checkpointing complete), replay.
|
||
- Redo Logging: On reboot, scan the log for committed transactions and try to write them again.
|
||
- **Metadata Journaling**
|
||
ls-type:: annotation
|
||
hl-page:: 570
|
||
hl-color:: yellow
|
||
id:: 643b96ec-18c9-4053-be7e-3b7d3b7dbbbd
|
||
- Data journaling doubles the traffic to disk, and seek between log area and main data area is costly.
|
||
- Metadata journaling writes metadata to log without data block. Data block is written directly to main data area before metadata is logged.
|
||
- Protocol
|
||
- hl-page:: 571
|
||
ls-type:: annotation
|
||
id:: 643b9b7c-a43e-4c50-9dec-1d8f30bae712
|
||
hl-color:: yellow
|
||
1. **Data write**: Write data to final location; wait for completion (optional).
|
||
2. **Journal metadata write**: Write the begin block and metadata to log; wait for writes to complete.
|
||
3. **Journal commit**: Write the transaction commit block (Tx End) to log; wait for the write to complete; the transaction (including data) is now committed.
|
||
4. **Checkpoint metadata**: Write the contents of the metadata update to their final locations in FS.
|
||
5. **Free**: Later, mark the transaction free in journal superblock.
|
||
- Actually, step 1 and step 2 can be issued concurrently, but Step 3 must wait for Step 1 and 2.
|
||
- Tricky Case: Block Reuse
|
||
ls-type:: annotation
|
||
hl-page:: 572
|
||
hl-color:: yellow
|
||
id:: 643b9c5a-c06b-4a4b-af75-9a2242069fc8
|
||
- Replay can cause data block to be overwritten when the block is re-used after deletion and the log is not freed in time.
|
||
- Well, the key point actually lies in that, directory information is considered as metadata. If the original block is a directory, the following operation sequence will cause problem: modify the directory entries, delete the directory, re-used the directory's block for a file. The recovery process will overwritten the file's data block with the old, deleted directory data.
|
||
- Other Approaches
|
||
ls-type:: annotation
|
||
hl-page:: 574
|
||
hl-color:: yellow
|
||
id:: 643ba14d-00f4-4f92-921c-740f3b6def61
|
||
- Soft updates: carefully order the writes to ensure on-disk structure is consistent at any time
|
||
- COW: never overwrite in place
|
||
- back-pointer: add backward pointer to inode to check consistency
|
||
- optimistic crash consistency: kind of transaction checksum
|
||
- premise 引出,预先提出;作为…的前提
|
||
ls-type:: annotation
|
||
hl-page:: 563
|
||
hl-color:: green
|
||
id:: 643b824d-2732-46ae-961d-74a06db18138
|
||
- tad 少量;一点儿:
|
||
ls-type:: annotation
|
||
hl-page:: 563
|
||
hl-color:: green
|
||
id:: 643b824f-c588-4b82-93e6-393016d3b5b1
|
||
- hideous 可怕的;丑恶的
|
||
ls-type:: annotation
|
||
hl-page:: 572
|
||
hl-color:: green
|
||
id:: 643b9c40-6329-4720-9d26-75a78701392c
|
||
- ## Log-structured File Systems
|
||
hl-page:: 579
|
||
ls-type:: annotation
|
||
id:: 643b8dad-3813-4048-8d04-5eb93a6bd182
|
||
hl-color:: yellow
|
||
collapsed:: true
|
||
- **Writing To Disk Sequentially**
|
||
hl-page:: 580
|
||
ls-type:: annotation
|
||
id:: 643bab9f-04f5-4e2f-b232-d6cfead45619
|
||
hl-color:: yellow
|
||
- write all updates (including metadata) to the disk sequentially, e.g. write a new data block, and then write its newly updated inode sequentially after it (rather than seek to the inode region far away)
|
||
- **Write Buffering**
|
||
hl-page:: 581
|
||
ls-type:: annotation
|
||
id:: 643bac5e-ce70-4d9b-92c5-4fb6dda099d6
|
||
hl-color:: yellow
|
||
- Writing sequentially alone doesn't mean good performance. A ==large number of contiguous writes or one large write== is the key to good write performance.
|
||
- Before writing to the disk, LFS ==keeps track of updates in memory==; when it has received a sufficient number of updates, (a *segment*) it writes them to disk all at once.
|
||
hl-page:: 581
|
||
ls-type:: annotation
|
||
id:: 643bac81-cd5a-49aa-a81b-aff6c2405a40
|
||
hl-color:: yellow
|
||
- Segment size: similar to evaluation here ((6437feab-eceb-4f11-9ced-ae43e2798c0c)). The larger chunk size, the better performance.
|
||
hl-page:: 582
|
||
ls-type:: annotation
|
||
id:: 643bb0bc-9781-484e-b249-224d89414165
|
||
hl-color:: yellow
|
||
- The effective rate of writing $R_{\text{effective}}$ and chunk size $D$:
|
||
$$R_{\text{effective}} = \frac{D}{T_{\text{write}}} = \frac{D}{T_{\text{position}}+\frac{D}{R_{\text{peak}}} } \\ D = \frac{F}{1-F}\times R_{\text{peak}} \times T_{\text{position}}$$
|
||
- **The Inode Map**, Finding inodes
|
||
hl-page:: 583
|
||
ls-type:: annotation
|
||
id:: 643bb0f6-b84c-469e-8188-0db6e86f36e8
|
||
hl-color:: yellow
|
||
- The i-map is a structure that maps inode-number to the disk address of the most recent version of the inode
|
||
hl-page:: 583
|
||
ls-type:: annotation
|
||
id:: 643bb162-9f99-4740-8fd6-859f236c1855
|
||
hl-color:: yellow
|
||
- LFS places chunks of the ==inode map right next to the other new information==. For example, when appending a data block to a file, LFS actually writes the new data block, its inode, and a piece of the inode map all together.
|
||
- **The Checkpoint Region**
|
||
hl-page:: 585
|
||
ls-type:: annotation
|
||
id:: 643bb250-205a-4ae9-8cff-0d715cfa6b7d
|
||
hl-color:: yellow
|
||
- Contains pointers to the latest pieces of the inode map. Note the checkpoint region is only updated periodically, without reduce performance too much.
|
||
- The look up process
|
||
- First look up CR for i-map (often cached in memory), then consult i-map for the directory's inode, then get file inode number from directory, finally consult i-map again for file's inode
|
||
- recursive update problem: Whenever an inode is updated, its location on disk changes. This would have also entailed an update to the directory that points to this file (change the pointer field, thus the directory needs to be written to a new location), which then would have mandated a change to the parent of that directory, and so on, all the way up the file system tree.
|
||
hl-page:: 586
|
||
ls-type:: annotation
|
||
id:: 643bb4de-bc1f-4f61-a5dd-036867e85fe7
|
||
hl-color:: yellow
|
||
- This won't be a problem for LFS. LFS maps inode number to address and directories store inode numbers rather than addresses, so even the inode moves to a new location there is no need to change the directory.
|
||
- Garbage Collection
|
||
ls-type:: annotation
|
||
hl-page:: 587
|
||
hl-color:: yellow
|
||
id:: 643bb6cb-61ae-4231-aaf4-d78f1b1a7851
|
||
- LFS leaves old versions of file structures scattered throughout the disk, though only the latest version is needed. Therefore, LFS has to periodically ==clean these old versions== of data and metadata.
|
||
- LFS cleaner works on a ==segment-by-segment basis==. Read in a number of old segments, collect live blocks, write them out to a new set of segments and finally free the old segments.
|
||
hl-page:: 588
|
||
ls-type:: annotation
|
||
id:: 643bb91c-62a0-4154-8d12-c9ae356a4fc7
|
||
hl-color:: yellow
|
||
- Determining Block Liveness
|
||
ls-type:: annotation
|
||
hl-page:: 588
|
||
hl-color:: yellow
|
||
id:: 643bb7e4-6d90-4d59-8930-29a243862288
|
||
- segment summary block: inode number and in-file offset of each data block
|
||
hl-page:: 588
|
||
ls-type:: annotation
|
||
id:: 643bba1e-4f7b-4291-bdd8-966dd366748c
|
||
hl-color:: yellow
|
||
- Pseudocode depiction
|
||
- ```python
|
||
# A -> block address
|
||
# N -> inode number
|
||
# T -> offset in file
|
||
(N,T) = SegmentSummary[A]
|
||
inode = Read(imap[N])
|
||
if (inode[T] == A):
|
||
return live
|
||
else:
|
||
return dead
|
||
```
|
||
- **version number**: in some cases (e.g., file deleted), LFS records file's version number in imap and summary block, and compares them during GC to speed up the check
|
||
hl-page:: 589
|
||
ls-type:: annotation
|
||
id:: 643bbc21-a025-4dd4-bdc2-4a1eb68abf5e
|
||
hl-color:: yellow
|
||
- Crash Recovery
|
||
ls-type:: annotation
|
||
hl-page:: 590
|
||
hl-color:: yellow
|
||
id:: 643bbd43-a342-4445-a808-b9800790a83c
|
||
- General write scheme
|
||
- LFS organizes writes in a log, i.e. the CR points to a head and tail segment, and each segment points to the next segment to write. CR is propagated to disk periodically.
|
||
- To make it clear, there is no separate "log" space on the disk similar to what journaling FSs do. The segments written to the disk are logs by themselves. See [Page 30, Figure 4-1, R92](https://www2.eecs.berkeley.edu/Pubs/TechRpts/1992/CSD-92-696.pdf)
|
||
- Checkpoint Region
|
||
- LFS keeps 2 CRs (at both ends of the disk) and write alternately. On writing, LFS first writes header (with timestamp), then body, finally a last block (with timestamp). In this way, crashes can be detected through inconsistent timestamps, and LFS can choose the latest CR to use.
|
||
- Roll Forward
|
||
hl-page:: 590
|
||
ls-type:: annotation
|
||
id:: 643bc080-2693-4a74-b261-56f92e3c75e4
|
||
hl-color:: yellow
|
||
- The basic idea is to start with the last checkpoint region, find the end of the log (included in the CR), and then use that to read through the next segments and see if there are any valid updates.
|
||
hl-page:: 590
|
||
ls-type:: annotation
|
||
id:: 643bc10e-be20-47b3-bab4-713493dd5153
|
||
hl-color:: yellow
|
||
- mandate (政府组织经过选举得到的)授权;(政府)任期;委托;
|
||
hl-page:: 586
|
||
ls-type:: annotation
|
||
id:: 643bb439-b7d0-4170-9417-cd900062bfbd
|
||
hl-color:: green
|
||
- entail 牵涉;须要;使必要 involve
|
||
hl-page:: 586
|
||
ls-type:: annotation
|
||
id:: 643bb533-1fee-4b9a-9965-7a63016d5591
|
||
hl-color:: green
|
||
- ceremonious 讲究礼节的,正式的
|
||
hl-page:: 587
|
||
ls-type:: annotation
|
||
id:: 643bb6e8-6466-4574-aa04-4ea25b3e9034
|
||
hl-color:: green
|
||
- cease 停止, 终止, 结束
|
||
ls-type:: annotation
|
||
hl-page:: 595
|
||
hl-color:: green
|
||
id:: 643bc4b4-dc22-471c-9229-558a42904cc8
|
||
- ## Flash-based SSDs
|
||
ls-type:: annotation
|
||
hl-page:: 595
|
||
hl-color:: yellow
|
||
id:: 643ba369-83df-42f9-9ee9-b45d4652e8fb
|
||
collapsed:: true
|
||
- Storing a Single Bit
|
||
ls-type:: annotation
|
||
hl-page:: 595
|
||
hl-color:: yellow
|
||
id:: 643bce2b-3e0d-4860-a7fd-34c0b0565fe2
|
||
- Flash chips are designed to store one or more bits in a single transistor; the level of charge trapped within the transistor is mapped to a binary value. Such as SLC (0, 1), MLC (00, 01, 10, 11), TLC and even QLC
|
||
hl-page:: 595
|
||
ls-type:: annotation
|
||
id:: 643bce49-47a3-43c6-ae82-825fd5224dd4
|
||
hl-color:: yellow
|
||
- From Bits to Banks
|
||
ls-type:: annotation
|
||
hl-page:: 596
|
||
hl-color:: yellow
|
||
id:: 643bcf1a-e41d-4ad2-83a3-ed550f9be123
|
||
- page: a few KB in size
|
||
- block (erase block): hundreds of KB, consists of many pages
|
||
- bank/plane: flash chips are organized into banks/planes, consisting of a large number of cells.
|
||
- Basic Flash Operations
|
||
ls-type:: annotation
|
||
hl-page:: 597
|
||
hl-color:: yellow
|
||
id:: 643bcf8f-05db-476d-8b73-e9a052d91e4d
|
||
- **Read** (a page): ==Any page==; Fast; Access any location ==uniformly quickly==
|
||
- **Erase** (a ==block==): Before writing to a page, the page's enclosing block must be *erased* (all set to 1). ==Expensive==. Flash chips will ==wear out== as it is erased.
|
||
- **Program** (a page): Once a block has been erased, it can be *programmed* by page, changing some of the 1s to 0s in order to write the desired content. Slower than *read*, but faster than *erase*.
|
||
- One way to think about flash chips is that each page has a state associated with it, namely INVALID, VALID and ERASED.
|
||
hl-page:: 597
|
||
ls-type:: annotation
|
||
id:: 643bd219-634d-4c9a-abf7-e266b5b3c2d7
|
||
hl-color:: yellow
|
||
- Reliability Problem
|
||
- Wear out
|
||
- when a flash block is erased and programmed, it slowly accrues a little bit of extra charge. Over time, as that extra charge builds up, it becomes increasingly difficult to differentiate between a 0 and a 1
|
||
ls-type:: annotation
|
||
hl-page:: 599
|
||
hl-color:: yellow
|
||
id:: 643bd3c4-c868-44fb-bd96-1ac7f3fe14c0
|
||
- Disturbance
|
||
- When accessing a particular page within a flash, it is possible that some bits get flipped in neighboring pages
|
||
ls-type:: annotation
|
||
hl-page:: 599
|
||
hl-color:: yellow
|
||
id:: 643bd3e8-304f-4e43-93a6-a8630df283b0
|
||
- Most SSDs will write pages in order (i.e., low to high), reducing reliability problems related to program disturbance.
|
||
ls-type:: annotation
|
||
hl-page:: 603
|
||
hl-color:: yellow
|
||
id:: 643bd8fd-99ed-4d7b-adaa-50be9ee619dc
|
||
- Flash Translation Layer (FTL)
|
||
hl-page:: 600
|
||
ls-type:: annotation
|
||
id:: 643bd544-e923-48b0-a513-2e8d3753e0c2
|
||
hl-color:: yellow
|
||
- FTL turns client reads and writes into internal flash operations, i.e., accepts requests on logical blocks and do low-level commands on underlying physical blocks and pages.
|
||
- **write amplification**: The total traffic issued to the flash chips by FTL $\div$ the total traffic issued by the client.
|
||
hl-page:: 600
|
||
ls-type:: annotation
|
||
id:: 643bd5c6-9fbd-4bac-a71a-0e86a73b7ce2
|
||
hl-color:: yellow
|
||
- Goal: More parallelism, Less write amplification, Reduce wear out, Minimize program disturbance
|
||
- Direct mapped FTL
|
||
hl-page:: 601
|
||
ls-type:: annotation
|
||
id:: 643bd69a-5cc5-4ce1-97c9-805f422a0562
|
||
hl-color:: yellow
|
||
- A logical page is mapped directly to a physical page.
|
||
- Bad idea. Write is slow and leads to severe amplification, because it needs to read, erase and program the whole block for a single page.
|
||
- Log-Structured FTL
|
||
ls-type:: annotation
|
||
hl-page:: 602
|
||
hl-color:: yellow
|
||
id:: 643bd777-947b-4627-844b-b84fd5573657
|
||
- Upon a write to logical block N , the device appends the write to the next free spot in the currently-being-written-to block.
|
||
hl-page:: 602
|
||
ls-type:: annotation
|
||
id:: 643bd89f-4561-4b1e-94fa-9bd46914d870
|
||
hl-color:: yellow
|
||
- To allow for subsequent reads of block N , the device keeps a mapping table which stores the physical address of each logical block in the system.
|
||
ls-type:: annotation
|
||
hl-page:: 602
|
||
hl-color:: yellow
|
||
id:: 643bd8de-7dda-4882-ab7c-bbbe75f2a925
|
||
- Garbage Collection
|
||
ls-type:: annotation
|
||
hl-page:: 604
|
||
hl-color:: yellow
|
||
id:: 643bdcba-9e46-4dfc-8366-6472c734abdb
|
||
- Find a block that contains dead pages, read its live pages, write those live pages to the log, and reclaim the entire block.
|
||
id:: 643bdcd2-053a-4c18-bf9a-393fd367ebef
|
||
- GC can be ==expensive==, requiring reading and rewriting of live data. The ideal candidate for reclamation is a ==block that consists of only dead pages==.
|
||
- overprovision: adding extra flash capacity, cleaning can be delayed and pushed to the background
|
||
hl-page:: 606
|
||
ls-type:: annotation
|
||
id:: 643bdd25-8551-40be-9072-2cc3342f6c42
|
||
hl-color:: yellow
|
||
- **trim** operation: inform FTL that the logical block has been deleted and thus the device no longer need to track it.
|
||
hl-page:: 606
|
||
ls-type:: annotation
|
||
id:: 643bde06-ad8f-42a7-a322-85d8b511d56e
|
||
hl-color:: yellow
|
||
- Mapping Table Size
|
||
ls-type:: annotation
|
||
hl-page:: 606
|
||
hl-color:: yellow
|
||
id:: 643bdf64-fb3a-4417-8dc8-3cc736841285
|
||
- Page-level mapping takes up too much space
|
||
- Block-Based Mapping
|
||
ls-type:: annotation
|
||
hl-page:: 606
|
||
hl-color:: yellow
|
||
id:: 643bdf97-2eae-45e1-b6be-c93c7c47112b
|
||
- Block-level mapping is akin to larger page size in VM, the basic unit grows from page to block.
|
||
- Terrible performance under log-structured scheme. Even a write is small (page size), the FTL has to read from the old block and write the updated block to log. This leads to severe write amplification.
|
||
- Hybrid Mapping
|
||
ls-type:: annotation
|
||
hl-page:: 608
|
||
hl-color:: yellow
|
||
id:: 643be2e5-4f61-4aa5-9885-e0fc862c3df6
|
||
- **log table**: FTL keeps a few blocks erased and directs all writes to them, and keeps per-page mappings for these *log blocks*.
|
||
- **data table**: per-block mappings
|
||
- When looking for a logical address, FTL first consults the *log table*, and consults the *data table* if not found.
|
||
- To keep the log table small, FTL has to periodically examine the *log blocks* and switch them into *data blocks* (which can be pointed to by a block-level mapping). The details of three different situation, refer to the example in the book.
|
||
- switch merge: the pages in a log block can exactly share the same block number
|
||
hl-page:: 609
|
||
ls-type:: annotation
|
||
id:: 643be6ec-00bb-413f-96a1-7268f5b01709
|
||
hl-color:: yellow
|
||
- partial merge: some of the pages in a log block can share the same block, so FTL needs to move their buddies here to form a data block
|
||
hl-page:: 610
|
||
ls-type:: annotation
|
||
id:: 643be6f3-d351-4b04-a201-03dda410950d
|
||
hl-color:: yellow
|
||
- full merge: none of these pages can share the same block. better not merge the block
|
||
hl-page:: 610
|
||
ls-type:: annotation
|
||
id:: 643be6f7-4656-4f93-a208-88a6fa9be6e0
|
||
hl-color:: yellow
|
||
- Page Mapping Plus Caching
|
||
hl-page:: 610
|
||
ls-type:: annotation
|
||
id:: 643be86a-9bd8-4962-92ac-76832cc93a6c
|
||
hl-color:: yellow
|
||
collapsed:: true
|
||
- Akin to paging in VM, load a small active set of the page-level mappings into the memory.
|
||
- If working set is limited, this approach works fine. Otherwise, frequent eviction will damage the performance.
|
||
- Wear Leveling
|
||
ls-type:: annotation
|
||
hl-page:: 611
|
||
hl-color:: yellow
|
||
id:: 643be88d-4648-4dc3-8f5a-fc7c45fa144a
|
||
collapsed:: true
|
||
- Spread erase/program across the blocks of the device evenly.
|
||
- The log structured approach does most of the work for this goal, but one problem remains. Blocks filled with long-lived data rarely get overwritten and thus do not receive fair share of write load.
|
||
- One simple solution could be periodically move such blocks elsewhere, but it will increase write amplification.
|
||
- SSD Performance
|
||
ls-type:: annotation
|
||
hl-page:: 611
|
||
hl-color:: yellow
|
||
id:: 643bdf55-53c3-407f-b87f-86b3d8f1141b
|
||
- SSD outperforms HDD dramatically in random IO, while there is less difference in Sequential IO.
|
||
- Random read is slower than random write for SSD, due to the log-structured design.
|
||
- accrue 逐渐增加;积累
|
||
hl-page:: 599
|
||
ls-type:: annotation
|
||
id:: 643bd3a4-af24-4e7f-905b-f3c3a8739831
|
||
hl-color:: green
|
||
- rigid 死板的;僵硬的
|
||
hl-page:: 600
|
||
ls-type:: annotation
|
||
id:: 643bd351-d4f4-406a-9910-f44ab31bc83f
|
||
hl-color:: green
|
||
- ## Data Integrity and Protection
|
||
ls-type:: annotation
|
||
hl-page:: 619
|
||
hl-color:: yellow
|
||
id:: 643ba392-acd9-4255-930e-a97f94fb28ef
|
||
collapsed:: true
|
||
- Disk Failure Modes
|
||
ls-type:: annotation
|
||
hl-page:: 619
|
||
hl-color:: yellow
|
||
id:: 643bec95-40fd-4df9-9981-1f6d641ec520
|
||
- Latent-sector errors
|
||
- LSEs arise when a disk sector (or group of sectors) has been damaged in some way.
|
||
ls-type:: annotation
|
||
hl-page:: 620
|
||
hl-color:: yellow
|
||
id:: 643beca7-e6d1-4a17-93ec-d7445eee92c1
|
||
- Head crash (disk head somehow touches the surface and damages it) or Cosmic rays!
|
||
- Can be detected or even corrected by in-disk ECC (error correcting code).
|
||
- Block Corruption
|
||
- Not detectable by the disk itself. Silent faults
|
||
- Buggy firmware, faulty bus
|
||
- Handling Latent Sector Errors
|
||
ls-type:: annotation
|
||
hl-page:: 621
|
||
hl-color:: yellow
|
||
id:: 643bed56-bc80-4332-b799-933755811759
|
||
- Since LSEs can be ==easily detected==, the storage system simply uses whatever ==redundancy mechanism to recover== this.
|
||
- Detecting Corruption: The Checksum
|
||
ls-type:: annotation
|
||
hl-page:: 622
|
||
hl-color:: yellow
|
||
id:: 643beee5-af3c-44c2-bf55-716c0a4ce0c4
|
||
- A function takes a chunk of data as input and produces ==a small summary of the data==, which is the checksum. Checksum should enable the system to detect data corruption by ==re-computing and matching==
|
||
- Common Checksum Functions
|
||
ls-type:: annotation
|
||
hl-page:: 623
|
||
hl-color:: yellow
|
||
id:: 643befcd-f69a-4c19-bb76-21d8945d4cc8
|
||
- XOR: only detect odd number of bit(s) flip
|
||
- 2's compliment addition (ignoring overflow): vulnerable to shift
|
||
- Fletcher checksum: almost as strong as the CRC, detecting all single-bit, double-bit errors, and many burst errors
|
||
- ```C
|
||
uint16_t Fletcher16( uint8_t *data, int count )
|
||
{
|
||
uint16_t sum1 = 0;
|
||
uint16_t sum2 = 0;
|
||
int index;
|
||
for ( index = 0; index < count; ++index ) {
|
||
sum1 = (sum1 + data[index]) % 255;
|
||
sum2 = (sum2 + sum1) % 255;
|
||
}
|
||
return (sum2 << 8) | sum1;
|
||
}
|
||
```
|
||
- CRC: Treat the data block `D` as a large binary number and divide it by an agreed value `k`. The remainder is the CRC value.
|
||
- No perfect checksum, there is always a collision (non-identical data generate identical checksum)
|
||
- Checksum Layout
|
||
ls-type:: annotation
|
||
hl-page:: 624
|
||
hl-color:: yellow
|
||
id:: 643bf039-a6cb-475c-b990-df21d8f3919f
|
||
- If supported by drive manufacturer, one solution is to format the drive with 8-byte checksum and 520-byte data per sector.
|
||
- Another solution: the FS packs checksums into 512 Byte blocks to be stored in sectors with corresponding data sectors following.
|
||
- Using Checksums: compare *stored checksum* and *computed checksum*
|
||
hl-page:: 625
|
||
ls-type:: annotation
|
||
id:: 643bf2c6-9c4f-44a8-bcc5-0af3570b64be
|
||
hl-color:: yellow
|
||
- Misdirected Writes
|
||
ls-type:: annotation
|
||
hl-page:: 626
|
||
hl-color:: yellow
|
||
id:: 643bf2f7-fc6a-4289-a50b-784e6a765eb9
|
||
- Disk/RAID controllers write the data to disk correctly but ==in the wrong location==. Checksum itself won't help in this situation.
|
||
hl-page:: 626
|
||
ls-type:: annotation
|
||
id:: 643bf30b-6fcf-4d5f-9c13-dd29d4284f63
|
||
hl-color:: yellow
|
||
- Add an extra *physical ID* to each checksum, and we can check this since data itself is correct.
|
||
- Lost Writes
|
||
ls-type:: annotation
|
||
hl-page:: 627
|
||
hl-color:: yellow
|
||
id:: 643bf3ff-eeec-4c89-b385-6a104d0596bd
|
||
- The device informs the upper layer that a write is ==completed but in fact not persisted==. Checksum won't help, since the new checksum does not get to disk either.
|
||
hl-page:: 627
|
||
ls-type:: annotation
|
||
id:: 643bf40f-573a-4004-9b3f-443502a7a198
|
||
hl-color:: yellow
|
||
- Solution: Perform a write verify or read-after-write, though slow. Add a checksum elsewhere in the system to detect lost writes.
|
||
- Disk Scrubbing
|
||
hl-page:: 628
|
||
ls-type:: annotation
|
||
id:: 643bf592-4dec-43c4-b8ff-996d765e071b
|
||
hl-color:: yellow
|
||
- Most data is rarely accessed, and thus would stay unchecked, which affects the reliability.
|
||
- Many systems utilize disk scrubbing (i.e., periodically read through every block and check them)
|
||
- Overheads Of Checksumming
|
||
hl-page:: 628
|
||
ls-type:: annotation
|
||
id:: 643bf4f5-4ec4-4b42-962d-8c3a7729b64e
|
||
hl-color:: yellow
|
||
- Space: disk (take up user data space) and memory (mostly short-lived, not a problem)
|
||
- Time: CPU (has to compute through the data) and IO (checksum stored elsewhere, or scrubbing)
|
||
- CPU overheads can be reduced by combining data copying and checking, since copy is needed anyhow
|
||
- beverage (除水以外的)饮料
|
||
hl-page:: 623
|
||
ls-type:: annotation
|
||
id:: 643befc3-80a8-40de-a3b9-c994a90c0f0a
|
||
hl-color:: green
|
||
- scrub 擦洗;刷洗;矮树丛
|
||
hl-page:: 627
|
||
ls-type:: annotation
|
||
id:: 643bf4d3-df61-4530-928f-ed524699c44f
|
||
hl-color:: green
|
||
- spouse 配偶
|
||
ls-type:: annotation
|
||
hl-page:: 633
|
||
hl-color:: green
|
||
id:: 643ba3b2-5a2a-4589-a871-62ad213de195
|
||
- levity 轻率的举止;轻浮
|
||
hl-page:: 633
|
||
ls-type:: annotation
|
||
id:: 643bfdfa-6681-4fcc-b7c1-b84887afeecd
|
||
hl-color:: green
|
||
- sarcastic 讥讽的, 讽刺的,
|
||
hl-page:: 634
|
||
ls-type:: annotation
|
||
id:: 643bfe9d-913f-4ab7-aba3-a3fac83d1dfb
|
||
hl-color:: green
|
||
- scribble 草草记下,匆匆书写;胡写乱画;潦草的文字
|
||
hl-page:: 634
|
||
ls-type:: annotation
|
||
id:: 643bfeb8-34d1-428d-82d5-0bfefb871d4e
|
||
hl-color:: green |