OSTEP Chapter 43

2023-04-16 17:50:10 +08:00 · 2023-04-16 17:50:10 +08:00 · 539ca5c3d1
commit 539ca5c3d1
parent 1b2c362cdd
9 changed files with 1385 additions and 7 deletions
--- a/assets/ostep_1681115599584_0.edn
+++ b/assets/ostep_1681115599584_0.edn
--- a/logseq/bak/logseq/config/2023-04-16T06_01_41.642Z.Desktop.edn
+++ b/logseq/bak/logseq/config/2023-04-16T06_01_41.642Z.Desktop.edn
--- a/logseq/bak/logseq/config/2023-04-16T06_01_48.493Z.Desktop.edn
+++ b/logseq/bak/logseq/config/2023-04-16T06_01_48.493Z.Desktop.edn
@ -343,6 +343,7 @@
 ;;     ;use Percent-encoding for other invalid characters
 :file/name-format :triple-lowbar
 :ui/show-brackets? true
+ :feature/enable-timetracking? false

 ;; specify the format of the filename for journal files
 ;; :journal/file-name-format "yyyy_MM_dd"
--- a/logseq/bak/logseq/config/2023-04-16T06_35_35.148Z.Desktop.edn
+++ b/logseq/bak/logseq/config/2023-04-16T06_35_35.148Z.Desktop.edn
@ -343,6 +343,7 @@
 ;;     ;use Percent-encoding for other invalid characters
 :file/name-format :triple-lowbar
 :ui/show-brackets? false
+ :feature/enable-timetracking? false

 ;; specify the format of the filename for journal files
 ;; :journal/file-name-format "yyyy_MM_dd"
--- a/logseq/bak/logseq/config/2023-04-16T07_03_21.987Z.Desktop.edn
+++ b/logseq/bak/logseq/config/2023-04-16T07_03_21.987Z.Desktop.edn
@ -343,6 +343,7 @@
 ;;     ;use Percent-encoding for other invalid characters
 :file/name-format :triple-lowbar
 :ui/show-brackets? true
+ :feature/enable-timetracking? false

 ;; specify the format of the filename for journal files
 ;; :journal/file-name-format "yyyy_MM_dd"
--- a/logseq/bak/logseq/config/2023-04-16T08_11_54.939Z.Desktop.edn
+++ b/logseq/bak/logseq/config/2023-04-16T08_11_54.939Z.Desktop.edn
@ -342,7 +342,8 @@
 ;;     ;use triple underscore `___` for slash `/` in page title
 ;;     ;use Percent-encoding for other invalid characters
 :file/name-format :triple-lowbar
- :ui/show-brackets? true
+ :ui/show-brackets? false
+ :feature/enable-timetracking? false

 ;; specify the format of the filename for journal files
 ;; :journal/file-name-format "yyyy_MM_dd"
--- a/logseq/config.edn
+++ b/logseq/config.edn
@ -342,7 +342,7 @@
 ;;     ;use triple underscore `___` for slash `/` in page title
 ;;     ;use Percent-encoding for other invalid characters
 :file/name-format :triple-lowbar
- :ui/show-brackets? false
+ :ui/show-brackets? true
 :feature/enable-timetracking? false

 ;; specify the format of the filename for journal files
--- a/logseq/custom.css
+++ b/logseq/custom.css
@ -0,0 +1,7 @@
+:root {
+  --ls-font-family: Noto Sans CJK SC, Helvetica Neue, sans-serif;
+}
+
+* {
+  font-variant-ligatures: none !important;
+}
--- a/pages/hls__ostep_1681115599584_0.md
+++ b/pages/hls__ostep_1681115599584_0.md
@ -908,7 +908,6 @@ file-path:: ../assets/ostep_1681115599584_0.pdf
 		  id:: 6437a50f-5c6c-47ff-9179-ac48118342d7
 		  hl-color:: yellow
 	- **IO time**
-	  collapsed:: true
 		- **Rotational Delay**: wait for the desired sector to rotate under the disk head
 		  hl-page:: 466
 		  ls-type:: annotation
@ -1049,6 +1048,7 @@ file-path:: ../assets/ostep_1681115599584_0.pdf
 	  ls-type:: annotation
 	  id:: 6437f261-2d97-4f0c-85aa-06dd6d230ce0
 	  hl-color:: yellow
+	  collapsed:: true
 		- spread the blocks of the array across the disks in a round-robin fashion
 		  ls-type:: annotation
 		  hl-page:: 483
@ -1129,6 +1129,7 @@ file-path:: ../assets/ostep_1681115599584_0.pdf
 	  ls-type:: annotation
 	  id:: 6438241b-f487-4cf3-b717-60811340a5bd
 	  hl-color:: yellow
+	  collapsed:: true
 		- Improved version of *RAID4*, RAID5 rotate the parity block across drives.
 		- ((64382b1e-7c59-4729-a70f-68005b0640b4))
 		- **Performance**
@ -1504,7 +1505,7 @@ file-path:: ../assets/ostep_1681115599584_0.pdf
 			  ls-type:: annotation
 			  id:: 643ac9e7-1479-4be9-81fe-acb750f363b4
 			  hl-color:: yellow
-			- Potential Performance: large sequential read from a large file. However, with selected chunk size (threshold of going to another group), ==cost of seek between groups can be amortized==. The larger size of a chunk, the higher average bandwidth you will reach.
+			- Potential Performance Problem: large sequential read from a large file. However, with selected chunk size (threshold of going to another group), ==cost of seek between groups can be amortized==. The larger size of a chunk, the higher average bandwidth you will reach.
 	- Measuring File Locality
 	  hl-page:: 550
 	  ls-type:: annotation
@ -1549,3 +1550,291 @@ file-path:: ../assets/ostep_1681115599584_0.pdf
  ls-type:: annotation
  id:: 643acfc2-eef7-4c7f-a8a5-1740c8788159
  hl-color:: yellow
+  collapsed:: true
+	- Crash Scenarios
+	  ls-type:: annotation
+	  hl-page:: 560
+	  hl-color:: yellow
+	  id:: 643b7d72-4da5-42cf-abd5-dbbc3d9332d7
+	  collapsed:: true
+		- Consider a ==write operation with new data block allocation== in the `vsfs` introduced above, which involves 3 independent write to the disk
+		- Only one operation is done
+			- data block: no a problem for FS, as if the write never happened, though user data get lost
+			- inode: FS inconsistency, bitmap says it is not allocated while inode says it is, read garbage from the block
+			- data bitmap: FS inconsistency, space leak, the block won't be utilized forever
+		- Two operations are done
+			- inode and bitmap: read garbage, though FS consistent
+			- data block and inode/bitmap: inconsistent
+	- The File System Checker
+	  ls-type:: annotation
+	  hl-page:: 562
+	  hl-color:: yellow
+	  id:: 643b8161-a7b8-496d-9121-4ce20ee8deb6
+	  collapsed:: true
+		- Let inconsistencies happen and then fix them later when rebooting. This approach cannot solve all problems (like data loss), the only goal is to make the FS metadata consistent internally. Run before the FS is mounted
+		  hl-page:: 562
+		  ls-type:: annotation
+		  id:: 643b81ae-edfd-4a9e-9df2-fdba2175dde2
+		  hl-color:: yellow
+		- Basic summary of what `fsck` does
+			- Superblock: if corrupt, use an ==alternative copy==
+			- Free blocks: scan inodes, (double/triple...) indirect blocks to collect ==information about allocated blocks== and use this information to ==correct the bitmap==.
+			- Inode links: traverse the whole directory tree and calculate ==reference count for each inode==. Verify this for each inode. Ff inode allocated without any directory referring to it, move to `lost+found`
+			- Duplicates: multiple inode pointers point to the same block. Copy the block or clear inode
+			- Bad blocks, Inode state, Directory checks, etc.
+		- Problem: too slow
+	- Journaling (or Write-Ahead Logging)
+	  ls-type:: annotation
+	  hl-page:: 564
+	  hl-color:: yellow
+	  id:: 643b86d9-afcc-4494-87d1-275502df79a7
+		- Basic Idea: Before writing the structures in place, first write a log elsewhere on the disk. If crash takes place during the actual update, FS can fix inconsistency according to the log.
+		- **Data Journaling**
+		  ls-type:: annotation
+		  hl-page:: 565
+		  hl-color:: yellow
+		  id:: 643b8dc6-e380-49e3-ab08-ce27aa8767e2
+			- **physical logging**: put the exact physical contents of the update in the journal
+			  hl-page:: 565
+			  ls-type:: annotation
+			  id:: 643b8ebd-b694-4862-90f0-9fb0f1a847f5
+			  hl-color:: yellow
+			- **checkpointing**: overwrite the old structures in the FS
+			  hl-page:: 565
+			  ls-type:: annotation
+			  id:: 643b8eed-3b37-4ac0-b924-9a2d035f2517
+			  hl-color:: yellow
+			- **transaction identifier**: transaction begin including information about the pending update, and transaction end marker
+			  hl-page:: 565
+			  ls-type:: annotation
+			  id:: 643b8f7d-76a2-477e-89ea-1321475b3dbe
+			  hl-color:: yellow
+			- Journal write: Write the transaction (*Tx Begin* mark, data to update, *Tx End* mark) to log
+				- To make things faster, instead of issuing serial write requests, we may ==merge these requests.==
+				  id:: 643b9070-8e4c-422a-8173-388fb801930d
+				- To avoid possible data loss during a single issue (due to internal disk scheduling), the *Tx End* mark must be written with ==a separate request==, while other part of the log can be issued as a package.
+				- Well, add a checksum is also a solution. With checksum, you can write all these stuff in a single request. If disk failed to propagate all of the bits to disk, this failure will be notice during the reboot scan and the log will be skipped.
+				  hl-page:: 567
+				  ls-type:: annotation
+				  id:: 643b9816-e478-4f45-a5bd-fbe168fdc406
+				  hl-color:: yellow
+				- Thus, this step can be split into 2 stages: ==Journal Write and Journal Commit==, which respectively means write Tx Begin mark and pending update and write Tx End mark.
+			- To re-use the log region, add a journal superblock on the disk for information about transaction checkpoint completion (free checkpointed ones). Perhaps a circular log.
+			- Protocol
+				- hl-page:: 570
+				  ls-type:: annotation
+				  id:: 643b9e00-4597-4a1a-890a-be95041f6b3b
+				  hl-color:: yellow
+				  1. **Journal write**: Write the contents of the transaction (Tx Begin, contents of the update) to the log; wait for these writes to complete.
+				  2. **Journal commit**: Write the transaction commit block (Tx End) to the log; wait for the write to complete; the transaction is now committed.
+				  3. **Checkpoint**: Write the contents of the update to their final locations within the file system.
+				  4. **Free**: Some time later, mark the transaction free in the journal by updating the journal superblock.
+		- Recovery
+		  ls-type:: annotation
+		  hl-page:: 568
+		  hl-color:: yellow
+		  id:: 643b9301-4a07-459a-a413-5c2738560e10
+			- Crash before transaction commit, skip.
+			- Crash after transaction commit (but before checkpointing complete), replay.
+				- Redo Logging: On reboot, scan the log for committed transactions and try to write them again.
+		- **Metadata Journaling**
+		  ls-type:: annotation
+		  hl-page:: 570
+		  hl-color:: yellow
+		  id:: 643b96ec-18c9-4053-be7e-3b7d3b7dbbbd
+			- Data journaling doubles the traffic to disk, and seek between log area and main data area is costly.
+			- Metadata journaling writes metadata to log without data block. Data block is written directly to main data area before metadata is logged.
+			- Protocol
+				- hl-page:: 571
+				  ls-type:: annotation
+				  id:: 643b9b7c-a43e-4c50-9dec-1d8f30bae712
+				  hl-color:: yellow
+				  1. **Data write**: Write data to final location; wait for completion (optional).
+				  2. **Journal metadata write**: Write the begin block and metadata to log; wait for writes to complete.
+				  3. **Journal commit**: Write the transaction commit block (Tx End) to log; wait for the write to complete; the transaction (including data) is now committed.
+				  4. **Checkpoint metadata**: Write the contents of the metadata update to their final locations in FS.
+				  5. **Free**: Later, mark the transaction free in journal superblock.
+				- Actually, step 1 and step 2 can be issued concurrently, but Step 3 must wait for Step 1 and 2.
+			- Tricky Case: Block Reuse
+			  ls-type:: annotation
+			  hl-page:: 572
+			  hl-color:: yellow
+			  id:: 643b9c5a-c06b-4a4b-af75-9a2242069fc8
+				- Replay can cause data block to be overwritten when the block is re-used after deletion and the log is not freed in time.
+				- Well, the key point actually lies in that, directory information is considered as metadata. If the original block is a directory, the following operation sequence will cause problem: modify the directory entries, delete the directory, re-used the directory's block for a file. The recovery process will overwritten the file's data block with the old, deleted directory data.
+	- Other Approaches
+	  ls-type:: annotation
+	  hl-page:: 574
+	  hl-color:: yellow
+	  id:: 643ba14d-00f4-4f92-921c-740f3b6def61
+		- Soft updates: carefully order the writes to ensure on-disk structure is consistent at any time
+		- COW: never overwrite in place
+		- back-pointer: add backward pointer to inode to check consistency
+		- optimistic crash consistency: kind of transaction checksum
+- premise 
+  ls-type:: annotation
+  hl-page:: 563
+  hl-color:: green
+  id:: 643b824d-2732-46ae-961d-74a06db18138
+- tad 
+  ls-type:: annotation
+  hl-page:: 563
+  hl-color:: green
+  id:: 643b824f-c588-4b82-93e6-393016d3b5b1
+- hideous 
+  ls-type:: annotation
+  hl-page:: 572
+  hl-color:: green
+  id:: 643b9c40-6329-4720-9d26-75a78701392c
+- hairy
+  ls-type:: annotation
+  hl-page:: 572
+  hl-color:: green
+  id:: 643b9c49-9e7c-4379-a640-8aff152ea511
+- ## Log-structured File Systems
+  hl-page:: 579
+  ls-type:: annotation
+  id:: 643b8dad-3813-4048-8d04-5eb93a6bd182
+  hl-color:: yellow
+	- **Writing To Disk Sequentially**
+	  hl-page:: 580
+	  ls-type:: annotation
+	  id:: 643bab9f-04f5-4e2f-b232-d6cfead45619
+	  hl-color:: yellow
+		- write all updates (including metadata) to the disk sequentially, e.g. write a new data block, and then write its newly updated inode sequentially after it (rather than seek to the inode region far away)
+	- **Write Buffering**
+	  hl-page:: 581
+	  ls-type:: annotation
+	  id:: 643bac5e-ce70-4d9b-92c5-4fb6dda099d6
+	  hl-color:: yellow
+		- Writing sequentially alone doesn't mean good performance. A ==large number of contiguous writes or one large write== is the key to good write performance.
+		- Before writing to the disk, LFS ==keeps track of updates in memory==; when it has received a sufficient number of updates, (a *segment*) it writes them to disk all at once.
+		  hl-page:: 581
+		  ls-type:: annotation
+		  id:: 643bac81-cd5a-49aa-a81b-aff6c2405a40
+		  hl-color:: yellow
+		- Segment size: similar to evaluation here ((6437feab-eceb-4f11-9ced-ae43e2798c0c)). The larger chunk size, the better performance.
+		  hl-page:: 582
+		  ls-type:: annotation
+		  id:: 643bb0bc-9781-484e-b249-224d89414165
+		  hl-color:: yellow
+			- The effective rate of writing $R_{\text{effective}}$ and chunk size $D$:
+			  $$R_{\text{effective}} = \frac{D}{T_{\text{write}}} = \frac{D}{T_{\text{position}}+\frac{D}{R_{\text{peak}}} } \\ D = \frac{F}{1-F}\times R_{\text{peak}} \times T_{\text{position}}$$
+	- **The Inode Map**, Finding inodes
+	  hl-page:: 583
+	  ls-type:: annotation
+	  id:: 643bb0f6-b84c-469e-8188-0db6e86f36e8
+	  hl-color:: yellow
+		- The i-map is a structure that maps inode-number to the disk address of the most recent version of the inode
+		  hl-page:: 583
+		  ls-type:: annotation
+		  id:: 643bb162-9f99-4740-8fd6-859f236c1855
+		  hl-color:: yellow
+		- LFS places chunks of the ==inode map right next to the other new information==. For example, when appending a data block to a file, LFS actually writes the new data block, its inode, and a piece of the inode map all together.
+	- **The Checkpoint Region**
+	  hl-page:: 585
+	  ls-type:: annotation
+	  id:: 643bb250-205a-4ae9-8cff-0d715cfa6b7d
+	  hl-color:: yellow
+		- Contains pointers to the latest pieces of the inode map. Note the checkpoint region is only updated periodically, without reduce performance too much.
+	- The look up process
+		- First look up CR for i-map (often cached in memory), then consult i-map for the directory's inode, then get file inode number from directory, finally consult i-map again for file's inode
+		- recursive update problem: Whenever an inode is updated, its location on disk changes. This would have also entailed an update to the directory that points to this file (change the pointer field, thus the directory needs to be written to a new location), which then would have mandated a change to the parent of that directory, and so on, all the way up the file system tree. 
+		  hl-page:: 586
+		  ls-type:: annotation
+		  id:: 643bb4de-bc1f-4f61-a5dd-036867e85fe7
+		  hl-color:: yellow
+			- This won't be a problem for LFS. LFS maps inode number to address and directories store inode numbers rather than addresses, so even the inode moves to a new location there is no need to change the directory.
+	- Garbage Collection
+	  ls-type:: annotation
+	  hl-page:: 587
+	  hl-color:: yellow
+	  id:: 643bb6cb-61ae-4231-aaf4-d78f1b1a7851
+		- LFS leaves old versions of file structures scattered throughout the disk, though only the latest version is needed. Therefore, LFS has to periodically ==clean these old versions== of data and metadata.
+		- LFS cleaner works on a ==segment-by-segment basis==. Read in a number of old segments, collect live blocks, write them out to a new set of segments and finally free the old segments.
+		  hl-page:: 588
+		  ls-type:: annotation
+		  id:: 643bb91c-62a0-4154-8d12-c9ae356a4fc7
+		  hl-color:: yellow
+		- Determining Block Liveness
+		  ls-type:: annotation
+		  hl-page:: 588
+		  hl-color:: yellow
+		  id:: 643bb7e4-6d90-4d59-8930-29a243862288
+			- segment summary block: inode number and in-file offset of each data block
+			  hl-page:: 588
+			  ls-type:: annotation
+			  id:: 643bba1e-4f7b-4291-bdd8-966dd366748c
+			  hl-color:: yellow
+			- Pseudocode depiction
+				- ```python
+				  # A -> block address
+				  # N -> inode number
+				  # T -> offset in file
+				  (N,T) = SegmentSummary[A]
+				  inode = Read(imap[N])
+				  if (inode[T] == A):
+				    return live
+				  else:
+				    return dead
+				  ```
+			- **version number**: in some cases (e.g., file deleted), LFS records file's version number in imap and summary block, and compares them during GC to speed up the check
+			  hl-page:: 589
+			  ls-type:: annotation
+			  id:: 643bbc21-a025-4dd4-bdc2-4a1eb68abf5e
+			  hl-color:: yellow
+	- Crash Recovery
+	  ls-type:: annotation
+	  hl-page:: 590
+	  hl-color:: yellow
+	  id:: 643bbd43-a342-4445-a808-b9800790a83c
+		- General write scheme
+			- LFS organizes writes in a log, i.e. the CR points to a head and tail segment, and each segment points to the next segment to write. CR is propagated to disk periodically.
+			- To make it clear, there is no separate "log" space on the disk similar to what journaling FSs do. The segments written to the disk are logs by themselves. See [Page 30, Figure 4-1, R92](https://www2.eecs.berkeley.edu/Pubs/TechRpts/1992/CSD-92-696.pdf)
+		- Checkpoint Region
+			- LFS keeps 2 CRs (at both ends of the disk) and write alternately. On writing, LFS first writes header (with timestamp), then body, finally a last block (with timestamp). In this way, crashes can be detected through inconsistent timestamps, and LFS can choose the latest CR to use.
+		- Roll Forward
+		  hl-page:: 590
+		  ls-type:: annotation
+		  id:: 643bc080-2693-4a74-b261-56f92e3c75e4
+		  hl-color:: yellow
+			- The basic idea is to start with the last checkpoint region, find the end of the log (included in the CR), and then use that to read through the next segments and see if there are any valid updates.
+			  hl-page:: 590
+			  ls-type:: annotation
+			  id:: 643bc10e-be20-47b3-bab4-713493dd5153
+			  hl-color:: yellow
+- ## Flash-based SSDs
+  ls-type:: annotation
+  hl-page:: 595
+  hl-color:: yellow
+  id:: 643ba369-83df-42f9-9ee9-b45d4652e8fb
+- ## Data Integrity and Protection
+  ls-type:: annotation
+  hl-page:: 619
+  hl-color:: yellow
+  id:: 643ba392-acd9-4255-930e-a97f94fb28ef
+- spouse 
+  ls-type:: annotation
+  hl-page:: 633
+  hl-color:: green
+  id:: 643ba3b2-5a2a-4589-a871-62ad213de195
+- mandate
+  ls-type:: annotation
+  hl-page:: 586
+  hl-color:: green
+  id:: 643bb439-b7d0-4170-9417-cd900062bfbd
+- entail
+  ls-type:: annotation
+  hl-page:: 586
+  hl-color:: green
+  id:: 643bb533-1fee-4b9a-9965-7a63016d5591
+- ceremonious
+  ls-type:: annotation
+  hl-page:: 587
+  hl-color:: green
+  id:: 643bb6e8-6466-4574-aa04-4ea25b3e9034
+- cease
+  ls-type:: annotation
+  hl-page:: 595
+  hl-color:: green
+  id:: 643bc4b4-dc22-471c-9229-558a42904cc8