Linux IO Model
cpu 基本概念
CPU 执行的流程
A CPU core is a thread execution unit – if it has no runnable thread, it
physically
cannot do work.
flowchart LR A[CPU Core] -->|Fetches| B[Instruction Pointer] B -->|Executes| C[Current Thread's Code] C -->|Needs data| D[Registers/Memory] D -->|If data not ready| E[Stall Pipeline] E -->|If stall too long| F[Switch to another thread] F -->|If no threads ready| G[HALT Instruction] G --> H[CPU IDLE]
现代 CPU idle(空闲) 会怎样?
- modern CPU 在空闲时会进入低功耗(低电压)状态,称为 C-state。
- C-state 分为多个级别,C0 是 CPU 正在工作,C1、C2 等是不同的低功耗状态。
- CPU 会根据负载情况自动调整 C-state。
等待发生在何处?
The Brutal Truth: Waiting Never Disappears
Physics law: I/O operations (disk/network) are orders of magnitude slower than CPU.
- Disk seek: ~10ms (10,000,000 CPU cycles)
- Network roundtrip: ~100ms (100,000,000 CPU cycles)
No technology can eliminate this latency—it’s physics. Async I/O doesn’t remove waiting; it relocates who bears the cost of waiting.
Where Waiting Happens: The 3 Layers
1. Hardware Layer (Unavoidable Waiting)
- Disk controller waits for platter rotation → kernel cannot bypass this
- Network card waits for packets → kernel cannot bypass this
- This waiting is PHYSICAL and happens regardless of software model.
2. Kernel Layer (The “Delegation” Illusion)
- What the kernel actually does:
// Simplified Linux kernel pseudo-code for async I/O void handle_async_read(request *req) { if (data_ready_in_hardware) { // Hardware completed wait copy_data_to_user(req->buffer); trigger_user_notification(req->callback); // "Wake up" user-space } else { queue_request_for_later(req); // Defer until hardware finishes } }
- The kernel is waiting—but not in your thread’s context.
- It uses kernel threads (for
aio_*
) or completion rings (io_uring
) to track hardware progress. - Your thread isn’t blocked, but the kernel is managing the wait elsewhere.
- It uses kernel threads (for
3. User-Space Layer (The Critical Nuance)
- Your thread does “wait”—but in a non-wasteful way:
// Java AsynchronousFileChannel example channel.read(buffer, 0, null, new CompletionHandler<>() { public void completed(Integer bytes, Object att) { // THIS is where you "wait" - but only AFTER data arrives System.out.println("Data processed: " + bytes); } }); System.out.println("Thread continues immediately");
- Before data arrives: Your thread executes other code → no CPU wasted.
- When data arrives: Your thread must run the callback → this is “waiting” in disguise.
用户空间获取数据前提是什么?
- 数据准备阶段
- 内核准备数据,这就引出了问题,内核还没准备好数据,等(进程进入 sleep 状态)还是不等?
- 比如内核
sock
的 sk_receive_queue 没有数据
- 比如内核
- 内核没准备好数据,进程进入 sleep 状态 ->
阻塞 IO
- 内核没准备好数据,进程不进入 sleep 状态 ->
非阻塞 IO
- 内核准备数据,这就引出了问题,内核还没准备好数据,等(进程进入 sleep 状态)还是不等?
- 数据拷贝阶段
- 数据从内核空间复制到用户空间,这就引出了问题,谁来负责复制?
- 比如内核
sock
的 sk_receive_queue 的数据由谁来复制到用户空间
- 比如内核
- 由
用户线程
的内核态来负责 ->同步 IO
- 由
内核线程
负责 ->异步 IO
- 数据从内核空间复制到用户空间,这就引出了问题,谁来负责复制?
- 根据以上两大阶段理解 block/non-block sync/async
I/O 模型
IO 的同步、异步、阻塞、非阻塞取决于相关 syscall 的实现。
Blocking I/O
The process is blocked until the I/O operation completes. 阻塞
- 用户调用阻塞的
read
后,没有数据进程就放弃 CPU,进入阻塞状态(sleep)
Non-blocking I/O
The system call returns
immediately
, either with the data or with an error indicating that the data is not ready. The process can thenperiodically check
(poll) for readiness. 非阻塞
- 用户自己去调用非阻塞的
read
轮询
I/O Multiplexing
(e.g., select, poll, epoll): The process can
block
on multiple I/O sources atonce
, and is woken up whenany
of them are ready. This allows one thread to manage multiple I/O operations.
同步非阻塞, java 的 NIO 就是这个模型。有一个线程阻塞在
epoll_wait()
Signal-Driven I/O
The kernel notifies the process via a signal (e.g., SIGIO) when the I/O operation is ready to proceed. 信号驱动 SIGIO 只有一种状态
kill -l | grep SIGIO
1) SIGVTALRM 27) SIGPROF 28) SIGWINCH 29) SIGIO 30) SIGPWR
Asynchronous I/O
The I/O operation is initiated, and the process continues. The kernel sends a signal when the entire I/O operation is completed.
Async I/O =
Decoupling
request submission from result processing, where the duration of thread occupancy is minimized to the time required to handle the result—not the time required for the physical I/O operation. 发起请求, 处理结果
异步 io_uring