BFQ (Budget Fair Queueing) is a Proportional Share, or equivalently Fair Queueing, disk scheduler that allows each process/thread to be assigned a fraction of the disk bandwidth.
In BFQ, as in CFQ, synchronous requests are collected in per-task queues, and asynchronous requests are collected in per-device (or, in the case of hierarchical scheduling, per group) queues. When the underlying device driver asks for the next request to serve and there is no queue being served, BFQ uses B-WF2Q+, a modified version of WF2Q+, to choose a queue, then selects the first request from that queue in C-LOOK order and returns it to the driver. When a new queue is selected it is assigned a budget, in disk sector units, decremented each time a request from the same queue is served. When the device driver asks for new requests and there is a queue under service, they are chosen from that queue until one of the following conditions is met:
- the queue exhausts its budget, or
- the queue is spending too much time (see the timeout_* parameters below) to consume its budget, or
- the queue has no more requests to serve.
As in CFQ, if a sync queue has no more requests to serve, but it has some budget left, the scheduler idles (i.e., it tells to the device driver that it has no requests to serve even if there are other active queues) for a short period, waiting for a new request from the task owning the queue.
When hierarchical scheduling is enabled, queues are collected in a tree of groups, and there is a distinct B-WFQ2+ scheduler on each non-leaf node. Leaf nodes are request queues as in the non-hierarchical case.
BFQ supports ioprio classes at each hierarchy level, enforcing a strict priority ordering among classes. This means that idle queues/groups are served only if there are no best effort queues/group in the same scheduler, and best effort queues/groups are served only if there are no real-time queues/groups.
The parameters available to the user for configuring BFQ, exported through the canonical I/O scheduler sysfs interface are:
- timeout_sync, timeout_async: the maximum amount of disk
time that can be given to a task once it has been selected
for service, respectively for synchronous and asyncronous
queues. It allows the user to specify a maximum slice
length to put an upper bound to the latencies imposed by the
NOTE: the timeout itself is not the upper bound to the maximum latency, that depends also on the queue length, on the load of the system and on the aggregate throughput. The paper on Paolo's page will soon be updated to consider this parameter in the formal description of the scheduler's guarantees.
- max_budget: the maximum amount of service, measured in disk sectors, that can be provided to a queue once it is selected. Bigger values increase the throughput for the single tasks and for the system, at the price of increasing the maximum latency a request can incur in, too. The default value is 0, which means that BFQ will try to estimate it by itself, assuming that timeout_sync is an upper bound for the maximum slice length acceptable for the system.
- max_budget_async_rq: in addition to the max_budget, limit, async queues are served for a maximum number of requests, after that a new queue is selected.
All the other parameters have the same meaning they have in CFQ.
BFQ supports full hierarchical scheduling; internally it controls all the devices independently from each other, but to simplify the user interface, by now, it exports only a single attribute set per-group.
Each group has the same I/O scheduling parameters that also tasks have:
- bfqio.ioprio_class: the scheduling class the group belongs to.
- bfqio.ioprio: the priority (linearly mapped into a weight) of the group inside its parent.
Last updated April 1st, 2008