Disk scheduling is a consolidated topic in computer science. Some of the well-known results reached by the research community have not made it into FreeBSD, for more than a single reason. A previous Summer of Code project introduced a modular framework for pluggable schedulers and a novel fair queueing algorithm; due to architectural limitations anyway it didn't explore the feasibility of anticipatory scheduling, a technique that can increase significantly the throughput of synchronous sequential workloads on rotational media.
With Luigi Rizzo, we have developed a prototype framework to introduce pluggable disk schedulers in the Geom layer. It basically consists in a Geom class that queues up requests going to the provider it is attached to, and releases them according to the algorithm, which is implemented in an external module.
We also implemented a couple of algorithms, gsched_as which is a simple anticipatory scheduler with no per-client info, and gsched_rr which does anticipation and round robin among per-client queues.
How to Try the Code
The full sources of the scheduling frameworks and for the implemented algorithms can be downloaded here.
In order to use it the following steps are necessary:
- extract the archive contents;
- compile the source code, using make from the
geom_sched directory:$ cd geom_sched $ make
- install the compiled modules/library (needs root privileges),
with# make install
- create a GEOM that uses the scheduler:# geom sched create -a rr ad1The command above creates the ad1.sched. provider, and attaches a rr scheduler to it. It uses /dev/ad1 as a provider.
- changing the scheduler on an existing geom_sched provider:# geom sched configure -a as ad1.sched.
- destroying an existing provider:# geom sched destroy ad1.sched.
- inserting the scheduler in a live geom mesh:# geom sched insert -a bfq ad1
Along with the framework and the generic code, in the tarball you'll find three schedulers, an Anticipatory one, and a Round-Robin one and a BFQ port.
The Anticipatory (as) is just a demonstration of how idling can be implemented. It is a pure throughput booster for highly sequential synchronous loads, but it misses all the complexity to handle reasonably mixed workloads. It can be used as a reference for its high throughput when working in ideal conditions.
The Round-Robin one (rr) is quite more complex. It assures some insulation between different processes accessing concurrently the disk, and it implements idling too, to achieve high througputs with synchronous workloads. Of course the higher level of fairness it can provide has a cost, and it is a certain degree of throughput loss on mixed workloads, that may be noticeable.
To have more information on BFQ see the original pages.
Last updated August 16th, 2009