Live Blog Day 1: morning

Live Blog Day 1: morning

by | Sep 22, 2025 | 0 comments

Welcome to the 2025 edition of Kernel Recipes. In this live blog, we’ll follow each talk live – please come back regularly.

Introduction

Anne is presenting this 12th edition of Kernel Recipes; the agenda has two social events. Frank was not available to do his usual drawings this year; Emma has done new mascots for this year, on a “French” theme. Jean-Christophe and Erwan are doing video and audio. Benji and Willy helped welcome the attendees. Yours truly is doing this live blog. Anne did most of the organization of this conference. The Godfather for this edition is Paul McKenney, who will also do the introductory talk.

Anne

Attendees should be careful of the flying Mic; this year’s charity auction will support Les Restos du Cœur, well-known in France. To conclude, Anne thanked every sponsor by name.

Breaking Up Is Hard To Do — Paul E McKenney

L’épreuve de la Séparation, ou: Breaking up is hard to do is the title of this presentation. It’s a follow-up to Paul’s two previous Kernel Recipes presentations (2023 and 2024) on synchronization.

With a jump in time, Paul presents an 1989 dual-8086 board with many chipsets: memory, cache, floating point units. Those computers were very expensive back then, and only a little number of persons had access to it, so there was almost no hobbyist. Back then, there was no way to learn concurrent programming other than “apprenticeship” with other experienced people; there was almost no publicly available concurrent software, and the little books that existed only catered to experienced programmers, not beginners.

When writing software, a rule of thumb is that the more generally applicable software has the most users. Paul talks about the “Iron triangle” of Productivity, Performance and Generality: pick two. But Generality is a Must-Have, Paul says. If you also want Performance, then you have to sacrifice Productivity. Aiming this at Kernel Developers complaining about RCU complexity, he says that’s because they made a choice.

But excessive generality is also a shot in the foot. Following YAGNI (You Ain’t Gonna Need It) helps. In the context of RCU, Paul chose Generality and Productivity (deadlock immunity), and this limits applicability.

One way to “solve” this problem is to make concurrency go away through multiple strategies: run multiple instances of a sequential application, let another parallel software do the heavy lifting (in a black box), or just optimize a sequential application. Paul talked about computing the digits of PI as an example where parallel algorithms exist, but are less efficient than sequential ones. But even faster is to just compute PI once, and cache it in your program.

Paul McKenney

Parallel-Programming has a few pitfalls. One is to start by choosing the Parallel Access Control (synchronization mechanism: Mutex, RCU, etc.); it leads to very bad designs with high overhead. Instead, one should start by partitioning the data and choosing replication: it helps to follow the laws of hardware & physics, Paul says.

Paul asks the audience how to improve an example Hash-Table API to make it concurrent. One aspect is to remove exact counting on hash_add, Paul says, since exact counts are introducing a bottleneck. It means that concurrency forces a tradeoff between performance and generality.

“Concurrency doesn’t play fair” Paul says: it will use your own perceived intelligence against you: you’ll dig deeper hole for yourself before realizing you are in trouble.

For the first experiment, Paul distributed a set of Lego bricks:two  green Duplos, and 16 blue 2x4x1 bricks. Both have the volume. Assembling the two green “big” bricks is an order of magnitude faster than assembling the same shape with the smaller bricks; and that gives an intuition of concurrency vs sequential solutions.

Paul then presented the “Dining philosophers” concurrency problem, but with chopsticks instead of forks: There are 5 philosophers and 5 chopsticks, and there are textbook solutions to avoid starvation (deadlocks); but Paul says the simpler solution is to just give each philosopher a second chopstick: those are cheap enough.

As takeways, Paul says not to always reach for parallel programming first, but to try to optimize a sequential solution first. And to try to let other people or programs handle the parallel programming. And finally, always use the right tool for the job.

 

 

Kworkflow: Mix & Match Kernel Recipes End-to-end — Melissa Wen

Melissa used to co-maintain vkms and v3d, and nowadays works on AMD graphics drivers, like the one on Steam Deck. Kworkflow is a community-maintained tool used for kernel development. It contains multiple recipes for traditional kernel development.

Melissa Wen

In a first recipe, Melissa shows developing a GPU driver. Using kworkflow’s kw cli, which has multiple subcommands, Melissa shows how to manage hardware, config files. These are the ingredients. Then the ingredients are “mixed” by building the kernel with kw build; then baked with kw deploy, which can also reboot the machine with the proper argument, after the kernel is deployed.

Tasting is done with kw debug: it fetches the dmesg, and in the example, tries to unload the kernel module, failing. This might require searching for a solution online. This is done with kw patch-hub; to fetch patches from mailing lists, and apply them. Building and deploying at the same time can be done in a single command with kw bd(build and deploy).

In a second recipe, Melissa shows using a Rasperry Pi 4 with a mainline kernel. Changing ingredients is done with kw env; to switch kernel sources, toolchain, config file and target device. This can all be done with kw init --templatewhich has pre-made templates for existing workflows.

In a third recipe, a mainline kernel is run on a Steam Deck, in a live demo. In order to have working audio, Melissa applied a patch, then rebuilt and deployed the kernel, forcing the install (because of Steam Deck details).

To send patches, there is the command kw send, to send patches to a mailing list. It will automatically format-patch with git and get the maintainers before sending.

In the audience, Andrea asked about how the kernel is packaged before sending: Melissa says it’s not a distro package, but is sent as a tarball during deploy. To answer another question, Melissa says there is no dkms support yet. Melissa says that kworkflow’s main selling point is to have common scripts instead of every developer having their own separate productivity scripts; it can be done once globally for each sub-project: device, architecture, etc. The tool is designed for single-users, and does not yet allow sharing targets. It does not have automatic integration of external test suites like xfstests, but they can be run normally through kw debug. Support for virtme-ng is being contemplated, and the project in general is open to contributions.

So you want to write a driver in Rust? — Alice Ryhl

Alice is co-author of the android binder driver (Rust version) and the Tyr GPU driver (in Rust), both upstream; and the ashmem android driver (Rust version) downstream. In userspace, Alice is also a tokio maintainer, the “asynchronous standard library for Rust”.

Why Rust? In the introduction, Alice says that with Rust, one does not need to chose two between Performance, Reliability and Productivity. To add a second language to C in the kernel, Rust is the only one that is both as performant and mature enough, while having memory safety. But while Reliability includes Memory safety, it’s not the only thing: Logic Bugs for example should also be taken into account. Rust can help with those too, with proper design. In response to a remark from Joel Fernandes in the audience that one can have the same advantages in any language with proper abstraction, Alice says that it’s true, but in general only at runtime, not at compile time like Rust does.

With the concept of encapsulating “unsafe” code with “Safe” abstraction, Rust helps to remove bad usages of APIs, and to concentrate reviews on the “unsafe” parts. In general those represent only a tiny portion of driver. Alice illustrates this with “the Fallacy of Gray”.

Another good selling point of Rust is productivity: with the common “If it compiles, it works”. But there is some ramp-up time to learn. This ramp-up time is not just for authors, but reviewers and maintainers as well. Alice recommends meeting maintainers live at the next conference and going through the patchset with them.

The first step to build a driver, is to read the documentation in Documentation/rust/quick-start.rst, then install the dependencies and check if they are all there with make rustavailable. There are a few sample drivers to look at in samples/rust; Alice shows the MiscDevice sample driver first, one of the simplest drivers. There is a struct to describe the module, and another for the device registration. On the module struct, one implements a “trait” called kernel::Module, which means adding a function with pre-defined prototype that will be called at init.

On the device struct, the RustMiscDevice trait is implemented; this one has multiple functions to implement; first the open function, which takes a private pointer to store data: but in Rust it needs to be appropriately typed. The return value of the function allocates memory with KBox, which describes a kernel heap allocation; it is wrapped in a Result, a standard type for faillible functions; it can be combined with the ? try operator that automatically returns from the current function when a Result contains an error Err. Alice shows how the Result enum would relate to a C definition, and how it can be mandatory to handle errors for fallible functions in Rust.

Another example is how in Rust, a Mutex is a container of data. This means that it’s impossible to get to the data with holding the mutex’s lock, another field that does not need to be protected simply shouldn’t be put in the container. Alice shows a fuller example with mutex use: the implementation of the write_iter function; and how its destructor (Drop trait) implementation automatically frees the mutex, just like with C’s guard scopes that are now being used in the kernel.

Another MiscDevice function implemented was read_iter, that also shows reading from a locked buffer.

Alice shared a few learning resources: “the book”, Rustlings course, Rust by Example; in Linux, look at samples and real drivers, and “Talk to us!” Alice says. There are other resources like “Comprehensive Rust”, or the Embedded book.

It’s possible to write many kind of drivers already: phy, block, misc, drm, (pwm, hid); bus devices as well: pci, platform, aux, faux, (usb, i2c). And the “pure data manipulation” drivers are the simplest ones like the QR code driver. There are many other already-upstream Rust abstractions, and Alice showed a slide full of them.

What can go upstream? The ideal is to pick an entirely new driver, not a duplicate of an existing one. There are exceptions like reference drivers. When sending abstractions, one should also have a user for them, so in this case Reference drivers help, even if they duplicate functionality. There are multiple ways to introduce Rust in a subsystem: the ideal one is to become part of an existing subsystem; contributors can become maintainers or reviewers. Some subsystem don’t want to have Rust entirely, and “that’s okay”, Alice says, for example if no one needs to interact with the code.

An approach to experimentation and learning is to land Rust code upstream; but not letting the possibility of bugs holding you back (at the beginning), code should always build though.

In the audience, someone asked if it was as easy to call Rust code from C as C code from Rust: the answer is yes, and a good example for that is the DRM panic QR code driver. Another person inquired about how easy is Rust code to be modified when requirements change, since one generally encodes the constraints in the type system; Alice says it hasn’t been much of an issue in her experience. Paul McKenney asked if Field Projections are coming in order to have proper RCU bindings: “we are working on it” Alice says. Julia Lawall asked about the difference between Mutex and C’s scoped guards: Alice says that it’s more powerful in Rust, for example the lock can be explicitly dropped; and it’s possible to create shorter scopes with curly brackets. An attendee asked if it was possible to write filesystems in Rust: not upstream yet, but there are existing abstractions in downstream patches one could use in order not to start from zero.

That’s it for this morning! Continue reading the afternoon live blog!

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *