This is the live stream for this afternoon.

Checking your work: Linux kernel testing and CI – David Vernet

There are many ways to test the kernel: kselftests, KUnit, xfstests, etc.

kselftests are userspace programs, usually written in C, they are in tree at tools/testing/selftests. Some can be simple scripts, others have a kernel module counterpart. They output TAP format results.

KUnit is unit testing framework for testing individual Linux kernel functions. It’s compiled in the kernel, and configured at build-time. C source files usual live next to driver code. There’s a python script to run them in tools/testing/kunit

xfstests is a project focused on filesystems. And there are others that out of tree, like Linux Kernel Perfomance (lkp-tests), Phoronix Test suite (pts), Linux Test project(ltp).

For test run projects, there are a few options: KernelCI, Linux kernel test robot, patchwork, syzbot, etc.

KernelCI is now a Linux Foundation project. It builds the kernel across of variety of trees, branches, toolchains, configs and runs tests on lots of different architectures. All the results are shown on a web ui dashboard, with build and run logs available, etc.

David Vernet

Linux Kernel Performance is run by the 0-day team at Intel that also runs the kernel test robot. They also have a dashboard, which is in mailing list format. It test-builds patches before they are merged, which is quite useful.

Patchwork can be combined with some scripts and CI actions to have the build information. For example, the bpf project uses that, with Github action runners.

syszbot is the Google fuzzer; it has a dashboard; it focuses on fuzzing, so it doesn’t run developer-written tests.

btrfs tests runs xfstests for btrfs, and hosts a dashboard. It’s used to track performance of the project.

All those CI systems have the same goal: testing the kernel. Can they be combined ?

kselftests are nice, but need more work. They were initially intended as a dumping ground for small tests programs. The suites should be upgraded to advertise if a test is flaky or not.

A test system shouldn’t annoy maintainers. Its goal is to alleviate pressure on maintainers. So flaky tests should be fixed or removed, as they provide negative value.

As a bonus after Q&A, David showed how to fast and easy it is to write a kselftest. It’s integrated in the linux kernel build system.

The complete story of the in-kernel sloppy GPIO logic analyzer – Wolfram Sang

This is a story about hacking to scratch one’s itch, Wolfram says. It works in Linux with irqs + preemption disabled by polling GPIOs. It’s called “sloppy” for a reason.

For a common task, Wolfram would get a new board, and had to enable the IP cores on the SoC. For this one, he could not test his patches, because he did not have physical access to the board, which sat in Japan while he worked from Germany. Therefore, he could not submit untested work upstream.

He did not have multiple logic analyzers in the lab, or someone to operate them constantly remotely. But someone could setup wires once, and he had a lot of idle CPU cores.

He know that it was possible to build a software logic analyzer, but he wasn’t an expert in CPU isolation.

Wolfram Sang

Despite that, he tried to get his work upstream, but it isn’t merged yet.

SGLA (Sloppy GPIO Logic Analyzer) is the kernel part: it’s configured and accessed through debugfs. It locks the CPU while it runs to do its task. The userspace part is a script that wraps debugfs access, isolates the CPU, and converts the output to sigrok data.

The device-tree configuration was quite simple, it configured the pins on which the analysis should be done, and bound them properly.

The display was done in sigrok, an open source GUI software for logic analyzers.

In the initial test, CPU isolation did not work because SMP support was not done yet on the prototype. So he ran the code on the other CPUs without Linux SMP support anyway.

While doing the testing, he found that he did not need to have custom wires to do the analysis: he could directly read the GPIO value even if the pins were muxed to i²C on this hardware. But it needed support in the GPIO subsystem: that is supported as non-strict mode; and in the pinctrl driver for the hardware: luckily, this was a simple patch.

On the next hardware revision, the hardware no longer supported non-strict mode with GPIOs.

In the current incarnation, it works reasonably well, although it’s still not a logic analyzer. It was fun to create though.

Linux on RISC-V – Drew Fustini

RISC-V is the clean slate design RISC instruction set architecture coming out of Berkeley. It has a 32bits, 64bits as well as 128bits variants for future-proofing.

There are 32 general purpose registers, and many extensions for other features; the most common variant for Linux is RISCV64 GC, for general purpose.

Since the RISC-V extensions vary a lot, general “profiles” are in the work for pre-selecting a set of extensions: one for microcontrollers and one for application processors. There are multiple books on the architecture already.

The specifications are controlled by RISC-V International, a non-profit, with 2700+ members. Many vendors have already shipped many RISC-V cores, like NVIDIA in their GPUs, or Seagate in their disk drives.

There is no ISA licensing fee or royalties, avoiding some complexities. The ISA is open, but the cores can be proprietary or open source. The open ISA makes the latter possible, and there are a already few open source cores, from many different groups.

Drew Fustini

It also has well supported software ecosystem: support is in many OSes, toolchains and libraries, languages and runtimes.

The RISC-V architecture has three privileges mode: user, supervisor (OS), and machine (firmware). There are few Control Status Registers to get the machine status and configure it. These are used for many things, like controlling virtual memory, supervisor mode or trap handling.

The spec is still being worked on, and there are for examples future interrupt handlers incoming to enhance the current PLIC and CLIC ones.

There’s a non-ISA RISC-V specification for describing the Supervisor Binary Interface (SBI), the calling convention between supervisor and machine mode.

In RISC-V, a Hart is the Hardware Thread. In the context of SMT, a single core might have multiple harts.

There are SBI extensions, and those are used to control low-level platform features: Hart State Management (lifecycle), Performance, system reset. The hypervisor extension adds another level between machine and supervisor, as well as SBI interfaces for communication between levels.

There’s an open source implementation of SBI called openSBI.

UEFI Support is also available. Both u-boot and edk2 have support for RISC-V EFI mode, and grub2 can be used as a RISC-V payload. There’s a RISC-V EFI Boot Protocol for discovering the boot Hart.

The RISC-V platform specification describes common requirements for hardware in order to be able boot an OS. It has different levels/profiles depending the type of OS: generic, embedded or server. There’s a RISC-V ACPI Platform Specification for RISC-V specific tables like Hart and Timer capabilities.

QEMU can easily emulate both RISCV32 and RISCV64, and supports some extensions like hypervisor mode.

In Linux, most relevant architecture-dependent features are implemented in the RISC-V architecture. Recently, KVM support was added, as well as 5-level page tables to have 57 bits of address space. Perf support was improved. In 5.18, cpuidle and suspend drivers now support the SBI HSM extension; and the kernel can list the capabilities of a given CPU. In 5.19, kexec_file() support was submitted.

In progress, we have the vector ISA support in Linux for the new Vector 1.0 extension. As well as IPI and SSTC.

Fedora has been working on RISC-V for a while now. In Debian, 95% of the packages are supported in the non-official architecture port. In Ubuntu, official images are provided for some systems. Both Yocto and Buildroot have good support for RISC-V, too.

SiFive shipped a few high-end boards to be able to run the latest chips, but those were expensive are hard to get.

The Kendryte K210 is a much smaller board with 8MB of RAM with support in Buildroot to build a small Linux system.

T-Head is a RISC-V hardware and SoC (C910) from Alibaba with support for Android. They also have smaller SoC as well, the C906, with an official devboard (D1 Nezha) from Allwinner International that is very affordable. Some people have been working on linux mainline support for the D1. Unfortunately, it had a non-standard design (a non-coherent interconnect), which needed special support upstream in the form of the Svpbmt patchset, finally merged in 5.19.

It’s possible to join RISC-V International free-of-cost as an individual to participate in the discussions. Drew also runs a bi-weekly virtual meetup for the community.

Powerful and programmable kernel debugging with drgn – Omar Sandoval

drgn is a “programmable debugger”: it exposes a REPL with specific helpers instead of custom cli, and can work on both kernel core dumps and live linux kernel.

drgn was created because Omar came up against some very tricky bugs, and wasn’t able to use existing tools to do what he needed to do. And it was designed from the start to be used as a library.

Omar Sandoval

The next part was a demo of the debugger. It uses a python shell by default. The processes are accessed through the prog variable, using the dictionary syntax. Omar showed how to iterate on the children of given running process, in a VM being debugged by drgn.

drgn provides many helpers to help with common manipulations: for example to list elements in a list. This is all documented inline within python with help(drgn.helpers.linux); the same thing also available in the official drgn documentation.

Case Study

A report indicated a container creation failure with ENOSPC. Using strace and retsnoop, it was find to come from a limit on the number of IPC namespaces. But there were only a few IPC namespaces alive on the machines.

After analysis of the code, it was found to be an error returned when a given atomic counter in a hashtable reached a maximum value. Using drgn, one can search this hashtable for the proper id, and get the value of the counter. The counter did reach the maximum value, while there was only a few of those namespaces being used.

The decrement path has then to be analyzed. It was called in a kernel workqueue, so tasks were enumated with drgn to find if any locked task was in the system. The parameters of the workqueue callback function could then be printed. It was found that the free path of those free_ipc was using synchronize_rcu, making the close path slower; and delayed because it was in workqueue. A running crash-looping container was creating IPC namespaces fast enough that it was faster than the free mechanism, reaching the upper limit of IPC namespaces.

All this can be found dynamically in a familiar programming environment of drgn. And the python code doing this can be reused and shared.

Underneath, drgn uses a C library called libdgrn that does the heavy lifting and core abstraction.

The limitations of drgn are that it’s racy for live targets, needs to be kept in sync with the kernel, and requires DWARF. The latter is being worked on by Stephen Brennan to use BTF, ORC and kallsyms. But BTF is missing local variable descriptions, which if they are added will make the file bigger: from ~4MB to ~6MB on Stephen’s machine.

drgn can go beyond debugging, Omar says, thanks to its modular design: as a learning tool, for automation, etc.