#+Title: What I should not forgot when designing a new userspace <-> kernel interface #+Author: Yann Droneaud #+Email: ydroneaud@opteya.com #+Date: 2014-09-24 #+Description: a talk for Kernel Recipes #+Language: en #+OPTIONS: toc:nil #+OPTIONS: H:1 #+OPTIONS: num:0 #+BEGIN_COMMENT Copyright (C) 2014 OPTEYA SAS, distributed under Creative Commons Attribution-ShareAlike 4.0 International http://creativecommons.org/licenses/by-sa/4.0/ #+END_COMMENT * Where do you want to go today ? I have this shiny new subsystem/driver/... I need a mean to exchange data between userspace and kernel, so that my kernel code can do some work. * What are you going to do ? I'm going to design a new - character device, - ioctl, - syscall. * Are you sure ? #+BEGIN_CENTER I'm "/Not Sure/" ! #+END_CENTER * You're probably not #+BEGIN_CENTER Indeed. #+END_CENTER * There's always alternative Am I really in desperate need for a new - character device, - ioctl, - syscall ? * You could try It would be so much fun ! * Show me your data structure #+BEGIN_SRC C struct foo { char a; unsigned long b; #ifdef CONFIG_EXTENDED void *c; #endif }; #+END_SRC C * Don't use conditionals in public datastructure #+BEGIN_SRC C struct foo { char a; unsigned long b; #ifdef CONFIG_EXTENDED void *c; #endif }; #+END_SRC C I should have noted that's never gonna work: userspace compiled without define vs kernel compiled with the define: #+BEGIN_CENTER ABI mismatch ! #+END_CENTER * You're using the wrong data types (1/4) #+BEGIN_SRC C struct foo { char a; unsigned long b; void *c; }; #+END_SRC Dependending on the ABI, =char= signedness would change, so I should use fixed signedness data type. * You're using the wrong data types (2/4) #+BEGIN_SRC C struct foo { u8 a; unsigned long b; void *c; }; #+END_SRC C Dependending on the ABI, =sizeof(long)= would change, so I should use fixed length data types. * You're using the wrong data types (3/4) #+BEGIN_SRC C struct foo { u8 a; u64 b; void *c; }; #+END_SRC C Depending on the ABI, =sizeof(void *)= would change, so I should use a fixed data type for pointer too. I should also note that such pointer doesn't have =\_\_user= annotation, which might hide some access violation warnings. (Hint: install =sparse=, and use =make C=2=) * You're using the wrong data types (4/4) #+BEGIN_SRC C struct foo { u8 a; u64 b; u64 c; }; #+END_SRC C What might happen if I don't rewrite my data structure this way ? * You don't want to maintain a "compat" layer, do you ? What's "compat" ? * It's not the layer you're looking for I should probably avoid introducing new interfaces which require a compat layer. * You're still using the wrong data types #+BEGIN_SRC C struct foo { u8 a; u64 b; u64 c; }; #+END_SRC C I should not forgot that kernel data types are not exposed asis in userspace. Kernel data types are prefixed with =\_\_=. * Your data structure is almost correct #+BEGIN_SRC C struct foo { __u8 a; __u64 b; __u64 c; }; #+END_SRC C * Let's talk about ABI I should note the following about ABIs for some common architectures. #+BEGIN_CENTER | | sizeof(u64) | alignof(u64) | |---------+-------------+--------------| | i386 | 8 | 4 | | x86\_64 | 8 | 8 | | arm | 8 | 8 | #+END_CENTER In particular, i386 is special in this regard. * Your're digging hole There's implicit padding in my data structure. #+BEGIN_SRC C struct foo { __u8 a; /* u8 padding[n]; with n in 3 to 7 */ __u64 b; __u64 c; }; #+END_SRC C * Still down the rabbit hole Is this better ? #+BEGIN_SRC C struct foo { __u64 b; __u64 c; __u8 a; /* u8 padding[n]; with n in 3 to 7 */ }; #+END_SRC C Padding can be introduced at the end of the structure too. * You should know about padding I've discovered many ways to create holes in data structure. Such holes are possible discrepancy between ABIs: a compat layer might be needed for 32bits emulation on 64bits kernel. It's also wasted space: my data structure is using more bytes than strictly required. * How to initialize what doesn't have a name Since this padding is implicit, I don't initialize it. As I'm allocating the data structure on stack, I'm going to leak information to userspace. I will probably receive a semi-automatic email from Dan Carpenter who maintains and run =smatch= on kernel code. * You have to fill the hole #+BEGIN_SRC C struct foo { __u8 a; __u8 padding[7]; __u64 b; __u64 c; }; #+END_SRC C When filling hole, I have to use the largest alignment requirement, so I should not use i386 constraints. * You have to fill some headers for userspace My data structure must be described in a header in =include/uapi/linux=. It should not be in =include/linux= nor in =drivers/...= or whatever. I could read [[https://lwn.net/Articles/494993/][The UAPI header file split]], by Michael Kerrisk\\ http://lwn.net/Articles/507794/ * Take care of ABI stability Once designed, once merged, once exposed to userspace, my ABI cannot be changed without breaking existing userspace program. So I should start thinking right now about extension. * What you can do to prevent such issues (1/2) I have to ensure that data structure has no implicit padding, and alignment/size are the same for multi-ABI architectures (i386/amd64, sparc32/64, ppc32/64, etc.). * What you can do to prevent such issues (2/2) When reading data from userspace: #+BEGIN_CENTER /Check every unused bits for being 0/ #+END_CENTER When writing data to userspace: #+BEGIN_CENTER /Set every unused bits to 0/ #+END_CENTER * Tools that can help you (gcc) ** GCC GCC has a nice warning option =-Wpadded= #+BEGIN_SRC sh $ gcc -Wpadded -c foo.c #+END_SRC #+BEGIN_EXAMPLE foo.c:6:15: attention : padding struct to align ‘b’ [-Wpadded] __u64 b; ^ #+END_EXAMPLE #+BEGIN_EXAMPLE fuu.c:9:1: warning: padding struct size to alignment boundary [-Wpadded] }; ^ #+END_EXAMPLE But I don't want to compile the whole kernel with this option. * Tools that can help you (pahole 1/5) ** Pahole I could use pahole to check the data structure layout. Pahole is a tool written by Arnaldo Carvalho de Melo (acme). http://git.kernel.org/cgit/devel/pahole/pahole.git I can use it to look after padding, and I could use it to compare the layout of the structure for i386 and x86\_64. [[http://lwn.net/Articles/206805/][pahole and other DWARF2 utilities]]\\ http://lwn.net/Articles/206805/ Part of =dwarves= packages on some distribution. * You can check earlier structure with pahole (2/5) #+BEGIN_SRC sh $ gcc -c -m32 foo.c $ pahole foo.o #+END_SRC #+BEGIN_SRC C struct foo { __u8 a; /* 0 1 */ /* XXX 3 bytes hole, try to pack */ __u64 b; /* 4 8 */ __u64 c; /* 12 8 */ /* size: 20, cachelines: 1, members: 3 */ /* sum members: 17, holes: 1, sum holes: 3 */ /* last cacheline: 20 bytes */ }; #+END_SRC * You can check earlier reordered structure with pahole (3/5) #+BEGIN_SRC sh $ gcc -c -m32 foo.c $ pahole foo.o #+END_SRC #+BEGIN_SRC C struct foo { __u64 b; /* 0 8 */ __u64 c; /* 8 8 */ __u8 a; /* 16 1 */ /* size: 20, cachelines: 1, members: 3 */ /* padding: 3 */ /* last cacheline: 20 bytes */ }; #+END_SRC * You can check ABI differences with pahole (4/5) #+BEGIN_SRC sh $ gcc -c -m32 foo.c -o foo-32.o $ pahole foo-32.o > foo-32.o.c $ gcc -c -m64 foo.c -o foo-64.o $ pahole foo-64.o > foo-64.o.c #+END_SRC * and diff (5/5) #+BEGIN_SRC sh $ diff -u foo-32.o.c foo-64.o.c #+END_SRC #+BEGIN_EXAMPLE --- foo-32.o.c 2014-09-25 14:28:03.988105812 +0200 +++ foo-64.o.c 2014-09-25 14:28:11.591073455 +0200 @@ -3,7 +3,7 @@ __u64 c; /* 8 8 */ __u8 a; /* 16 1 */ -/* size: 20, cachelines: 1, members: 3 */ -/* padding: 3 */ -/* last cacheline: 20 bytes */ +/* size: 24, cachelines: 1, members: 3 */ +/* padding: 7 */ +/* last cacheline: 24 bytes */ }; #+END_EXAMPLE * Tools that can help you (trinity) ** Trinity I could add my new kernel interface to Trinity. [[http://codemonkey.org.uk/projects/trinity/][Trinity is a Linux system call fuzzer]] written by Dave Jones. http://codemonkey.org.uk/projects/trinity/\\ https://github.com/kernelslacker/trinity [[http://lwn.net/Articles/536173/][LCA: The Trinity fuzz tester]], by Michael Kerrisk\\ http://lwn.net/Articles/536173/ * You could find some help As usual, I should submit patches earlier in my development cycle to [[mailto:linux-kernel@vger.kernel.org][linux-kernel@vger.kernel.org]] in order to get some reviews. But another mailing list may be of interest: [[https://www.kernel.org/doc/man-pages/linux-api-ml.html][The linux-api mailing list]]\\ https://www.kernel.org/doc/man-pages/linux-api-ml.html #+BEGIN_CENTER [[mailto:linux-api@vger.kernel.org][linux-api@vger.kernel.org]] #+END_CENTER * Links [[http://www.ukuug.org/events/linux2007/2007/papers/Bergmann.pdf][How to not invent kernel interfaces]], by Arnd Bergmann\\ http://www.ukuug.org/events/linux2007/2007/papers/Bergmann.pdf [[http://lwn.net/Articles/588444/][Botching up ioctls]], by Daniel Vetter\\ http://blog.ffwll.ch/2013/11/botching-up-ioctls.html [[http://lwn.net/Articles/588444/][Proper handling of unknown flags in system calls]], by Michael Kerrisk\\ http://lwn.net/Articles/588444/ [[http://lwn.net/Articles/585415/][Flags as a system call API design pattern]], by Michael Kerrisk\\ http://lwn.net/Articles/585415/ * More Links [[https://lwn.net/Articles/494993/][Fixing the unfixable autofs ABI]], by Jonathan Corbet\\ https://lwn.net/Articles/494993/ [[http://lwn.net/Articles/311630/][System calls and 64-bit architectures]], by Jake Edge\\ http://lwn.net/Articles/311630/ [[http://lwn.net/Articles/567894/][A perf ABI fix]], by Jonathan Corbet\\ http://lwn.net/Articles/567894/ * Questions ? * One last thing #+BEGIN_CENTER Please use =O\_CLOEXEC= by default ! #+END_CENTER * Final words #+BEGIN_CENTER Thank You ! #+END_CENTER