Linux Compatibility on BSD for the PPC Platform: Part 5by Emmanuel Dreyfus
Debugging the debugger
At various stages of Linux emulation debugging, the lack of a strong debugging tool such as
gdb is a big issue. Getting native threads working with the Java Virtual Machine (JVM) is so tricky that it really requires a working
gdb to understand what is going on. NetBSD's
gdb is able to work on Linux processes, but it is not able to work with dynamic Linux programs, because it knows nothing about Linux's
ld.so. Thus having Linux's
gdb working is highly desirable. In this article, we will take a look at Linux emulation fixes needed to have a fully functional Linux
Spurious terminal hangup
The first issue we had with
gdb was rather rude:
gdb loaded successfully, displayed the credit lines, the prompt, and then it exited, taking the whole session down at the same time. When running Linux
gdb in a telnet session, I was simply logged off.
gdb, we were able to discover that the reason was a hangup signal (
SIGHUP) issued to
gdb, and probably to all the processes in the process group operating on the terminal, because all were killed.
The question was: Where was this spurious signal coming from? The kernel trace showed no
kill() system call from
gdb was not requesting the whole session to die. The decision to send the signal therefore had to be made in the kernel.
Getting a spurious
SIGHUP is quite unusual. Most of the time, runaway processes get unexpected
SIGBUS signals because they attempted to access invalid memory locations in their address spaces.
SIGHUP is rare enough, so we were able to locate where it was coming from in the kernel by using
SIGHUP in the kernel sources. There are basically four places in the NetBSD kernel where a hangup signal is sent to a process. These are
By adding a few
printf commands before each of these locations in the kernel, it was possible to discover that the problem was coming from
As its name suggests,
ttioctl() is an
ioctl method. By having a look to the kernel trace just before the
SIGHUP is caught by
gdb, we have a better idea of where the problem was coming from:
1594 gdb CALL ioctl(0,TIOCGWINSZ,0x7fffe138) 1594 gdb RET ioctl 0 1594 gdb CALL ioctl(0,TIOCSWINSZ,0x7fffe138) 1594 gdb RET ioctl 0 1594 gdb CALL ioctl(0,TIOCGETA,0x7fffe028) 1594 gdb RET ioctl 0 1594 gdb CALL ioctl(0,TIOCSETAW,0x7fffe0e8) 1594 gdb RET ioctl 0 1594 gdb CALL rt_sigprocmask(0x2,0x101c29bc,0,0x8) 1594 gdb RET rt_sigprocmask0 1594 gdb PSIG SIGHUP caught handler=0x1003c9c4 mask=(20) code=0x0SIGHUP SIG_DFL
Knowing that the problem is related to
ioctl() calls, we wanted to have a deeper look to the four
ioctl() calls that occur before the hangup signal. We started with the last one, the
ioctl() TIOCSETAW command, which happened to be the
ioctl() call leading to the spurious hangup signal generation. But first, let us introduce this
Previously in this series
Linux Compatibility on BSD for the PPC Platform: Part 4 -- Emmanuel Dreyfus explains difficulties discovered in porting the Linux compatibility layer to run the Java Virtual Machine.
Linux Compatibility on BSD for the PPC Platform: Part 3 -- Signals are the interactions between the kernel and the user program -- a program can't run without them. Emmanuel Dreyfus explains how to make your signals Linux-compatible.
Linux Compatibility on BSD for the PPC Platform: Part 2 -- Emmanuel Dreyfus takes a look at how to prevent dynamic Linux binary compatibility problems on the NetBSD/PowerPC platform.
Linux Compatibility on BSD for the PPC Platform -- The Linux compatibility layer allows BSD to run Linux binary applications. Emmanuel Dreyfus explains how he implemented this on NetBSD for the PowerPC.
As we explained in part three of this series,
ioctl() is used to perform various non standard operations on files -- this is different from read, write, etc. The Linux
TIOCSETAW ioctl() command is used to set terminal properties, but after current I/O operation has finished. Using this system call,
gdb just tries to adjust a terminal setting.
Let us now see how this
ioctl() call happens to invoke
ioctl() system call is implemented as
sys/compat/linux/common/linux_termios.c:linux_sys_ioctl() for Linux processes. Like most Linux wrapper functions, the job of
linux_sys_ioctl() is to make appropriate translations and then call the native
ioctl implementation. The native
ioctl() implementation depends on the file on which the
ioctl() system call was made.
linux_sys_ioctl() loads the appropriate function address in the
*bsdioctl function pointer, like this:
bsdioctl = fp->f_ops->fo_ioctl;
linux_sys_ioctl() tests the command argument (com) of the
ioctl() system call, doing different Linux to BSD translations depending on the command. The Linux
ioctl() command we are looking after,
TIOCSETAW is implemented as two NetBSD
TIOCSETAW. Both of theses commands are executed using
ioctl() operation is done on file descriptor zero (first argument of the
ioctl() system call), which is the standard input. If the standard input is a terminal (as opposed to a regular file or a pipe), its
ioctl() method is the
ioctl() method for terminals, which happens to be
ttioctl(). In the
ttioctl() implementation, we can see that
SIGHUP is issued when executing the
TIOCSETAW command, and if the terminal output speed is null.
Our problem here is that Linux sometimes has a null output speed for a terminal because it does not need to have a value for a virtual terminal, whereas NetBSD uses this value to detect a terminal hangup.
The fix was to fool the NetBSD kernel into thinking that the terminal output speed was not null, whereas it was apparently set to zero for the Linux process. This was achieved by modifying the
bsd_termios_to_linux_termios() functions from
sys/compat/linux/common/linux_termios.c, whose job is to translate between Linux and NetBSD
termios structures. The fix is simple: When a Linux process stores a null value in the output speed field
c_ospeed, we set the field to
-1 so that the NetBSD kernel will not hangup the terminal:
/* * A null c_ospeed causes NetBSD to hangup the terminal. * Linux does not do this, and it sets c_ospeed to zero * sometimes. If it is null, we store -1 in the kernel */ if (bts->c_ospeed == 0) bts->c_ospeed = -1;
And when the Linux process reads a
struct termios from the kernel, if
c_ospeed is -1 then we translate it back to 0. The Linux process thus has a consistent value for
/* * A null c_ospeed causes NetBSD to hangup the terminal. * Linux does not do this, and it sets c_ospeed to zero * sometimes. If it is null, we store -1 in the kernel */ if (bts->c_ospeed == -1) bts->c_ospeed = 0;
The value -1 is arbitrary, it was chosen negative so that it cannot interfere with any valid value for
c_ospeed. With this fix,
gdb was able to startup without immediately hanging up the whole session. Next step was to actually use it.