Vanishing Features of the 2.6 Kernelby Jerry Cooperstein
Many developers are eagerly awaiting the 2.6 Linux kernel. The feature freeze has passed, with a code freeze planned for January and final release slated for the second quarter of 2003. There is considerable excitement about anticipated enhancements, especially regarding scalability and performance.
However, some developers may first notice what doesn't work anymore. Some techniques and APIs have been removed, and existing device drivers and modular plugins may no longer work. At the same time, it will take some time to take advantage of new features and to find replacements for old ones.
Some deprecated techniques, such as task queues, have finally been eliminated. Other facilities, including in-kernel Web acceleration, have been supplanted by newer advances. Other changes, notably banishing the system call table from the list of exported symbols available to modules, have flowed more from philosophical and licensing issues than from technical considerations.
Export of the System Call Table
The Linux kernel has a monolithic architecture; it is one big program. All
parts of the kernel are visible to each other unless their scope has been
explicitly limited. Arguments are passed on the stack, as in any other
C program. At the same time, Linux makes extensive use of modules: facilities that may be loaded and unloaded dynamically. (These are often, but not always
device drivers.) Modules can only see explicitly exported symbols (functions,
variables, etc.). Unless the kernel or a previously loaded module includes
EXPORT_SYMBOL(foobar);, the module cannot refer to
Extensive modularization does not render the kernel any less monolithic. The critical difference between monolithic and microkernels stems from how components communicate with each other. As long as the Linux kernel prefers function calls to message passing, its basic structure will remain monolithic.
The system call table is a vector containing the addresses of the functions executed whenever a system call is made from user space. When invoking a system call, the kernel receives the number of the call, the number of arguments, and the arguments themselves. It uses the call number as an offset into the table and places the arguments in the registers; they're not passed on the stack. Then it jumps to the appropriate address to execute the system call.
Exporting the system call table allows modules to substitute system calls
with replacements of their own devising. To replace the basic kernel
read() system call requires a simple code fragment:
extern int sys_call_table; read_save = sys_call_table[NR_read]; sys_call_table[NR_read] = read_sub;
read_sub() has been defined somewhere in the module and
the pointer to the original system call has been saved so that it can be
restored upon module unloading:
sys_call_table[NR_read] = read_save;
So what is wrong with this technique?
On the practical side, it is easy to incur race conditions, especially on multi-processor systems where the replacement happens while an application is using the system call. Various locking techniques can offer some protection, but the details are non-trivial. However, the abolition of this method is not primarily due to practical difficulties.
Some system calls penetrate deep into kernel's heart. Binary-only modules, where the source is not available under a GPL-compatible license, have enjoyed the use of this technique. Exported symbols have been visible to all modules.
The rules governing binary modules and GPL violations have always been fuzzy. Some argue that it is permissible for any such module to restrict itself to exported symbols. Others maintain it depends on whether or not the module fiddles with core kernel facilities. The line between central and peripheral matters has always been very gray.
To sharpen this delineation, the 2.4.10 version of
modutils, which handles
loading and unloading of modules, introduced module licenses. In addition, the
EXPORT_SYMBOL_GPL macro, introduced in the 2.4.11 kernel, created
two classes of exported symbols. Only modules with an acceptable open-source
license can have access to symbols exported under the GPL. All previously
exported symbols were grand fathered in.
This led to some loud arguments. Perhaps if the macro had been called
EXPORT_SYMBOL_INTERNAL, it would have shown an intent of
differentiating between modules implementing central and peripheral kernel
facilities, rather than making a choice based on the kernel programmer's
Choosing to use
EXPORT_SYMBOL_GPL(sys_call_table) would have
satisfied many objections. Instead, the more draconian choice of embargoing all
export of the system call table occurred. Red Hat did this in the patched
2.4.18 kernel shipped with Red Hat Linux 8.0, and Linus Torvalds did the same
in the 2.5.41 development kernel. As a result, a module can no longer replace
a system call through the simple code above. Its replacement adds support to
register new system calls dynamically. This feature may continue to grow.
Most observers foresee a tightening of the limits on binary modules. This may very well break some rather expensive commercial Linux products, but that doesn't seem to bother most kernel developers. Reminding the purveyors of binary modules that they continue to operate at the pleasure of the Linux kernel developers and their open-source licenses is seen to be a necessary (even enjoyable) task. It has probably always been true that the only way to protect investment in Linux deployment of drivers and other kernel facilities (not applications) is to go open source, even if that is difficult for commercial enterprises to absorb. Recent developments seem to re-emphasize this.
Pages: 1, 2