diff --git a/en_US.ISO8859-1/articles/5-roadmap/article.sgml b/en_US.ISO8859-1/articles/5-roadmap/article.sgml index bb8905a442..d14d10847e 100644 --- a/en_US.ISO8859-1/articles/5-roadmap/article.sgml +++ b/en_US.ISO8859-1/articles/5-roadmap/article.sgml @@ -1,670 +1,670 @@ %man; %freebsd; %authors; %teams; %mailing-lists; RELENG_3"> RELENG_4"> RELENG_5"> RELENG_5_1"> RELENG_5_2"> HEAD"> ]>
The Roadmap for 5-STABLE The &os; Release Engineering Team $FreeBSD$ 2003 The &os; Release Engineering Team Introduction and Background After nearly three years of work, &os; 5.0 was released in January of 2003. Features like the GEOM block layer, Mandatory Access Controls, ACPI, sparc64 and ia64 platform support, and UFS snapshots, background filesystem checks, and 64-bit inode sizes make it an exciting operating system for both desktop and production users. However, some important features are not complete. The foundations for fine-grained locking and preemption in the kernel exist, but much more work is left to be done. Work on Kernel Schedulable Entities (KSE), similar to Scheduler Activations, has been ongoing but needs a push to realize its benefit. Performance compared to &os; 4.X has declined and must be restored and surpassed. This is somewhat similar to the situation that &os; faced in the 3.X series. Work on 3-CURRENT trudged along seemingly forever, and finally a cry was made to just ship it and clean up later. This decision resulted in the 3.0 and 3.1 releases being very unsatisfying for most, and it wasn't until 3.2 that the series was considered stable. To make matters worse, the &t.releng.3; branch was created along with the 3.0 release, and the &t.releng.head; branch was allowed to advance immediately towards 4-CURRENT. This resulted in a quick divergence between &t.releng.head; and &t.releng.3;, making maintenance of the &t.releng.3; branch very difficult. &os; 2.2.8 was left for quite a while as the last production-quality version of &os;. Our intent is to avoid repeating that scenario with &os; 5.x. Delaying the &t.releng.5; branch until it is stable and production quality will ensure that it stays maintainable and provides a compelling reason to upgrade from 4.X, To do this, we must identify the current areas of weakness and set clear goals for resolving them. This document contains what we as the release engineering team feel are the milestones and issues that must be resolved for the &t.releng.5; branch. It does not dictate every aspect of &os; development, and we welcome further input. Nothing that follows is meant to be a sleight against any person or group, or to trivialize any work that has been done. There are some significant issues, though, that need decisive and unbiased action. Major issues The state of SMPng and kernel lockdown is the biggest concern for 5.X. To date, few major systems have come out from under the kernel-wide mutex known as Giant. The SMP status page at http://www.FreeBSD.org/smp provides a comprehensive breakdown of the overall SMPng status. Status specific to SMPng progress in device drivers can be found at at http://www.FreeBSD.org/projects/busdma. In summary: VM: the kmem_malloc(M_NOWAIT) path no longer needs Giant held. The kmem_malloc(M_WAITOK) path is in progress and is expected to be finished in the coming weeks. Other facets of the VM system, like the vfs interface, buffer/cache, etc, are largely untouched. GEOM: The GEOM block layer was designed to run free of Giant, but at this time no block drivers can run without Giant. Additionally, it has the potential to suffer performance loss due to its upcall/downcall data paths happening in kernel threads. Lightweight context switches might help this. Network: Locking of the TCP and UDP portions of the stack is complete. Work is in progress to lock up the IP stack, including the routing tree, ARP code, raw IP, and ifaddr and inet data structures. IPv6 has been lightly touched during the inp locking but is hindered by the KAME code being significnatly out of date. Work has not started on any of the other protocols such as AppleTalk, XNS, or IPX. Locking of the socket layer is in progress but has been largely untested. None of the hardware drivers or Ethernet layers have been locked. VFS: Initial pre-cleanup started. buffer/cache: Initial work complete. Proc: Work on locking the proc structure was ongoing for a while but seems to have stalled. CAM: No significant work has occurred on the CAM SCSI layer. Newbus: some work has started on locking down the device_t structure. Pipes: complete with the exception of VM-related optimizations. File descriptors: complete. Process accounting: jails, credentials, MAC labels, and scheduler are out from under Giant. MAC Framework: complete Timekeeping: complete kernel encryption: crypto drivers and core &man.crypto.4; framework are Giant-free. KAME IPsec and FAST IPSec have not been locked. Sound subsystem: complete kernel preemption: preemption for interrupt threads is enabled. However, contention due to Giant covering much of the kernel and most of the device driver interrupt routines causes excessive context switches and might actually be hurting performance. Work is underway to explore ways to make preemption be conditional. Another issue with SMPng is interrupt latency. The overhead of doing a complete context switch to a kernel interrupt thread is high and shows noticeable latency. Work is ongoing to implement lazy context switching on all platforms. Fine grained locking of drivers will also help this, as will converting drivers to be as efficient as possible in their interrupt routines. Next, the state of KSE must resolved for &t.releng.5;. Work on it has slowed noticeably in the past 6 months but appears to be picking up again. There are a number of issues that must be addressed: The userland threading library, currently called libkse, is immature and has not been used for any significant threaded application. KSE has the potential to uncover latent race conditions and create new ones. An audit needs to be performed to ensure that no obvious problems exist. According to the release schedule below, KSE kernel and userland components must be functionality complete by June 2003 in order to be included in the &t.releng.5; branch. For security and stability reasons, if KSE cannot be finished in time then, by default, all KSE-specific syscalls should be modified to return ENOSYS and all other KSE-specific interfaces disabled. Deprecating KSE from &t.releng.5; but keeping it in the &t.releng.head; branch will pose problems in porting bugfixes and features between the two branches, so every effort should be made to finish it on time. Goals for 5-STABLE The goals for the &t.releng.5; branch point are: All subsystems and interfaces must be mature enough to be maintainable for improvements and bug fixes. Equal or better stability from &os; 4.8. No functional regressions from 4.8. It is important to make sure that users do not avoid upgrading to 5.x because of lost functionality. Performance on par with &os; 4.8 for most common operations. Both UP and SMP configurations should be evaluated. SMP has the potential to perform much better than 4.X, though for the purposes of creating the &t.releng.5; branch, comparable performance between the two should be acceptable. It is unrealistic to expect that the SMPng project will be fully complete by &t.releng.5;, or that performance will be significantly better than 4.X. However, focusing on a subset of the outstanding tasks will give enough benefit for the branch to be viable and maintainable. To break it down: ABI/API/Infrastructure stability - Enough infrastructure must be in place and stable to allow fixes from &t.releng.head; to easily and safely be merged into &t.releng.5;. Also, we must draw a line as to what subsystems are to be locked down when we go into 5-STABLE. SMPng VM: Most codepaths, others than the ones that interact with VFS, should be Giant-free for &t.releng.5;. Network: Taking the network stack out from under Giant poses the risk of uncovering latent bugs and races. Locking it down but not removing Giant imposes further performance penalties. A decision on which parts of the network stack should be locked and taken out from under Giant for &t.releng.5; should be made no later than March 15. Work on the IP, TCP, UDP,raw IP, routing sockets, and Unix domain sockets stands a good chance of being complete in time for &t.releng.5;. If the decision is made to not lift Giant from the stack, then the locks in these layers could be optimized out with a kernel config option. Having a Giant-free path from the the hardware layer to the IP queues should be investigated as it could allow significant performance gains in the network benchmarks. If this can be achieved then the hardware interface layer needs to allow for drivers to incrementally become free of Giant. Locking down at least two Ethernet drivers would be highly desirable. If the semantics are too complex to have the stack free of Giant but not the hardware drivers, investigation should be done into making it configurable. Lesser-used network stacks like netatlk, netipx, etc, should not break while this work is going on. However, locking them is not a high priority. Special kernel config options might be needed in order for these layers to operate with the rest of the stack being locked and Giant free. GEOM: At least 2 block drivers should be locked in order to demonstrate that others can also be locked without changing the interface to GEOM. The ATA driver is a good candidate for this, though caution should be taken as it is also extremely high-profile and any problems with it will affect nearly all users of &os;. Lazy context switching: sparc64 is the only platform that performs lazy context switching when entering the kernel. The performance gains promised by this are significant enough to - require that it be implemented for all other Tier 1 + require that it be implemented for all other Tier-1 platforms. KSE: The kernel side of KSE must be functionally complete and have undergone a security audit. libkse must be complete enough to demonstrate a real-world application running correctly on it using the standard POSIX Threads API. Examples would be apache 2.0, Java, and/or mozilla. A functional regression test suite is also a requirement for &t.releng.5; and should test signal delivery, scheduling, performance, and process security/credentials for both KSE and non-KSE processes. KSE kernel and userland components must also reach the same level of functionality for all Tier-1 platforms in both UP and SMP configurations. The definition of Tier-1 platforms can be found in http://www.FreeBSD.org/doc/en_US.ISO8859-1/articles/committers-guide/archs.html. busdma interface and drivers: architectures like PAE/i386 and sparc64 which don't have a direct mapping between host memory address space and expansion bus address space require the elimination for vtophys() and friends. The busdma interface was created to handle exactly this problem, but many drivers do not use it yet. The busdma project at http://www.FreeBSD.org/projects/busdma tracks the progress of this and should be used to determine which drivers must be converted for &t.releng.5; and which can be left behind. Also, there has been talk by several developers and the original author to give the busdma interface a minor overhaul. If this is to happen, it needs to happen before &t.releng.5;. Otherwise, differences between the old and new API will make driver maintenance difficult. PCI resource allocation: PC2003 compliance requires that x86 systems no longer configure PCI devices from the system BIOS, leaving this task soley to the OS. &os; must gain the ability to manage and allocate PCI memory resources on its own. Implementing this should take into account cardbus, PCI-HotPlug, and laptop dockstation requirements. This feature will become increasingly critical through the lifetime of &t.releng.5;, and therefore is a requirement for the &t.releng.5; branch. Performance: most performance gains hinge on the progress of SMPng Areas that should be concentrated on are: Storage I/O: I/O performance suffers from two problems, too many expensive context switches, and too much work being done in interrupt threads. Specifically, it takes 3 context switches for most drivers to get from the hardware completion interrupt to unblocking the user process: one for the interrupt thread, one for the GEOM g_up thread, and one to get back to the user thread. Drivers that attempt to be efficient and quick in their interrupt handlers (as all should be) usually also schedule a taskqueue, which adds a context switch in between the interrupt thread and the g_up thread and brings the total up to 4. Two things need to be done to attack this: Make all drivers defer most of their processing out of their interrupt thread. Significant performance gains have been shown recently in the &man.aac.4; driver by making its interrupt handler be INTR_MPSAFE and moving all processing to a taskqueue. investigate eliminating the taskqueue context switch by adding a callback to the g_up thread that allows a driver to do its interrupt processing there instead of in the taskqueue. Network: Network drivers suffer from the interrupt latency previously mentioned as well as from the network stack being partially locked down but not free from Giant. Possible strategies for addressing this are described in the previous section. Other locking - XXX? Benchmarks and performance testing: Having a source of reliable and useful benchmarks is essential to identifying performance problems and guarding against performance regressions. A performance team that is made up of people and resources for formulating, developing, and executing benchmark tests should be put into place soon. Comparisons should be made against both &os; 4.X and Linux 2.4.x. Tests to consider are: the classic worldstone webstone: www/webstone Fstress: http://www.cs.duke.edu/ari/fstress ApacheBench: www/p5-ApacheBench netperf: benchmarks/netperf Web Polygraph: http://www.web-polygraph.org Note: does not compile with gcc 3.x yet. Features: ACPI: Intel's ACPI power management and device configuration subsystem has become an integral part of &os;'s x86 and ia64 device configuration model. However, many bugs exist in Intel's vendor code, our OS-specific code, and motherboard BIOSes, causing many ACPI-enabled systems to fail to boot, misdetect drivers, and/or have many other problems. Fixing these problems seems to be an uphill battle and is often times causing a poor first-impression of &os; 5.0. Most x86 systems can function with ACPI disabled, and logic should be added to the bootloader and sysinstall to allow users to easily and intuitively turn it off. Turning off ACPI by default is prone to problems also as many newer systems rely on it to provide correct interrupt routing information. Also, a centralized resource should be created to track ACPI problems and solutions. Linux uses the same Intel vendor sources as &os;, so we should investigate how they have handled some of the known problems. NEWCARD/OLDCARD: The NEWCARD subsystem was made the default for &os; 5.0. Unfortunately, it contains no support for non-Cardbus bridges and falls victim to interrupt routine problems on some laptops. The classic 16-bit bridge support, OLDCARD, still exists and can be compiled in, but this is highly inconvenient for users of older laptops. If OLDCARD cannot be completely deprecated for &t.releng.5;, then provisions must be made to allow users to easily install an OLDCARD-enabled kernel. Documentation should be written to help trasition users from OLDCARD to NEWCARD and from &man.pccardd.8; to &man.devd.8;. The power management and dumpcis functionality of &man.pccardc.8; needs to be brought forward to work with NEWCARD, along with the ability to load CIS quirk entries. Most of this functionality can be integrated into &man.devd.8; and &man.devctl.4;. New scheduler framework: The new scheduler framework is in place, and users can select between the classic 44bsd scheduler and the new ULE scheduler. A scheduler that demonstrates processor affinity, HyperThreading and KSE awareness, and no regressions in performance or interactivity characteristics must be available for &t.releng.5;. sparc64 local console: neither syscons nor vt work on sparc64, leaving it with only serial and fake OFW console support. This is a major support hole for what is a - Tier 1 platform. Whether syscons can be shoe-horned in or + Tier-1 platform. Whether syscons can be shoe-horned in or wscons be adopted from NetBSD is up for debate. However, sparc64 must have local console support for &t.releng.5;. Having this will also enable the XFree86 server to run, which is also a requirement for &t.releng.5;. gcc/toolchain: gcc 3.3 might be available in time for &t.releng.5; and might offer some attractive benefits, but also likely to introduce ABI incompatibility with prior gcc versions. ABI compatibility should be locked down for the &t.releng.5; branch. There has also been a request to move /usr/include/g++ to /usr/include/g++-v3 to be more compliant with the stock behavior of gcc. This should also be investigated for &t.releng.5;. gdb: gdb from the base system should work for sparc64. It should also understand KSE thread semantics, assuming that KSE is included in the &t.releng.5; branch. gdb 5.3 is available and there are reports that it should address the sparc64 issue. &man.disklabel.8; regressions: The biggest casualty of the introduction of GEOM appears to be the disklabel utility. The option gives unpredictable results in most cases now and should be removed or fixed. Work is planned for a new unified interface for modifying labels and slices, however this should not preclude disklabel from being fixed. Documentation: The manual pages, Handbook, and FAQ should be free from content specific to &os; 4.X, i.e. all text should be equally applicable to &os; 5.X. The installation section of the handbook needs the most work in this area. The release documentation needs to be complete and accurate - for all Tier 1 architectures. The hardware notes and + for all Tier-1 architectures. The hardware notes and installation guides need specific attention. If &os; 5.1 is not the branch point for &t.releng.5; then the Early Adopters Guide needs to be updated. This document should then be removed just before the release closest to the &t.releng.5; branch point. Schedule If branching &t.releng.5; at the 5.1 release is paramount, 5.1 will probably need to move out by at least 3 months. The schedule would be: Jun 30, 2003: KSE and SMPng feature freeze Aug 4, 2003: 5.1-BETA, general code freeze Aug 18, 2003: 5.1-RC1, &t.releng.5; and &t.releng.5.1; branched Aug 25, 2003: 5.1-RC2 Sept 1, 2003: 5.1-RELEASE Taking an incremental approach might be more beneficial. Releasing 5.1 in time for USENIX ATC 2003 will provide a wide audience for productive feedback and will keep &os; visible. In this scenario, 5.1 should offer a significant improvement over 5.0 in terms of bug fixes and performance. Lockdowns and improvements to the storage subsystem and scheduler should be expected, the NEWCARD/OLDCARD issues should be addressed, and all known bugs and regressions from the 5.0 errata list should be fixed. KSE and other SMPng tasks that cannot finish in time for 5.1 should also not reduce the stability of the release. The schedule for this would be: May 5, 2003: 5.1-BETA, general code freeze May 19, 2003: 5.1-RC1, &t.releng.5.1; branched May 27, 2003: 5.1-RC2 Jun 2, 2003: 5.1-RELEASE Jun 30, 2003: KSE and SMPng feature freeze Sept 1, 2003: 5.2-BETA, general code freeze Sept 15, 2003: 5.2-RC1, &t.releng.5; and &t.releng.5.2; branched Sept 22, 2003: 5.2-RC2 Sept 29, 2003: 5.2-RELEASE Post &t.releng.5; direction As with all -STABLE development streams, the focus should be bug fixes and incremental improvements. Just like normal, everything should be vetted through the &t.releng.head; branch first and committed to &t.releng.5; with caution. As before, new device drivers, incremental features, etc, will be welcome in the branch once they have been proven in &t.releng.head;. Further SMPng lockdowns will be divided into two categories, driver and subsystem. The only subsystem that will be sufficiently locked down for &t.releng.5; will be GEOM, so incrementally locking down device drivers under it is a worthy goal for the branch. Full subsystem lockdowns will have to be fully tested and proven in &t.releng.head; before consideration will be given to merging them into &t.releng.5;.