diff --git a/en/projects/netperf/index.sgml b/en/projects/netperf/index.sgml index f9859326ae..c2e37b703f 100644 --- a/en/projects/netperf/index.sgml +++ b/en/projects/netperf/index.sgml @@ -1,296 +1,296 @@ - + %includes; N/A"> Done"> Prototyped"> Merged to HEAD; RELENG_5 candidate"> New task"> Unknown"> %developers; ]> &header;

Contents

Project Goal

The netperf project is working to enhance the performance of the FreeBSD network stack. This work grew out of the SMPng Project, which moved the FreeBSD kernel from a "Giant Lock" to more fine-grained locking and multi-threading. SMPng offered both performance improvement and degradation for the network stack, improving parallelism and preemption, but substantially increasing per-packet processing costs. The netperf project is primarily focussed on further improving parallelism in network processing while reducing the SMP synchronization overhead. This in turn will lead to higher processing throughput and lower processing latency.

Project Strategies

Robert Watson

The two primary focuses of this work are to increase parallelism while decreasing overhead. Several activities are being performed that will work toward these goals:

Project Tasks

Task Responsible Last updated Status Notes
Prefer file descriptor reference counts to socket reference counts for system calls. &a.rwatson; 20041124 &status.done; Sockets and file descriptors both have reference counts in order to prevent these objects from being free'd while in use. However, if a file descriptor is used to reach the socket, the reference counts are somewhat interchangeable, as either will prevent undesired garbage collection. For socket system calls, overhead can be reduced by relying on the file descriptor reference count, thus avoiding the synchronized operations necessary to modify the socket reference count, an approach also taken in the VFS code. This change has been made for most socket system calls, and has been committed to HEAD (6.x). It has also been merged to RELENG_5 for inclusion in 5.4.
Mbuf queue library &a.rwatson; 20041124 &status.prototyped; In order to facilitate passing off queues of packets between network stack components, create an mbuf queue primitive, struct mbufqueue. The initial implementation is complete, and the primitive is now being applied in several sample cases to determine whether it offers the desired semantics and benefits. The implementation can be found in the rwatson_dispatch Perforce branch. Additional work must also be done to explore the performance impact of "queues" vs arrays of mbuf pointers, which are likely to behave better from a caching perspective.
Employ queued dispatch in interface send API &a.rwatson; 20041106 &status.prototyped; An experimental if_start_mbufqueue() interface to struct ifnet has been added, which passes an mbuf queue to the device driver for processing, avoiding redundant synchronization against the interface queue, even in the event that additional queueing is required. This has not yet been benchmarked. A subset change to dispatch a single mbuf to a driver has also been prototyped, and bencharked at a several percentage point improvement in packet send rates from user space.
Employ queued dispatch in the interface receive API &a.rwatson; 20041106 &status.new; Similar to if_start_mbufqueue, allow input of a queue of mbufs from the device driver into the lowest protocol layers, such as ether_input_mbufqueue.
Employ queued dispatch across netisr dispatch API &a.rwatson; 20041124 &status.prototyped; Pull all of the mbufs in the netisr ifqueue out of the ifqueue into a thread-local mbuf queue to avoid repeated lock operations to access the queue. Also use lock-free operations to test for queue contents being present. This has been prototyped in the rwatson_netperf branch.
Modify UMA allocator to use critical sections not mutexes for per-CPU caches. &a.rwatson; 20041124 &status.prototyped; The mutexes protecting per-CPU caches require atomic operations on SMP systems; as they are per-CPU objects, the cost of synchronizing access to the caches can be reduced by combining CPU pinning and/or critical sections instead. A prototype of this has been implemented in the rwatson_percpu branch, but is waiting on critical section performance optimizations that will prevent this change from negatively impacting uniprocessor performance. The critical section operations from John Baldwin have been posted for public review.
Optimize critical section performance &a.jhb; 20041124 &status.prototyped; Critical sections prevent preemption of a thread on a CPU, as well as preventing migration of that thread to another CPU, and maybe used for synchronizing access to per-CPU data structures, as well as preventing recursion in interrupt processing. Currently, critical sections disable interrupts on the CPU. In previous versions of FreeBSD (4.x and before), optimizations were present that allowed for software interrupt disabling, which lowers the cost of critical sections in the common case by avoiding expensive microcode operations on the CPU. By restoring this model, or a variation on it, critical sections can be made substantially cheaper to enter. In particular, this change will lower the cost of critical sections on UP such that it is approximately the same cost as a mutex, meaning that optimizations on SMP to use critical sections instead of mutexes will not harm UP performance. A prototype of this change is present in the jhb_lock Perforce branch, and patches have been posted to per-architecture mailing lists for review.

Netperf Cluster

Through the generous donations and investment of Sentex Data Communications, FreeBSD Systems, IronPort Systems, and the FreeBSD Foundation, a network performance testbed has been created in Ontario, Canada for use by FreeBSD developers working in the area of network performance. A similar cluster, made possible through the generous donation of Verio, is being prepared for use in more general SMP performance work in Virginia, US. Each cluster consists of several SMP systems inter-connected with giga-bit ethernet such that relatively arbitrary topologies can be constructed in order to test host-host, IP forwarding, and bridging performance scenarios. Systems are network booted, have serial console, and remote power, in order to maximize availability and minimize configuration overhead. These systems are available on a check-out basis for experimentation and performance measurement to FreeBSD developers working on the Netperf project, and in related areas.

More detailed information on the netperf - cluster can be found by following this linka.

+ cluster can be found by following this link.

Links

Some useful links relating to the netperf work:

&footer;