diff --git a/en/events/2002/usenix-devsummit.sgml b/en/events/2002/usenix-devsummit.sgml new file mode 100644 index 0000000000..0a35c67d0c --- /dev/null +++ b/en/events/2002/usenix-devsummit.sgml @@ -0,0 +1,1095 @@ + + + + + + %includes; + %developers; +]> + + +&header; + +

The third FreeBSD Developer Summit was held on June 11-12, 2002, at +the Monterey Marriott in Monterey, CA, immediately prior to the USENIX +Annual Technical Conference at the same location. The FreeBSD +Developer Summit was sponsored by DARPA, the FreeBSD Foundation, FreeBSD Mall, Network +Associates Laboratories, and AT&T. Notes were taken by George +Neville-Neil, Bruce Mah, +and Robert Watson. +Markup by Murray +Stokely.

+ +

These notes cover day 2, which began at 9:30am, and ended at 5:00pm.

+ +

Agenda

+ + + +

NOTE: As usual I missed some names, please add those I missed.

+ +

Attending:

+ +

In person:

+ + +

On The Phone:

+ +

??

+ +

Via webcast:

+ +

??

+ +

The meeting followed a format where each section was led by an + individual and then a discussion ensued. Not all of the discussion + was caught but we have tried to make those sections + understandable.

+ +
+
+

09:30 Opening Remarks

+
+ +
+
+

KSE - Julian Elischer

+ +

KSE has not changed much since the last summit (Feb). The major +change is that upcalls works more like signals instead of like fork(). +That is to say that you give the system a function pointer to call +instead of having the "return twice" semantics so that it supports +all architectures.

+ +

The names in the system are deliberately different from other +threading packages. This was to reduce confusion during +development.

+ +

The process structure has been broken into 4 parts. This is in +-CURRENT at the moment. It's still "really" one structure but is +being accessed as 4 different ones.

+ +

Looking for more people to run the code.

+ +

Right now we're rewriting to clean up how the functions work.

+ +

Other architectures can be stubbed out as well.

+ +

Right now there is no support for Sparc or IA64 but he would like +to commit now. Not committing now means that it has to come out of +perforce and people have to patch it.

+ +

Discussion

+ +
+ +

RW : What about userland?

+ +

JE : It can run different threads +in userland. The primitives are all there it just needs a bit more +help. I would like to put an idea out. Is it a good idea to be able +to have non-threaded programs linking with threaded libraries?

+ +

RW : Putting async I/O into such a +thing would make sense.

+ +

JE : The library would not care +who was accessing it.

+ +

RW : For instance libc could be +threaded or not.

+ +

JE : That would be interesting. I +don't know if the two interfaces are incompatible.

+ +

JB : X does this.

+ +

MD : It is very doable but you +have to make it non-preemptive. If you're switching non-preemptively +you can use library routines which are non threaded.

+ +

JE : If I do what I'm thinking of +doing then each lib will have its own KSE group.

+ +

MD : stdio does not have to be +thread aware if you don't schedule preemptively. It all matters where +it blocks.

+ +

JE : Since you're a non-threaded +program you don't know that.

+ +

RW : If you're going to support +that, libc has to support threads.

+ +

RW : It sounds like some +complexity goes away. Can we use 1 libc with has threading?

+ +

JE : Do we want to go down this +path?

+ +

RW : Now or later?

+ +

JE : What do I design now to do +this?

+ +

JB : For example libc_r does not +work with rfork.

+ +

JE : The answer is that yes we +should move forward. Tricky issues, signals...

+ +

WL : Have people talked about +pthread programs and cancellation points?

+ +

JE : The pthreads library does not +assume that you're only going to change threads at yield() points. We +are going to have cancellation points. There is an unimplemented call +which will be able to send a thread targeted signal even into the +kernel.

+ +

JE : When a thread is scheduled +onto a KSE there is a mailbox that the userland thread scheduler +updates.

+ +

JE : Is there anyone else who has +some time or test it? How many people should test this before I check +it in? There is a patch that's continuously updated on my web site to +be able to patch it to -CURRENT. There is a CVSUP target from cvsup +10 which will bring down the sources. If you go to my web page on +freefal there is a pointer there to a web page that explains how to +CVSUP from source.

+ +

RW : What about SMP locking for +this?

+ +

JE : Handled by the proc locking. +Has not been tried on SMP machines yet.

+ +

DO : What about on Sparc?

+ +

JE : You may need to stub things +out.

+ +

JB : Is the paper on the web site?

+ +

JE : The updated copy has disappeared.

+ +

?? : What's the different between +NetBSD and FreeBSD on this?

+ +

JE : Logically not a tremendous +difference but Net follows the paper closely and Free takes the idea +and makes it into a production system. There were some tough battles +on -arch about this. The tricky point is that the proc structure has +to be broken into 4 instead of 2. If you want to be able to do POSIX +you need to be able to treated as different processes but in other +systems you need to group the threads. You wind up with two classes +of threads. You need another structure to do the aggregation. In the +end we ended up breaking up the proc structure into 4 pieces to not +overwhelm the CPU when scheduling threads. This is the major +difference.

+ +

JE : I greatly admire the NetBSD +way which is to take an idea and not dilute it.

+ +

JE : Net is also putting a Solaris +compatible threads package on top of their scheduler activations in +the Solaris ABI.

+
+
+ +
+
+

SMPng - John Baldwin

+ +

JB : Yesterday we talked about SMP related things so I'll give a summary +and then give a list of things for 5.0. + +

JB : The big thing for 5.0 is to get the network stack out from under +Giant. + +

JB : Jefferey Xu and Jennifer Ying were here to talk about this. They +have the PCBs checked in now. + +

JY : Interface Queues and SynCache might be done. + +

The remaining chunks of the network code are:

+
+ + + +

JB : Aside from network the newbus +locking needs to be done (Warner Losh) and also CAM stuff. No known +status on CAM. Perhaps CAM is not needed for 5.0

+ +

JB : Disk drive interrupts? Would +help performance. Going to talk to Poul Henning-Kamp

+ +

JB : Alan Cox is working on the VM +system. Working based on the old Mach stuff. Objective for 5.0 is to +get zero fill and execute on write to work without Giant. In future +he wants to look at locking down pmap() functions.

+ +

JB : Still some stability issues. +UMA breaks some assumptions. For instance sockets assume that once +memory is a socket its a socket forever, this is no longer true.

+ +

JB : Talked to Mike Smith about +5.0 and have decided to stop adding features so that we can start +clean up 5.0 and make it a real release. This might require hacks.

+ +

RW : For example in the UMA case there could be a flag to just say +"don't reclaim this zone" -- this would help with issues such as the +socket code assuming memory is type stable. + +Over to AC on the VM system. Nothing to say. + +

BM : As much as I might get hated for this. Will preemption stuff +go away by 5.0? + +

JB :No, that's a 6.0 thing. There are things to do first. + +

??? Phone : Could this come in in the life time of 5.? 5.1? + +

RW : This is a release issue really. + +

JB : Yes, the kernel is pre-emptive. + +

RW : Perhaps we should talk about is performance goals? What are the +comparisons to make? Perhaps head of 4 with head of 5. We'll see a +mix. + +

JB : I need to run benchmarks. + +

RW : In terms of SMP features when will VM be ready to be measured? I +can't put a date on it. + +

AC : I think I told John was in time for release. I'm already doing +performance testing so we've already started. + +

RW : We'll pick a date to start doing measurements. Perhaps 2 or 3 +weeks from now. + +

AC : My guess is the the locking pmap is going to take some time to +shake out. On the other hand the next major module we should be +working on is machine dependent level. Last we should try approaching +the vmobject level. I'll start worrying about performance in the near +term. + +

RW : Will threading improve latency or throughput for networking? + +

BM : I would like if we could actually start before. + +

RW : Do you have a timeline for the interrupt threading stuff? + +

BM : I finished some things last night but there are still issues. +In a couple of weeks it should be ready for first commit. + +

RW : Informally beginning to measure performance now. What are the +right sets of tests? Need to discuss on -arch. + +

AC : It would be nice to have that committed to the tools directory. + +

JB : The statistics analysis package are we using. + +

BM : I have some good success with netpipe for overall measurement. + +

RW : Need to be using consistent compilers because of the compiler +change. Also all our debugging stuff will slow down the benchmarking. + +

Benchmark Ideas

+ + +

Tests to be run on:

+ + +

Targets:

+ + +

MD : Debug stuff on 5.0. I think +it might be reasonable then to take the space hit and always have the +debugging in but turn it on and off with sysctl.

+ +

RW : We should commit an optimized +kernel configuration and benchmarking guidlines to the tree as +well.

+ +
+BREAK +
+ +

RW : I think we should continue +the performance discussion. We want to accomplish a couple of things. +One is stability measurement. What are the things we need to be +measuring? What is our definition of useful?

+ +

Jefferey : End to end measurement +with gigabit cards. For latency test connections per second. Can use +ttcp or netbench in ports.

+ +

gnn : need to make sure we run +against all of 4.6

+ +

JE : Need to really have 3 tests. +4.6 (forever) 4.x (following updates) and -CURRENT.

+ +

RW : There are other dimensions. +Degree of parallelism for instance. We might see degradation in uni +but get good stuff in multi case.

+ +

JE : Test for impact of KSE +complications as well.

+ +

AP : I think as the results come +through people should not be too worried about it. Perhaps we should +benchmark database type stuff as well. Need to do something +comprehensive.

+ +

DO : What does the test matrix +look like? Different architectures and different numbers of +processors.

+ +

RW : Can we make a multi-processor +run uni-procesor.

+ +

AP : Run queue and scheduler stuff?

+ +

JE : Will talk to Alfred.

+ +

RW : Is scalability testing important?

+ +

DavidM : NFS testing.

+ +

RW : What about UI testing?

+ +

JX : x11perf is the way to do that.

+ +

MD : Currently we have a directory +for regression tests, should we do one for performance tests?

+ +

gnn : talk to sleepycat for DB +tests, see if they have some

+ +

AP : Really nice to tests DB +applications that are heavily thread dependent.

+ +

Jefferey :Apache 2 has threads.

+ +

RW : What about commercial folks? +What do you do.

+ +

Paul Saab : Normally what we end +up doing is using the snapshot on some machines and see if the bugs +are out. There is no performance testing really.

+ +

RW : Again, what about performance?

+ +

Paul Saab : We've really never had +one. It's more just bugs. We've just never found the performance to +be a problem.

+ +

RW : We need to create a forum for +talking about performance. We need reproducible test cases.

+ +

Paul Saab : There's also other +things. We've been doing lots of looking at this. FreeBSD gets +kicked down by attacks for instance. We have a lot of tools to get to +the project though.

+ +

RW : I will set up the mailing list.

+
+ +
+
+

New Hardware Architectures

+ +

Alpha

+ +
+ +

JB : Questions about alpha?

+ +

RW : KSE on alpha?

+ +

JE : We have patches so it +compiles and runs non-KSE programs. You can have the patched version +of the alpha kernel up and running though.

+ +

RW : Is the task owned of making +this work on Alpha?

+ +
+ +

IA64

+ +
+ +

DR : It works as far as I get to +use it. It's not used in production right now.

+ +

PS : Intel shipped me a quad +processor IA64 board. (McKinley is the name of the board).

+ +

RW : What does it need for 5.0?

+ +

DR : It works, it works for SMP. +Self hosts, build worlds. sysinstall compiles but needs more kicking +to work.

+ +

Paul Saab : Intel wants us to ship +a CD.

+ +

DR : There is no thread support +right now (threading library needs to move to get/setcontext rather +than longjmp).

+ +

DR : Need to move every driver to +use BUS DMA for large memory machines to get bounce buffers.

+ +

JB : PHK is working on using a new +libwhisk so that sysinstall et al work on all systems.

+ +
+ +

Sparc64

+ +
+ +

Jake B : Take control of KSE stuff +on Sparc 64.

+ +

RW : Do we have a Sparc 64 in the +cluster?

+ +

Jake B : It's not in the cluster +yet. It's a serial cluster issue.

+ +

RW : Package building on S64?

+ +

Jake B : Perhaps a bunch of Ultra +60s for a package build.

+ +

David : 1500 build right now?

+ +

Jake B : Yes, but a lot of the +same bug in packages are broken.

+ +

JB : Timeline for X?

+ +

Jake B : Not really.

+ +

RW : In terms of 5.0 how +comfortable are you?

+ +

Jake B : sysinstall is the only problem.

+
+ +

PowerPC

+ +
+ +

Benno Rice : I got it up to +execing a fake init in the simulator and printing "hello world". +Trying to work with real hardware. I now have some semblance of +busdma and am working on other stuff. GEM on iMac is in an embryonic +state. Should get to NFS mount in a few weeks.

+ +

RW : How do you feel about your +timeline?

+ +

Benno : I'm not sure we'll have +something fully workable for 5.0.

+ +

RW : You're not at the point yet +on working on KSE are you?

+ +

Benno : No, need a useful system +first.

+ +
+ +

AMD64

+ +
+ +

RW : I know that we're having +simulator problems.

+ +

DO : The issues are about legal +and NDA. AMD decided on FreeBSD +Mall as the NDA person. I have not had a working simulator since +September.

+ +

Paul : I can make that happen, as +well as real hardware.

+ +

DO :I've got a cross tool chain in +the tree. I have some untested pmap stuff. Hardware has been +available for a month or so. We could boot FreeBSD 4.6 today if only +we had hardware.

+ +

RW : What do you think about 5.0? +Should we discuss that at another date?

+ +
+ +

MIPS

+ +
+ +

??? :Juniper offered.

+ +

DO : But we have no hardware.

+ +

??? :Juniper thinks it's OK but +doesn't want to have it rot in the tree.

+ +

BD : I have a line on a company +that does compact PCI with R6Ks.

+ +

RW : We're waiting for someone to +turn up.

+ +
+
+ +
+LUNCH +
+ +
+
+ +

Trusted BSD

+ +

RW : MAC framework is what is of +interest today.

+ +See Slides from Robert + +
+ +

JE : Are the labels the same on +all structures?

+ +

RW : You can modify this but there +are issues with memory: is the space needed for a label too large to +add to an mbuf header, for example? The label is small, but there +area lot of them?

+ +

BM : When you're freeing the mbuf +do you write the label data?

+ +

RW : We blank it when we free it.

+ +

BM : I do not think the 36 bytes +in the mbuf header is a problem.

+ +

JE : I'm more interested in the +"why" than the how.

+ +

RW : A lot of people are +interested in this. Some of the things that do interest a lot of +people are things like doing on the fly security for a web server.

+ +

JE : Is there a black hatted TLA +interested?

+ +

RW : Yes and several gov'ts. As +well as plenty of financial folks.

+ +

RW : There's a lot of userland +stuff that's not done yet.

+
+
+ +
+
+

Release Engineering

+ +

MS : Shows a slide of releases. +4.6 is ready to go but having issues with ISO images. DP1, a lot of +goals were met. 1000 packages were building on -CURRENT to get DP1 +out. Polished 4.2. We need to start making decisions on 5.0. +November is still the date we're shooting for. We're going to do a +4.7 and a 4.8. DP3?

+ +

***GET SLIDE FROM MURRAY***

+ +
+ +

MS : Release engineering area of +the web site www.freebsd.org/releng. For DP2 question about p4 or +CVS? Will probably use p4 for DP2 as well. USB subsystem? Perl +removal? KSE?

+ +

JE : KSE should be able to run +simple tests.

+ +

DO : Is whatever you have +committed by DP2 be the same as the release.

+ +

JE : It will be a subset.

+ +

MS : What will the status be of +KSE in userland for 5.0?

+ +

JE : Can't answer that right +now. We're not removing the old libraries. The userland work will +happen between DP2 and release. The next step is MP as well as +UP.

+ +

DO : Are we heading for a release?

+ +

MS : yes.

+ +

DO : Then we have to stop having +major commits.

+ +

MS : Yes, the discussion today is +what are the major must have features.

+ +

RW : We need to decide if there +are major upcoming problems and reduce risk on things like KSE.

+ +

JE : That's why I want to get MS 3 +in now.

+ +

RW : Do you think that KSE related +changes from later milestones are going to be isolated to KSE or +pervasive?

+ +

JE : Hard to say. My guess is +that MS 4 stuff should be less pervasive.

+ +

RW : What happens if KSE just +doesn't work?

+ +

JE : Well it does work, the +patches work, it's a question of risk. We need to check on new +things, like locking two threads in the same process.

+ +

MD : KSEs only become fragile when +pthread uses them. That's the turning point.

+ +

DO : I'd like the rules for the +rest of the summer, I hope we'll talk about that.

+ +

MS : Earlier is better.

+ +

JM : I think the cutoff point for +KSE might be MS 3.

+ +

RW : It's the kind of thing where +if we need to back out we can.

+ +

JE : If you're not going to run +KSEs then you're OK.

+ +

RW : I think it's low risk. Let's +avoid the risk is the message.

+ +

JE : The next DP2 (where we'd like +MS4).

+ +

AP : We really need KSE so all +this concern about stuff that no one really uses is not a big deal. +People just need to play catch up. We have performance problems and +we need to solve those.

+ +

DO : We quickly need to figure out +our policy on multiple archs.

+ +

RW : I briefly want to respond to +Alfred. We have asserted that KSE will be experimental. It will be +in and 5.0 will go out but there might be issues.

+ +

JB : Realistically for the network +stack is that IPv4 sockets will not be giant. But this is only in the +network stack world. Several people are working on it.

+ +

RW : The GEOM stuff will be +enabled by default in 5.0. Sparc depends on it. I do not know what +the impediments are to that though.

+ +

JE : The kernel stuff is there but +the user space is not. It can't become the default until everything +is there.

+ +

WL : What level of control are you +going to exercise over the tree in the coming months?

+ +

MS : You're going to see more +level of control but we expect the requests to be reasonable. It's a +very open process.

+ +

JB : How are we going to address the 5/6 split? + +

MS : Carefully is all I can +say.

+ +

RW : For 5. 0 we need to have a +more informed decision. The release engineers will be trying to +reduce the number of large code changes more as time goes by. We +don't have to wait for 5.x to be perfectly stable before we branch.

+ +

MS : Let's move it to more general +discussion of DP2? Specific technologies.

+ +

BM : Is there a strategy to lock +other protocols that are not locked down onw?

+ +

DO : How much more do we need to +do before 5.0?

+ +

JB : Bug fixing is what we're doing.

+ +

RW : The answer on the network +stack. We need to choose a strategy on how to handle the other +protocols.

+ +

DO : The crux is that socket +locking must be in 5.0.

+ +

RW : There are 2 or 3 problems. +Routing code is a problem. See earlier discussions.

+ +

Doug : RCng is essentially done. +What it needs is testers.

+ +

AP : What about libh (I think libh +is wrong but this is what I heard)?

+ +

JB : It's very far along but not a +5.0 thing.

+ +

WL : Problems with interrupt +routing in ACPCI?

+ +

Watanabe : Cannot handle PCI PCI +interrupt routing. Many 802.11x have this problem.

+ +

JE : Is it a problem from Intel?

+ +

Watanabe : This is not an Intel +problem but a problem on our side. PCI PCI routing code should be +added. New code is necessary.

+ +
+Whiteboard
+
+UFS2		rcNG		KSE M3			CAM SMPng
+
+GEOM		TrustedBSD MAC	BusDMA			Newbus SMPng
+
+C++		Cardbus		libwhisk/sysinstall	KOBJ? (no!)
+				sparc64
+
+Perl Removal	ACPI		Alpha SMP Stability	Pkgs for
+							sparc64, IA64
+
+devd		PCI intr route	document hints		release docs
+							for new
+							platform
+
+ +

??? : Firewire?

+ +

RW : What hardware shipping on +IA64?

+ +

DR : Intel stuff

+ +

RW : What about on Sparc64?

+ +

DO : Very limited (hme...)

+ +

RW : KOBJ extensions discussed at +BSDCon?

+ +

WL : Not sure, probably not for +5.0. Pervasive, so no.

+ +

RW : How broken is C++?

+ +

DO : Only on sparc64. Don't +really know yet, but it's probably a library issue. The compiler is a +pre-release snapshot. The diffs are now getting large from May 5 to +now. We should attempt to be as far along this gcc branch as possible +come release.

+
+
+ +
+
+

rc.d

+
+ +
+ +

GT : Talking about rc.d stuff. +Import from NetBSD. Right now we have patches out there that are +translated from the current boot order. It's in perforce. After the +conference it will go into the mainline. Single toggle for +booting.

+ +

RW : How in sync are the bits in +the new stuff with the old stuff.

+ +

GT : Last patch is from June 3rd, +but it's tracking closely.

+ +

RW : What is the schedule for +committing to the main tree.

+ +

GT : We have large patches so +we're going to re-import from NetBSD.

+ +

RW : How about you have it done by +July 1?

+ +

GT : We could probably do that. +Definitely want to be in DP2.

+ +

GS : How long will we keep the old +stuff for?

+ +

GT : We'll keep them both in for a +while. Not more than 1.5 months though.

+ +

JE : Have you had a look at all at +the Mac OS/X startup code?

+ +

GT : No.

+ +

JE : Do you deal with dependencies?

+ +

GT : There is meta data in each +script that says what needs what. There is a program that orders +everything correctly.

+ +

??? : How does this effect the rc +script for ports install?

+ +

GT : We could make this available +to ports but won't on the first version.

+ +

AP : Can I recommend that you +recommend this to ports?

+ +

GT : Yes, the problem is that we +have so many ports.

+ +

RW : The reason for this is for +rebundlers of FreeBSD in their environments. We don't have to have it +for DP2 but it should be an ultimate goal. We might need to have a +policy statement on this. That at date X all ports must use the new +system.

+ +
+ +
+
+

Other Issues

+
+ +
+ +

SL : I've been working on hardware +crypto. I'm looking for consensus on getting hardware crypto in the +kernel. This will not happen in 5.0.

+ +

Syscall vector change for 64bits

+ + +

MD : Two ways to go. Need to +create a new syscall vector. The other is to do a 1 off replacement. +Prefer the former.

+ +

RW : Perhaps we need to create a +FreeBSD 5 syscall vector. Could be a new ABI.

+ +

JE : Aren't there enough other numbers?

+ +

RW : That's one way to look at it +and other platforms have done that? Is that too heavy weight?

+ +

JE : It sounds that way to me. +You end up having to replicate the old ones into the new one.

+ +

MD : The issue is about pollution.

+ +

DR : Seems like too much work for 5.x

+ +

JE : It's more work. There are +now two places. Why not talk to OpenBSD?

+ +

RW : Should there be a BSD API? +Tough to do across projects.

+ +

DO : Who here is going to see that +through? We have not talked to NetBSD about even SMP.

+ +

AP : Does changing the syscall +table allow us to do clean up?

+ +

RW : We could do that without +doing 64bit syscall table.

+ +

5.x ABI stability

+ + +

RW : There are new functions in +5.x. At what point do we stop changing?

+ +

DR : When people start really using it.

+ +

RW : How do we tell? How did Solaris do it?

+ +

Everyone : Know one knows.

+ +

DR : It's too hard to add a +syscall vector. Library issues are a problem.

+ +

DO : We can use ELF to handle that.

+ +

DR : Let's just add 20 new +syscalls instead of adding new work that we don't really really need.

+ +

RW : Punt on lack of time to do +this.

+ +

MD : I see DO's point with the +libraries but I have done this with time_t at 64 bits.

+ +

devd

+ + +

RW : The devd stuff was to +integrate cardbus, newbus, etc.

+ +

JE : To monitor requests to mount +or create new devices.

+ +

RW : Is this a 5.0 requirement? +Is there anyone to do this?

+ +

GT (from IRC) : PHK has patches +that make having devd unnecessary.

+ +

BD : Need something that does what +pccardd did.

+ +

JE : Need to be able to do this +through a file.

+ +

WL : (from IRC): That's a 6.0 +feature.

+ +

JE : It would not be a large step +to put something in the middle to handle this.

+ +

JE : Sometime in the 5 lifetime we +need this.

+ +

WL : There is no way to monitor +events in newbus but it would be easy to add.

+ +

JE : I'm not sure I understood you +correctly.

+ +

WL : What happens now in a PCI is +that it makes a call to pci_get_devid() and the driver would say "yes +I am " or "no I'm not" so you'd have to change each of the busses to +do this but that's not too tough because we have a small # of +busses.

+ +

JB : Mike Smith gave us an +informal tour of OS/X. OS/X uses XML to do this. They have the DEVID +in XML.

+ +

BD : I looked at some PCI drivers +and some work that way but some don't.

+ +

JE : It seems to me we need to not +have to modify every single driver. If you've got a device that's not +supported you ask all drivers. At the point when you run out you make +an outcall. The outcall returns does a substitution.

+ +

RW : Time up, time to wrap up.

+
+ + &footer; + + +