diff --git a/en/events/2002/usenix-devsummit.sgml b/en/events/2002/usenix-devsummit.sgml new file mode 100644 index 0000000000..0a35c67d0c --- /dev/null +++ b/en/events/2002/usenix-devsummit.sgml @@ -0,0 +1,1095 @@ + + + + + + %includes; + %developers; +]> + + +&header; + +
The third FreeBSD Developer Summit was held on June 11-12, 2002, at +the Monterey Marriott in Monterey, CA, immediately prior to the USENIX +Annual Technical Conference at the same location. The FreeBSD +Developer Summit was sponsored by DARPA, the FreeBSD Foundation, FreeBSD Mall, Network +Associates Laboratories, and AT&T. Notes were taken by George +Neville-Neil, Bruce Mah, +and Robert Watson. +Markup by Murray +Stokely.
+ +These notes cover day 2, which began at 9:30am, and ended at 5:00pm.
+ +NOTE: As usual I missed some names, please add those I missed.
+ +In person:
+On The Phone:
+ +??
+ +Via webcast:
+ +??
+ +The meeting followed a format where each section was led by an + individual and then a discussion ensued. Not all of the discussion + was caught but we have tried to make those sections + understandable.
+ +KSE has not changed much since the last summit (Feb). The major +change is that upcalls works more like signals instead of like fork(). +That is to say that you give the system a function pointer to call +instead of having the "return twice" semantics so that it supports +all architectures.
+ +The names in the system are deliberately different from other +threading packages. This was to reduce confusion during +development.
+ +The process structure has been broken into 4 parts. This is in +-CURRENT at the moment. It's still "really" one structure but is +being accessed as 4 different ones.
+ +Looking for more people to run the code.
+ +Right now we're rewriting to clean up how the functions work.
+ +Other architectures can be stubbed out as well.
+ +Right now there is no support for Sparc or IA64 but he would like +to commit now. Not committing now means that it has to come out of +perforce and people have to patch it.
+ +RW : What about userland?
+ +JE : It can run different threads +in userland. The primitives are all there it just needs a bit more +help. I would like to put an idea out. Is it a good idea to be able +to have non-threaded programs linking with threaded libraries?
+ +RW : Putting async I/O into such a +thing would make sense.
+ +JE : The library would not care +who was accessing it.
+ +RW : For instance libc could be +threaded or not.
+ +JE : That would be interesting. I +don't know if the two interfaces are incompatible.
+ +JB : X does this.
+ +MD : It is very doable but you +have to make it non-preemptive. If you're switching non-preemptively +you can use library routines which are non threaded.
+ +JE : If I do what I'm thinking of +doing then each lib will have its own KSE group.
+ +MD : stdio does not have to be +thread aware if you don't schedule preemptively. It all matters where +it blocks.
+ +JE : Since you're a non-threaded +program you don't know that.
+ +RW : If you're going to support +that, libc has to support threads.
+ +RW : It sounds like some +complexity goes away. Can we use 1 libc with has threading?
+ +JE : Do we want to go down this +path?
+ +RW : Now or later?
+ +JE : What do I design now to do +this?
+ +JB : For example libc_r does not +work with rfork.
+ +JE : The answer is that yes we +should move forward. Tricky issues, signals...
+ +WL : Have people talked about +pthread programs and cancellation points?
+ +JE : The pthreads library does not +assume that you're only going to change threads at yield() points. We +are going to have cancellation points. There is an unimplemented call +which will be able to send a thread targeted signal even into the +kernel.
+ +JE : When a thread is scheduled +onto a KSE there is a mailbox that the userland thread scheduler +updates.
+ +JE : Is there anyone else who has +some time or test it? How many people should test this before I check +it in? There is a patch that's continuously updated on my web site to +be able to patch it to -CURRENT. There is a CVSUP target from cvsup +10 which will bring down the sources. If you go to my web page on +freefal there is a pointer there to a web page that explains how to +CVSUP from source.
+ +RW : What about SMP locking for +this?
+ +JE : Handled by the proc locking. +Has not been tried on SMP machines yet.
+ +DO : What about on Sparc?
+ +JE : You may need to stub things +out.
+ +JB : Is the paper on the web site?
+ +JE : The updated copy has disappeared.
+ +?? : What's the different between +NetBSD and FreeBSD on this?
+ +JE : Logically not a tremendous +difference but Net follows the paper closely and Free takes the idea +and makes it into a production system. There were some tough battles +on -arch about this. The tricky point is that the proc structure has +to be broken into 4 instead of 2. If you want to be able to do POSIX +you need to be able to treated as different processes but in other +systems you need to group the threads. You wind up with two classes +of threads. You need another structure to do the aggregation. In the +end we ended up breaking up the proc structure into 4 pieces to not +overwhelm the CPU when scheduling threads. This is the major +difference.
+ +JE : I greatly admire the NetBSD +way which is to take an idea and not dilute it.
+ +JE : Net is also putting a Solaris +compatible threads package on top of their scheduler activations in +the Solaris ABI.
+JB : Yesterday we talked about SMP related things so I'll give a summary +and then give a list of things for 5.0. + +
JB : The big thing for 5.0 is to get the network stack out from under +Giant. + +
JB : Jefferey Xu and Jennifer Ying were here to talk about this. They +have the PCBs checked in now. + +
JY : Interface Queues and SynCache might be done. + +
The remaining chunks of the network code are:
+JB : Aside from network the newbus +locking needs to be done (Warner Losh) and also CAM stuff. No known +status on CAM. Perhaps CAM is not needed for 5.0
+ +JB : Disk drive interrupts? Would +help performance. Going to talk to Poul Henning-Kamp
+ +JB : Alan Cox is working on the VM +system. Working based on the old Mach stuff. Objective for 5.0 is to +get zero fill and execute on write to work without Giant. In future +he wants to look at locking down pmap() functions.
+ +JB : Still some stability issues. +UMA breaks some assumptions. For instance sockets assume that once +memory is a socket its a socket forever, this is no longer true.
+ +JB : Talked to Mike Smith about +5.0 and have decided to stop adding features so that we can start +clean up 5.0 and make it a real release. This might require hacks.
+ +RW : For example in the UMA case there could be a flag to just say +"don't reclaim this zone" -- this would help with issues such as the +socket code assuming memory is type stable. + +Over to AC on the VM system. Nothing to say. + +
BM : As much as I might get hated for this. Will preemption stuff +go away by 5.0? + +
JB :No, that's a 6.0 thing. There are things to do first. + +
??? Phone : Could this come in in the life time of 5.? 5.1? + +
RW : This is a release issue really. + +
JB : Yes, the kernel is pre-emptive. + +
RW : Perhaps we should talk about is performance goals? What are the +comparisons to make? Perhaps head of 4 with head of 5. We'll see a +mix. + +
JB : I need to run benchmarks. + +
RW : In terms of SMP features when will VM be ready to be measured? I +can't put a date on it. + +
AC : I think I told John was in time for release. I'm already doing +performance testing so we've already started. + +
RW : We'll pick a date to start doing measurements. Perhaps 2 or 3 +weeks from now. + +
AC : My guess is the the locking pmap is going to take some time to +shake out. On the other hand the next major module we should be +working on is machine dependent level. Last we should try approaching +the vmobject level. I'll start worrying about performance in the near +term. + +
RW : Will threading improve latency or throughput for networking? + +
BM : I would like if we could actually start before. + +
RW : Do you have a timeline for the interrupt threading stuff? + +
BM : I finished some things last night but there are still issues. +In a couple of weeks it should be ready for first commit. + +
RW : Informally beginning to measure performance now. What are the +right sets of tests? Need to discuss on -arch. + +
AC : It would be nice to have that committed to the tools directory. + +
JB : The statistics analysis package are we using. + +
BM : I have some good success with netpipe for overall measurement. + +
RW : Need to be using consistent compilers because of the compiler +change. Also all our debugging stuff will slow down the benchmarking. + +
Benchmark Ideas
+Tests to be run on:
+Targets:
+MD : Debug stuff on 5.0. I think +it might be reasonable then to take the space hit and always have the +debugging in but turn it on and off with sysctl.
+ +RW : We should commit an optimized +kernel configuration and benchmarking guidlines to the tree as +well.
+ +RW : I think we should continue +the performance discussion. We want to accomplish a couple of things. +One is stability measurement. What are the things we need to be +measuring? What is our definition of useful?
+ +Jefferey : End to end measurement +with gigabit cards. For latency test connections per second. Can use +ttcp or netbench in ports.
+ +gnn : need to make sure we run +against all of 4.6
+ +JE : Need to really have 3 tests. +4.6 (forever) 4.x (following updates) and -CURRENT.
+ +RW : There are other dimensions. +Degree of parallelism for instance. We might see degradation in uni +but get good stuff in multi case.
+ +JE : Test for impact of KSE +complications as well.
+ +AP : I think as the results come +through people should not be too worried about it. Perhaps we should +benchmark database type stuff as well. Need to do something +comprehensive.
+ +DO : What does the test matrix +look like? Different architectures and different numbers of +processors.
+ +RW : Can we make a multi-processor +run uni-procesor.
+ +AP : Run queue and scheduler stuff?
+ +JE : Will talk to Alfred.
+ +RW : Is scalability testing important?
+ +DavidM : NFS testing.
+ +RW : What about UI testing?
+ +JX : x11perf is the way to do that.
+ +MD : Currently we have a directory +for regression tests, should we do one for performance tests?
+ +gnn : talk to sleepycat for DB +tests, see if they have some
+ +AP : Really nice to tests DB +applications that are heavily thread dependent.
+ +Jefferey :Apache 2 has threads.
+ +RW : What about commercial folks? +What do you do.
+ +Paul Saab : Normally what we end +up doing is using the snapshot on some machines and see if the bugs +are out. There is no performance testing really.
+ +RW : Again, what about performance?
+ +Paul Saab : We've really never had +one. It's more just bugs. We've just never found the performance to +be a problem.
+ +RW : We need to create a forum for +talking about performance. We need reproducible test cases.
+ +Paul Saab : There's also other +things. We've been doing lots of looking at this. FreeBSD gets +kicked down by attacks for instance. We have a lot of tools to get to +the project though.
+ +RW : I will set up the mailing list.
+JB : Questions about alpha?
+ +RW : KSE on alpha?
+ +JE : We have patches so it +compiles and runs non-KSE programs. You can have the patched version +of the alpha kernel up and running though.
+ +RW : Is the task owned of making +this work on Alpha?
+ +DR : It works as far as I get to +use it. It's not used in production right now.
+ +PS : Intel shipped me a quad +processor IA64 board. (McKinley is the name of the board).
+ +RW : What does it need for 5.0?
+ +DR : It works, it works for SMP. +Self hosts, build worlds. sysinstall compiles but needs more kicking +to work.
+ +Paul Saab : Intel wants us to ship +a CD.
+ +DR : There is no thread support +right now (threading library needs to move to get/setcontext rather +than longjmp).
+ +DR : Need to move every driver to +use BUS DMA for large memory machines to get bounce buffers.
+ +JB : PHK is working on using a new +libwhisk so that sysinstall et al work on all systems.
+ +Jake B : Take control of KSE stuff +on Sparc 64.
+ +RW : Do we have a Sparc 64 in the +cluster?
+ +Jake B : It's not in the cluster +yet. It's a serial cluster issue.
+ +RW : Package building on S64?
+ +Jake B : Perhaps a bunch of Ultra +60s for a package build.
+ +David : 1500 build right now?
+ +Jake B : Yes, but a lot of the +same bug in packages are broken.
+ +JB : Timeline for X?
+ +Jake B : Not really.
+ +RW : In terms of 5.0 how +comfortable are you?
+ +Jake B : sysinstall is the only problem.
+Benno Rice : I got it up to +execing a fake init in the simulator and printing "hello world". +Trying to work with real hardware. I now have some semblance of +busdma and am working on other stuff. GEM on iMac is in an embryonic +state. Should get to NFS mount in a few weeks.
+ +RW : How do you feel about your +timeline?
+ +Benno : I'm not sure we'll have +something fully workable for 5.0.
+ +RW : You're not at the point yet +on working on KSE are you?
+ +Benno : No, need a useful system +first.
+ +RW : I know that we're having +simulator problems.
+ +DO : The issues are about legal +and NDA. AMD decided on FreeBSD +Mall as the NDA person. I have not had a working simulator since +September.
+ +Paul : I can make that happen, as +well as real hardware.
+ +DO :I've got a cross tool chain in +the tree. I have some untested pmap stuff. Hardware has been +available for a month or so. We could boot FreeBSD 4.6 today if only +we had hardware.
+ +RW : What do you think about 5.0? +Should we discuss that at another date?
+ +??? :Juniper offered.
+ +DO : But we have no hardware.
+ +??? :Juniper thinks it's OK but +doesn't want to have it rot in the tree.
+ +BD : I have a line on a company +that does compact PCI with R6Ks.
+ +RW : We're waiting for someone to +turn up.
+ +RW : MAC framework is what is of +interest today.
+ +See Slides from Robert + +JE : Are the labels the same on +all structures?
+ +RW : You can modify this but there +are issues with memory: is the space needed for a label too large to +add to an mbuf header, for example? The label is small, but there +area lot of them?
+ +BM : When you're freeing the mbuf +do you write the label data?
+ +RW : We blank it when we free it.
+ +BM : I do not think the 36 bytes +in the mbuf header is a problem.
+ +JE : I'm more interested in the +"why" than the how.
+ +RW : A lot of people are +interested in this. Some of the things that do interest a lot of +people are things like doing on the fly security for a web server.
+ +JE : Is there a black hatted TLA +interested?
+ +RW : Yes and several gov'ts. As +well as plenty of financial folks.
+ +RW : There's a lot of userland +stuff that's not done yet.
+MS : Shows a slide of releases. +4.6 is ready to go but having issues with ISO images. DP1, a lot of +goals were met. 1000 packages were building on -CURRENT to get DP1 +out. Polished 4.2. We need to start making decisions on 5.0. +November is still the date we're shooting for. We're going to do a +4.7 and a 4.8. DP3?
+ +***GET SLIDE FROM MURRAY***
+ +MS : Release engineering area of +the web site www.freebsd.org/releng. For DP2 question about p4 or +CVS? Will probably use p4 for DP2 as well. USB subsystem? Perl +removal? KSE?
+ +JE : KSE should be able to run +simple tests.
+ +DO : Is whatever you have +committed by DP2 be the same as the release.
+ +JE : It will be a subset.
+ +MS : What will the status be of +KSE in userland for 5.0?
+ +JE : Can't answer that right +now. We're not removing the old libraries. The userland work will +happen between DP2 and release. The next step is MP as well as +UP.
+ +DO : Are we heading for a release?
+ +MS : yes.
+ +DO : Then we have to stop having +major commits.
+ +MS : Yes, the discussion today is +what are the major must have features.
+ +RW : We need to decide if there +are major upcoming problems and reduce risk on things like KSE.
+ +JE : That's why I want to get MS 3 +in now.
+ +RW : Do you think that KSE related +changes from later milestones are going to be isolated to KSE or +pervasive?
+ +JE : Hard to say. My guess is +that MS 4 stuff should be less pervasive.
+ +RW : What happens if KSE just +doesn't work?
+ +JE : Well it does work, the +patches work, it's a question of risk. We need to check on new +things, like locking two threads in the same process.
+ +MD : KSEs only become fragile when +pthread uses them. That's the turning point.
+ +DO : I'd like the rules for the +rest of the summer, I hope we'll talk about that.
+ +MS : Earlier is better.
+ +JM : I think the cutoff point for +KSE might be MS 3.
+ +RW : It's the kind of thing where +if we need to back out we can.
+ +JE : If you're not going to run +KSEs then you're OK.
+ +RW : I think it's low risk. Let's +avoid the risk is the message.
+ +JE : The next DP2 (where we'd like +MS4).
+ +AP : We really need KSE so all +this concern about stuff that no one really uses is not a big deal. +People just need to play catch up. We have performance problems and +we need to solve those.
+ +DO : We quickly need to figure out +our policy on multiple archs.
+ +RW : I briefly want to respond to +Alfred. We have asserted that KSE will be experimental. It will be +in and 5.0 will go out but there might be issues.
+ +JB : Realistically for the network +stack is that IPv4 sockets will not be giant. But this is only in the +network stack world. Several people are working on it.
+ +RW : The GEOM stuff will be +enabled by default in 5.0. Sparc depends on it. I do not know what +the impediments are to that though.
+ +JE : The kernel stuff is there but +the user space is not. It can't become the default until everything +is there.
+ +WL : What level of control are you +going to exercise over the tree in the coming months?
+ +MS : You're going to see more +level of control but we expect the requests to be reasonable. It's a +very open process.
+ +JB : How are we going to address the 5/6 split? + +
MS : Carefully is all I can +say.
+ +RW : For 5. 0 we need to have a +more informed decision. The release engineers will be trying to +reduce the number of large code changes more as time goes by. We +don't have to wait for 5.x to be perfectly stable before we branch.
+ +MS : Let's move it to more general +discussion of DP2? Specific technologies.
+ +BM : Is there a strategy to lock +other protocols that are not locked down onw?
+ +DO : How much more do we need to +do before 5.0?
+ +JB : Bug fixing is what we're doing.
+ +RW : The answer on the network +stack. We need to choose a strategy on how to handle the other +protocols.
+ +DO : The crux is that socket +locking must be in 5.0.
+ +RW : There are 2 or 3 problems. +Routing code is a problem. See earlier discussions.
+ +Doug : RCng is essentially done. +What it needs is testers.
+ +AP : What about libh (I think libh +is wrong but this is what I heard)?
+ +JB : It's very far along but not a +5.0 thing.
+ +WL : Problems with interrupt +routing in ACPCI?
+ +Watanabe : Cannot handle PCI PCI +interrupt routing. Many 802.11x have this problem.
+ +JE : Is it a problem from Intel?
+ +Watanabe : This is not an Intel +problem but a problem on our side. PCI PCI routing code should be +added. New code is necessary.
+ ++Whiteboard + +UFS2 rcNG KSE M3 CAM SMPng + +GEOM TrustedBSD MAC BusDMA Newbus SMPng + +C++ Cardbus libwhisk/sysinstall KOBJ? (no!) + sparc64 + +Perl Removal ACPI Alpha SMP Stability Pkgs for + sparc64, IA64 + +devd PCI intr route document hints release docs + for new + platform ++ +
??? : Firewire?
+ +RW : What hardware shipping on +IA64?
+ +DR : Intel stuff
+ +RW : What about on Sparc64?
+ +DO : Very limited (hme...)
+ +RW : KOBJ extensions discussed at +BSDCon?
+ +WL : Not sure, probably not for +5.0. Pervasive, so no.
+ +RW : How broken is C++?
+ +DO : Only on sparc64. Don't +really know yet, but it's probably a library issue. The compiler is a +pre-release snapshot. The diffs are now getting large from May 5 to +now. We should attempt to be as far along this gcc branch as possible +come release.
+GT : Talking about rc.d stuff. +Import from NetBSD. Right now we have patches out there that are +translated from the current boot order. It's in perforce. After the +conference it will go into the mainline. Single toggle for +booting.
+ +RW : How in sync are the bits in +the new stuff with the old stuff.
+ +GT : Last patch is from June 3rd, +but it's tracking closely.
+ +RW : What is the schedule for +committing to the main tree.
+ +GT : We have large patches so +we're going to re-import from NetBSD.
+ +RW : How about you have it done by +July 1?
+ +GT : We could probably do that. +Definitely want to be in DP2.
+ +GS : How long will we keep the old +stuff for?
+ +GT : We'll keep them both in for a +while. Not more than 1.5 months though.
+ +JE : Have you had a look at all at +the Mac OS/X startup code?
+ +GT : No.
+ +JE : Do you deal with dependencies?
+ +GT : There is meta data in each +script that says what needs what. There is a program that orders +everything correctly.
+ +??? : How does this effect the rc +script for ports install?
+ +GT : We could make this available +to ports but won't on the first version.
+ +AP : Can I recommend that you +recommend this to ports?
+ +GT : Yes, the problem is that we +have so many ports.
+ +RW : The reason for this is for +rebundlers of FreeBSD in their environments. We don't have to have it +for DP2 but it should be an ultimate goal. We might need to have a +policy statement on this. That at date X all ports must use the new +system.
+ +SL : I've been working on hardware +crypto. I'm looking for consensus on getting hardware crypto in the +kernel. This will not happen in 5.0.
+ +MD : Two ways to go. Need to +create a new syscall vector. The other is to do a 1 off replacement. +Prefer the former.
+ +RW : Perhaps we need to create a +FreeBSD 5 syscall vector. Could be a new ABI.
+ +JE : Aren't there enough other numbers?
+ +RW : That's one way to look at it +and other platforms have done that? Is that too heavy weight?
+ +JE : It sounds that way to me. +You end up having to replicate the old ones into the new one.
+ +MD : The issue is about pollution.
+ +DR : Seems like too much work for 5.x
+ +JE : It's more work. There are +now two places. Why not talk to OpenBSD?
+ +RW : Should there be a BSD API? +Tough to do across projects.
+ +DO : Who here is going to see that +through? We have not talked to NetBSD about even SMP.
+ +AP : Does changing the syscall +table allow us to do clean up?
+ +RW : We could do that without +doing 64bit syscall table.
+ +RW : There are new functions in +5.x. At what point do we stop changing?
+ +DR : When people start really using it.
+ +RW : How do we tell? How did Solaris do it?
+ +Everyone : Know one knows.
+ +DR : It's too hard to add a +syscall vector. Library issues are a problem.
+ +DO : We can use ELF to handle that.
+ +DR : Let's just add 20 new +syscalls instead of adding new work that we don't really really need.
+ +RW : Punt on lack of time to do +this.
+ +MD : I see DO's point with the +libraries but I have done this with time_t at 64 bits.
+ +RW : The devd stuff was to +integrate cardbus, newbus, etc.
+ +JE : To monitor requests to mount +or create new devices.
+ +RW : Is this a 5.0 requirement? +Is there anyone to do this?
+ +GT (from IRC) : PHK has patches +that make having devd unnecessary.
+ +BD : Need something that does what +pccardd did.
+ +JE : Need to be able to do this +through a file.
+ +WL : (from IRC): That's a 6.0 +feature.
+ +JE : It would not be a large step +to put something in the middle to handle this.
+ +JE : Sometime in the 5 lifetime we +need this.
+ +WL : There is no way to monitor +events in newbus but it would be easy to add.
+ +JE : I'm not sure I understood you +correctly.
+ +WL : What happens now in a PCI is +that it makes a call to pci_get_devid() and the driver would say "yes +I am " or "no I'm not" so you'd have to change each of the busses to +do this but that's not too tough because we have a small # of +busses.
+ +JB : Mike Smith gave us an +informal tour of OS/X. OS/X uses XML to do this. They have the DEVID +in XML.
+ +BD : I looked at some PCI drivers +and some work that way but some don't.
+ +JE : It seems to me we need to not +have to modify every single driver. If you've got a device that's not +supported you ask all drivers. At the point when you run out you make +an outcall. The outcall returns does a substitution.
+ +RW : Time up, time to wrap up.
+