ipfw(4) can be invoked by pfil(9) framework for each packet several times. Each call uses on-stack variable of type struct ip_fw_args to keep the state of ipfw(4) processing. Currently this variable has 240 bytes size on amd64. And each time ipfw(4) does bzero() on it, and then it initializes some fields. glebius@ has reported that they at Netflix discovered, that initialization of this variable produces significant overhead on packet processing. I did some investigations and after my patch I managed to increase performance of packet processing on simple routing with ipfw(4) firewalling to about 11%.
What contains the patch:
- the size of struct ip_fw_args reduced up to 128 bytes. The dummypar field seems unused, thus I removed it.
- next_hop* fields are grouped into one union. Ethernet header pointer also stored into this union, since forwarding doesn't work on layer2.
- new field flags introduced, it keeps flags that are used to avoid access and initialization of some fields of struct ip_fw_args.
- the rest of code modified to honor these flags.