ipfw(4) can be invoked by pfil(9) framework for each packet several times. Each call uses on-stack variable of type `struct ip_fw_args` to keep the state of ipfw(4) processing. Currently this variable has 240 bytes size on amd64. And each time ipfw(4) does bzero() on it, and then it initializes some fields. glebius@ has reported that they at Netflix discovered, that initialization of this variable produces significant overhead on packet processing. I did some investigations and after my patch I managed to increase performance of packet processing on simple routing with ipfw(4) firewalling to about 11%.
What contains the patch:
* the size of `struct ip_fw_args` reduced 144 bytes. The dummypar field seems unused, thus I removed it.
* `next_hop*` fields are grouped into one union. Ethernet header pointer also stored into this union, since forwarding doesn't work on layer2.
* new field `flags` introduced, it keeps flags that are used to avoid access and initialization of some fields of `struct ip_fw_args`.
* the rest of code modified to honor these flags.