Add a check in ip_output() to drop and free mbufs with a zero-length packet header (m->m_pkthdr.len == 0).
This prevents a kernel panic caused by dereferencing invalid pointers in the transmit path, as observed during ICMP ping operations.
The issue was triggered by rare cases where a zero-length mbuf was passed down the stack, resulting in a NULL dereference in device transmit routines.
Returning EINVAL for such packets ensures system stability and prevents crashes from malformed or unexpected input.
Details
- arc lint: Passed, no lint issues.
- Build: Successfully built world and kernel
- Boot: Created an image and verified it boots in a VM.
- kyua: Attempted to run relevant tests, but python dependency could not be satisfied due to secure environment network restrictions.
Diff Detail
- Repository
- rG FreeBSD src repository
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
| sys/netinet/raw_ip.c | ||
|---|---|---|
| 560 | Please follow the style(9) guide, use 2 tabs and 4 spaces to maintain its indentation level. | |
IMHO, this case should not happen at all. Therefore, if there is a possible scenario, it maybe more appropriate to use KASSERT instead.
The description mentions ip_output(), but in reality the patch is against rip_send(), which is send(2) method for a PF_INET/SOCK_RAW. Looks like with a specially crafted send(2) we can achieve such a bogus mbuf. Instead of putting the bandaid let's find out how this mbuf was produced and fix the bug. Note that mbufs are formed outside of protocol implementation, so potentially the problem discovered may apply to other protocols beyond PF_INET/SOCK_RAW.
A reproduce recipe is needed. If there is no recipe, a kernel core would be useful to understand how to reproduce.
This patch was added to our code back in 2022 by one of the Klara folks. I'll remove it and try to reproduce on our latest builds to see if I can come up with reproduction steps and/or a kernel core.
Perfect. Please make sure that a reproducer you find also triggers a panic on an unmodified FreeBSD kernel.
After removing the patch, all of our testing passed. I also did a manual check of the various code paths for the different protocols but I didn't see any that could cause an issue here. As a result, I'm abandoning the issue. Thanks for the help!
Just to clarify, after doing more digging, I found out that Klara was just the last ones to touch this patch. However, they were not the first ones to add the patch. That was done by NetApp back in 2011. Definitely a stale patch that I should have retested before being posted here.