Page MenuHomeFreeBSD

Fix memory corruption bug in vdev_read in loader ZFS support
ClosedPublic

Authored by pkelsey on Feb 11 2019, 5:06 AM.

Details

Summary

The modifications made to vdev_read in r325310 to support read offsets and sizes not aligned with sector boundaries contains at least one bug, which will corrupt memory when triggered. If the read is not aligned to an underlying sector boundary, or the extent of the read does not cover a multiple of the underlying sector size, a bounce buffer will be used. If a bounce buffer is used and the requested read size (bytes parameter) is greater than the bounce buffer size (which is set to the underlying sector size), the first call to read() will overrun the end of the bounce buffer, as rb_size is still set to the value of the bytes parameter.

This patch fixes the bug described above and also rewrites the routine to be much easier to verify step-by-step for all of the read alignment and size cases.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

pkelsey created this revision.Feb 11 2019, 5:06 AM

I have two questions:

  1. how was it tested - was there some corruption case?
  2. do you plan to review boot1 code too (as separate patch)?

I have two questions:

  1. how was it tested - was there some corruption case?

I did not see any corruption happen. I found this while reading code while root-causing an issue I was having with a machine I had upgraded to 11.2 (the issue described briefly in D19142). On the 11.2 machine, I did instrument vdev_read and found that on that machine, no reads were performed that required the bounce buffer. I imagine the bounce buffer only comes into use with 4K sector drives - none of my gear is 4K.

I tested this patch using a VM booting from a ZFS mirror.

  1. do you plan to review boot1 code too (as separate patch)?

I had not yet looked at it, but now I have, and it would appear to require the same fix. I don't see a reasonable way to abstract this new read logic into a reusable function - the abstract version would have to take pointers to alloc, free, and read routines, the read routines would have to be wrappers in both cases, and the free routine would have to be a wrapper in the efi/boot1 case. So I think that patch will largely be an exercise in copy/paste. After this passes review, I can prepare a patch for efi/boot1.

pkelsey added inline comments.Feb 11 2019, 2:32 PM
stand/libsa/zfs/zfs.c
438 ↗(On Diff #53754)

Clean this (and similar lines below) up with a local char * that let's us avoid all the casting.

I have two questions:

  1. how was it tested - was there some corruption case?

I did not see any corruption happen. I found this while reading code while root-causing an issue I was having with a machine I had upgraded to 11.2 (the issue described briefly in D19142). On the 11.2 machine, I did instrument vdev_read and found that on that machine, no reads were performed that required the bounce buffer. I imagine the bounce buffer only comes into use with 4K sector drives - none of my gear is 4K.
I tested this patch using a VM booting from a ZFS mirror.

The reason why I am asking is, I am trying to understand if we actually *can* get into the situation with overrun, I got the impression the zfs on disk format should keep things sector aligned, but it really is easy to get confused there... And it does feel safer if we do have proper checks in place.

  1. do you plan to review boot1 code too (as separate patch)?

I had not yet looked at it, but now I have, and it would appear to require the same fix. I don't see a reasonable way to abstract this new read logic into a reusable function - the abstract version would have to take pointers to alloc, free, and read routines, the read routines would have to be wrappers in both cases, and the free routine would have to be a wrapper in the efi/boot1 case. So I think that patch will largely be an exercise in copy/paste. After this passes review, I can prepare a patch for efi/boot1.

The reason why I am asking is, I am trying to understand if we actually *can* get into the situation with overrun, I got the impression the zfs on disk format should keep things sector aligned, but it really is easy to get confused there... And it does feel safer if we do have proper checks in place.

If the zfs on-disk format keeps things sector-aligned / sector-multiple, then why was vdev_read complicated with code to handle non-sector aligned / non-sector multiple reads to begin with?

The reason why I am asking is, I am trying to understand if we actually *can* get into the situation with overrun, I got the impression the zfs on disk format should keep things sector aligned, but it really is easy to get confused there... And it does feel safer if we do have proper checks in place.

If the zfs on-disk format keeps things sector-aligned / sector-multiple, then why was vdev_read complicated with code to handle non-sector aligned / non-sector multiple reads to begin with?

Yea, you are right:)

The reason why I am asking is, I am trying to understand if we actually *can* get into the situation with overrun, I got the impression the zfs on disk format should keep things sector aligned, but it really is easy to get confused there... And it does feel safer if we do have proper checks in place.

If the zfs on-disk format keeps things sector-aligned / sector-multiple, then why was vdev_read complicated with code to handle non-sector aligned / non-sector multiple reads to begin with?

Yea, you are right:)

I think one way to wind up in the unaligned / non-multiple case is with a vdev having ashift = 9 on a 512e 4K device (either created that way on the 512e 4K device originally, or perhaps created on a 512 device and disk-imaged to a 512e 4K device). I think in that case, on a BIOS system, you will have drives reporting a 4K sector size via int13h ah=48h, so vdev_read() will consider that the sector size, but the metadata layout driving the reads will have been created from the standpoint of having 512 byte sectors and thus may have non-4K sizes and alignments.

pkelsey updated this revision to Diff 53819.Feb 12 2019, 12:42 AM

Introduced local char * buffer pointer to avoid casts on output buffer pointer updates

allanjude accepted this revision.Feb 13 2019, 2:12 AM

Thanks for doing the cleanup and adding the ascii-art diagram, that is very helpful.

This revision is now accepted and ready to land.Feb 13 2019, 2:12 AM
tsoome accepted this revision.Feb 13 2019, 6:36 AM

Thanks for working on this!:)

This revision was automatically updated to reflect the committed changes.

Not sure that this works.
I try this on 11.2 and on Hyper-V gen1 system stick at ldelf loading stage on boot from zfs root.

Not sure that this works.
I try this on 11.2 and on Hyper-V gen1 system stick at ldelf loading stage on boot from zfs root.

Can you describe in more detail what you tried? Did you merge this patch into into 11.2 sources (if so, which ones - RELEASE, STABLE, ...), or did you build the bootcode from head and copy the results to your 11.2 system (if so, what did you copy)?

I merge it to stable.
Sorry for noise, problem was with random - not enough entropy on first boot to unblock it. It reproduced even on 11.2 from official site.