Page MenuHomeFreeBSD

[mmcsd] Remove bogus values in disk structure
ClosedPublic

Authored by marcel on May 25 2015, 4:03 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Nov 28, 8:49 PM
Unknown Object (File)
Thu, Nov 28, 7:47 PM
Unknown Object (File)
Mon, Nov 25, 5:09 AM
Unknown Object (File)
Fri, Nov 22, 10:10 AM
Unknown Object (File)
Mon, Nov 18, 8:37 PM
Unknown Object (File)
Mon, Nov 18, 7:32 PM
Unknown Object (File)
Sun, Nov 17, 11:32 PM
Unknown Object (File)
Sun, Nov 17, 2:37 PM
Subscribers

Details

Summary

Fix broken functionality by removing the bogus values in the disk structure for the stripesize, fwheads and fwsectors. On a Beaglebone Black these bogus values triggered annoying warnings (like partition not aligned to a 4M boundary due to a 4M stripe size) and worse, made the mmcsd unusable due to there being 8K sectors on a track.

There is no geometry dictated by the firmware or inherently present due to the hardware. As such, fwheads and fwsectors should never be set. These fields trigger a level of rigidity demanded by firmware and hardware that need an actual geometry. Such rigidity is inappropriate for synthesized values and are best synthesized in other places (e.g. geom_part) where they are a little less inappropriate.

On top of that, the fields in the disk structure have sectors as their unit and the mmcsd driver fills them with values that are clearly in bytes. It's the moral equivalent of being asked your height in meters and given your weight in pounds.

This change is a "back to square one" approach. One could argue that a stripe size is handy, but without due consideration of consequences, a lack of documentation and/or possibly a tunable/sysctl doing so would be ill advised. We have all the tools in place to align partitions and define appropriate file system block sizes that it's not clear why mmcsd needs to fake values.

Test Plan

mmcsd0 unusable before this change.
mmcsd0 functional after this change.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

marcel retitled this revision from to [mmcsd] Remove bogus values in disk structure.
marcel updated this object.
marcel edited the test plan for this revision. (Show Details)
marcel added reviewers: imp, ian, gonzo.
marcel set the repository for this revision to rS FreeBSD src repository - subversion.

I disagree with these changes. mmcsd devices have been working correctly on all my systems, including Beaglebone black, since these changes were made almost a year ago. The units in the fields are correct (bytes in size fields, sectors in fw*), as seen here (BBB, unit 0 is 8gb sdcard, unit 1 is 4gb onboard eMMC):

root@bb:~ # diskinfo -v /dev/mmcsd0
/dev/mmcsd0
      512             # sectorsize
      7909408768      # mediasize in bytes (7.4G)
      15448064        # mediasize in sectors
      4194304         # stripesize
      0               # stripeoffset
      20              # Cylinders according to firmware.
      94              # Heads according to firmware.
      8192            # Sectors according to firmware.
      91E014AD        # Disk ident.

root@bb:~ # diskinfo -v /dev/mmcsd1
/dev/mmcsd1
      512             # sectorsize
      3867148288      # mediasize in bytes (3.6G)
      7553024         # mediasize in sectors
      4194304         # stripesize
      0               # stripeoffset
      20              # Cylinders according to firmware.
      46              # Heads according to firmware.
      8192            # Sectors according to firmware.
      2887EE7B        # Disk ident.

Other than "annoying warnings" it isn't clear why you think the current setup is broken. The annoying warnings cannot be eliminated when a device is formatted/partitioned by different software than it's read by. This is especially true of an sdcard formatted in a usb adapter, where the adapter's firmware reports values for these parameters which we cannot control and which will almost surely mismatch anything we cook up in our code.

If there's a problem with some software somewhere assuming sectors can't exceed 63, then we can fix that by redoing the algebra so that it's heads that gets the big number.

1st: I have the hardest time believing it just works for you when it fails to perform a pre-scripted and tested operation to copy the FreeBSD image from SD to MMC:

root@beaglebone:~ # sh ./copy-to-emmc.sh 
Warning: This script erases /dev/mmcsd1
It then copies your FreeBSD system from /dev/mmcsd0

If you booted from SD:
   /dev/mmcsd0 will refer to the micro-SD card
   /dev/mmcsd1 will refer to the eMMC

If you booted from eMMC, it will be the other way around
(Check the boot messages to verify your situation.)

If you are certain you want this script to erase stuff,
edit the script, remove the "exit 1" command, and run it again.

Erasing /dev/mmcsd1!  (Hope you meant this!)
1+0 records in
1+0 records out
65536 bytes transferred in 0.018680 secs (3508336 bytes/sec)

Creating MSDOS FAT12 boot partition on eMMC
mmcsd1 created
mmcsd1s1 added, but partition is not aligned on 4194304 bytes
active set on mmcsd1s1
newfs_msdos: trim 4095 sectors to adjust to a multiple of 8192
newfs_msdos: meta data exceeds file system size

Copying boot files to eMMC boot partition
mount_msdosfs: /dev/mmcsd1s1: Invalid argument
umount: /mnt: not a file system root directory

Creating FreeBSD partition on eMMC
GEOM: mmcsd1s2: invalid disklabel.
GEOM: diskid/DISK-25874071s2: invalid disklabel.
mmcsd1s2 added
GEOM: diskid/DISK-25874071s2: invalid disklabel.
mmcsd1s2 created
GEOM: diskid/DISK-25874071s2: invalid disklabel.
mmcsd1s2a added
/dev/mmcsd1s2a: 3676.0MB (7528384 sectors) block size 32768, fragment size 4096
        using 6 cylinder groups of 626.09MB, 20035 blks, 80256 inodes.
super-block backups (for fsck_ffs -b #) at:
 192, 1282432, 2564672, 3846912, 5129152, 6411392
GEOM: diskid/DISK-25874071s2: invalid disklabel.
tunefs: soft updates journaling remains unchanged as disabled
tunefs: NFSv4 ACLs set
tunefs: issue TRIM to the disk set
GEOM: diskid/DISK-25874071s2: invalid disklabel.
    *snip*

The errors are real:

root@beaglebone:~ # fsck_msdosfs /dev/mmcsd1s1
** /dev/mmcsd1s1
Invalid signature in boot block: 0000

There's no file system. And dmesg(8) shows a slew of annoying messages:

GEOM: mmcsd1s2: invalid disklabel.
GEOM: diskid/DISK-25874071s2: invalid disklabel.
GEOM: diskid/DISK-25874071s2: invalid disklabel.
GEOM: diskid/DISK-25874071s2: invalid disklabel.
GEOM: diskid/DISK-25874071s2: invalid disklabel.
GEOM: diskid/DISK-25874071s2: invalid disklabel.

Also, fdisk gives a clear discrepancy:

root@beaglebone:~ # fdisk mmcsd1
******* Working on device /dev/mmcsd1 *******
parameters extracted from in-core disklabel are:
cylinders=20 heads=46 sectors/track=8192 (376832 blks/cyl)

Figures below won't work with BIOS for partitions not in cyl 1
parameters to be used for BIOS calculations are:
cylinders=20 heads=46 sectors/track=8192 (376832 blks/cyl)

Media sector size is 512
Warning: BIOS sector numbering starts with sector 1
Information from DOS bootblock is:
The data for partition 1 is:
sysid 12 (0x0c),(DOS or Windows 95 with 32 bit FAT (LBA))
    start 8253, size 4095 (1 Meg), flag 80 (active)
        beg: cyl 0/ head 1/ sector 62;
        end: cyl 0/ head 1/ sector 60
The data for partition 2 is:
sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
    start 16384, size 7528448 (3676 Meg), flag 0
        beg: cyl 0/ head 2/ sector 1;
        end: cyl 20/ head 0/ sector 0
The data for partition 3 is:
<UNUSED>
The data for partition 4 is:
<UNUSED>

WTF: The first slice ends before it starts!?!

This brings me to the second point: 8192 is not a valid number of sectors per track. Valid numbers are between 1 and 63 inclusive. The PC BIOS only has 6 available bits in a register for it. Even the MBR only has 6 bits available for the number of sectors per track. You cannot possible write an MBR that records 8192 sectors per track.

The number of heads per cylinder must be within 1 and 255 and the number of cylinders is limited to 1023. There are only 8 bits for the heads and 10 bits for the cylinder. These are imitations set forth by the BIOS and the MBR. While these aren't being exceeded yet, there's no indication that there's any regard for these limitations so I thought it prudent to point it out now.

I have no problem with 16, 32 or 48 as the number of sectors per track. With 8192 bytes per erase block and 512 bytes per sector, we have 16 sectors per erase block and therefore a track that's a multiple of the erase block. 48 is the least attractive number as it isn't a power of 2. The number of heads can be anything, but a nice power of 2 is preferred. If the storage is big enough, 256 is a good number.

To bring the point home; a fixed kernel has none of those problems:

root@beaglebone:~ # sh ./copy-to-emmc.sh 
Warning: This script erases /dev/mmcsd1
It then copies your FreeBSD system from /dev/mmcsd0

If you booted from SD:
   /dev/mmcsd0 will refer to the micro-SD card
   /dev/mmcsd1 will refer to the eMMC

If you booted from eMMC, it will be the other way around
(Check the boot messages to verify your situation.)

If you are certain you want this script to erase stuff,
edit the script, remove the "exit 1" command, and run it again.

Erasing /dev/mmcsd1!  (Hope you meant this!)
1+0 records in
1+0 records out
65536 bytes transferred in 0.019602 secs (3343410 bytes/sec)

Creating MSDOS FAT12 boot partition on eMMC
mmcsd1 created
mmcsd1s1 added
active set on mmcsd1s1
/dev/mmcsd1s1: 4038 sectors in 4038 FAT12 clusters (512 bytes/cluster)
BytesPerSec=512 SecPerClust=1 ResSectors=1 FATs=2 RootDirEnts=512 Sectors=4095 Media=0xf0 FATsecs=12 SecPerTrack=63 Heads=255 HiddenSecs=0

Copying boot files to eMMC boot partition


Creating FreeBSD partition on eMMC
mmcsd1s2 added
mmcsd1s2 created
mmcsd1s2a added
/dev/mmcsd1s2a: 3685.9MB (7548800 sectors) block size 32768, fragment size 4096
        using 6 cylinder groups of 626.09MB, 20035 blks, 80256 inodes.
super-block backups (for fsck_ffs -b #) at:
 192, 1282432, 2564672, 3846912, 5129152, 6411392
tunefs: soft updates journaling remains unchanged as disabled
tunefs: NFSv4 ACLs set
tunefs: issue TRIM to the disk set

Copying the system from SD to eMMC
    *snip*

After investingating some more, I agree that setting fw_sectors and fw_heads to these values causes failures in things that should work. They could be set to values that don't cause problems, but I'm not sure that would give any benefit. (It might avoid aligning a partition on sector 63, but I'm not sure 128 is actually better for an sdcard.)

I think the real problem is in the bogus checks that geom does, but that's a subject for another day.

Is it necessary to delete stripeoffset/size? For me, just leaving fw* at zero made everything work again.

As mentioned, I have 2 issues with the current value for the stripe size:

  1. 4M is in my opinion a ridiculous value. Using the erase block size as the stripe size makes a whole lot more sense to me. But, I'm more than happy to be proven wrong.
  2. What's using this? How and in what cases does setting the stripe size change behavior?
  3. How are things affected when we layer geom_stripe or vinum on top of MMCs? Does it break it for no good reason, even if such a thing is probably not done (but who knows -- I've always found it to be a bad thing to make assumptions about how people use FreeBSD)?

In short: If someone would ask me why we set the stripe size to 4M, I have no answer. What's a meaningful end-user oriented answer to that question?

We set the stripesize to the erase block size in bytes, it's quite clear in the code. Why do you think 4M is not the erase block size?

The stripe size was set so that the file systems would begin at an erase block boundary. This can greatly improve performance in some limited tests I've done. Either scottl or phk recommended that this be set to that back in the day. I thought we'd been setting this for many years.

MBR hasn't needed rigid enforcement of fw things for two decades now (not since the mid 1990's). The fact that we're doing that is itself a bug. It's also a bug that the tools seem to want 0..63, but don't enforce that range with some warning. I'm not convinced that's a real limit.

I tend to agree the bogus values cause some problems. I have several systems here running just fine with them, however. The problem with the bogus values is that most flash images are created with a ATA or USB SD adapter, and if you go to the raw device there you get a different lie than you get from the MMC stack, but you also get different lies all over the place depending on which adapter you use. It should be trivial to say 'just ignore the fw* stuff, it's a lie' with gpart, but alas many hours of delving into the code has yet to reveal its secrets. so removing the lie from MMC is a nice start, but it is just a symptom of an overly rigid, impossible to relax enforcement of rules whose value has long since passed.

I think 4M isn't the erase sector size, because MMC_IVAR_SECTOR_SIZE returns the native sector size in and by itself. In my case this is 8K. Lines 426-431 show that we're rounding by the erase sector size without first multiplying with 512 (the logical sector size), also indicating that mmc_get_erase_sector() is in bytes and there is no reason to multiply with 512.

Thus, the code is inconsistent and I belief that the multiplication is a mistake. I may be wrong, but then on lines 426-431 we're not rounding to the erase sector size.

Correction: I misinterpreted the code for lines 426-431. We're working on block numbers there (call them LBAs), so the multiplication is not needed. This indicates that mmc_get_erase_sectors() is in block numbers and thus that stripe size is correctly set to the erase block size in bytes (i.e. 4M).

I'll update the patch to leave the stripe size as is.

marcel edited edge metadata.

New diff keeps the stripe size as before.