Page MenuHomeFreeBSD

growfs script: add swap partition as well as growing root
ClosedPublic

Authored by karels on Nov 22 2022, 7:02 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Jan 23, 6:50 PM
Unknown Object (File)
Sat, Jan 18, 9:38 PM
Unknown Object (File)
Sat, Jan 11, 2:56 PM
Unknown Object (File)
Sat, Jan 11, 2:51 PM
Unknown Object (File)
Sat, Jan 11, 2:43 PM
Unknown Object (File)
Sat, Jan 11, 1:58 PM
Unknown Object (File)
Sat, Jan 11, 12:09 PM
Unknown Object (File)
Sat, Jan 11, 11:59 AM

Details

Summary

Add the ability to create a swap partition in the course of growing
the root file system on first boot, enabling by default. The default
rules are: add swap if the disk is at least 15 GB (decimal), and the
existing root is less than 40% of the disk. The default size is 10%
of the disk, but not more than double the memory size.

The default behavior can be overridden by setting growfs_swap_size in
/etc/rc.conf or in the kernel environment, with kenv taking priority.
A value of 0 inhibits the addition of swap, an empty value specifies
the default, and other values indicate a swap size in sectors.

Addition of swap is also inhibited if a swap partition is found in
the output of the sysctl kern.geom.conftxt before the current root
partition, usually meaning that there is another disk present.

The root partition is read-only when growfs runs, so /etc/fstab can
not be modified. That step is handled by a new growfs_fstab script,
added in a separate commit. Set the value "growfs_swap_added=1" in
kenv to indicate that this should be done.

There is optional verbose output meant for debugging; it can only be
enabled by modifying the script (in two places, for sh and awk).
This should be removed before release, after testing on -current.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 48627
Build 45513: arc lint + arc unit

Event Timeline

karels created this revision.

"other values indicate a swap size in sectors": so from media to media, with different sector sizes, the same figure means differing sizes?

Is the script used with the likes of:

QUOTE
Disk images may be downloaded from the following URL (or any of the
FreeBSD Project mirrors):

https://download.freebsd.org/snapshots/VM-IMAGES/

Images are available in the following disk image formats:

~ RAW
~ QCOW2 (qemu)
~ VMDK (qemu, VirtualBox, VMWare)
~ VHD (qemu, xen)

The partition layout is:

~ 512k - freebsd-boot GPT partition type (bootfs GPT label)
~ 1GB  - freebsd-swap GPT partition type (swapfs GPT label)
~ 24GB - freebsd-ufs GPT partition type (rootfs GPT label)

END QUOTE

If yes, has the handling been tested? The above already has a 1GB freebsd-swap before the freebsd-ufs.

What if someone had a similar example that instead had such a freebsd-swap after the freebsd-ufs?

This sounds like a good idea in principle. A couple things which come to mind:

  1. This breaks the scenario of "VM is initially booted with a 20 GB disk; at a later time, the disk is expanded to 30 GB and /etc/rc.d/growfs is run manually" since the root partition will no longer be at the end of the disk. I guess theoretically we could delete the swap partition and create a new one at the end of the disk?
  1. In EC2 I have code for automatically using ephemeral disks for swap space; this has higher performance than using the root disk (network disk vs. local disk). But not all EC2 instance types have ephemeral disks. I'm not sure what the ideal interaction between these two features would be.

". . . and the existing root is less than 40% of the disk" What if other partitions/slices than root's are taking up space as well? Do you need to use an available free space's size that is sufficiently large instead?

It may be that the script may be intended to work in more contexts than just the FreeBSD built images and be intended to preserve various partitions/slices that may be around. That might mean only growing into the freespace that happens to be directly after the root partition.

Imagine someone deleting a partition just after the root partition for media that they had been using and then having the root partition grown despite yet other partitions being present.

Warning: I'm not an expert in the intended range of uses of the long standing script. So I may have wandered too far here.

"other values indicate a swap size in sectors": so from media to media, with different sector sizes, the same figure means differing sizes?

Sector size is (nearly?) always 512. Using sectors ensures that the value is a multiple of the sector size. Although I suppose I could just divide and round down.

Is the script used with the likes of:

[VM-IMAGES with pre-existing swap]

I don't know, I'll have to investigate. I had assumed that it was not used there.

If yes, has the handling been tested? The above already has a 1GB freebsd-swap before the freebsd-ufs.

Not yet, but I'll investigate. "before" is relative to the kern.geom.conftxt, which is not always in the expected order.

What if someone had a similar example that instead had such a freebsd-swap after the freebsd-ufs?

The easiest thing would be to put growfs_swap_size="0" in the default rc.conf. Or, if it made sense, to change the default to "off", and enable in arm images, etc.

This sounds like a good idea in principle. A couple things which come to mind:

  1. This breaks the scenario of "VM is initially booted with a 20 GB disk; at a later time, the disk is expanded to 30 GB and /etc/rc.d/growfs is run manually" since the root partition will no longer be at the end of the disk. I guess theoretically we could delete the swap partition and create a new one at the end of the disk?

Yes, deleting the automatic swap partition would allow this to work. This is only a problem if the initial root partition was smaller than the 20 GB disk and growfs was enabled.

  1. In EC2 I have code for automatically using ephemeral disks for swap space; this has higher performance than using the root disk (network disk vs. local disk). But not all EC2 instance types have ephemeral disks. I'm not sure what the ideal interaction between these two features would be.

It probably makes the most sense to disable the growfs swap addition in that case. These scripts don't really handle that situation correctly, where swap could be prioritized. But the growfs swap will not be added if there is swap in the fstab already.

". . . and the existing root is less than 40% of the disk" What if other partitions/slices than root's are taking up space as well? Do you need to use an available free space's size that is sufficiently large instead?

This is designed for use with our existing images, where root is the only substantial partition, and the free space immediately follows it.

It may be that the script may be intended to work in more contexts than just the FreeBSD built images and be intended to preserve various partitions/slices that may be around. That might mean only growing into the freespace that happens to be directly after the root partition.

That is true in the current growfs as well.

Imagine someone deleting a partition just after the root partition for media that they had been using and then having the root partition grown despite yet other partitions being present.

If they deleted the partition after first boot, growfs won't run automatically.

Warning: I'm not an expert in the intended range of uses of the long standing script. So I may have wandered too far here.

It really isn't a general-purpose tool, it only needs to handle the images as we ship them. Some of the options may be useful for other embedded systems, e.g. setting the swap size where the target is known hardware.

About the VM images with pre-existing swap partitions: the swap partition is already listed in /etc/fstab, so I'll just check for that. Although we might want to consider switching them to use this mechanism... But that wouldn't provide any swap if the disk is too small.

Change units of growfs_swap_size to bytes; skip swap if swap is in fstab

It probably makes the most sense to disable the growfs swap addition in that case. These scripts don't really handle that situation correctly, where swap could be prioritized. But the growfs swap will not be added if there is swap in the fstab already.

Thinking about this a bit more... EC2 instances can change their instance types, so an instance might have ephemeral disks (with swap space allocated on them) on some boots and not others. I'm inclined to say that the best solution here is "always allocate the swap partition but don't enable it if we have other swap already". Unfortunately "already" in this case happens after growfs, so we would need to create the swap partition in growfs and then conditionally enable it later...

It probably makes the most sense to disable the growfs swap addition in that case. These scripts don't really handle that situation correctly, where swap could be prioritized. But the growfs swap will not be added if there is swap in the fstab already.

Thinking about this a bit more... EC2 instances can change their instance types, so an instance might have ephemeral disks (with swap space allocated on them) on some boots and not others. I'm inclined to say that the best solution here is "always allocate the swap partition but don't enable it if we have other swap already". Unfortunately "already" in this case happens after growfs, so we would need to create the swap partition in growfs and then conditionally enable it later...

How do you handle the fstab in that case? I see that my EC2 instance has neither swap, nor an fstab entry. The code that I just added will omit swap if it is already included in the fstab; that handles the VM images from the standard build, which already have a swap partition, and which is in the fstab. I'd be reluctant to add a second swap partition, which would end up on the same device, especially if it would never be enabled by default.

I can think of a couple of possibilities: it would be possible to set growfs_swap_size to a size in bytes in the EC2 rc.conf, which would force creation of a swap device of that size. Or, I could add a reserved value (e.g. 1 byte) that would force the creation using the default sizing rules. It would be added to the fstab, but as "/dev/label/growfs_swap", so it would be easy to recognize and manipulate.

If it is useful to add a way to force swap to be added using the default size, I would probably change the values for growfs_swap_size to symbolic ones, e.g. AUTO, NONE, or ALWAYS (or a value in bytes).

Any thoughts on whether it would be useful to force addition of swap even if it is already present? I thought it would be easy, but it takes a bit of work to find the new swap partition if there was already one present.

Any thoughts on whether it would be useful to force addition of swap even if it is already present? I thought it would be easy, but it takes a bit of work to find the new swap partition if there was already one present.

My take: Already present -> operating outside the limited remit of this script... don't add it.

I'd avoid creating ANOTHER swap partition, though. If you want a 'force' it should mean "and throw away whatever swap you found there to create the one you'd have created were that swap partition not ever there" or some such.

In D37462#853345, @imp wrote:

I'd avoid creating ANOTHER swap partition, though. If you want a 'force' it should mean "and throw away whatever swap you found there to create the one you'd have created were that swap partition not ever there" or some such.

There are two different situations though, a swap partition on the root/install disk, or on another disk. I have the latter situation if my USB SSD is plugged in. I don't want to modify other disks, that would violate POLA. I realized, though, that supplying a swap size already implements "force", and can label the wrong partition as "growfs_swap" currently. I'll need to fix that one way or the other.

@cperciva what would work best on EC2?

@cperciva what would work best on EC2?

Sorry, I was at AWS re:Invent so I wasn't able to pay attention to this review for the past few days.

I think the best for EC2 is
(a) unconditionally *create* swap space on >15 GB disks, but
(b) only *use* that swap space if -- at a later stage in the boot process -- we don't have any swap configured.

This way if a system is booted on an EC2 instance with ephemeral disks it will use those, but if it's moved over to an instance without ephemeral disks later it will still have *some* swap space.

Another question: Do you want to respect vm.swap_maxpages? Or just allocate the swap space and let the kernel potentially issue a warning about the device being too large and not being fully utilized?

@cperciva what would work best on EC2?

I think the best for EC2 is
(a) unconditionally *create* swap space on >15 GB disks, but
(b) only *use* that swap space if -- at a later stage in the boot process -- we don't have any swap configured.

Two questions (at least): is there a canonical way to test for EC2 for this purpose? And would ephemeral disks have swap partitions, or entries for swap in /etc/fstab, at the time growfs runs? I can easily create and label the partition, but skip the fstab and swapon steps if on EC2. Maybe testing ec2_ephemeralswap_enable would be appropriate here?

This way if a system is booted on an EC2 instance with ephemeral disks it will use those, but if it's moved over to an instance without ephemeral disks later it will still have *some* swap space.

Another question: Do you want to respect vm.swap_maxpages? Or just allocate the swap space and let the kernel potentially issue a warning about the device being too large and not being fully utilized?

Good question. I had assumed that a limit of 2*physmem would be small enough; I've been thinking mostly about embedded systems with <= 8 GB. Computing this limit would probably be appropriate. Looks like the warning is at vm.swap_maxpages/2. 2*physmem is probably excessive for larger systems too.

Another question: Do you want to respect vm.swap_maxpages? Or just allocate the swap space and let the kernel potentially issue a warning about the device being too large and not being fully utilized?

My memory is that warnings start to be produced about being potentially mistuned when swapon crosses about something like vm.swap_maxpages/2 (total across active swap files). There is also some variation in the figure over time as the OS is upgraded, even if other things are not varied, as I remember. Leaving some margin can be appropriate.

Looking at "man 8 loader_simp" and its kern.maxswzone material:

Note that swap metadata can be fragmented, which means that
the system can run out of space before it reaches the
theoretical limit.  Therefore, care should be taken to not
configure more swap than approximately half of the
theoretical maximum.

Unfortunately, some other detail about kern.maxswzone need not apply to various architectures. (I'm guessing it presumes amd64.) The:

If no value is provided, the system
allocates enough memory to handle an amount of swap that
corresponds to eight times the amount of physical memory
present in the system.

is suspect, for example. On aarch64 I get warnings starting somewhere between 3.5 and 3.8 times the RAM present. On armv7, it starts somewhere between 1.8 and 2.0 times the RAM present, as I remember.

Another thing about kern.maxswzone, that may not be obvious, if I understand right anyway: increasing kern.maxswzone makes tradeoffs with other kernel memory use instead of only increasing the amount of swap space that can be supported. The normal adjustment (rare) is to decrease it to allow more of other types of kernel memory use. See "man 8 loader_simp".

Change swap limit based on memory size (2x memory up to 4 GB, 8 GB up
to 8 GB memory, 1x memory beyond that). Also limit according to
vm.swap_maxpages/2. Move code to label swap from growfs_fstab to growfs
to assist with EC2, and correct to get the device right if there was
another swap partition.

Additional notes after the last update:

  • I moved the swap partition labeling from growfs_fstab to growfs, which should allow EC2 to bypass growfs_fstab (e.g. by undefining growfs_swap_pdev in kenv) and do its own thing. That code has also been corrected so that it labels the correct partition in some cases where it got it wrong before.
  • I did not add a way to force swap addition using the default size, nor code to delete any existing swap partition. Note, though, that setting a specific swap size overrides the check for existing swap.

Any comments, changes, approvals?

My only comments are about documentation, which I'll leave to your discression.
I like the concept and the code to implement it looks decent.
I don't see any weird edge cases that need to be handled or documented.
Embedded systems should handle this well.
Systems with tiny resources will be made no worse by this commit, so need to be considered further.

So it ticks all the boxes for me. If you are at all unsure, maybe wait a couple of days for objections unless you get a ton of approvals :) I'm not seeing anything that my (sometimes overly) picky self has....

libexec/rc/rc.d/growfs
67

Might want to mention the rule here (or in a comment below) for how you pick the size in more detail.

199

where does this number come from?

This revision is now accepted and ready to land.Dec 8 2022, 6:56 PM

I'll update comments. I'll also wait a while for any additional comments.

libexec/rc/rc.d/growfs
67

OK. Meanwhile, I just noted that the Handbook recommends 2 * memory size with no limit. I think I like this rule better (1 * memory size over 8 GB). Anyone know the history of this, or what bsdinstall uses? If I found the right bit of code, it doesn't refer to memory size, and has a 4 GB upper bound. It also doesn't refer to memory size.

199

:). I wanted to use swap on a 16 GB microSD, but figured much smaller would be too small.

Add comments on rule for swap size and disk size limit.

This revision now requires review to proceed.Dec 9 2022, 4:12 PM
This revision was not accepted when it landed; it landed in state Needs Review.Dec 10 2022, 7:42 PM
This revision was automatically updated to reflect the committed changes.

I tried the new main [so: 14] snapshot, dd'd to a USB3 SSD and booted:
snaphot: FreeBSD-14.0-CURRENT-arm64-aarch64-RPI-20221224-c89209c674f2-259842.img
so: FreeBSD 14.0-CURRENT #0 main-n259842-c89209c674f2: Sat Dec 24 05:52:28 UTC 2022
Result (from the serial console capture):

Starting file system checks:
/dev/ufs/rootfs: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ufs/rootfs: clean, 599506 free (242 frags, 74908 blocks, 0.0% fragmentation)
/etc/rc.d/growfs: 203: Syntax error: "(" unexpected (expecting "}")

Looks to be the ' in "Don't" in a supposed #comment that that instead matches a prior awk use of ' unintentionally. Later in the line is: "(decimal)" that supplies the "(" reported.

Given a "Don't" -> "Do not" workaround after dd but before booting in order to try to see if the rest worked . . .

I do not find any evidence of a growfs_fstab script, just a man7 for it:

# find / -name "growfs_fstab*" -print | more
/usr/share/man/man7/growfs_fstab.7.gz

I discovered this because setting up use of the swap did not happen automatically.
( /dev/label/growfs_swap was present and worked for a manual swapon command. )

Side note: dumping to the swap space was not set up automatically either. I'm not sure what the intent was for such.

I found these over the weekend too, testing the snapshot images. Fixed by 4c8a257810a6.

I found these over the weekend too, testing the snapshot images. Fixed by 4c8a257810a6.

For:

FreeBSD 14.0-CURRENT #0 main-n259905-231d75568f16: Sun Jan 1 11:28:27 UTC 2023 root@releng1.nyi.freebsd.org:/usr/obj/usr/src/arm64.aarch64/sys/GENERIC

I ended up with:

# gpart show 
=>      63  62333889  mmcsd0  MBR  (30G)
        63      1985          - free -  (993K)
      2048    102400       1  fat32lba  [active]  (50M)
    104448  62228480       2  freebsd  (30G)
  62332928      1024          - free -  (512K)

=>       0  62228480  mmcsd0s2  BSD  (30G)
         0       128            - free -  (64K)
       128  62228352         1  freebsd-ufs  (30G)

So no swap. Looks like it got "gpart: Invalid start param: Invalid argument" . . .

Starting file system checks:
/dev/ufs/rootfs: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ufs/rootfs: clean, 599451 free (483 frags, 74871 blocks, 0.0% fragmentation)
Growing root partition to fill device
random: randomdev_wait_until_seeded unblock wait
random: unblocking device.
Adding swap partition
GEOM_PART: mmcsd0s2 was automatically resized.
  Use `gpart commit mmcsd0s2` to save changes or `gpart undo mmcsd0s2` to revert them.
mmcsd0s2 resized
gpart: Invalid start param: Invalid argument
mmcsd0s2a resized
super-block backups (for fsck_ffs -b #) at:
 11524224, 12804672, 14085120, 15365568, 16646016, 17926464, 19206912,
 20487360, 21767808, 23048256, 24328704, 25609152, 26889600, 28170048,
 29450496, 30730944, 32011392, 33291840, 34572288, 35852736, 37133184,
 38413632, 39694080, 40974528, 42254976, 43535424, 44815872, 46096320,
 47376768, 48657216, 49937664, 51218112, 52498560, 53779008, 55059456,
 56339904, 57620352, 58900800, 60181248, 61461696
Swap partition not found on mmcsd0s2
dumpon: /dev/label/growfs_swap: No such file or directory