This patch contains several changes:
1. It changes locking on input to improve performances.
2. It fixes the bug where the `vt_flush()` timer is resumed even when the change is made to another (hidden) window's output buffer.
3. It fixes the VGA palette and color indexing.
The main part of the patch is the locking change. I compared syscons(4) and vt(4) input performances using the following command, from a remote SSH session, where both syscons(4) and vt(4) are configured to use text mode:
```
time cat find.txt >/dev/ttyv0
```
`find.txt` was created with:
```
find / > find.txt
```
In my benchmark, the file is about 26 MiB and 360000 lines.
Results:
* syscons(4): 700 ms
* vt(4): 2500 ms
In both cases, the fact that the window is the currently showed window or not doesn't impact the time taken.
Another difference is the rendering: while the content was scrolling on the screen,with syscons(4), frames look good. However with vt(4), they look "split" as if on a single line, there was several letters, then spaces, then several other letters. Once the file is entirely written to the terminal, the content is correct for both implementations.
After studying the code for both implementations, it looks like that:
* syscons(4) acquires a single lock, does the whole input process (writes the character to the buffer, moves the cursor, possibly handles the newlines and line wraps by scrolling the content) and release the lock. The rendering thread uses the same lock to render the new content and that lock is held for the entire drawing process.
* With vt(4), a lock is acquires by the upper layer, then vt(4) does the input process: it writes the character to the buffer, acquires and releases a lock to mark the region as out-of-date, moves the cursor, re-acquires the lock to mark the region as out-of-date, possible handles the newlines/line wraps where it acquires that lock again twice (copy screen, then fill the new line). Once vt(4) is finished, the upper layer releases its own lock. In addition, after each putchar/cursor move/copy/fill, there is an atomic cmpset to decide if the `vt_flush()` timer should be resumed. Last but not least, the upper layer re-acquires its lock a second time to call the `tc_done()` callback. The rendering thread acquires the vt(4) lock to read and reset the coordinates of the out-of-date region, releases it and does the drawing.
So in the end:
* syscons(4) acquires 1 lock to process an input.
* vt(4) acquires 4 locks and 2 atomic cmpsets to process an input, or 5 locks and 3 atomic cmpset if there is a newline.
The patch improves this situation by:
* Introducing a new `tc_prepare()` callback: it allows vt(4) to acquire its internal lock once instead of each time a callback is called. That lock is released in `tc_done()`.
* Moving the atomic cmpset fo `tc_done()` so it's done once only.
* Changing the upper layer to acquire its own lock once, not twice.
After that, vt(4) is down to around 800 ms for the same test. So slightly slower than syscons(4), mainly because there are two locks, not one. It's already a 3 times improvement on my laptop.
For the rendering thread in vt(4), the lock is acquires earlier and released later, in particular after the drawing is finished. This fixes the weird frame rendering while scrolling because vt(4) won't write to the output buffer while the rendering thread is reading it.
I also ran several builkernels to see if the locking impacted a real usecase. I couldn't find any difference: no matter the console driver or the application of this patch or not, the buildkernel was always the same.
Now the two smaller changes:
1. When input is for a terminal which is not currently displayed, we don't try to resume the `vt_flush()` anymore. This was a waste of time and resources.
2. In the VGA renderer, the 16-color palette is now initialized with the default VGA colors: some colors are swapped compared to the console colors palette (blue<->red for instance). Also, when we set the foreground or background color, we use a mapping to convert a console color index to a VGA index. The only user-visible change is when vt(4) is initialized early in boot and the loader menu is still displayed. Before the fix, Beastie would switch from red to blue, likewise for the "Booting..." message (blue to red). This doesn't change colors after boot. Therefore it's a really minor change but now the VGA renderer is consistent with other renderers (where Beastie colors would remain unmodified).
There are still a few things I want to finish like adding assertions that the `vtbuf` lock is locked/unlocked where appropriate.
In the end, I will commit those changes in several commits, not everything at once.