The block cache implementation in loader has proven to be almost useless, and in worst case even slowing down the disk reads due to insufficient cache size and extra memory copy. Also the current cache implementation does not work with zfs built on top of multiple disks.
The goal of this project is to explore the alternative implementation of bcache; instead of LRU this code is using simple hash (to have O(1) read from cache), and instead of single global cache, the code is creating block cache per device.
If whole data is not in cache, the read_strategy() will read the missing blocks + possible number of read ahead blocks. To simplify read ahead management, the read ahead will not wrap over bcache end, so in worst case, single block physical read will be performed to fill the last block in bcache.
The implementation is activated for biosdisk and bioscd devices and on tests, the speed boost is quite noticeable, especially for dosfs (the update for dosfs is not included in this diff), but depending on amount of cache allocated for cache. To simplify testing, this code has set loader heap quite large.
Note1: the dv_strategy() call has extra argument for offset, this was added to support partial block reads from dosfs, I did not include the dosfs update in this diff to keep logical separation.
Note2: as of current version, I did leave write not to update cache blocks as I'm not sure if it's at all needed. Not that important for the scope of the experiment, but is easy to fix.
Note3: The BOOT2 preprocessor guard is "leftover" from illumos code, as I have somewhat reworked stage2. Also as this diff is presented as experiment and not straight in patch, if it happens the idea is worth to be considered for integration, I assume some cleanup/updates will be in order anyhow.