Previously, gfxfb_blt flushed the framebuffer on every call. Since a
single drawing operation may invoke gfxfb_blt multiple times, this can
result in unnecessary flushes.
Instead, write updates to the shadow buffer (when present) and mark the
affected area as dirty. Flushing is deferred so multiple gfxfb_blt calls
can be coalesced into a single update. As before, only the dirty region
is flushed.
This fixes the slow bootloader problem in some platforms.