Separate callouts into two groups: high-precision and low-precision callouts. (In this context, "low precision" is defined as any callout with an execution window that is a superset of the time covered by a single callout bucket.)
Currently, every time that callout_process() executes, it needs to walk all entries in some number of buckets in the future to look for the earliest event that needs to execute. By separating callouts into high-precision and low-precision callouts, we can keep it from walking the low-precision buckets. Instead, if the bucket has any entries, we can simply assume the execution window of the bucket.
In addition, we now pre-calculate the deadline rather than recalculating it each time callout_process() examines the entry to extract timing information.
Finally, when adding a new callout to the same bucket currently being executed, stop resetting cc_exec_next to point to the current callout. That action caused callout_process() to begin walking the bucket from the beginning again. Instead, simply feed the callout time information into the variables shared with callout_process().