Compare commits

...

1 Commits

Author SHA1 Message Date
J. Nick Koston
8666f1b3ed [core] Outline Scheduler::call cleanup slow path into cold combined helper
Fold the to_remove_empty check, cleanup_slow_path_ call, and
MAX_LOGICALLY_DELETED_ITEMS threshold check into a single hot-path
branch + cold outlined helper.

Before this change, the cleanup block in Scheduler::call compiled to
two independent memw + l32i sequences (one for to_remove_empty_() inside
cleanup_(), one for the separate to_remove_count_() check) because GCC
cannot CSE across the memw barriers that std::atomic<uint32_t>::load
emits on Xtensa.

A previous attempt at collapsing these into a single inline branch
(#15985) had the right assembly for the memw count but grew
Scheduler::call by a handful of bytes and rearranged the control flow
enough to nudge sched up ~0.5 us/iter on gatetrigger in practice.

This version goes further: cleanup_slow_combined_ is annotated
noinline + cold so the entire slow path (both reads, both calls) is
pulled out of Scheduler::call entirely. The hot path becomes:

    memw; l32i a8, [to_remove_]; beqz skip
    call8 cleanup_slow_combined_   ; unlikely, cold

and Scheduler::call's body shrinks 344 B -> 332 B (-12 B, below the dev
baseline). Adjacent code (feed_wdt_slow_, etc.) stays in the same flash
region, avoiding the cache-layout side-effects that made earlier
attempts a net loss on busier configs.

The [[unlikely]] attribute on the branch plus the cold attribute on the
helper give the compiler permission to keep the skip path straight and
push the call out-of-line.
2026-04-24 17:11:36 -05:00
2 changed files with 30 additions and 8 deletions

View File

@@ -601,14 +601,16 @@ uint32_t HOT Scheduler::call(uint32_t now) {
}
#endif /* ESPHOME_DEBUG_SCHEDULER */
// Cleanup removed items before processing
// First try to clean items from the top of the heap (fast path)
this->cleanup_();
// If we still have too many cancelled items, do a full cleanup
// This only happens if cancelled items are stuck in the middle/bottom of the heap
if (this->to_remove_count_() >= MAX_LOGICALLY_DELETED_ITEMS) {
this->full_cleanup_removed_items_();
// Cleanup removed items before processing. Fast path: one atomic load +
// branch; when nothing is pending, skip both slow-path calls entirely.
// Previously the equivalent logic was split across cleanup_() (inline,
// loads to_remove_) and a separate to_remove_count_() check for the MAX
// threshold, which produced two memw+l32i pairs on Xtensa (GCC can't CSE
// across std::atomic's memw barriers). Folding the slow work into one
// outlined cold function keeps Scheduler::call's hot-path footprint small
// and the I-cache happy.
if (this->to_remove_count_() != 0) [[unlikely]] {
this->cleanup_slow_combined_();
}
// IMPORTANT: This loop uses index-based access (items_[0]), NOT iterators.
// This is intentional — fired intervals are pushed back into items_ via
@@ -761,6 +763,21 @@ bool HOT Scheduler::cleanup_slow_path_() {
}
return !this->items_.empty();
}
// Combined cold path for Scheduler::call. Only called when to_remove_ is
// non-zero. Noinline + cold keeps the two calls and the re-read of
// to_remove_ out of the main loop's hot path; the attribute also lets the
// compiler push this code out to a rarely-touched flash region so the
// scheduler's hot instructions stay in cache.
void Scheduler::cleanup_slow_combined_() {
// First sweep: drop cancelled items from the heap top.
this->cleanup_slow_path_();
// Re-read to_remove_ because cleanup_slow_path_ may have decremented it.
// If cancelled items remain stuck below the top and the count crossed
// the threshold, do a full sweep.
if (this->to_remove_count_() >= MAX_LOGICALLY_DELETED_ITEMS) {
this->full_cleanup_removed_items_();
}
}
Scheduler::SchedulerItem *HOT Scheduler::pop_raw_locked_() {
std::pop_heap(this->items_.begin(), this->items_.end(), SchedulerItem::cmp);

View File

@@ -316,6 +316,11 @@ class Scheduler {
}
// Slow path for cleanup_() when there are items to remove - defined in scheduler.cpp
bool cleanup_slow_path_();
// Combined slow path for Scheduler::call: runs cleanup_slow_path_ then, if
// cancelled items are still stuck below the heap top, full_cleanup_removed_items_.
// Outlined (noinline, cold) so the Scheduler::call fast path stays one atomic
// load + branch — keeping the I-cache hot for the common zero-to-remove case.
void __attribute__((noinline, cold)) cleanup_slow_combined_();
// Slow path for process_to_add() when there are items to merge - defined in scheduler.cpp
void process_to_add_slow_path_();
// Remove and return the front item from the heap as a raw pointer.