Compare commits

...

3 Commits

Author SHA1 Message Date
J. Nick Koston
abefd0a90a [scheduler] Revert cleanup_slow_path_ return type change
Put cleanup_slow_path_'s return back to bool (items_ non-empty after
cleanup) as a proper slow-path for the cleanup_() wrapper. The MAX
threshold gate in Scheduler::call() reads to_remove_count_() directly
— it's a plain atomic/uint32_t load (cheap on all platforms) and
matches the pre-refactor semantic of gating on the fresh post-cleanup
value.
2026-04-23 05:43:06 -05:00
J. Nick Koston
4d9e297a0c [scheduler] Drop redundant process_defer_queue_ wrapper
After the snapshot-counter refactor, Scheduler::call() only invokes the
defer-queue slow path when snap_defer > 0 — it never calls the
fast-path wrapper. The wrapper is dead code; drop it and rename
process_defer_queue_slow_path_ back to process_defer_queue_ since
there is nothing left to disambiguate.
2026-04-23 05:40:02 -05:00
J. Nick Koston
ecbc249d7a [scheduler] Snapshot cross-thread counters once per Scheduler::call()
On ESPHOME_THREAD_MULTI_NO_ATOMICS (BK72xx is the sole target — ARMv5TE,
no LDREX/STREX, so std::atomic is off), the per-helper _empty_() fast
paths fall back to "always take the lock". That means Scheduler::call()
paid a separate FreeRTOS mutex round-trip for each of:

  - process_defer_queue_   (defer_empty_)
  - process_to_add         (to_add_empty_)
  - cleanup_               (to_remove_empty_)

on every iteration, even on idle ticks where all three counters are
zero. Each round-trip is ~5-10us on BK72xx, adding ~75ms/min of
main-loop overhead at ~3100 iter/min. This matched the measured gap
between BK72xx (sched=125ms/min) and RTL87xx (sched=19ms/min) for
identical code.

Snapshot all three counters once at the top of Scheduler::call():

  - NO_ATOMICS: single LockGuard, read three plain uint32_t fields.
  - ATOMICS:    three relaxed atomic loads (free, same order as the
                per-helper fast paths).
  - SINGLE:     untouched — the existing direct container checks are
                already cheap with no concurrent writers.

The three skip-work gates below then branch on the snapshot instead of
each calling its own _empty_() (and re-locking on NO_ATOMICS). When a
gate fires, the slow path is invoked directly so it still acquires the
lock fresh to read container state.

After process_defer_queue_slow_path_ runs, resnapshot to_add_count_ and
to_remove_: callbacks dispatched by the defer queue can call
set_timeout/set_interval/cancel_*, which mutate those counters. Missing
that refresh would cause the subsequent process_to_add / cleanup_ gates
to skip freshly-queued work for one tick. The defer queue itself is
drained inside the slow path so snap_defer is consumed by the single
call.

Measured on BK7238/BK7231N while debugging overhead alongside
libretiny-eu/libretiny#360.
2026-04-23 05:38:10 -05:00
2 changed files with 54 additions and 24 deletions

View File

@@ -476,7 +476,7 @@ void Scheduler::compact_defer_queue_locked_() {
// (saves ~156 bytes flash). Erasing from the end is O(1) - no shifting needed.
this->defer_queue_.erase(this->defer_queue_.begin() + remaining, this->defer_queue_.end());
}
void HOT Scheduler::process_defer_queue_slow_path_(uint32_t &now) {
void HOT Scheduler::process_defer_queue_(uint32_t &now) {
// Process defer queue to guarantee FIFO execution order for deferred items.
// Previously, defer() used the heap which gave undefined order for equal timestamps,
// causing race conditions on multi-core systems (ESP32, BK7200).
@@ -534,13 +534,29 @@ void HOT Scheduler::process_defer_queue_slow_path_(uint32_t &now) {
#endif /* not ESPHOME_THREAD_SINGLE */
uint32_t HOT Scheduler::call(uint32_t now) {
// Snapshot the skip-work counters once up front so the gates below don't
// each re-lock on NO_ATOMICS. See snapshot_counters_ for per-platform cost.
uint32_t snap_add;
uint32_t snap_remove;
#ifndef ESPHOME_THREAD_SINGLE
this->process_defer_queue_(now);
#endif /* not ESPHOME_THREAD_SINGLE */
uint32_t snap_defer;
this->snapshot_counters_(snap_defer, snap_add, snap_remove);
if (snap_defer > 0) {
this->process_defer_queue_(now);
// Defer callbacks may set_timeout/set_interval/cancel_*, mutating the
// other two counters. Re-snapshot.
this->snapshot_counters_(snap_defer, snap_add, snap_remove);
}
#else
this->snapshot_counters_(snap_add, snap_remove);
#endif
// Extend the caller's 32-bit timestamp to 64-bit for scheduler operations
const auto now_64 = this->millis_64_from_(now);
this->process_to_add();
if (snap_add > 0)
this->process_to_add_slow_path_();
// Track if any items were added to to_add_ during callbacks
bool has_added_items = false;
@@ -582,14 +598,14 @@ uint32_t HOT Scheduler::call(uint32_t now) {
}
#endif /* ESPHOME_DEBUG_SCHEDULER */
// Cleanup removed items before processing
// First try to clean items from the top of the heap (fast path)
this->cleanup_();
// If we still have too many cancelled items, do a full cleanup
// This only happens if cancelled items are stuck in the middle/bottom of the heap
if (this->to_remove_count_() >= MAX_LOGICALLY_DELETED_ITEMS) {
this->full_cleanup_removed_items_();
// Cleanup removed items from the top of the heap, then escalate to a full
// walk if there are still too many cancelled items stuck in the middle.
// to_remove_count_ is a cheap plain read (atomic relaxed load on ATOMICS,
// direct field read on NO_ATOMICS/SINGLE).
if (snap_remove > 0) {
this->cleanup_slow_path_();
if (this->to_remove_count_() >= MAX_LOGICALLY_DELETED_ITEMS)
this->full_cleanup_removed_items_();
}
// IMPORTANT: This loop uses index-based access (items_[0]), NOT iterators.
// This is intentional — fired intervals are pushed back into items_ via

View File

@@ -413,17 +413,9 @@ class Scheduler {
#ifndef ESPHOME_THREAD_SINGLE
// Process defer queue for FIFO execution of deferred items.
// IMPORTANT: This method should only be called from the main thread (loop task).
// Inlined: the fast path (nothing deferred) is just an atomic load check.
inline void ESPHOME_ALWAYS_INLINE HOT process_defer_queue_(uint32_t &now) {
// Fast path: nothing to process, avoid lock entirely.
// Worst case is a one-loop-iteration delay before newly deferred items are processed.
if (this->defer_empty_())
return;
this->process_defer_queue_slow_path_(now);
}
// Slow path for process_defer_queue_() - defined in scheduler.cpp
void process_defer_queue_slow_path_(uint32_t &now);
// The fast path (nothing deferred) is handled by Scheduler::call()'s snapshot
// gate; this is only invoked when the snapshot saw defer_count_ > 0.
void process_defer_queue_(uint32_t &now);
// Helper to cleanup defer_queue_ after processing.
// Keeps the common clear() path inline, outlines the rare compaction to keep
@@ -446,7 +438,29 @@ class Scheduler {
// IMPORTANT: Caller must hold the scheduler lock before calling this function.
// IMPORTANT: Must not be inlined - rare path, outlined to keep it out of the hot instruction cache lines.
void __attribute__((noinline)) compact_defer_queue_locked_();
#endif /* not ESPHOME_THREAD_SINGLE */
// Snapshot skip-work counters for Scheduler::call(). NO_ATOMICS: one lock
// for all three reads. ATOMICS: three relaxed loads, free.
inline void ESPHOME_ALWAYS_INLINE HOT snapshot_counters_(uint32_t &defer, uint32_t &add, uint32_t &remove) {
#ifdef ESPHOME_THREAD_MULTI_NO_ATOMICS
LockGuard guard{this->lock_};
defer = this->defer_count_;
add = this->to_add_count_;
remove = this->to_remove_;
#else /* ESPHOME_THREAD_MULTI_ATOMICS */
defer = this->defer_count_.load(std::memory_order_relaxed);
add = this->to_add_count_.load(std::memory_order_relaxed);
remove = this->to_remove_.load(std::memory_order_relaxed);
#endif
}
#else /* ESPHOME_THREAD_SINGLE */
// SINGLE form — two direct reads, no defer queue. Lets call() use the
// same snap_add / snap_remove gates on every platform.
inline void ESPHOME_ALWAYS_INLINE HOT snapshot_counters_(uint32_t &add, uint32_t &remove) {
add = static_cast<uint32_t>(this->to_add_.size());
remove = this->to_remove_;
}
#endif /* ESPHOME_THREAD_SINGLE */
// Helper to check if item is marked for removal (platform-specific)
// Returns true if item should be skipped, handles platform-specific synchronization