Compare commits

...

7 Commits

Author SHA1 Message Date
J. Nick Koston
4ea417966d [core] Poison brace-depth tracking once it goes negative
Address Copilot review: the prior comment promised that a negative depth would never re-enable splitting, but an arithmetically-balanced later { could bring depth back to 0 and resume flushing mid-stream. Track an explicit 'poisoned' flag set once depth < 0 that permanently disables further flushes for the rest of the input. Adds a regression test where a leading } and a later { would have re-enabled splitting without the poison flag.
2026-04-17 19:01:12 -05:00
J. Nick Koston
6446f309c1 [core] Replace mutable-list-flag pattern with _ComponentGroup dataclass
The mutable-list-of-bool trick for rebinding-safe flag mutation works but reads poorly. Replace with a _ComponentGroup dataclass carrying lines + unsafe + no_split fields. cpp_main_section now reads as a straight iteration over typed groups; no apologetic comments needed.
2026-04-17 18:56:59 -05:00
J. Nick Koston
093c34d4a4 [core] Review cleanup: docstring accuracy, rationale comments, unsafe+no_split test
- Fix the ComponentMarker docstring's incomplete 'either placement-news or mutates a global' claim — acknowledge that function-local patterns also exist and note the bare-local detection covers them.
- Document that _emits_bare_local's RawExpression detection is intentionally safety-biased: false negatives break compilation, false positives just keep a slightly larger IIFE. Note the CallExpression(..., RawExpression) negative case explicitly.
- Explain the mutable-list-flag pattern in cpp_main_section — dataclass would read cleaner but the pattern is localized.
- Add regression test for a group with BOTH IIFEUnsafeStatement and a bare-local: unsafe wins (flat emission) because a return inside any IIFE, even a single big one, only exits the lambda.
2026-04-17 18:55:39 -05:00
J. Nick Koston
3ab935bebb [core] Expose IIFE_MAX_STATEMENTS constant and derive test sizes from it
Tests were hardcoding 120 statements and expecting 3 sub-chunks from a 50-cap. Extract the cap as a named module constant and compute the test-input size from it, so bumping the cap doesn't silently invalidate the tests.
2026-04-17 18:50:41 -05:00
J. Nick Koston
bb0067f517 [core] Strengthen RawExpression-as-arg test
Exercise the actual CallExpression(..., RawExpression) pattern that components use for passing raw arguments. The previous test used RawStatement filler which didn't exercise the detection path we wanted to assert doesn't trigger.
2026-04-17 18:49:04 -05:00
J. Nick Koston
550f6e7c72 [core] Detect bare-local emission and disable sub-split for those groups
A component's group is wrapped in a single IIFE with no sub-splitting when its to_code emits any of: scope-brace RawStatement, direct RawExpression via cg.add (raw bare-local or field-assignment like 'tz.field = x'), or typed AssignmentExpression (cg.variable). Detection is content-aware so entity_helpers' inline-comment RawStatements and RawExpression-as-CallExpression-arg don't false-positive. Adds 4 regression tests covering each detection path and the non-triggering inverse cases.
2026-04-17 18:46:30 -05:00
J. Nick Koston
5f2582efcd [safe_mode] Fix setup()-exit return getting trapped in IIFE
safe_mode emits `if (should_enter_safe_mode(...)) return` via
cg.add(RawExpression(...)) to short-circuit the rest of setup() and
boot into safe mode. With setup() split into per-component IIFEs,
that `return` was only exiting the lambda, so the rest of setup() ran
anyway — breaking safe-mode recovery.

Add IIFEUnsafeStatement, a Statement wrapper that marks its containing
component's block for flat emission (no IIFE). safe_mode wraps its
return expression in it. cpp_main_section detects any such statement
in a group and emits that group flat so control-flow constructs like
`return` still affect setup() itself.

IIFEUnsafeStatement.__str__ routes its inner through statement() so
bare Expression subclasses pick up the terminating semicolon. Reported
by @swoboda1337.
2026-04-17 18:12:14 -05:00
5 changed files with 329 additions and 25 deletions

View File

@@ -13,6 +13,7 @@ from esphome.cpp_generator import ( # noqa: F401
ComponentMarker,
Expression,
FlashStringLiteral,
IIFEUnsafeStatement,
LineComment,
LogStringLiteral,
MockObj,

View File

@@ -87,7 +87,10 @@ async def to_code(config):
config[CONF_REBOOT_TIMEOUT],
config[CONF_BOOT_IS_GOOD_AFTER],
)
cg.add(RawExpression(f"if ({condition}) return"))
# Wrap in IIFEUnsafeStatement so cpp_main_section emits this
# component's block flat rather than inside an IIFE lambda —
# the `return` must exit setup() itself, not just the lambda.
cg.add(cg.IIFEUnsafeStatement(RawExpression(f"if ({condition}) return")))
CORE.data[CONF_SAFE_MODE] = {}
CORE.data[CONF_SAFE_MODE][KEY_PAST_SAFE_MODE] = True

View File

@@ -1,5 +1,6 @@
from collections import defaultdict
from contextlib import contextmanager
from dataclasses import dataclass, field
import logging
import math
import os
@@ -531,17 +532,96 @@ class Library:
return self
def _wrap_in_iifes(lines: list[str], max_statements: int) -> list[str]:
# Cap on the number of statements in a single IIFE chunk when a
# component's to_code body is sub-split. Picks a frame-size sweet spot
# on esp32-s3 — large enough that most components fit in one chunk and
# small enough that heavy sensor platforms (many filter registrations)
# don't produce a chunk with a very large spill frame.
IIFE_MAX_STATEMENTS = 50
@dataclass
class _ComponentGroup:
"""A contiguous run of statements emitted by one component's to_code."""
lines: list[str] = field(default_factory=list)
# True when the group contains a statement that must affect setup()'s
# own control flow (e.g. safe_mode's `return`). Emit the group flat,
# bypassing IIFE wrapping entirely.
unsafe: bool = False
# True when the group contains a statement that may declare a
# function-local whose lifetime extends past the current statement
# (scope-brace RawStatement, direct RawExpression, typed
# AssignmentExpression). Wrap the group in a single IIFE without
# sub-splitting so the declaration and any later references stay
# in the same lambda.
no_split: bool = False
def _emits_bare_local(exp: "Statement") -> bool:
"""True if ``exp`` emits a scope brace or bare-raw construct that may
declare a function-local whose lifetime extends past the current
statement. Components that emit any such statement must not be
sub-split — later references within the same ``to_code`` would land
in a different IIFE and fail to compile.
The detection is intentionally safety-biased: false negatives cause
silent broken C++, false positives just keep a component in one
slightly larger IIFE. Any ``cg.add(RawExpression(...))`` disables
sub-splitting for its group regardless of whether the raw text
actually references a local, because the chunker can't introspect
arbitrary raw text."""
from esphome.cpp_generator import (
AssignmentExpression,
ExpressionStatement,
RawExpression,
RawStatement,
)
# Scope braces from cg.with_local_variable() or inline scope blocks
# (e.g. time's tz pattern). Content-aware so RawStatements emitted
# for "call(); // comment" (entity_helpers) don't false-positive.
if isinstance(exp, RawStatement) and str(exp).strip() in ("{", "}"):
return True
# cg.add(RawExpression(...)) — bare raw text, e.g.
# `time::ParsedTimezone tz{}` or `tz.field = ...`. CORE.add wraps
# a passed Expression in an ExpressionStatement; when the inner is
# a RawExpression the author is emitting uninterpreted text that
# may reference a local declared elsewhere in the same block. A
# RawExpression passed as a CallExpression argument does NOT land
# here (its ExpressionStatement's .expression is the CallExpression),
# so value-pass patterns like `var.set_program(RawExpression("&foo"))`
# continue to sub-split normally.
if isinstance(exp, ExpressionStatement) and isinstance(
exp.expression, RawExpression
):
return True
# cg.variable(id, rhs) — emits ``Type id = rhs;`` as a function-local.
return (
isinstance(exp, ExpressionStatement)
and isinstance(exp.expression, AssignmentExpression)
and exp.expression.type is not None
)
def _wrap_in_iifes(lines: list[str], max_statements: int | None) -> list[str]:
"""Wrap ``lines`` in ``[]() {...}();`` IIFEs of up to ``max_statements``
each. Never splits inside a brace-balanced block (e.g. the ``{`` / ``}``
pair from ``cg.with_local_variable()``), so an IIFE may exceed the cap
when a block straddles it. Comment-only chunks pass through verbatim.
each, or in a single IIFE when ``max_statements`` is ``None``. Never
splits inside a brace-balanced block (e.g. the ``{`` / ``}`` pair from
``cg.with_local_variable()``), so an IIFE may exceed the cap when a
block straddles it. Comment-only chunks pass through verbatim.
No ``noinline`` attribute — GCC's inliner re-folds small chunks freely,
keeping flash small without regressing peak stack."""
out: list[str] = []
chunk: list[str] = []
depth = 0
depth: int = 0
# Once depth goes negative we stop trusting the brace count and
# keep everything remaining in one final IIFE. A later ``{`` could
# arithmetically bring depth back to 0, but by that point the brace
# tracking is already unreliable — re-enabling mid-stream splits
# could land between a declaration and its use.
poisoned: bool = False
def flush() -> None:
if not chunk:
@@ -557,11 +637,16 @@ def _wrap_in_iifes(lines: list[str], max_statements: int) -> list[str]:
for line in lines:
chunk.append(line)
# Count { and } per line so inline control flow (e.g. `if (cond) {`)
# and balanced inline lambdas are tracked correctly. If depth ever
# goes negative (unbalanced input) we never return to 0 and the
# rest falls through into a single final IIFE — safe fallback.
# and balanced inline lambdas are tracked correctly.
depth += line.count("{") - line.count("}")
if depth == 0 and len(chunk) >= max_statements:
if depth < 0:
poisoned = True
if (
not poisoned
and max_statements is not None
and depth == 0
and len(chunk) >= max_statements
):
flush()
flush()
return out
@@ -1038,29 +1123,63 @@ class EsphomeCore:
self.data[KEY_CONTROLLER_REGISTRY_COUNT] = controller_count + 1
@property
def cpp_main_section(self):
from esphome.cpp_generator import ComponentMarker, statement
def cpp_main_section(self) -> str:
from esphome.cpp_generator import (
ComponentMarker,
IIFEUnsafeStatement,
statement,
)
# Split main_statements at ComponentMarker sentinels and wrap each
# component's group in an IIFE, sub-splitting at 50 statements so
# a single heavy component (e.g. a sensor platform with many
# filter registrations) can't blow the peak chunk frame.
#
# Two escape hatches control whether a component's group is safe
# to sub-split:
#
# - IIFEUnsafeStatement (e.g. safe_mode's setup-scope `return`):
# the whole group must stay at setup() scope so the statement
# affects setup()'s control flow, not the lambda's. Emit flat.
#
# - Any statement that may declare a function-local: a bare
# ``{`` / ``}`` RawStatement (from ``cg.with_local_variable``,
# time's inline tz block, etc.), a direct ``RawExpression``
# passed to ``cg.add`` (raw bare-local or field-assignment
# emission like ``time::ParsedTimezone tz`` followed by
# ``tz.field = ...``), or a typed ``AssignmentExpression``
# (``cg.variable`` emitting ``Type id = rhs;``). Each signals
# "this group's body may contain bare names whose scope is the
# enclosing IIFE"; wrap the whole group in one IIFE with no
# sub-split so the declaration and any later references stay
# together.
prefix: list[str] = []
components: list[list[str]] = []
current = prefix
components: list[_ComponentGroup] = []
current: list[str] = prefix
group: _ComponentGroup | None = None
for exp in self.main_statements:
if isinstance(exp, ComponentMarker):
current = []
components.append(current)
group = _ComponentGroup()
components.append(group)
current = group.lines
continue
if group is not None:
if isinstance(exp, IIFEUnsafeStatement):
group.unsafe = True
if _emits_bare_local(exp):
group.no_split = True
current.append(str(statement(exp)).rstrip())
if not components:
return "\n".join(prefix) + "\n\n"
pieces = list(prefix)
for body in components:
pieces.extend(_wrap_in_iifes(body, max_statements=50))
pieces: list[str] = list(prefix)
for g in components:
if g.unsafe:
pieces.extend(g.lines)
else:
cap = None if g.no_split else IIFE_MAX_STATEMENTS
pieces.extend(_wrap_in_iifes(g.lines, max_statements=cap))
return "\n".join(pieces) + "\n\n"
@property

View File

@@ -434,6 +434,25 @@ class LineComment(Statement):
return "\n".join(parts)
class IIFEUnsafeStatement(Statement):
"""Statement that must not be placed inside an IIFE lambda when
``cpp_main_section`` chunks ``setup()``. Causes the containing
component's block to be emitted flat (no IIFE), so constructs that
rely on exiting ``setup()`` directly — e.g. safe_mode's
``if (should_enter_safe_mode(...)) return;`` — still work.
Accepts either a ``Statement`` or a bare ``Expression``; bare
expressions are wrapped so they terminate with a semicolon."""
__slots__ = ("inner",)
def __init__(self, inner: Expression | Statement) -> None:
self.inner = inner
def __str__(self) -> str:
return str(statement(self.inner))
class ComponentMarker(Statement):
"""Chunking-boundary sentinel. ``cpp_main_section`` wraps the
statements between two markers in an IIFE to shorten temporary
@@ -441,15 +460,19 @@ class ComponentMarker(Statement):
Grouping is best-effort: ``flush_tasks`` can interleave coroutines
on ``await``, so a component's later statements may land in another
component's chunk. Safe because every statement either placement-
news into static storage or mutates a file-scope global."""
component's chunk. This is safe for the dominant codegen patterns
(placement-new into static storage, assignment to a file-scope
global); patterns that depend on function-local state within the
IIFE scope (cg.variable, with_local_variable, raw bare locals)
are kept together by the bare-local detection in cpp_main_section
so they aren't split across sibling lambdas."""
__slots__ = ("name",)
def __init__(self, name: str):
def __init__(self, name: str) -> None:
self.name = name
def __str__(self):
def __str__(self) -> str:
return f"// component-marker: {self.name}"
@@ -515,10 +538,15 @@ def literal(name: str) -> "MockObj":
def variable(
id_: ID, rhs: SafeExpType, type_: "MockObj" = None, register=True
id_: ID, rhs: SafeExpType, type_: "MockObj" = None, register: bool = True
) -> "MockObj":
"""Declare a new variable, not pointer type, in the code generation.
Emits a function-local declaration ``Type id = rhs;`` inside setup().
``cpp_main_section`` detects typed ``AssignmentExpression`` and
disables sub-chunking for the component's group, so later references
to the local within the same ``to_code`` stay visible.
:param id_: The ID used to declare the variable.
:param rhs: The expression to place on the right hand side of the assignment.
:param type_: Manually define a type for the variable, only use this when it's not possible

View File

@@ -7,7 +7,16 @@ import pytest
from strategies import mac_addr_strings
from esphome import const, core
from esphome.cpp_generator import ComponentMarker, RawStatement
from esphome.cpp_generator import (
AssignmentExpression,
CallExpression,
ComponentMarker,
ExpressionStatement,
IIFEUnsafeStatement,
MockObj,
RawExpression,
RawStatement,
)
class TestHexInt:
@@ -934,6 +943,20 @@ def test_wrap_in_iifes_unbalanced_braces_fall_through() -> None:
assert [line for line in result if line in lines] == lines
def test_wrap_in_iifes_negative_depth_stays_poisoned() -> None:
# Once depth goes negative the brace tracker is unreliable; a later
# arithmetic return to depth 0 must not re-enable splitting. Here
# "}" drives depth to -1 immediately, then "{" later brings depth
# back to 0 arithmetically. With max=1 every statement would flush
# if splitting were still enabled — assert everything emits as one
# IIFE because the poisoned flag stayed set.
lines = ["}", "a();", "{", "b();", "}", "c();"]
result = core._wrap_in_iifes(lines, max_statements=1)
assert sum(1 for line in result if line == "[]() {") == 1
assert result[0] == "[]() {"
assert result[-1] == "}();"
def test_wrap_in_iifes_never_splits_inline_brace_lines() -> None:
# Defensive: if codegen ever emits control flow with braces on the
# same line (if/else/for), the depth tracker should keep the whole
@@ -1011,6 +1034,136 @@ def test_cpp_main_section_component_marker_wraps_in_iife() -> None:
assert "component-marker" not in out
_CHUNK_COUNT = 3 # number of sub-chunks to force when testing splitting
# Statement count that produces exactly _CHUNK_COUNT IIFEs when sub-split
# at IIFE_MAX_STATEMENTS (full chunks at the cap, no partial).
_STATEMENTS_OVER_CAP = core.IIFE_MAX_STATEMENTS * _CHUNK_COUNT
def test_cpp_main_section_scope_brace_raw_disables_sub_split() -> None:
# A group containing scope-brace RawStatements (e.g. `{` / `}` from
# with_local_variable) must stay in one IIFE regardless of size so
# the scope bounds and any locals between them stay together.
target = core.EsphomeCore()
stmts: list = [ComponentMarker("wifi"), RawStatement("{")]
stmts.extend(RawStatement(f"s{i}();") for i in range(_STATEMENTS_OVER_CAP))
stmts.append(RawStatement("}"))
target.main_statements = stmts
out = target.cpp_main_section
assert out.count("[]() {") == 1
assert out.count("}();") == 1
def test_cpp_main_section_inline_comment_raw_still_sub_splits() -> None:
# entity_helpers emits `call(); // flags` as RawStatement for inline
# comments. Those shouldn't flag the group as scope-using — the
# content-aware check only triggers on bare `{` / `}`.
target = core.EsphomeCore()
stmts: list = [ComponentMarker("sensor")]
stmts.extend(
RawStatement(f"s{i}(); // flags") for i in range(_STATEMENTS_OVER_CAP)
)
target.main_statements = stmts
out = target.cpp_main_section
assert out.count("[]() {") == _CHUNK_COUNT
def test_cpp_main_section_raw_expression_disables_sub_split() -> None:
# cg.add(RawExpression(...)) — e.g. `time::ParsedTimezone tz` followed
# by `tz.field = ...` — is raw bare text that may reference a local
# declared elsewhere in the same group. Keep the group in one IIFE.
target = core.EsphomeCore()
stmts: list = [
ComponentMarker("time"),
ExpressionStatement(RawExpression("time::ParsedTimezone tz{}")),
]
stmts.extend(
ExpressionStatement(RawExpression(f"tz.field_{i} = {i}"))
for i in range(_STATEMENTS_OVER_CAP)
)
target.main_statements = stmts
out = target.cpp_main_section
assert out.count("[]() {") == 1
def test_cpp_main_section_raw_expression_as_call_arg_still_sub_splits() -> None:
# RawExpression passed as an argument to a method call (e.g.
# `var.set_program(RawExpression("&foo"))`) produces
# `ExpressionStatement(CallExpression(..., RawExpression))` — the
# outer expression is a CallExpression, not a RawExpression, so
# the group is still sub-splittable.
target = core.EsphomeCore()
stmts: list = [ComponentMarker("rp2040_pio_led_strip")]
stmts.extend(
ExpressionStatement(
CallExpression(MockObj(f"var_{i}.set_program"), RawExpression("&foo"))
)
for i in range(_STATEMENTS_OVER_CAP)
)
target.main_statements = stmts
out = target.cpp_main_section
assert out.count("[]() {") == _CHUNK_COUNT
def test_cpp_main_section_typed_assignment_disables_sub_split() -> None:
# cg.variable(id, rhs) emits `Type id = rhs;` via
# ExpressionStatement(AssignmentExpression(type=..., ...)). That's a
# function-local whose name must stay visible across all uses in
# the component — no sub-split.
target = core.EsphomeCore()
typed_assign = ExpressionStatement(
AssignmentExpression(MockObj("int"), "", MockObj("x"), MockObj("42"))
)
stmts: list = [ComponentMarker("custom"), typed_assign]
stmts.extend(RawStatement(f"use_x_{i}();") for i in range(_STATEMENTS_OVER_CAP))
target.main_statements = stmts
out = target.cpp_main_section
assert out.count("[]() {") == 1
def test_cpp_main_section_iife_unsafe_wins_over_no_split() -> None:
# A group that triggers BOTH flags (IIFEUnsafeStatement present AND
# a bare-local emission) must still be emitted flat — the unsafe
# flag wins because a `return` inside any IIFE, even a single big
# one, only exits the lambda.
target = core.EsphomeCore()
target.main_statements = [
ComponentMarker("safe_mode_with_local"),
RawStatement("{"), # would trigger no_split
RawStatement("new_foo();"),
RawStatement("}"),
IIFEUnsafeStatement(RawExpression("if (cond) return")),
]
out = target.cpp_main_section
assert "[]() {" not in out
assert "if (cond) return;" in out
def test_cpp_main_section_iife_unsafe_statement_emits_component_flat() -> None:
# A component that emits IIFEUnsafeStatement (e.g. safe_mode with
# `if (...) return;`) must be emitted flat — a `return` inside an
# IIFE would only exit the lambda, not setup().
target = core.EsphomeCore()
target.main_statements = [
ComponentMarker("logger"),
RawStatement("new_logger();"),
ComponentMarker("safe_mode"),
RawStatement("new_safe_mode();"),
IIFEUnsafeStatement(RawExpression("if (entering) return")),
ComponentMarker("sensor"),
RawStatement("new_sensor();"),
]
out = target.cpp_main_section
# logger and sensor wrapped; safe_mode flat.
assert out.count("[]() {") == 2
# safe_mode's statements appear at top level, not indented in a lambda.
assert "new_safe_mode();" in out
assert "if (entering) return;" in out
# The IIFEUnsafeStatement wrapper picks up the trailing semicolon
# via statement() when inner is a bare Expression.
assert "if (entering) return\n" not in out
def test_cpp_main_section_comment_only_component_omits_iife() -> None:
# A component that emits only a ComponentMarker (no statements) adds
# nothing to the generated output. A neighboring component with