Skip to content

Remove Status.SUCCESS guards everywhere#678

Draft
TheJulianJES wants to merge 4 commits into
zigpy:devfrom
TheJulianJES:tjj/remove_status_guards
Draft

Remove Status.SUCCESS guards everywhere#678
TheJulianJES wants to merge 4 commits into
zigpy:devfrom
TheJulianJES:tjj/remove_status_guards

Conversation

@TheJulianJES
Copy link
Copy Markdown
Contributor

DO NOT MERGE

Proposed change

This removes all(!) Status.SUCCESS checks for zigpy requests from ZHA.

ZCL errors like Status.READ_ONLY (or possibly just Status.FAILURE) when writing an attribute and errors like UNSUP_CLUSTER_COMMAND don't seem to be raised in zigpy, but instead be returned in the request result. Similar story for attribute reporting configuration and so on.

As this PR removes it for everything, it definitely breaks stuff if that happens. We may be able to selectively remove guards in the platforms for now, like light.
It might be good to adjust zigpy to always raise exceptions if it doesn't get a positive response from the device.

Additional information

Related PR:

Related discussion in:

Copy link
Copy Markdown
Collaborator

@zigpy-review-bot zigpy-review-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing this with the DO NOT MERGE caveat in mind, but flagging concrete failure modes per guard so the upstream-zigpy / selective-keep decision has full context.

Why these guards exist

In zigpy 1.4.1 (current dev pin), Cluster.command() -> Cluster.request() returns the raw response (typically a (command_id, status) tuple or Default_Response) and does not raise on READ_ONLY / FAILURE / UNSUP_CLUSTER_COMMAND. write_attributes() returns a list of WriteAttributesStatusRecords, again no raise on per-record failure. Device.update_firmware() likewise returns a non-SUCCESS foundation.Status for partial failures (zigpy/device.py line 1004).

So removing these guards today is a silent-failure change, not a dead-code change. Puddly's note on #672 ("thinking of having zigpy soon handle all ZCL default responses directly, throwing an error for any non-SUCCESS codes") is the only path that makes these guards truly redundant, and that hasn't shipped. I'd hold this PR until that zigpy change lands and is pinned in pyproject.toml -- otherwise we ship a regression window.

Per-platform failure mode after this PR (must address before merge)

The failing CI tests at https://github.com/zigpy/zha/actions/runs/22379645135/job/64777328461 are the canonical evidence for each of these -- they're not stale tests to update, they're documenting real semantics this PR loses:

  • cover/__init__.py (WindowCovering and Shade): on UNSUP_CLUSTER_COMMAND, the entity now silently advances to OPENING / CLOSING / CLOSED (Shade.async_close_cover sets self._is_open = False regardless of status). For WindowCovering specifically the transition timer (_set_lift_transition_target) stays armed for DEFAULT_MOVEMENT_TIMEOUT (~75s) before _clear_lift_transition fires, so HA sees a stuck OPENING state. test_cover_failures and test_shade catch this.
  • lock/__init__.py: previously logged "Error with lock_door: ..." and short-circuited. Now silently flips self._state = STATE_LOCKED. test_lock catches this -- the lock UI will show "locked" while the physical device rejected the command.
  • switch.py: same shape as lock -- silent self._state = True/False flip. Note the deletion also removes the isinstance(result, Exception) defensive branch; that one is dead code in current zigpy (errors raise rather than being returned), so that part is fine, but the Status.SUCCESS removal is the same silent-state bug as elsewhere. test_switch catches it.
  • update.py: device.update_firmware() returns foundation.Status for partial failures (see zigpy device.py:1004). Without the guard, a half-failed OTA reports success to HA. test_firmware_update_raises catches it.
  • light/__init__.py: most subtle. The set_transition_flag and not self._transition_listener: self.async_transition_complete() early returns are gone, so on a failed move_to_level_with_on_off the transition flag stays set indefinitely (next transition timer doesn't get armed, attribute reports get ignored as if a transition is in progress). The recent #672 fix for stuck transitioning flags is being partially undone by this. Even if we accept that the transitioning logic could be reworked, dropping these without replacement breaks one of the very paths #672 was added to handle.
  • cluster_handlers/__init__.py::write_attributes_safe: now just calls write_attributes and returns. The whole purpose of the _safe variant was to raise; if the variant doesn't raise, write_attributes_safe and write_attributes are the same function and call sites (e.g. climate.async_set_preset_mode) silently accept Status.FAILURE while still flipping _preset = "away". test_preset_setting and test_write_attributes_safe_key_error catch this. If write_attributes_safe is no longer meaningfully different from write_attributes, the right move is to remove the wrapper entirely and update call sites, not leave a misleadingly-named identity function.
  • cluster_handlers/__init__.py::_configure_reporting_status: this one is a behavior bug independent of the guards-vs-no-guards question. The new code first iterates for record in res and sets event_data[name]["status"] = record.status.name, but then immediately overwrites every entry with Status.SUCCESS.name in the unconditional for attr_name in attr_names loop. The mixed-response case (current_x = SUCCESS, current_y = UNSUPPORTED_ATTRIBUTE) now emits SUCCESS for both. test_configure_reporting_status[mixed_per_attribute] catches this. Even under the "zigpy will raise" future world, per-attribute partial failures from configure_reporting_multiple aren't "errors" -- they're legitimate per-attribute statuses the reporting-config event needs to surface.

Suggested path forward

  1. Drive the zigpy change Puddly mentioned (turn non-SUCCESS default responses into exceptions inside zigpy) first, get it released, pin it.
  2. Then this PR becomes the dead-code-removal it wants to be -- but it's still worth a per-method audit because some methods (update_firmware, write_attributes, configure_reporting) return statuses that aren't "default response from a unicast command" and won't be covered by that zigpy change.
  3. Until then, if there are specific guards causing pain (the linked #672 discussion was about light), narrow this PR to those rather than the everywhere variant.

Not blocking the draft itself -- it's marked DO NOT MERGE and the conversation is the point -- but capturing the audit per file so the next pass has it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants