Remove Status.SUCCESS guards everywhere#678
Conversation
zigpy-review-bot
left a comment
There was a problem hiding this comment.
Reviewing this with the DO NOT MERGE caveat in mind, but flagging concrete failure modes per guard so the upstream-zigpy / selective-keep decision has full context.
Why these guards exist
In zigpy 1.4.1 (current dev pin), Cluster.command() -> Cluster.request() returns the raw response (typically a (command_id, status) tuple or Default_Response) and does not raise on READ_ONLY / FAILURE / UNSUP_CLUSTER_COMMAND. write_attributes() returns a list of WriteAttributesStatusRecords, again no raise on per-record failure. Device.update_firmware() likewise returns a non-SUCCESS foundation.Status for partial failures (zigpy/device.py line 1004).
So removing these guards today is a silent-failure change, not a dead-code change. Puddly's note on #672 ("thinking of having zigpy soon handle all ZCL default responses directly, throwing an error for any non-SUCCESS codes") is the only path that makes these guards truly redundant, and that hasn't shipped. I'd hold this PR until that zigpy change lands and is pinned in pyproject.toml -- otherwise we ship a regression window.
Per-platform failure mode after this PR (must address before merge)
The failing CI tests at https://github.com/zigpy/zha/actions/runs/22379645135/job/64777328461 are the canonical evidence for each of these -- they're not stale tests to update, they're documenting real semantics this PR loses:
cover/__init__.py(WindowCoveringandShade): onUNSUP_CLUSTER_COMMAND, the entity now silently advances toOPENING/CLOSING/CLOSED(Shade.async_close_coversetsself._is_open = Falseregardless of status). ForWindowCoveringspecifically the transition timer (_set_lift_transition_target) stays armed forDEFAULT_MOVEMENT_TIMEOUT(~75s) before_clear_lift_transitionfires, so HA sees a stuckOPENINGstate.test_cover_failuresandtest_shadecatch this.lock/__init__.py: previously logged"Error with lock_door: ..."and short-circuited. Now silently flipsself._state = STATE_LOCKED.test_lockcatches this -- the lock UI will show "locked" while the physical device rejected the command.switch.py: same shape as lock -- silentself._state = True/Falseflip. Note the deletion also removes theisinstance(result, Exception)defensive branch; that one is dead code in current zigpy (errors raise rather than being returned), so that part is fine, but theStatus.SUCCESSremoval is the same silent-state bug as elsewhere.test_switchcatches it.update.py:device.update_firmware()returnsfoundation.Statusfor partial failures (see zigpy device.py:1004). Without the guard, a half-failed OTA reports success to HA.test_firmware_update_raisescatches it.light/__init__.py: most subtle. Theset_transition_flag and not self._transition_listener: self.async_transition_complete()early returns are gone, so on a failedmove_to_level_with_on_offthe transition flag stays set indefinitely (next transition timer doesn't get armed, attribute reports get ignored as if a transition is in progress). The recent #672 fix for stuck transitioning flags is being partially undone by this. Even if we accept that the transitioning logic could be reworked, dropping these without replacement breaks one of the very paths #672 was added to handle.cluster_handlers/__init__.py::write_attributes_safe: now just callswrite_attributesand returns. The whole purpose of the_safevariant was to raise; if the variant doesn't raise,write_attributes_safeandwrite_attributesare the same function and call sites (e.g.climate.async_set_preset_mode) silently acceptStatus.FAILUREwhile still flipping_preset = "away".test_preset_settingandtest_write_attributes_safe_key_errorcatch this. Ifwrite_attributes_safeis no longer meaningfully different fromwrite_attributes, the right move is to remove the wrapper entirely and update call sites, not leave a misleadingly-named identity function.cluster_handlers/__init__.py::_configure_reporting_status: this one is a behavior bug independent of the guards-vs-no-guards question. The new code first iteratesfor record in resand setsevent_data[name]["status"] = record.status.name, but then immediately overwrites every entry withStatus.SUCCESS.namein the unconditionalfor attr_name in attr_namesloop. The mixed-response case (current_x = SUCCESS, current_y = UNSUPPORTED_ATTRIBUTE) now emitsSUCCESSfor both.test_configure_reporting_status[mixed_per_attribute]catches this. Even under the "zigpy will raise" future world, per-attribute partial failures fromconfigure_reporting_multiplearen't "errors" -- they're legitimate per-attribute statuses the reporting-config event needs to surface.
Suggested path forward
- Drive the zigpy change Puddly mentioned (turn non-
SUCCESSdefault responses into exceptions inside zigpy) first, get it released, pin it. - Then this PR becomes the dead-code-removal it wants to be -- but it's still worth a per-method audit because some methods (
update_firmware,write_attributes,configure_reporting) return statuses that aren't "default response from a unicast command" and won't be covered by that zigpy change. - Until then, if there are specific guards causing pain (the linked #672 discussion was about light), narrow this PR to those rather than the everywhere variant.
Not blocking the draft itself -- it's marked DO NOT MERGE and the conversation is the point -- but capturing the audit per file so the next pass has it.
DO NOT MERGE
Proposed change
This removes all(!)
Status.SUCCESSchecks for zigpy requests from ZHA.ZCL errors like
Status.READ_ONLY(or possibly justStatus.FAILURE) when writing an attribute and errors likeUNSUP_CLUSTER_COMMANDdon't seem to be raised in zigpy, but instead be returned in the request result. Similar story for attribute reporting configuration and so on.As this PR removes it for everything, it definitely breaks stuff if that happens. We may be able to selectively remove guards in the platforms for now, like light.
It might be good to adjust zigpy to always raise exceptions if it doesn't get a positive response from the device.
Additional information
Related PR:
isinstanceExceptionchecks from switch #677 (<- safe)(this PR currently includes that commit)
Related discussion in: