Skip to content

Disable device availability checker during diagnostics regeneration#682

Draft
TheJulianJES wants to merge 1 commit into
zigpy:devfrom
TheJulianJES:tjj/disable_polling_diagnostics_regen
Draft

Disable device availability checker during diagnostics regeneration#682
TheJulianJES wants to merge 1 commit into
zigpy:devfrom
TheJulianJES:tjj/disable_polling_diagnostics_regen

Conversation

@TheJulianJES
Copy link
Copy Markdown
Contributor

Proposed change

This disable device availability checker during diagnostics regeneration.
At the moment, there's a rare issue where you can manage to regenerate diagnostics with available being set to false everywhere, as the checker starts after 30 to 45 seconds of running diagnostics regeneration.

It's possible that the implementation can be cleaned up. Just wanted to put it into a PR for now.

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.51%. Comparing base (229df22) to head (96248f9).
⚠️ Report is 32 commits behind head on dev.

Additional details and impacted files
@@           Coverage Diff           @@
##              dev     #682   +/-   ##
=======================================
  Coverage   97.51%   97.51%           
=======================================
  Files          62       62           
  Lines       10949    10949           
=======================================
  Hits        10677    10677           
  Misses        272      272           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

# asynchronously mutate device availability while fixtures are being written.
zha_gateway.global_updater.stop()
zha_gateway._device_availability_checker.stop() # noqa: SLF001
zha_gateway.config.allow_polling = False
Copy link
Copy Markdown
Contributor Author

@TheJulianJES TheJulianJES Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zha_gateway.config.allow_polling will be set to True by fetch_updated_state:

self.config.allow_polling = True

With #654, we make sure to run that before regenerating diagnostics but that also means it'll re-enable allow_polling still. So that's why we also stop the global_updater and _device_availability_checker to fix the issue.

Maybe we should just patch out fetch_updated_state entirely for diagnostics. EDIT: Eh, also not possible in a nice way.

Copy link
Copy Markdown
Contributor Author

@TheJulianJES TheJulianJES Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we'd set allow_polling = False after this line, it should work without needing to stop the global_updater and _device_availability_checker:

await zha_gateway.async_block_till_done(wait_background_tasks=True)

Hmm...

Copy link
Copy Markdown
Collaborator

@zigpy-review-bot zigpy-review-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tooling-only change, scoped to tools/regenerate_diagnostics.py — no runtime leakage. The motivation (background availability checker firing 30-45 s into a regen run and flipping every device's available to false in the regenerated fixtures) is real, and stopping global_updater + _device_availability_checker does fix it.

One thing worth tightening before this comes out of draft — the allow_polling = False assignment is effectively dead. TestGateway.__aenter__ calls async_initialize_devices_and_entities(), which schedules fetch_updated_state as a background task (async_create_background_task, not awaited by async_block_till_done since wait_background_tasks=False is the default). That background task ends with self.config.allow_polling = True (gateway.py:389). So as soon as the event loop yields inside the first join_zigpy_device / async_block_till_done call, allow_polling will flip back to True regardless of what the new code sets here — which is exactly what you flagged in your own self-review threads on line 45.

In practice the two .stop() calls are doing all the work; the periodic loops never see the allow_polling = True flip-back because their tasks have been cancelled. So the line is harmless, just misleading. Two options:

  1. Drop the allow_polling = False line and let the .stop() calls stand on their own (cleanest given the comment already says "disable background tasks").
  2. Or, await the background task to settle first (await zha_gateway.async_block_till_done(wait_background_tasks=True) before the three lines), then set allow_polling = False, and you can drop the .stop() calls entirely — which is the alternative your second self-review thread sketched.

No tests needed — one-off tooling script. The # noqa: SLF001 on the private-attribute access is fine for tooling; if a follow-up wants a Gateway.stop_periodic_tasks() helper that mirrors what shutdown() already does (gateway.py:862-863), that'd let this script and the shutdown path share a code path, but it's not a blocker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants