Skip to content

last_nonempty_at uses wall-clock (time.time()); suspend/resume + NTP backward jumps produce wrong-state #18

Description

@zzallirog

Follow-up on PR #17 (v0.5.18). Implementation chose time.time() for _collector_last_nonempty_at; tile JS computes seconds_since_nonempty = (Date.now() / 1000) - last_nonempty_at and renders > 60s as stale.

Failure modes

  1. Suspend/resume on laptop: wall clock advances during suspend; time.monotonic() does not. Sample-at-14:00 with wall=14:00, suspend 4h, wake at 18:00. Tile correctly shows stale until next sample tick refreshes last_nonempty_at = 18:00:01. This case works correctly with wall clock.
  2. NTP backward correction: wall jumps from 18:05:30 to 18:05:00 (-30s). Existing last_nonempty_at = 18:05:25 is now "in the future" relative to current wall. Tile computes nowSec - 18:05:25 = -5s, not > 60, renders as fresh. False-positive "fresh" classification persists until next sample. Rare but real.
  3. Monotonic alternative: time.monotonic() would fix G-10: coolstep tail CLI shows cpu_tctl=0.0°C on Intel even after G-9 fix #2 but break G-7 follow-up: backend-emitted unavailable_reason per signal #1 — Python CLOCK_MONOTONIC pauses during suspend, so a 4h-suspended collector would show seconds_since_nonempty ≈ 0 after wake, incorrectly fresh. Worse than wall.
  4. Right answer: time.clock_gettime(time.CLOCK_BOOTTIME) on Linux — does not jump (vs wall) AND advances during suspend (vs CLOCK_MONOTONIC). Doesnt exist via time.monotonic() wrapper; must use clock_gettime directly.

Proposal

Replace time.time() in coolstep/dashboard/server.py:_collector_last_nonempty_at write site with time.clock_gettime(time.CLOCK_BOOTTIME). Tile-side computation symmetric — request-time derived value (compute seconds_since_nonempty on server using same clock, expose only relative). Or keep last_nonempty_at as monotonic-domain value with field rename to make it obvious (last_nonempty_boottime).

Acceptance

Test: monkeypatch time.clock_gettime to simulate suspend gap (advance returned value by 4h), assert tile renders stale. Test: monkeypatch to simulate NTP backward jump, assert tile remains fresh until next sample (not flipping into negative).

Trade-off: CLOCK_BOOTTIME is Linux-specific. Project is Linux-only (per compat/), so acceptable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions