Stale-policy fail-closed

If the agent can't reach the cloud for an extended period, what should it do? Two defensible answers:

  • Trust the cache: keep enforcing whatever policy was last received.
  • Fail closed: ignore the cache, default-block everything until cloud connectivity returns.

v1 picks fail-closed past a configurable staleness window. The default is 7 days.

How it works

The agent records last_successful_fetch after every 200 or 304 from the cloud. On every device-plug event, the agent checks whether the staleness exceeds the configured window. If it does, the engine returns a default-block decision regardless of what the cached policy says. The tray surfaces "Policy stale" amber, and a policy_stale tamper event fires (debounced to one per hour to avoid noise).

When the cloud comes back

The first successful fetch resets the staleness clock. The cached policy becomes authoritative again immediately; tray flips back to healthy; no manual intervention needed.

Tuning the window

Per-tenant setting MaxStalenessDays (default 7). Lower values are stricter; higher values let road-warrior laptops survive longer offline trips without flipping into fail-closed.

Why fail-closed is the right default

A long-offline endpoint has had time for the local situation to change in ways the cached policy can't anticipate — a thumb drive that was previously allow-listed by serial might have been revoked centrally for a reason. The trust model says "the cloud is authoritative"; fail-closed makes that real.

Stale-policy fail-closed — PermitUSB docs