Summary
During the radio CI pass, rtc_test.test_01_time_set was setting the hardware RTC to 2012-08-06 (Curiosity landing). When the set_now_time teardown reset the clock to current UTC via TIME_SET, AuthenticationRouter recorded command_loss_start using getTime() before RtcManager had processed the command — so it stamped 2012+Δ. Once the RTC jumped to 2026, the next run_handler tick computed elapsed ≈ 14 years, which exceeds COMM_LOSS_TIME (default 3 days), fired CommandLossFound, entered safe mode, stopped the watchdog, and rebooted FSW. recover_from_safe_mode got no response, and all subsequent radio tests inherited a dead link.
Observed in CI: CommandLossFound event "after 397545166 seconds without contact".
Root cause
AuthenticationRouter::update_command_loss_start uses this->getTime() (RTC wall clock). When RTC is written to a past date and then reset to current UTC, the frame for the reset command arrives at AuthRouter before RtcManager processes it, so the snapshot captures the stale past time. The subsequent wall-clock jump makes elapsed time enormous.
Short-term fix (done)
test_01_time_set now uses datetime.now(timezone.utc) - timedelta(hours=12) instead of the 2012 Curiosity landing date. This keeps elapsed < 3 days and avoids triggering CommandLossFound. Applied to both UART and radio modes for consistency.
Proper FSW fix (future work)
Change AuthenticationRouter::update_command_loss_start to use this->get_uptime() (Zephyr k_uptime_seconds()) instead of this->getTime(). Monotonic uptime is immune to RTC changes. The run_handler timebase branch already handles TB_PROC_TIME correctly — only one line in update_command_loss_start needs to change.
File: Components/AuthenticationRouter/AuthenticationRouter.cpp line 182
// Before:
Fw::Time current_time = this->getTime();
// After:
Fw::Time current_time = this->get_uptime();
This permanently prevents the entire class of "RTC jump → false command-loss" bugs without needing test workarounds.
Summary
During the radio CI pass,
rtc_test.test_01_time_setwas setting the hardware RTC to 2012-08-06 (Curiosity landing). When theset_now_timeteardown reset the clock to current UTC viaTIME_SET,AuthenticationRouterrecordedcommand_loss_startusinggetTime()beforeRtcManagerhad processed the command — so it stamped 2012+Δ. Once the RTC jumped to 2026, the nextrun_handlertick computed elapsed ≈ 14 years, which exceedsCOMM_LOSS_TIME(default 3 days), firedCommandLossFound, entered safe mode, stopped the watchdog, and rebooted FSW.recover_from_safe_modegot no response, and all subsequent radio tests inherited a dead link.Observed in CI:
CommandLossFoundevent "after 397545166 seconds without contact".Root cause
AuthenticationRouter::update_command_loss_startusesthis->getTime()(RTC wall clock). When RTC is written to a past date and then reset to current UTC, the frame for the reset command arrives at AuthRouter before RtcManager processes it, so the snapshot captures the stale past time. The subsequent wall-clock jump makes elapsed time enormous.Short-term fix (done)
test_01_time_setnow usesdatetime.now(timezone.utc) - timedelta(hours=12)instead of the 2012 Curiosity landing date. This keeps elapsed < 3 days and avoids triggeringCommandLossFound. Applied to both UART and radio modes for consistency.Proper FSW fix (future work)
Change
AuthenticationRouter::update_command_loss_startto usethis->get_uptime()(Zephyrk_uptime_seconds()) instead ofthis->getTime(). Monotonic uptime is immune to RTC changes. Therun_handlertimebase branch already handlesTB_PROC_TIMEcorrectly — only one line inupdate_command_loss_startneeds to change.File:
Components/AuthenticationRouter/AuthenticationRouter.cppline 182This permanently prevents the entire class of "RTC jump → false command-loss" bugs without needing test workarounds.