For sensors glink lpaicp crash#1408
Open
quic-vishsant wants to merge 3 commits into
Open
Conversation
The GLINK native driver has no formal channel state machine, making it prone to race conditions during concurrent open/close sequences. Consider a remote subsystem operating in island mode -- a low-power state where the subsystem suspends processing of its IPC message queue while continuing other execution. Incoming GLINK commands accumulate and are dispatched only when the subsystem exits island mode. The host processes GLINK_CMD_OPEN_ACK in interrupt context via qcom_glink_native_rx(), while GLINK_CMD_OPEN, GLINK_CMD_CLOSE, and GLINK_CMD_CLOSE_ACK are deferred to a worker thread. This asymmetry creates a window where two close acknowledgments for the same channel, but carrying different local channel IDs (lcids), are processed together. The first close acknowledgment clears the lcid to 0. Any subsequent close request then goes out with lcid 0, which the remote silently ignores. The host considers the channel closed while the remote keeps it open. On the next open attempt the remote sees a duplicate open and crashes. Introduce an explicit channel state machine to eliminate these races: - enum glink_channel_state covers local state: CLOSED, OPENING, OPENED, CLOSING. - remote_opened flag tracks the remote endpoint's state. - state_lock spinlock protects both state variables. - Helper functions validate and perform state transitions atomically. IDR entries (lcids, rcids) and the reference count are deferred until both local and remote sides are fully closed, preventing stale IDR lookups and use-after-free regardless of which close ordering occurs. Signed-off-by: Vishnu Santhosh <vishnu.santhosh@oss.qualcomm.com>
Debugging channel state issues during open/close sequences currently requires adding temporary instrumentation to the driver. Add two tracepoints that provide persistent, zero-overhead-when-disabled visibility into the state machine: - qcom_glink_channel_state: fires on every local state transition, recording old and new states and an error flag that indicates an invalid transition attempt. - qcom_glink_channel_info: fires on remote endpoint state changes, recording whether the remote side opened or closed. Together these allow a complete channel lifecycle to be reconstructed from a trace log without any driver modification. Signed-off-by: Vishnu Santhosh <vishnu.santhosh@oss.qualcomm.com>
Wire the qcom_glink_channel_state and qcom_glink_channel_info tracepoints into the channel state machine helpers. Signed-off-by: Vishnu Santhosh <vishnu.santhosh@oss.qualcomm.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
GLINK channels can crash a remote subsystem when open close sequences races. This occurred during island mode. Island mode is a low-power state where the remote suspends IPC queue processing while continuing other execution -- queued GLINK commands are dispatched in a burst when the subsystem exits island mode.
The root cause is the absence of a formal channel state machine. The host processes GLINK_CMD_OPEN_ACK in interrupt context while CLOSE and CLOSE_ACK are deferred to a worker thread. When two close acknowledgments for the same channel arrive back-to-back, the first clears lcid to 0; the second close request goes out with lcid 0, which the remote silently ignores. The host believes the channel is closed; the remote keeps it open. On the next open attempt the remote sees a duplicate open and asserts.