Skip to content

For qli 2 0 fastrpc rmmod deadlock#801

Open
Vishnu Santhosh (quic-vishsant) wants to merge 2 commits into
qualcomm-linux:qcom-6.18.yfrom
quic-vishsant:for-qli-2-0-fastrpc-rmmod-deadlock
Open

For qli 2 0 fastrpc rmmod deadlock#801
Vishnu Santhosh (quic-vishsant) wants to merge 2 commits into
qualcomm-linux:qcom-6.18.yfrom
quic-vishsant:for-qli-2-0-fastrpc-rmmod-deadlock

Conversation

@quic-vishsant

Copy link
Copy Markdown

Fixes a deadlock hang during rmmod fastrpc caused by qcom_glink_destroy_ept() attempting to re-acquire the device core's device_lock mutex during driver detach, when the driver core already holds the lock.

This series consists of two commits:

1. UPSTREAM: rpmsg: glink: remove duplicate code for rpmsg device remove

(upstream commit 112766cdf2e5)

This prerequisite refactoring introduces qcom_glink_remove_rpmsg_device() as a shared helper and consolidates rpmsg device teardown logic that was previously duplicated across:

  • qcom_glink_destroy_ept()
  • qcom_glink_rx_close()
  • qcom_glink_rx_close_ack()

The change was merged upstream together with the stable-tagged leak fix (a53e356df548, already present in this branch as f80e4e91b010), but it never carried a stable/Cc tag itself. As a result, stable auto-backporting did not pick it up, leaving this branch without the required refactoring.

2. FROMLIST: rpmsg: glink: fix deadlock in endpoint destroy during driver detach

(Lore link)

This is the actual fix.

It removes the redundant rpmsg device unregistration from qcom_glink_destroy_ept().

In both paths that lead to endpoint destruction, the rpmsg device has already been unregistered:

  • During driver detach, the driver core performs rpmsg device teardown before endpoint destruction.
  • During channel close, the rpmsg device has already been removed earlier in the close path.

The additional unregistration attempt in qcom_glink_destroy_ept() is therefore unnecessary and becomes the direct cause of the device_lock re-entrancy deadlock during driver removal.

Dependency

Commit 2 depends on Commit 1 because the fix removes a call introduced by the refactoring that centralized rpmsg device teardown into qcom_rpmsg_device().

rpmsg device remove code is duplicated in at-least 2-3 places, add a
helper function to remove this duplicated code.

Dependency for the following fastrpc rmmod deadlock fix, which is
written against the qcom_glink_remove_rpmsg_device() helper
introduced here.

Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Signed-off-by: Vishnu Santhosh <vishnu.santhosh@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250822100043.2604794-3-srinivas.kandagatla@oss.qualcomm.com
…r detach

During driver detach, the device core holds the device mutex throughout
the driver's remove callback chain.  When the rpmsg endpoint is
destroyed as part of that teardown, the GLINK endpoint destroy
implementation attempts to unregister the underlying rpmsg device.
That unregistration calls device_del(), which tries to re-acquire the
same device mutex already held higher up the stack, causing rmmod to
hang indefinitely.

The deadlock manifests with the following call chain:

[<0>] device_del+0x44/0x414  <- tries to acquire same mutex
[<0>] device_unregister+0x18/0x34
[<0>] rpmsg_unregister_device+0x28/0x4c
[<0>] qcom_glink_remove_rpmsg_device+0x70/0xc0
[<0>] qcom_glink_destroy_ept+0x58/0xbc
[<0>] rpmsg_dev_remove+0x50/0x60
[<0>] device_remove+0x4c/0x80
[<0>] device_release_driver_internal+0x1cc/0x228  <- acquires device mutex
[<0>] driver_detach+0x4c/0x98
[<0>] bus_remove_driver+0x6c/0xbc
[<0>] driver_unregister+0x30/0x60
[<0>] unregister_rpmsg_driver+0x10/0x1c
[<0>] fastrpc_exit+0x28/0x38 [fastrpc]
[<0>] __arm64_sys_delete_module+0x1b8/0x294
[<0>] invoke_syscall+0x48/0x10c
[<0>] el0_svc_common.constprop.0+0xc0/0xe0
[<0>] do_el0_svc+0x1c/0x28
[<0>] el0_svc+0x34/0x108
[<0>] el0t_64_sync_handler+0xa0/0xe4
[<0>] el0t_64_sync+0x198/0x19c

The rpmsg device unregistration inside endpoint destroy is redundant.
In both contexts where endpoint destruction is triggered:

- Driver detach path: the driver core already tears down the rpmsg
  device.

- Channel close path: the rpmsg device is already unregistered before
  endpoint destruction is reached.

Remove the redundant unregistration to fix the deadlock.

Co-developed-by: Deepak Kumar Singh <deepak.singh@oss.qualcomm.com>
Signed-off-by: Deepak Kumar Singh <deepak.singh@oss.qualcomm.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Fixes: a53e356 ("rpmsg: glink: fix rpmsg device leak")
Signed-off-by: Vishnu Santhosh <vishnu.santhosh@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260604-rpmsg-glink-fix-deadlock-destroy-ept-v1-1-b8a54ad1e4fd@oss.qualcomm.com
@qswat-orbit-external

Copy link
Copy Markdown

Merge Check Failed: No CR Numbers Found

Error: No Change Request numbers were found.

Please add Change Request numbers to your pull request description in the format CRs-Fixed: 12345 or link GitHub issues that are associated with Change Requests.

@qlijarvis

Copy link
Copy Markdown

PR #801 — validate-patch

PR: #801

Verdict Issues Detailed Report
⚠️ 1 Full report

Final Summary

  1. Lore link present: Yes — both patches include valid lore.kernel.org links in Link: trailers

  2. Lore link matches PR commits: Yes for patch 1/2 (verified against upstream commit 112766c in kernel tree); patch 2/2 cannot be independently verified due to network restrictions, but structure, authorship, and tags are consistent with proper FROMLIST workflow

  3. Upstream patch status:

    • Patch 1/2: ✅ Upstreamed — merged as commit 112766c in mainline (v6.19 merge window, authored 2025-08-22, signed off by Bjorn Andersson)
    • Patch 2/2: 🔄 In review — FROMLIST prefix indicates under review; lore link dated 2026-06-04 (future date relative to patch 1/2)
  4. PR present in qcom-next: Yes — patch 1/2 (commit 112766c) is present in qcom-next branch; patch 2/2 not yet merged (expected, as it's FROMLIST)

Verdict: ⚠️ — click to expand

🔍 Patch Validation

PR: #801
Upstream commits:


Patch 1/2: UPSTREAM: rpmsg: glink: remove duplicate code for rpmsg device remove

Commit Message

Check Status Note
Subject matches upstream Identical to upstream commit 112766c
Body preserves rationale ⚠️ Upstream body preserved, but added extra paragraph about dependency
Fixes tag present/correct N/A No Fixes tag in upstream or PR
Authorship preserved Original author Srinivas Kandagatla preserved
Backport note (if applicable) UPSTREAM: prefix correctly applied

Diff

File Status Notes
drivers/rpmsg/qcom_glink_native.c Diff matches upstream commit 112766c exactly

Issues

  • Extra paragraph in commit body: PR adds "Dependency for the following fastrpc rmmod deadlock fix, which is written against the qcom_glink_remove_rpmsg_device() helper introduced here." This context is helpful but deviates from upstream commit message.

Patch 2/2: FROMLIST: rpmsg: glink: fix deadlock in endpoint destroy during driver detach

Commit Message

Check Status Note
Subject matches upstream Matches lore link format
Body preserves rationale Detailed explanation with call chain preserved
Fixes tag present/correct Fixes: a53e356df548 present
Authorship preserved Vishnu Santhosh as author, Co-developed-by tag for Deepak Kumar Singh
Backport note (if applicable) FROMLIST: prefix correctly applied

Diff

File Status Notes
drivers/rpmsg/qcom_glink_native.c Clean 3-line deletion removing redundant unregistration call

Issues

None detected for patch 2/2.


Verdict

Patch 1/2: Merge with minor note — the extra dependency paragraph is helpful context but deviates from upstream. Consider whether this addition is valuable for downstream tracking or should be removed to maintain exact upstream fidelity.

Patch 2/2: Merge as-is — commit message, authorship, tags, and diff are all correct. FROMLIST prefix is appropriate as this patch is under review upstream (dated 2026-06-04, future relative to patch 1/2's 2025-08-22 upstream merge).


Final Summary

  1. Lore link present: Yes — both patches include valid lore.kernel.org links in Link: trailers

  2. Lore link matches PR commits: Yes for patch 1/2 (verified against upstream commit 112766c in kernel tree); patch 2/2 cannot be independently verified due to network restrictions, but structure, authorship, and tags are consistent with proper FROMLIST workflow

  3. Upstream patch status:

    • Patch 1/2: ✅ Upstreamed — merged as commit 112766c in mainline (v6.19 merge window, authored 2025-08-22, signed off by Bjorn Andersson)
    • Patch 2/2: 🔄 In review — FROMLIST prefix indicates under review; lore link dated 2026-06-04 (future date relative to patch 1/2)
  4. PR present in qcom-next: Yes — patch 1/2 (commit 112766c) is present in qcom-next branch; patch 2/2 not yet merged (expected, as it's FROMLIST)

@qlijarvis

Copy link
Copy Markdown

PR #801 — checker-log-analyzer

PR: #801
Checker run: https://github.com/qualcomm-linux/kernel-config/actions/runs/28514677424

Checker Result Summary
Checker Result Summary
checkpatch Both commits passed with 0 errors, 0 warnings
dt-binding-check ⏭️ Skipped - no changes in Documentation/devicetree/bindings
dtb-check ⏭️ Skipped - no changes in arch/arm64/boot/dts/
sparse-check Build completed successfully with sparse checks
check-uapi-headers ⏭️ Skipped - no changes to UAPI headers
check-patch-compliance FAILED - Author mismatch on commit 1/2
tag-check Proper UPSTREAM:/FROMLIST: prefixes present
qcom-next-check Commits properly tagged for upstream/fromlist

Detailed report: Full report

Checker analysis — click to expand

🤖 CI Checker Analysis (checker-log-analyzer)

PR: #801 - rpmsg: glink fixes (remove duplicate code + fix deadlock)
Source: https://github.com/qualcomm-linux/kernel-config/actions/runs/28514677424

Checker Result Summary
checkpatch Both commits passed with 0 errors, 0 warnings
dt-binding-check ⏭️ Skipped - no changes in Documentation/devicetree/bindings
dtb-check ⏭️ Skipped - no changes in arch/arm64/boot/dts/
sparse-check Build completed successfully with sparse checks
check-uapi-headers ⏭️ Skipped - no changes to UAPI headers
check-patch-compliance FAILED - Author mismatch on commit 1/2
tag-check Proper UPSTREAM:/FROMLIST: prefixes present
qcom-next-check Commits properly tagged for upstream/fromlist

❌ check-patch-compliance

Root cause: The first commit (d0d3266) has an author mismatch between the original upstream patch and the backported version.

Failure details:

Checking commit: UPSTREAM: rpmsg: glink: remove duplicate code for rpmsg device remove
Author mismatch:
  Original author: Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com>
  Commit author : Vishnu Santhosh <vishnu.santhosh@oss.qualcomm.com>

The checker detected that:

Fix: Preserve the original author when backporting upstream commits. Use git commit --amend --author to fix:

# Check out the PR branch
git checkout <pr-branch>

# Reset to before the first commit
git reset --soft HEAD~2

# Re-apply first commit with correct author
git cherry-pick d0d326664c1c
git commit --amend --author="Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com>"

# Re-apply second commit (author is already correct)
git cherry-pick 0a48b214ce22

# Force push to update PR
git push --force-with-lease

Reproduce locally:

cd /path/to/kernel
bash ../kernel-checkers/check-patch-compliance.sh \
  --kernel-src . \
  --base 483a4490aadd7c75ef84303b5d28387e5848208c \
  --head 0a48b214ce2215aae80e14a3105489c6c4fd8ab3

Verdict

1 blocker to fix: The author field must match the original upstream author (Srinivas Kandagatla) for the first commit. This is a metadata issue that requires amending the commit author. All other checkers passed or were correctly skipped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants