For qli 2 0 fastrpc rmmod deadlock#801
Conversation
rpmsg device remove code is duplicated in at-least 2-3 places, add a helper function to remove this duplicated code. Dependency for the following fastrpc rmmod deadlock fix, which is written against the qcom_glink_remove_rpmsg_device() helper introduced here. Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Vishnu Santhosh <vishnu.santhosh@oss.qualcomm.com> Link: https://lore.kernel.org/r/20250822100043.2604794-3-srinivas.kandagatla@oss.qualcomm.com
…r detach During driver detach, the device core holds the device mutex throughout the driver's remove callback chain. When the rpmsg endpoint is destroyed as part of that teardown, the GLINK endpoint destroy implementation attempts to unregister the underlying rpmsg device. That unregistration calls device_del(), which tries to re-acquire the same device mutex already held higher up the stack, causing rmmod to hang indefinitely. The deadlock manifests with the following call chain: [<0>] device_del+0x44/0x414 <- tries to acquire same mutex [<0>] device_unregister+0x18/0x34 [<0>] rpmsg_unregister_device+0x28/0x4c [<0>] qcom_glink_remove_rpmsg_device+0x70/0xc0 [<0>] qcom_glink_destroy_ept+0x58/0xbc [<0>] rpmsg_dev_remove+0x50/0x60 [<0>] device_remove+0x4c/0x80 [<0>] device_release_driver_internal+0x1cc/0x228 <- acquires device mutex [<0>] driver_detach+0x4c/0x98 [<0>] bus_remove_driver+0x6c/0xbc [<0>] driver_unregister+0x30/0x60 [<0>] unregister_rpmsg_driver+0x10/0x1c [<0>] fastrpc_exit+0x28/0x38 [fastrpc] [<0>] __arm64_sys_delete_module+0x1b8/0x294 [<0>] invoke_syscall+0x48/0x10c [<0>] el0_svc_common.constprop.0+0xc0/0xe0 [<0>] do_el0_svc+0x1c/0x28 [<0>] el0_svc+0x34/0x108 [<0>] el0t_64_sync_handler+0xa0/0xe4 [<0>] el0t_64_sync+0x198/0x19c The rpmsg device unregistration inside endpoint destroy is redundant. In both contexts where endpoint destruction is triggered: - Driver detach path: the driver core already tears down the rpmsg device. - Channel close path: the rpmsg device is already unregistered before endpoint destruction is reached. Remove the redundant unregistration to fix the deadlock. Co-developed-by: Deepak Kumar Singh <deepak.singh@oss.qualcomm.com> Signed-off-by: Deepak Kumar Singh <deepak.singh@oss.qualcomm.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Fixes: a53e356 ("rpmsg: glink: fix rpmsg device leak") Signed-off-by: Vishnu Santhosh <vishnu.santhosh@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260604-rpmsg-glink-fix-deadlock-destroy-ept-v1-1-b8a54ad1e4fd@oss.qualcomm.com
|
Merge Check Failed: No CR Numbers Found Error: No Change Request numbers were found. Please add Change Request numbers to your pull request description in the format CRs-Fixed: 12345 or link GitHub issues that are associated with Change Requests. |
PR #801 — validate-patchPR: #801
Final Summary
|
PR #801 — checker-log-analyzerPR: #801
Detailed report: Full report
|
Fixes a deadlock hang during
rmmod fastrpccaused byqcom_glink_destroy_ept()attempting to re-acquire the device core'sdevice_lockmutex during driver detach, when the driver core already holds the lock.This series consists of two commits:
1. UPSTREAM: rpmsg: glink: remove duplicate code for rpmsg device remove
(upstream commit
112766cdf2e5)This prerequisite refactoring introduces
qcom_glink_remove_rpmsg_device()as a shared helper and consolidates rpmsg device teardown logic that was previously duplicated across:qcom_glink_destroy_ept()qcom_glink_rx_close()qcom_glink_rx_close_ack()The change was merged upstream together with the stable-tagged leak fix (
a53e356df548, already present in this branch asf80e4e91b010), but it never carried a stable/Cc tag itself. As a result, stable auto-backporting did not pick it up, leaving this branch without the required refactoring.2. FROMLIST: rpmsg: glink: fix deadlock in endpoint destroy during driver detach
(Lore link)
This is the actual fix.
It removes the redundant rpmsg device unregistration from
qcom_glink_destroy_ept().In both paths that lead to endpoint destruction, the rpmsg device has already been unregistered:
The additional unregistration attempt in
qcom_glink_destroy_ept()is therefore unnecessary and becomes the direct cause of thedevice_lockre-entrancy deadlock during driver removal.Dependency
Commit 2 depends on Commit 1 because the fix removes a call introduced by the refactoring that centralized rpmsg device teardown into
qcom_rpmsg_device().