Skip to content

GPU/TPC: Increace assumed cacheline size to 128 byte in cluster finder#15545

Open
fweig wants to merge 1 commit into
AliceO2Group:devfrom
fweig:cl-128byte
Open

GPU/TPC: Increace assumed cacheline size to 128 byte in cluster finder#15545
fweig wants to merge 1 commit into
AliceO2Group:devfrom
fweig:cl-128byte

Conversation

@fweig

@fweig fweig commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Adjust memory layout in TPC cluster finder for 128 byte cachelines found in modern GPU architectures.

On RTX 5080 this increases the throughput of the noisy pad filter from 221 GB/s to 379 GB/s.

On Radeon VII throughput increases from 58 GB/s to 81 GB/s.

Other kernels of the cluster finder either slightly increase in performance or stay the same.

In my test this makes the full cluster finder slightly slower on CPU (869ms -> 891ms), but I don't think this is large enough to justify adding a separate path for 64 byte cachelines.

@fweig fweig requested a review from davidrohr as a code owner June 22, 2026 11:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant