Skip to content

perf/Optimize neighbor search setup#7528

Open
Audrey-777 wants to merge 2 commits into
deepmodeling:developfrom
Audrey-777:perf/neighsearch-setup-openmp
Open

perf/Optimize neighbor search setup#7528
Audrey-777 wants to merge 2 commits into
deepmodeling:developfrom
Audrey-777:perf/neighsearch-setup-openmp

Conversation

@Audrey-777

Copy link
Copy Markdown

Summary

This PR optimizes the MD neighbor-search setup path in NeighborSearch.

The change keeps the existing ordering semantics while reducing setup overhead for larger systems:

  • Cache base-cell atoms once in type/local order.
  • Fill periodic images by deterministic image index.
  • Classify generated atoms into inside/ghost categories in parallel for large generated atom lists.
  • Preserve ordered inside/ghost output by sequential ordered write-back.

A conservative threshold is used so smaller systems keep the serial setup path.

Correctness

The implementation preserves:

  • all_atoms_ image-major / type / local-atom order.
  • atom_id as the linear all-atom index.
  • Inside/ghost ordering from the original all_atoms_ scan.
  • OpenMP OFF behavior.
  • C++11 compatibility.

Performance

Environment:

  • 8 physical cores / 16 logical CPUs
  • Intel Xeon Platinum 8163
  • np=1
  • OMP_PROC_BIND=spread
  • OMP_PLACES=cores

2048 atom LJ NVE, 200 steps

Threads Wall ABACUS total NeighborSearch::init set_member_variables
1 10.79 s 10.42 s 0.79 s 0.38 s
2 10.61 s 10.23 s 0.76 s 0.36 s
8 11.22 s 10.85 s 0.77 s 0.36 s

8192 atom LJ NVE, 100 steps

Threads Wall ABACUS total NeighborSearch::init set_member_variables
1 24.19 s 23.82 s 1.60 s 0.78 s
2 21.68 s 21.32 s 0.75 s 0.34 s
4 21.07 s 20.71 s 0.64 s 0.34 s
8 20.70 s 20.34 s 0.52 s 0.30 s
16 21.44 s 21.07 s 0.65 s 0.37 s

Best 8192 result

  • Wall time: 24.19 s -> 20.70 s, about 1.17x.
  • NeighborSearch::init: 1.60 s -> 0.52 s, about 3.08x.
  • set_member_variables: 0.78 s -> 0.30 s, about 2.60x.

The 2048-atom case is mostly noise-bound and does not benefit from high thread counts, so this PR should be read as a larger-system setup optimization.

Tests

  • git diff --check
  • OpenMP ON focused build
  • OpenMP ON focused ctest: passed
  • OpenMP OFF focused build
  • OpenMP OFF focused ctest: passed
  • GitHub check on c9a28da13: passed

Copilot AI review requested due to automatic review settings June 26, 2026 13:51

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the MD neighbor-search setup in NeighborSearch by making periodic image generation and inside/ghost classification more cache-friendly and optionally parallel for large workloads, while aiming to preserve the existing output ordering semantics.

Changes:

  • Precomputes “base cell” atoms once (type/local order) and reuses them to fill periodic images deterministically by image index.
  • Adds an OpenMP-enabled setup path for large generated-atom counts, and an OpenMP-enabled inside/ghost classification path with ordered sequential write-back.
  • Adds ModuleBase::timer instrumentation to set_member_variables, init, and build_neighbors.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +209 to +215
const int atoms_per_image = static_cast<int>(base_atoms.size());
const int nimage_x = glayerX_ + glayerX_minus_;
const int nimage_y = glayerY_ + glayerY_minus_;
const int nimage_z = glayerZ_ + glayerZ_minus_;
const int nimages = nimage_x * nimage_y * nimage_z;
const int total_atoms = nimages * atoms_per_image;

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in c40ee17. I updated the setup count calculations to use checked long long intermediates and explicitly validate that nimage_x/y/z, nimages, and total_atoms fit in int before they are used for indexing or vector sizing. If the generated neighbor search atom count exceeds the supported int indexing range, the code now exits via ModuleBase::WARNING_QUIT instead of overflowing silently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants