feat(api): make Attack Paths sink selectable between Neo4j and Neptune#11524
Conversation
…R-1276-porting-attack-paths-scan-target-database-from-neo-4-j-to-neo-4-j-neptune
…R-1276-porting-attack-paths-scan-target-database-from-neo-4-j-to-neo-4-j-neptune
…-paths sink inputs
…R-1276-porting-attack-paths-scan-target-database-from-neo-4-j-to-neo-4-j-neptune
…R-1276-porting-attack-paths-scan-target-database-from-neo-4-j-to-neo-4-j-neptune
ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Plus Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (9)
📝 WalkthroughWalkthroughAdds a sink abstraction and implementations (Neo4j, Neptune), an ingest driver for per-scan Cartography temp DBs, a facade routing ingest vs sink by database name and scan migration state, sink-based sync/materialization with list normalization, graph-store health probing, and coordinated tests/docs/config. ChangesAttack Paths Neptune Graph Sink Support
🎯 4 (Complex) | ⏱️ ~60 minutes
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
|
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
|
✅ Conflict Markers Resolved All conflict markers have been successfully resolved in this pull request. |
|
✅ All necessary |
There was a problem hiding this comment.
Pull request overview
This PR introduces a configurable persistent “sink” for Attack Paths graphs, allowing deployments to choose between the existing Neo4j-backed store and an Amazon Neptune-backed store, while keeping Cartography’s per-scan staging DB on Neo4j. It also adapts the sync/query model for Neptune openCypher limitations (notably list-typed properties) and updates health/readiness behavior and tests accordingly.
Changes:
- Added a sink abstraction with Neo4j and Neptune backends (factory-selected via
ATTACK_PATHS_SINK_DATABASE) plus per-scan routing usingAttackPathsScan.is_migrated. - Updated the sync pipeline to normalize sink properties and materialize catalogued list properties as child “item” nodes connected via typed
HAS_*relationships. - Updated API read paths, readiness probe behavior, docs (README/skill docs), and expanded/adjusted unit tests; bumped
cartographyto0.136.0.
Reviewed changes
Copilot reviewed 43 out of 47 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| skills/prowler-attack-paths-query/SKILL.md | Updates query authoring guidance for Neo4j/Neptune compatibility and list-item node modeling. |
| README.md | Documents selecting Neo4j vs Neptune as the Attack Paths persistent sink and required env vars. |
| api/src/backend/tasks/tests/test_attack_paths_scan.py | Updates scan tests for refactors and sink-aware sync changes; adds coverage for property normalization. |
| api/src/backend/tasks/jobs/attack_paths/sync.py | Refactors sync to write via sink backend, explode catalogued list properties into child nodes, and normalize property types for Neptune compatibility. |
| api/src/backend/tasks/jobs/attack_paths/scan.py | Wires sync/provider-type catalog resolution, uses new index helpers, and triggers legacy Neo4j drain after Neptune writes. |
| api/src/backend/tasks/jobs/attack_paths/queries.py | Removes sink-side sync write templates; keeps fetch queries for temp Neo4j DB reads. |
| api/src/backend/tasks/jobs/attack_paths/provider_config.py | Introduces ProviderConfig + NormalizedList catalog (AWS) to drive list-property materialization. |
| api/src/backend/tasks/jobs/attack_paths/legacy_drain.py | Adds opportunistic cleanup of legacy Neo4j tenant DB subgraphs during Neptune cutover. |
| api/src/backend/tasks/jobs/attack_paths/indexes.py | Adds sink-aware index creation behavior (Neo4j-only) and wraps Cartography index creation. |
| api/src/backend/tasks/jobs/attack_paths/findings.py | Import ordering cleanup; no functional sink changes here. |
| api/src/backend/tasks/jobs/attack_paths/db_utils.py | Adds is_migrated initialization for new scans and routes recovery checks through per-scan backend selection. |
| api/src/backend/tasks/jobs/attack_paths/config.py | Re-exports provider config types/constants from new provider_config.py. |
| api/src/backend/tasks/jobs/attack_paths/cleanup.py | Renames internal helper to mark_scan_finished and updates imports accordingly. |
| api/src/backend/conftest.py | Adds sink_backend_stub fixture to isolate sink factory caching in tests. |
| api/src/backend/config/guniconf.py | Reinitializes attack-paths drivers post-fork to avoid dead driver IO threads under preload_app. |
| api/src/backend/config/django/production.py | Adds Neptune connection settings under DATABASES["neptune"]. |
| api/src/backend/config/django/devel.py | Adds Neptune connection settings for dev environment. |
| api/src/backend/config/django/base.py | Adds ATTACK_PATHS_SINK_DATABASE setting defaulting to neo4j. |
| api/src/backend/api/v1/views.py | Routes query catalog/lookup by is_migrated and passes scan into query execution helpers for backend routing. |
| api/src/backend/api/tests/test_views.py | Updates expectations for new is_migrated parameters and scan passing into execution helpers. |
| api/src/backend/api/tests/test_sink.py | Adds comprehensive tests for sink factory selection, Neo4j/Neptune sink behaviors, and cutover routing. |
| api/src/backend/api/tests/test_legacy_drain.py | Adds unit tests validating legacy Neo4j drain behavior and exception swallowing. |
| api/src/backend/api/tests/test_health.py | Updates readiness probe to “graphdb” abstraction and tests sink-reflecting componentId + probe timeout behavior. |
| api/src/backend/api/tests/test_attack_paths.py | Updates tests to use sink stubs and per-scan routing in query execution helpers. |
| api/src/backend/api/tests/test_attack_paths_database.py | Replaces legacy driver tests with facade routing tests (ingest vs sink) and exception hierarchy checks. |
| api/src/backend/api/models.py | Adds AttackPathsScan.is_migrated cutover flag for per-scan routing. |
| api/src/backend/api/migrations/0096_attack_paths_scan_is_migrated.py | Adds migration introducing is_migrated boolean field. |
| api/src/backend/api/health.py | Readiness now probes the configured graph sink (Neo4j/Neptune) with a dedicated timeout budget using a small executor. |
| api/src/backend/api/attack_paths/views_helpers.py | Routes reads via get_backend_for_scan(scan) and adds Neptune custom-query timeout hinting for migrated scans. |
| api/src/backend/api/attack_paths/sink/neptune.py | Implements Neptune sink: dual drivers (writer/reader), SigV4 auth, sync write path, and bounded subgraph deletes. |
| api/src/backend/api/attack_paths/sink/neo4j.py | Implements Neo4j sink: driver lifecycle, sessions, sync write path, and batched subgraph delete with DB-not-found handling. |
| api/src/backend/api/attack_paths/sink/factory.py | Adds sink factory with process-wide caching and cutover routing to secondary Neo4j backend for legacy scans. |
| api/src/backend/api/attack_paths/sink/base.py | Defines SinkDatabase protocol contract for sink implementations. |
| api/src/backend/api/attack_paths/sink/init.py | Exposes sink factory API as the public sink module surface. |
| api/src/backend/api/attack_paths/queries/registry.py | Adds dual query catalogs (current vs deprecated) selected by is_migrated. |
| api/src/backend/api/attack_paths/ingest/driver.py | Adds dedicated Neo4j driver for per-scan temp databases (always Neo4j), separate from sink. |
| api/src/backend/api/attack_paths/ingest/init.py | Exposes ingest driver functions as the public ingest module surface. |
| api/src/backend/api/attack_paths/database.py | Converts database module into a facade routing temp DB ops to ingest and everything else to the configured sink. |
| api/src/backend/api/attack_paths/cypher_sanitizer.py | Comment/doc formatting tweaks (no behavioral change shown in diff). |
| api/src/backend/api/apps.py | Removes outdated startup comment referencing old single-driver behavior. |
| api/pyproject.toml | Bumps cartography to 0.136.0 (and updates constraints); also bumps workos constraint. |
| api/CHANGELOG.md | Adds changelog entry documenting Neptune sink support and configuration. |
| .gitignore | Ignores docker-compose override files. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
🔒 Container Security ScanImage: 📊 Vulnerability Summary
16 package(s) affected
|
There was a problem hiding this comment.
Actionable comments posted: 10
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
api/src/backend/tasks/tests/test_attack_paths_scan.py (1)
1216-1219:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPatch the actual backend check path in this recovery test.
Line 1217 mocks
graph_database.has_provider_data, butrecover_graph_data_readynow callsget_backend_for_scan(...).has_provider_data(...). This test currently passes without exercising the exception-swallowing branch it claims to cover.Proposed fix
- with patch( - "tasks.jobs.attack_paths.db_utils.graph_database.has_provider_data", - side_effect=Exception("Neo4j unreachable"), - ): + backend = MagicMock() + backend.has_provider_data.side_effect = Exception("Neo4j unreachable") + with patch( + "api.attack_paths.sink.get_backend_for_scan", + return_value=backend, + ): # Should not raise recover_graph_data_ready(attack_paths_scan)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/src/backend/tasks/tests/test_attack_paths_scan.py` around lines 1216 - 1219, The test is mocking the wrong symbol so it never triggers the exception path in recover_graph_data_ready; update the patch to target the backend call actually used: mock get_backend_for_scan to return an object whose has_provider_data raises Exception("Neo4j unreachable") (or patch tasks.jobs.attack_paths.get_backend_for_scan(...).has_provider_data directly) so that recover_graph_data_ready exercises the exception-swallowing branch; reference recover_graph_data_ready, get_backend_for_scan, and has_provider_data when locating where to change the test.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@api/src/backend/api/attack_paths/ingest/driver.py`:
- Line 166: The current session.run call in driver.py embeds the database
variable directly into the Cypher DDL (session.run(f"DROP DATABASE `{database}`
IF EXISTS DESTROY DATA")), which is risky if the source changes; add a short
inline comment near that call stating the trust boundary (database is produced
by get_database_name() and is UUID-based) and add a validation/assertion before
the call (e.g., assert that get_database_name() matches a UUID regex or only
contains allowed characters) to ensure future values cannot inject harmful
content; reference the session.run line and get_database_name() in your change.
- Around line 138-145: The exception message is incorrect for
WriteQueryNotAllowedException when default_access_mode == neo4j.READ_ACCESS;
change the raised message from "Read query not allowed" to "Write query not
allowed" in the raise call inside the block that checks default_access_mode and
READ_EXCEPTION_CODES (refer to the raise of WriteQueryNotAllowedException in
driver.py), make the same message change where WriteQueryNotAllowedException is
raised in sink/neo4j.py and sink/neptune.py, and update assertions in
api/src/backend/api/tests/test_attack_paths.py that expect the old message to
assert the new "Write query not allowed" text; keep the existing exception type,
code argument (READ_EXCEPTION_CODES[0]), and conditional logic unchanged.
In `@api/src/backend/api/attack_paths/sink/neo4j.py`:
- Around line 257-260: In the except block catching GraphDatabaseQueryException
(the "Failed to clear query cache for database" handler), replace the use of the
root logging module (logging.warning) with the module-level logger variable
named logger (defined near the top of the file) so the warning is emitted
through the same logger instance used elsewhere (i.e., change
logging.warning(...) to logger.warning(...), keeping the existing message and
exception interpolation).
- Around line 88-99: The comment above the connectivity check contains a typo
("Neo4" should be "Neo4j"); update the comment text near the try/except that
calls self._driver.verify_connectivity() (and the surrounding explanatory lines
referencing the Neo4j sink and lazy reconnection) to replace "Neo4" with "Neo4j"
so the comment accurately names the database in the block that logs via
logger.warning.
In `@api/src/backend/api/attack_paths/sink/neptune.py`:
- Around line 409-452: The SigV4 Host header in _NeptuneAuthToken is built from
urlsplit(url).hostname which drops non-default ports and can break Neptune IAM
signing; change the Host header construction to use the full authority
(urlsplit(url).netloc) when calling request.headers.add_header("Host", ...),
keep the rest of the SigV4Auth flow (AWSRequest, SigV4Auth, json-dumped
auth_obj, ExpiringAuth returned by neptune_auth_provider) unchanged, and add a
unit test that constructs a token via _NeptuneAuthToken (or invokes
neptune_auth_provider with a https_url including a port like :8182) and asserts
the generated auth headers contain the expected Host including the port.
In `@api/src/backend/tasks/jobs/attack_paths/legacy_drain.py`:
- Around line 51-81: Initialize sink = None before the try, then move the call
to _close_silently(sink) into a finally block so Neo4jSink is always closed even
when an exception is raised; ensure the finally guards against sink being None
(so _close_silently is only invoked when a Neo4jSink instance was created).
Locate the block using Neo4jSink, sink, _close_silently, _exists_and_empty,
get_database_name, tenant_id and provider_id and keep existing logging/exception
handling but remove the unconditional call to _close_silently from the try body
and place it in the finally.
In `@README.md`:
- Line 105: Update the README header "Neptune sink" to Title Case by changing
the heading text to "Neptune Sink"; locate the header string "Neptune sink" and
replace it, and also verify and update any internal references or
table-of-contents entries that reference that exact header to match the new
capitalization.
- Line 97: The header "Neo4j sink" must be converted to Title Case; find the
markdown header string "Neo4j sink" and change it to "Neo4j Sink" (update any
identical occurrences if present) so documentation complies with the Title-Case
header guideline.
In `@skills/prowler-attack-paths-query/SKILL.md`:
- Around line 449-455: The fenced code block containing the "Git pin
(prowler-cloud/cartography@<TAG>)" and "PyPI pin (cartography==<TAG>)" URLs is
missing a language identifier and triggers markdownlint MD040; update the
opening fence (the triple-backticks that precede those two URL lines) to include
a language specifier such as text (e.g., change ``` to ```text) so the block is
explicitly marked and the pipeline lint error is resolved.
- Around line 433-437: The fenced code block in SKILL.md (the directory listing
under api/src/backend/api/attack_paths/queries/) is missing a language specifier
which triggers markdownlint MD040; update that fenced block to include a
language identifier (e.g., use ```text) so the block becomes a valid fenced code
block and include the added final line for {provider}.py as shown in the
proposed fix, ensuring the opening fence matches the specified language.
---
Outside diff comments:
In `@api/src/backend/tasks/tests/test_attack_paths_scan.py`:
- Around line 1216-1219: The test is mocking the wrong symbol so it never
triggers the exception path in recover_graph_data_ready; update the patch to
target the backend call actually used: mock get_backend_for_scan to return an
object whose has_provider_data raises Exception("Neo4j unreachable") (or patch
tasks.jobs.attack_paths.get_backend_for_scan(...).has_provider_data directly) so
that recover_graph_data_ready exercises the exception-swallowing branch;
reference recover_graph_data_ready, get_backend_for_scan, and has_provider_data
when locating where to change the test.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 41849498-a52e-43a3-a249-81428681f677
⛔ Files ignored due to path filters (1)
api/uv.lockis excluded by!**/*.lock,!**/uv.lock
📒 Files selected for processing (46)
.gitignoreREADME.mdapi/CHANGELOG.mdapi/pyproject.tomlapi/src/backend/api/apps.pyapi/src/backend/api/attack_paths/cypher_sanitizer.pyapi/src/backend/api/attack_paths/database.pyapi/src/backend/api/attack_paths/ingest/__init__.pyapi/src/backend/api/attack_paths/ingest/driver.pyapi/src/backend/api/attack_paths/queries/aws.pyapi/src/backend/api/attack_paths/queries/aws_deprecated.pyapi/src/backend/api/attack_paths/queries/registry.pyapi/src/backend/api/attack_paths/sink/__init__.pyapi/src/backend/api/attack_paths/sink/base.pyapi/src/backend/api/attack_paths/sink/factory.pyapi/src/backend/api/attack_paths/sink/neo4j.pyapi/src/backend/api/attack_paths/sink/neptune.pyapi/src/backend/api/attack_paths/views_helpers.pyapi/src/backend/api/health.pyapi/src/backend/api/migrations/0096_attack_paths_scan_is_migrated.pyapi/src/backend/api/models.pyapi/src/backend/api/tests/test_attack_paths.pyapi/src/backend/api/tests/test_attack_paths_database.pyapi/src/backend/api/tests/test_health.pyapi/src/backend/api/tests/test_legacy_drain.pyapi/src/backend/api/tests/test_sink.pyapi/src/backend/api/tests/test_views.pyapi/src/backend/api/v1/views.pyapi/src/backend/config/django/base.pyapi/src/backend/config/django/devel.pyapi/src/backend/config/django/production.pyapi/src/backend/config/guniconf.pyapi/src/backend/conftest.pyapi/src/backend/tasks/jobs/attack_paths/cleanup.pyapi/src/backend/tasks/jobs/attack_paths/config.pyapi/src/backend/tasks/jobs/attack_paths/db_utils.pyapi/src/backend/tasks/jobs/attack_paths/findings.pyapi/src/backend/tasks/jobs/attack_paths/indexes.pyapi/src/backend/tasks/jobs/attack_paths/legacy_drain.pyapi/src/backend/tasks/jobs/attack_paths/provider_config.pyapi/src/backend/tasks/jobs/attack_paths/queries.pyapi/src/backend/tasks/jobs/attack_paths/scan.pyapi/src/backend/tasks/jobs/attack_paths/sync.pyapi/src/backend/tasks/tests/test_attack_paths_scan.pydocs/user-guide/tutorials/prowler-app-attack-paths.mdxskills/prowler-attack-paths-query/SKILL.md
💤 Files with no reviewable changes (1)
- api/src/backend/api/apps.py
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #11524 +/- ##
==========================================
- Coverage 94.02% 93.63% -0.40%
==========================================
Files 241 254 +13
Lines 35705 36669 +964
==========================================
+ Hits 33573 34334 +761
- Misses 2132 2335 +203
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
…ort, close legacy drain sink
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
api/src/backend/api/attack_paths/sink/neo4j.py (1)
183-187:⚠️ Potential issue | 🟠 MajorFix Neo4j
CREATE DATABASEto not parameterize the database identifierIn
api/src/backend/api/attack_paths/sink/neo4j.py(lines 183-187),create_database()usessession.run("CREATE DATABASE $database IF NOT EXISTS", {"database": database}), but Neo4j does not allow Cypher parameters for the database name identifier, so this can fail at runtime.Same incorrect statement also exists in
api/src/backend/api/attack_paths/ingest/driver.py.🔧 Proposed fix
def create_database(self, database: str) -> None: with self.get_session() as session: - session.run( - "CREATE DATABASE $database IF NOT EXISTS", {"database": database} - ) + session.run(f"CREATE DATABASE `{database}` IF NOT EXISTS")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/src/backend/api/attack_paths/sink/neo4j.py` around lines 183 - 187, The CREATE DATABASE statement incorrectly uses a Cypher parameter for the database identifier (in create_database in api/src/backend/api/attack_paths/sink/neo4j.py and the identical call in api/src/backend/api/attack_paths/ingest/driver.py), which Neo4j rejects; fix by constructing the query with the database name directly after validating/escaping it (e.g., allow only safe characters like [A-Za-z0-9_], otherwise reject) and then interpolating into the query string such as "CREATE DATABASE <name> IF NOT EXISTS" (or with backticks around a validated name) instead of using a parameterized map. Ensure both create_database implementations perform the same validation and use the direct, interpolated query.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@api/src/backend/api/attack_paths/sink/neo4j.py`:
- Around line 183-187: The CREATE DATABASE statement incorrectly uses a Cypher
parameter for the database identifier (in create_database in
api/src/backend/api/attack_paths/sink/neo4j.py and the identical call in
api/src/backend/api/attack_paths/ingest/driver.py), which Neo4j rejects; fix by
constructing the query with the database name directly after validating/escaping
it (e.g., allow only safe characters like [A-Za-z0-9_], otherwise reject) and
then interpolating into the query string such as "CREATE DATABASE <name> IF NOT
EXISTS" (or with backticks around a validated name) instead of using a
parameterized map. Ensure both create_database implementations perform the same
validation and use the direct, interpolated query.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 2748dd9a-f869-4e17-9f6a-f377caebd468
⛔ Files ignored due to path filters (1)
api/uv.lockis excluded by!**/*.lock,!**/uv.lock
📒 Files selected for processing (7)
api/src/backend/api/attack_paths/sink/neo4j.pyapi/src/backend/api/attack_paths/sink/neptune.pyapi/src/backend/api/tests/test_sink.pyapi/src/backend/tasks/jobs/attack_paths/indexes.pyapi/src/backend/tasks/jobs/attack_paths/legacy_drain.pyapi/src/backend/tasks/jobs/attack_paths/scan.pyapi/src/backend/tasks/tests/test_attack_paths_scan.py
…R-1276-porting-attack-paths-scan-target-database-from-neo-4-j-to-neo-4-j-neptune
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@api/CHANGELOG.md`:
- Around line 41-45: Update the broken changelog link in the entry that
references "compliance catalog now warms..." by replacing the duplicated
repository URL "https://github.com/prowler-cloud/prowler-cloud/pull/4554" with
the correct PR URL (e.g., "https://github.com/prowler-cloud/prowler/pull/4554")
so the `(`#4554`)` link resolves; edit the markdown link target in the same line
that contains the text "`compliance-overviews/attributes` returns `503` while
warming" to fix the 404.
In `@api/src/backend/api/tests/test_views.py`:
- Around line 9770-9799: The tests mutate the global flags COMPLIANCE_WARMED and
COMPLIANCE_WARMING_STARTED without restoring their prior values which can leak
state between tests; capture each flag's original boolean state at the start of
the test, then set/clear them as the test requires, and finally restore them
(using set() if original was True, clear() if False) in the existing finally
block (first test) and by adding a try/finally around the second test's flag
changes so both tests restore the original states of COMPLIANCE_WARMED and
COMPLIANCE_WARMING_STARTED when they finish.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 7668eec1-157f-4937-aaec-a9611ea1bd7d
⛔ Files ignored due to path filters (1)
api/uv.lockis excluded by!**/*.lock,!**/uv.lock
📒 Files selected for processing (9)
README.mdapi/CHANGELOG.mdapi/pyproject.tomlapi/src/backend/api/attack_paths/sink/neo4j.pyapi/src/backend/api/attack_paths/sink/neptune.pyapi/src/backend/api/tests/test_sink.pyapi/src/backend/api/tests/test_views.pyapi/src/backend/api/v1/views.pyapi/src/backend/config/guniconf.py
There was a problem hiding this comment.
Caution
Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@api/CHANGELOG.md`:
- Around line 41-45: Update the broken changelog link in the entry that
references "compliance catalog now warms..." by replacing the duplicated
repository URL "https://github.com/prowler-cloud/prowler-cloud/pull/4554" with
the correct PR URL (e.g., "https://github.com/prowler-cloud/prowler/pull/4554")
so the `(`#4554`)` link resolves; edit the markdown link target in the same line
that contains the text "`compliance-overviews/attributes` returns `503` while
warming" to fix the 404.
In `@api/src/backend/api/tests/test_views.py`:
- Around line 9770-9799: The tests mutate the global flags COMPLIANCE_WARMED and
COMPLIANCE_WARMING_STARTED without restoring their prior values which can leak
state between tests; capture each flag's original boolean state at the start of
the test, then set/clear them as the test requires, and finally restore them
(using set() if original was True, clear() if False) in the existing finally
block (first test) and by adding a try/finally around the second test's flag
changes so both tests restore the original states of COMPLIANCE_WARMED and
COMPLIANCE_WARMING_STARTED when they finish.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 7668eec1-157f-4937-aaec-a9611ea1bd7d
⛔ Files ignored due to path filters (1)
api/uv.lockis excluded by!**/*.lock,!**/uv.lock
📒 Files selected for processing (9)
README.mdapi/CHANGELOG.mdapi/pyproject.tomlapi/src/backend/api/attack_paths/sink/neo4j.pyapi/src/backend/api/attack_paths/sink/neptune.pyapi/src/backend/api/tests/test_sink.pyapi/src/backend/api/tests/test_views.pyapi/src/backend/api/v1/views.pyapi/src/backend/config/guniconf.py
🛑 Comments failed to post (2)
api/CHANGELOG.md (1)
41-45:
⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix the broken changelog link.
Line 41 points at a duplicated repository path and will 404. Please update it to the correct repository before merging.
🛠 Proposed fix
-https://github.com/prowler-cloud/prowler-cloud/pull/4554 +https://github.com/prowler-cloud/prowler/pull/4554📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.- Compliance catalog now warms in background during startup. `compliance-overviews/attributes` returns `503` while warming, so the first request after a deploy no longer trips the API timeout [(`#4554`)](https://github.com/prowler-cloud/prowler/pull/4554) ### 🔐 Security - `dulwich` from 0.23.0 to 1.2.5 and `pyjwt` from 2.12.1 to 2.13.0, patching `GHSA-897w-fcg9-f6xj` (arbitrary file write) and `PYSEC-2026-179` (HMAC/JWK key confusion) [(`#11499`)](https://github.com/prowler-cloud/prowler/pull/11499)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/CHANGELOG.md` around lines 41 - 45, Update the broken changelog link in the entry that references "compliance catalog now warms..." by replacing the duplicated repository URL "https://github.com/prowler-cloud/prowler-cloud/pull/4554" with the correct PR URL (e.g., "https://github.com/prowler-cloud/prowler/pull/4554") so the `(`#4554`)` link resolves; edit the markdown link target in the same line that contains the text "`compliance-overviews/attributes` returns `503` while warming" to fix the 404.api/src/backend/api/tests/test_views.py (1)
9770-9799:
⚠️ Potential issue | 🟠 Major | ⚡ Quick winRestore original warming flags to keep tests isolated.
Line 9772 and Line 9792 mutate process-global
COMPLIANCE_WARMING_STARTED/COMPLIANCE_WARMEDstate without restoring the prior values, which can cause order-dependent failures in later tests.Suggested fix
def test_compliance_overview_attributes_503_while_warming( self, authenticated_client ): from api.compliance import COMPLIANCE_WARMED, COMPLIANCE_WARMING_STARTED + prev_started = COMPLIANCE_WARMING_STARTED.is_set() + prev_warmed = COMPLIANCE_WARMED.is_set() COMPLIANCE_WARMING_STARTED.set() COMPLIANCE_WARMED.clear() try: response = authenticated_client.get( reverse("complianceoverview-attributes"), {"filter[compliance_id]": "aws_account_security_onboarding_aws"}, ) finally: - COMPLIANCE_WARMING_STARTED.clear() + if prev_started: + COMPLIANCE_WARMING_STARTED.set() + else: + COMPLIANCE_WARMING_STARTED.clear() + if prev_warmed: + COMPLIANCE_WARMED.set() + else: + COMPLIANCE_WARMED.clear() assert response.status_code == status.HTTP_503_SERVICE_UNAVAILABLE assert response.json()["errors"][0]["code"] == "compliance_warming" def test_compliance_overview_attributes_serves_when_warming_not_started( self, authenticated_client ): @@ # not refuse — the endpoint lazily loads and serves as before. from api.compliance import COMPLIANCE_WARMED, COMPLIANCE_WARMING_STARTED + prev_started = COMPLIANCE_WARMING_STARTED.is_set() + prev_warmed = COMPLIANCE_WARMED.is_set() COMPLIANCE_WARMING_STARTED.clear() COMPLIANCE_WARMED.clear() - response = authenticated_client.get( - reverse("complianceoverview-attributes"), - {"filter[compliance_id]": "aws_account_security_onboarding_aws"}, - ) + try: + response = authenticated_client.get( + reverse("complianceoverview-attributes"), + {"filter[compliance_id]": "aws_account_security_onboarding_aws"}, + ) + finally: + if prev_started: + COMPLIANCE_WARMING_STARTED.set() + else: + COMPLIANCE_WARMING_STARTED.clear() + if prev_warmed: + COMPLIANCE_WARMED.set() + else: + COMPLIANCE_WARMED.clear() assert response.status_code == status.HTTP_200_OK📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.from api.compliance import COMPLIANCE_WARMED, COMPLIANCE_WARMING_STARTED prev_started = COMPLIANCE_WARMING_STARTED.is_set() prev_warmed = COMPLIANCE_WARMED.is_set() COMPLIANCE_WARMING_STARTED.set() COMPLIANCE_WARMED.clear() try: response = authenticated_client.get( reverse("complianceoverview-attributes"), {"filter[compliance_id]": "aws_account_security_onboarding_aws"}, ) finally: if prev_started: COMPLIANCE_WARMING_STARTED.set() else: COMPLIANCE_WARMING_STARTED.clear() if prev_warmed: COMPLIANCE_WARMED.set() else: COMPLIANCE_WARMED.clear() assert response.status_code == status.HTTP_503_SERVICE_UNAVAILABLE assert response.json()["errors"][0]["code"] == "compliance_warming" def test_compliance_overview_attributes_serves_when_warming_not_started( self, authenticated_client ): # Dev fallback: under runserver warming never runs, so the guard must # not refuse — the endpoint lazily loads and serves as before. from api.compliance import COMPLIANCE_WARMED, COMPLIANCE_WARMING_STARTED prev_started = COMPLIANCE_WARMING_STARTED.is_set() prev_warmed = COMPLIANCE_WARMED.is_set() COMPLIANCE_WARMING_STARTED.clear() COMPLIANCE_WARMED.clear() try: response = authenticated_client.get( reverse("complianceoverview-attributes"), {"filter[compliance_id]": "aws_account_security_onboarding_aws"}, ) finally: if prev_started: COMPLIANCE_WARMING_STARTED.set() else: COMPLIANCE_WARMING_STARTED.clear() if prev_warmed: COMPLIANCE_WARMED.set() else: COMPLIANCE_WARMED.clear() assert response.status_code == status.HTTP_200_OK🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/src/backend/api/tests/test_views.py` around lines 9770 - 9799, The tests mutate the global flags COMPLIANCE_WARMED and COMPLIANCE_WARMING_STARTED without restoring their prior values which can leak state between tests; capture each flag's original boolean state at the start of the test, then set/clear them as the test requires, and finally restore them (using set() if original was True, clear() if False) in the existing finally block (first test) and by adding a try/finally around the second test's flag changes so both tests restore the original states of COMPLIANCE_WARMED and COMPLIANCE_WARMING_STARTED when they finish.
Context
The persistent Attack Paths graph was Neo4j-only. This makes the persistent store (the "sink") selectable between Neo4j and Amazon Neptune, so existing installs behave exactly as before.
Cartography ingestion is unaffected: every scan still writes its graph into a throw-away per-scan Neo4j database. Neptune is single-database and cannot host those temporary databases, so only the final sink is configurable. Neo4j env stays required on workers even when the sink is Neptune.
Description
Sink abstraction: Introduces a
SinkDatabaseprotocol (sink/base.py) with two backends behind a factory (sink/factory.py):sink/neo4j.py: multi-database Neo4j, physical tenant isolation.sink/neptune.py: single-graph Neptune with dual Bolt drivers (writer/reader) and SigV4 auth; label-based isolation.attack_paths/database.pybecomes a thin routing shim:db-tmp-scan-*names route to the ingest Neo4j driver (ingest/driver.py), everything else to the configured sink. The public API surface and exception hierarchy are unchanged.Neptune-compatible queries: Neptune openCypher does not support list properties on nodes, so list-typed Cartography properties (
AWSPolicyStatement.action/resource, KMS algorithms, ECS container-definition lists, CloudFront aliases, and others, catalogued inprovider_config.py::AWS_NORMALIZED_LISTS) are now materialised at sync time as child item nodes joined by typedHAS_*edges. The AWS query catalog (aws.py) was rewritten to read those values by traversal and reshaped to a multi-MATCHform that runs efficiently on both sinks. Pre-cutover scans keep their old comma-delimited shape viaaws_deprecated.py, selected by the registry; both files expose identical query IDs and parameters so the API stays uniform during the transition.Per-scan routing and cutover: A new
AttackPathsScan.is_migratedflag records whether a scan's data lives in the current sink schema. Reads route per scan: migrated scans hit the active sink, pre-cutover scans stay on the legacy Neo4j tenant DB regardless of the configured sink. After a successful Neptune write, an opportunistic drain (legacy_drain.py) removes the provider subgraph from Neo4j and drops the database once empty, so legacy data clears itself as scans re-run. These paths and the flag are marked for removal once the drain finishes.Reliability: The ingest driver is fully lazy; the sink driver is lazy too, except for a best-effort warm-up on each gunicorn worker fork (driver IO threads do not survive
fork()underpreload_app, so each worker rebuilds its own). All degrade instead of crashing: a graph database that is down at boot no longer blocks API startup, it logs a warning and reconnects lazily on first use. An unreachable backend surfaces as a500on the relevant Attack Paths request and a503on/health/ready, which now probes whichever sink is active and reports it by name. Each driver carries a bounded TCP connect timeout so a dead host can't pin a worker past the readiness budget.Dependency: Bumps
cartography0.135.0 → 0.136.0.New environment variables (all optional; defaults preserve current behavior):
ATTACK_PATHS_SINK_DATABASEneo4jneo4jorneptuneNEPTUNE_WRITER_ENDPOINTneptune)NEPTUNE_READER_ENDPOINTNEPTUNE_PORT8182AWS_REGIONNEPTUNE_CONNECTION_TIMEOUT10NEPTUNE_CONN_ACQUISITION_TIMEOUT60NEPTUNE_MAX_CONNECTION_LIFETIME3600NEPTUNE_MAX_CONNECTION_POOL_SIZE50NEO4J_CONNECTION_TIMEOUT5(Existing
NEO4J_*andATTACK_PATHS_*tuning variables are unchanged.) Neptune authenticates with SigV4 via the pod's AWS credentials, so no graph password is configured for that sink.Steps to review
ATTACK_PATHS_SINK_DATABASEunset, run an Attack Paths scan, and confirm queries return the same results as onmaster.ATTACK_PATHS_SINK_DATABASE=neptuneplus theNEPTUNE_*/AWS_REGIONvars, run a scan, and confirm the AWS queries return equivalent paths.GET /health/live→200,GET /health/ready→503reporting the active sink, other endpoints respond, and an Attack Paths query returns500. Recover the backend and confirm readiness returns200and queries succeed (lazy reconnect).is_migrated=False) still read from Neo4j and new scans read from Neptune. After re-running a scan, confirm its provider subgraph is drained from the legacy Neo4j tenant DB.Checklist
API
License
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Summary by CodeRabbit
New Features
Refactor
Documentation
Chores
Tests