Skip to content

Fix/ssrf alerts target#1665

Open
ygndotgg wants to merge 5 commits into
parseablehq:mainfrom
ygndotgg:fix/ssrf-alerts-target
Open

Fix/ssrf alerts target#1665
ygndotgg wants to merge 5 commits into
parseablehq:mainfrom
ygndotgg:fix/ssrf-alerts-target

Conversation

@ygndotgg

@ygndotgg ygndotgg commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Fixes #XXXX.

Description

This PR hardens alert target delivery against server-side request forgery by adding an outbound policy for alert-target HTTP
requests.

Previously, authenticated users with alert target permissions could configure webhook, Slack, or AlertManager targets that
caused the Parseable server to send outbound HTTP requests to arbitrary URLs from the server’s network position. That meant
private/internal destinations, loopback services, link-local metadata endpoints, and other network-restricted HTTP services
could be reached if routable from the Parseable host.

The chosen fix is to centralize alert-target outbound networking behind a policy gate. Instead of validating only the submitted
URL string, Parseable now resolves the destination, validates the resolved IPs, applies admin-owned allow/deny policy, disables
redirects/proxies, pins the resolved DNS result into the HTTP client, and revalidates again immediately before alert delivery.

Key changes made in this patch:

  • Added alert target outbound policy configuration with:

    • allowPrivate
    • allowedDomains
    • allowedCidrs
    • deniedDomains
    • deniedCidrs
    • allowInvalidTls
  • Added super-admin APIs for reading, updating, and validating the alert target policy.

  • Applied outbound policy validation during target creation and update.

  • Revalidated outbound policy during alert delivery because stored targets and DNS answers can change after creation.

  • Blocked private, loopback, link-local, multicast, reserved, and other non-public destinations by default.

  • Allowed private/internal destinations only when explicitly enabled by admin policy and matched by an allowlist.

  • Made denied CIDRs/domains take precedence over allowlists.

  • Disabled redirects and proxy use for alert target delivery.

  • Pinned resolved DNS addresses into reqwest before sending.

  • Added focused unit tests for the important policy decision paths.


This PR has:

  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • been tested to ensure log ingestion and log query works.
  • added documentation for new or modified features or behaviors.

Summary by CodeRabbit

  • New Features
    • Introduced tenant-aware outbound HTTP policy controls for alert targets, including allow/deny rules for domains and CIDRs, private-IP gating, optional invalid-TLS handling, and stricter Slack HTTPS/host enforcement.
    • Added admin endpoints to view, validate, and activate alert target policies.
    • Added policy pre-validation on create/update and re-checks right before delivery, with per-call enforcement.
  • Bug Fixes
    • Outbound-policy denials now return consistent JSON error responses (400) with an admin hint.
  • Chores
    • Improved storage handling for policy data, including skipping legacy default-tenant stream directories.

@coderabbitai

coderabbitai Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

This PR adds tenant-aware outbound policy storage and validation for alert targets, exposes admin endpoints to read and update the policy, and applies the policy during target create/update and alert delivery. It also adds supporting storage paths, metastore APIs, and routing.

Changes

Outbound Alert Policy System

Layer / File(s) Summary
Build and storage wiring
Cargo.toml, src/lib.rs, src/alerts/mod.rs, src/handlers/http/mod.rs, src/storage/object_storage.rs, src/storage/localfs.rs
tokio gains net, the new alert policy and HTTP handler modules are exported, and storage helpers now include the outbound policy path plus DEFAULT_TENANT directory filtering.
Policy config and metastore
src/alerts/outbound_http_policy.rs, src/metastore/metastore_traits.rs, src/metastore/metastores/object_store_metastore.rs, src/handlers/http/modal/mod.rs
Defines tenant policy config and lifecycle APIs, extends the metastore trait and object-store implementation to read/write policies, and loads all policies during server init.
Validation engine and request preparation
src/alerts/outbound_http_policy.rs, src/alerts/target.rs
Adds outbound target preparation, URL/TLS/DNS/IP/header checks, prepared client construction, and the target-side preflight and delivery updates that consume those results.
Error mapping and admin policy handlers
src/alerts/mod.rs, src/handlers/http/alert_target_policy.rs, src/handlers/http/modal/server.rs, src/handlers/http/modal/query_server.rs
AlertError gains outbound-policy handling, and the admin policy routes are mounted for get, put, validate, and SuperAdmin access.
Target delivery and CRUD enforcement
src/alerts/target.rs, src/handlers/http/targets.rs
Target create/update paths now validate outbound policy, and Slack, webhook, and AlertManager delivery paths re-check policy immediately before sending.

Estimated code review effort: 4 (Complex) | ~50 minutes

Poem

🐰 I hopped through rules from root to wire,
With tenant paths and policy fire.
If hosts look shady, hop away,
If all is well, the alerts may stay.
Hoppity-hop, the gate holds tight!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 22.45% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title is concise and matches the main change: hardening alert-target delivery against SSRF.
Description check ✅ Passed The description follows the template and covers the goal, rationale, key changes, and checklist items.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
Cargo.toml (1)

84-91: 💤 Low value

ipnet addition not reflected in summary; otherwise dependency changes look correct.

The net feature on tokio ("^1.43") is valid and required for DNS resolution/socket operations in the policy engine. However, line 91 also adds ipnet = "2" (needed for CIDR parsing in the allow/deny policy), which contradicts the summary's claim that no other dependencies were altered.

Note that ipnet is placed under the "Async and Runtime" section though it's a networking/CIDR utility — consider moving it nearer url/networking deps for clarity, but this is purely cosmetic.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Cargo.toml` around lines 84 - 91, Update the PR summary to explicitly mention
the new ipnet = "2" dependency (used for CIDR parsing in allow/deny policy) in
addition to the tokio feature change; in the Cargo.toml diff, ensure the ipnet
addition is clearly associated with networking/url deps rather than the "Async
and Runtime" block by moving the ipnet = "2" line next to other networking
dependencies (the tokio features block and url dependency) so reviewers can
easily see it's a networking/CIDR utility; keep the existing tokio feature
changes (sync, macros, fs, rt-multi-thread, net) unchanged.
src/alerts/outbound_http_policy.rs (1)

167-169: 💤 Low value

Consider propagating client build errors instead of panicking.

While the ClientBuilder configuration is deterministic and unlikely to fail in practice, using .expect() in production code paths is less defensive than propagating the error. If future changes introduce fallible configuration (e.g., loading certificates), this would need updating.

🛡️ Suggested fix

Add a new error variant and propagate:

+    #[error("failed to build HTTP client: {0}")]
+    ClientBuildFailed(String),
     let client = builder
         .build()
-        .expect("alert target HTTP client can be constructed");
+        .map_err(|e| OutboundPolicyError::ClientBuildFailed(e.to_string()))?;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/alerts/outbound_http_policy.rs` around lines 167 - 169, Replace the
panic-causing .expect("alert target HTTP client can be constructed") on
builder.build() with proper error propagation: add a new error variant (e.g.,
OutboundHttpClientBuildError or ClientBuildError) to the module's error enum,
change the containing function's return type to Result<..., ErrorType>, and
replace builder.build().expect(...) with builder.build().map_err(|e|
ErrorType::OutboundHttpClientBuildError(e.into()))? (or equivalent) so the
client construction failure is returned to callers; update any call sites and
tests to handle the propagated Result accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/alerts/outbound_http_policy.rs`:
- Around line 306-323: In denied_ipv6, add explicit checks to reject IPv6
tunneling prefixes 6to4 (2002::/16) and Teredo (2001::/32) before falling back
to mapped-IPv4 handling: inspect ip.segments()[0] and return true when it equals
0x2002 (6to4) or equals 0x2001 (Teredo /32), then continue with the existing
mapped-IPv4 call to denied_ipv4 and other checks; this ensures embedded IPv4
addresses in those tunneling ranges cannot bypass the private-IP blocking.

---

Nitpick comments:
In `@Cargo.toml`:
- Around line 84-91: Update the PR summary to explicitly mention the new ipnet =
"2" dependency (used for CIDR parsing in allow/deny policy) in addition to the
tokio feature change; in the Cargo.toml diff, ensure the ipnet addition is
clearly associated with networking/url deps rather than the "Async and Runtime"
block by moving the ipnet = "2" line next to other networking dependencies (the
tokio features block and url dependency) so reviewers can easily see it's a
networking/CIDR utility; keep the existing tokio feature changes (sync, macros,
fs, rt-multi-thread, net) unchanged.

In `@src/alerts/outbound_http_policy.rs`:
- Around line 167-169: Replace the panic-causing .expect("alert target HTTP
client can be constructed") on builder.build() with proper error propagation:
add a new error variant (e.g., OutboundHttpClientBuildError or ClientBuildError)
to the module's error enum, change the containing function's return type to
Result<..., ErrorType>, and replace builder.build().expect(...) with
builder.build().map_err(|e| ErrorType::OutboundHttpClientBuildError(e.into()))?
(or equivalent) so the client construction failure is returned to callers;
update any call sites and tests to handle the propagated Result accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 307e02c4-1433-45cc-90f4-a9ee10f8c67c

📥 Commits

Reviewing files that changed from the base of the PR and between cefe210 and fa45266.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (9)
  • Cargo.toml
  • src/alerts/mod.rs
  • src/alerts/outbound_http_policy.rs
  • src/alerts/target.rs
  • src/handlers/http/alert_target_policy.rs
  • src/handlers/http/mod.rs
  • src/handlers/http/modal/query_server.rs
  • src/handlers/http/modal/server.rs
  • src/handlers/http/targets.rs

Comment thread src/alerts/outbound_http_policy.rs
@ygndotgg ygndotgg force-pushed the fix/ssrf-alerts-target branch from fa45266 to b8f9628 Compare June 2, 2026 13:08
coderabbitai[bot]
coderabbitai Bot previously approved these changes Jun 2, 2026
coderabbitai[bot]
coderabbitai Bot previously approved these changes Jun 27, 2026
@parmesant parmesant force-pushed the fix/ssrf-alerts-target branch from a5327c8 to a955914 Compare June 29, 2026 09:20
  - replace the single global policy manager with per-tenant policy state
  - persist outbound HTTP policy through metastore and reload it on startup
  - reject policies with overlapping allow/deny domain or CIDR entries
  - follow the existing tenant-keyed design used elsewhere in the codebase

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/alerts/target.rs (1)

662-665: 🔒 Security & Privacy | 🟡 Minor | ⚡ Quick win

Avoid logging the full AlertManager endpoint.

self.endpoint can include userinfo, query tokens, or sensitive path data. Log only a sanitized host/scheme when credentials are blocked.

Proposed fix
                 error!(
-                    "Alertmanager credentials blocked by outbound policy for destination:{}",
-                    self.endpoint
+                    "Alertmanager credentials blocked by outbound policy for destination_host:{}",
+                    self.endpoint.host_str().unwrap_or("<missing-host>")
                 );
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/alerts/target.rs` around lines 662 - 665, The blocked-credentials log in
AlertmanagerTarget::send is exposing the full endpoint via self.endpoint, which
may include sensitive userinfo or tokens. Update the error! call to log only a
sanitized Alertmanager destination, keeping just the scheme and host from
AlertmanagerTarget::endpoint and excluding credentials, query strings, and
sensitive path data. Use the existing endpoint handling in src/alerts/target.rs
to derive the sanitized value before logging.
src/storage/localfs.rs (1)

504-512: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Don't filter out DEFAULT_TENANT here. DEFAULT_TENANT is a literal string constant ("DEFAULT_TENANT"), and stream-name validation does not reserve it, so a stream with that name would disappear from both list_streams and list_old_streams. If the goal is to skip a stray root folder, narrow the check to the tenant directory layout instead of matching the stream name exactly.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/storage/localfs.rs` around lines 504 - 512, The list_streams filtering
currently excludes DEFAULT_TENANT by name, which can hide a valid stream. Update
list_streams in localfs.rs so the ignore_dir logic no longer matches the
DEFAULT_TENANT constant directly; instead, only skip the tenant root folder
based on the directory layout used by the storage hierarchy. Keep the rest of
the exclusion list unchanged and ensure list_old_streams follows the same rule
if it shares the same filtering logic.

Source: Learnings

🧹 Nitpick comments (1)
src/handlers/http/targets.rs (1)

34-37: 🔒 Security & Privacy | 🔵 Trivial | ⚡ Quick win

Use get_tenant_id_from_request here. This handler only needs the normalized tenant header; re-resolving tenant from auth adds an unnecessary RBAC dependency and maps those failures to CustomError.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/handlers/http/targets.rs` around lines 34 - 37, The tenant lookup in
tenant_from_request is using get_user_and_tenant_from_request, which pulls
auth-derived tenant data and introduces an unnecessary RBAC dependency. Update
tenant_from_request to use get_tenant_id_from_request instead, keep the same
Result<Option<String>, AlertError> shape, and remove the CustomError mapping
path so this handler only reads the normalized tenant header.

Source: Learnings

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/alerts/outbound_http_policy.rs`:
- Around line 37-59: The active_policy() lookup in outbound_http_policy.rs only
returns AlertTargetPolicyConfig::default() for DEFAULT_TENANT, causing missing
tenant entries to fail with PolicyNotFound. Update active_policy() so a missing
policy for any tenant falls back to AlertTargetPolicyConfig::default() instead
of erroring, while keeping the existing read-from-ALERT_TARGET_POLICY behavior
and using OutboundPolicyError::PolicyNotFound only for truly invalid cases.

In `@src/handlers/http/modal/mod.rs`:
- Around line 237-240: The outbound policy bootstrap in load_on_init uses ?
after outbound_http_policy::load_all_policies, which makes a single bad tenant
policy abort startup unlike the other loaders. Update this block to match the
existing pattern used by correlations, filters, dashboards, alerts, and targets:
call outbound_http_policy::load_all_policies, log any error with error!
including context, and continue without returning early so one corrupt policy
does not take down the whole node.

In `@src/metastore/metastores/object_store_metastore.rs`:
- Around line 1258-1270: `get_outbound_policy` currently propagates a
missing-object storage error instead of treating it as “no policy configured,”
which makes single-tenant policy loads fail unexpectedly. Update
`ObjectStoreMetastore::get_outbound_policy` to mirror the missing-file handling
used in `get_outbound_policies` by detecting the optional-missing case from
`self.storage.get_object` and returning the appropriate default/absence behavior
instead of `?`. Keep the existing path/tenant logic intact, and make sure
callers like `load_policy_for_tenant` can resolve cleanly for tenants without a
stored policy.

---

Outside diff comments:
In `@src/alerts/target.rs`:
- Around line 662-665: The blocked-credentials log in AlertmanagerTarget::send
is exposing the full endpoint via self.endpoint, which may include sensitive
userinfo or tokens. Update the error! call to log only a sanitized Alertmanager
destination, keeping just the scheme and host from AlertmanagerTarget::endpoint
and excluding credentials, query strings, and sensitive path data. Use the
existing endpoint handling in src/alerts/target.rs to derive the sanitized value
before logging.

In `@src/storage/localfs.rs`:
- Around line 504-512: The list_streams filtering currently excludes
DEFAULT_TENANT by name, which can hide a valid stream. Update list_streams in
localfs.rs so the ignore_dir logic no longer matches the DEFAULT_TENANT constant
directly; instead, only skip the tenant root folder based on the directory
layout used by the storage hierarchy. Keep the rest of the exclusion list
unchanged and ensure list_old_streams follows the same rule if it shares the
same filtering logic.

---

Nitpick comments:
In `@src/handlers/http/targets.rs`:
- Around line 34-37: The tenant lookup in tenant_from_request is using
get_user_and_tenant_from_request, which pulls auth-derived tenant data and
introduces an unnecessary RBAC dependency. Update tenant_from_request to use
get_tenant_id_from_request instead, keep the same Result<Option<String>,
AlertError> shape, and remove the CustomError mapping path so this handler only
reads the normalized tenant header.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 688f6ea9-3f32-4f7c-8748-cd98cb9a3b6e

📥 Commits

Reviewing files that changed from the base of the PR and between a955914 and 534cce1.

📒 Files selected for processing (10)
  • src/alerts/alert_traits.rs
  • src/alerts/outbound_http_policy.rs
  • src/alerts/target.rs
  • src/handlers/http/alert_target_policy.rs
  • src/handlers/http/modal/mod.rs
  • src/handlers/http/targets.rs
  • src/metastore/metastore_traits.rs
  • src/metastore/metastores/object_store_metastore.rs
  • src/storage/localfs.rs
  • src/storage/object_storage.rs

Comment on lines +37 to +59
pub static ALERT_TARGET_POLICY: Lazy<RwLock<HashMap<String, AlertTargetPolicyConfig>>> =
Lazy::new(|| RwLock::new(HashMap::new()));

// Read one policy snapshot per validation
pub async fn active_policy(
tenant_id: &Option<String>,
) -> Result<AlertTargetPolicyConfig, OutboundPolicyError> {
let tenant = tenant_id
.as_deref()
.unwrap_or(crate::parseable::DEFAULT_TENANT);
let guard = ALERT_TARGET_POLICY.read().await;
guard
.get(tenant)
.cloned()
.or_else(|| {
if tenant == crate::parseable::DEFAULT_TENANT {
Some(AlertTargetPolicyConfig::default())
} else {
None
}
})
.ok_or_else(|| OutboundPolicyError::PolicyNotFound(tenant.to_string()))
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Look for any tenant-provisioning code that seeds outbound policy defaults
rg -n 'replace_policy|put_outbound_policy|load_policy_for_tenant' --type=rust -C3

Repository: parseablehq/parseable

Length of output: 159


🏁 Script executed:

#!/bin/bash
set -euo pipefail

rg -n "ALERT_TARGET_POLICY|PolicyNotFound|active_policy|load_all_policies|load_policy_for_tenant|outbound policy|alert target policy|DEFAULT_TENANT" src -C 3

Repository: parseablehq/parseable

Length of output: 50378


🏁 Script executed:

#!/bin/bash
set -euo pipefail

rg -n "get_outbound_policy|get_outbound_policies|put_outbound_policy|outbound_policy" src/metastore src/handlers src/tenants src -g'*.rs' -C 2 | head -n 200

Repository: parseablehq/parseable

Length of output: 9306


🏁 Script executed:

#!/bin/bash
set -euo pipefail

sed -n '1190,1295p' src/metastore/metastores/object_store_metastore.rs
printf '\n---\n'
rg -n "create tenant|create_tenant|new tenant|tenant provisioning|put_outbound_policy\\(|load_all_policies\\(" src -g'*.rs' -C 2 | head -n 200

Repository: parseablehq/parseable

Length of output: 6021


Default outbound policy should not fail for tenants without an entry

active_policy() only falls back to AlertTargetPolicyConfig::default() for DEFAULT_TENANT. Every other tenant missing a stored policy returns PolicyNotFound, so alert-target create/update/delivery fails until an admin manually seeds a policy. Load a default policy for missing tenant entries as well.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/alerts/outbound_http_policy.rs` around lines 37 - 59, The active_policy()
lookup in outbound_http_policy.rs only returns
AlertTargetPolicyConfig::default() for DEFAULT_TENANT, causing missing tenant
entries to fail with PolicyNotFound. Update active_policy() so a missing policy
for any tenant falls back to AlertTargetPolicyConfig::default() instead of
erroring, while keeping the existing read-from-ALERT_TARGET_POLICY behavior and
using OutboundPolicyError::PolicyNotFound only for truly invalid cases.

Comment on lines +237 to +240
outbound_http_policy::load_all_policies(PARSEABLE.metastore.as_ref())
.await
.context("Failed to load outbound policies")?;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Outbound-policy load failure aborts startup unlike every other loader here.

All sibling loaders (correlations, filters, dashboards, alerts, targets) log failures via error!() and continue so one broken resource doesn't block startup. This new call instead uses ?, so a single tenant's invalid/corrupt stored policy (e.g. load_all_policies fails closed on the first bad tenant policy) will abort load_on_init entirely, taking down the whole node rather than just that tenant's alerting.

♻️ Suggested fix to match the existing loader pattern
-    outbound_http_policy::load_all_policies(PARSEABLE.metastore.as_ref())
-        .await
-        .context("Failed to load outbound policies")?;
-
+    if let Err(err) = outbound_http_policy::load_all_policies(PARSEABLE.metastore.as_ref()).await {
+        error!("Failed to load outbound policies: {err}");
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
outbound_http_policy::load_all_policies(PARSEABLE.metastore.as_ref())
.await
.context("Failed to load outbound policies")?;
if let Err(err) = outbound_http_policy::load_all_policies(PARSEABLE.metastore.as_ref()).await {
error!("Failed to load outbound policies: {err}");
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/handlers/http/modal/mod.rs` around lines 237 - 240, The outbound policy
bootstrap in load_on_init uses ? after outbound_http_policy::load_all_policies,
which makes a single bad tenant policy abort startup unlike the other loaders.
Update this block to match the existing pattern used by correlations, filters,
dashboards, alerts, and targets: call outbound_http_policy::load_all_policies,
log any error with error! including context, and continue without returning
early so one corrupt policy does not take down the whole node.

Comment on lines +1258 to +1270
async fn get_outbound_policy(
&self,
tenant_id: &str,
) -> Result<AlertTargetPolicyConfig, MetastoreError> {
let tenant = if tenant_id == DEFAULT_TENANT {
None
} else {
Some(tenant_id.to_string())
};
let path = outbound_http_policy_json_path(&tenant);
let bytes = self.storage.get_object(&path, &tenant).await?;
Ok(serde_json::from_slice::<AlertTargetPolicyConfig>(&bytes)?)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

get_outbound_policy doesn't tolerate a missing policy file, unlike get_outbound_policies.

get_outbound_policies treats a missing policy file per-tenant as "no policy configured" (skips via is_missing_optional_dir), but get_outbound_policy propagates the raw storage error via ? for the same case. Any caller of this single-tenant accessor (e.g. load_policy_for_tenant) will hard-fail for a tenant that has never explicitly stored a policy, instead of resolving to a default. This is worth aligning with the bulk method's semantics — see related comment on active_policy in src/alerts/outbound_http_policy.rs, which similarly only falls back to a default for DEFAULT_TENANT.

🛡️ Proposed fix to fall back gracefully on missing policy
     async fn get_outbound_policy(
         &self,
         tenant_id: &str,
     ) -> Result<AlertTargetPolicyConfig, MetastoreError> {
         let tenant = if tenant_id == DEFAULT_TENANT {
             None
         } else {
             Some(tenant_id.to_string())
         };
         let path = outbound_http_policy_json_path(&tenant);
-        let bytes = self.storage.get_object(&path, &tenant).await?;
-        Ok(serde_json::from_slice::<AlertTargetPolicyConfig>(&bytes)?)
+        match self.storage.get_object(&path, &tenant).await {
+            Ok(bytes) => Ok(serde_json::from_slice::<AlertTargetPolicyConfig>(&bytes)?),
+            Err(err) if is_missing_optional_dir(&err) => Ok(AlertTargetPolicyConfig::default()),
+            Err(err) => Err(MetastoreError::ObjectStorageError(err)),
+        }
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async fn get_outbound_policy(
&self,
tenant_id: &str,
) -> Result<AlertTargetPolicyConfig, MetastoreError> {
let tenant = if tenant_id == DEFAULT_TENANT {
None
} else {
Some(tenant_id.to_string())
};
let path = outbound_http_policy_json_path(&tenant);
let bytes = self.storage.get_object(&path, &tenant).await?;
Ok(serde_json::from_slice::<AlertTargetPolicyConfig>(&bytes)?)
}
async fn get_outbound_policy(
&self,
tenant_id: &str,
) -> Result<AlertTargetPolicyConfig, MetastoreError> {
let tenant = if tenant_id == DEFAULT_TENANT {
None
} else {
Some(tenant_id.to_string())
};
let path = outbound_http_policy_json_path(&tenant);
match self.storage.get_object(&path, &tenant).await {
Ok(bytes) => Ok(serde_json::from_slice::<AlertTargetPolicyConfig>(&bytes)?),
Err(err) if is_missing_optional_dir(&err) => Ok(AlertTargetPolicyConfig::default()),
Err(err) => Err(MetastoreError::ObjectStorageError(err)),
}
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/metastore/metastores/object_store_metastore.rs` around lines 1258 - 1270,
`get_outbound_policy` currently propagates a missing-object storage error
instead of treating it as “no policy configured,” which makes single-tenant
policy loads fail unexpectedly. Update
`ObjectStoreMetastore::get_outbound_policy` to mirror the missing-file handling
used in `get_outbound_policies` by detecting the optional-missing case from
`self.storage.get_object` and returning the appropriate default/absence behavior
instead of `?`. Keep the existing path/tenant logic intact, and make sure
callers like `load_policy_for_tenant` can resolve cleanly for tenants without a
stored policy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant