Skip to content

docs: add PostgreSQL KB how-to and troubleshooting guides (MIDDLEWARE-31526)#35

Open
SuJinpei wants to merge 2 commits into
masterfrom
docs/add-pg-kb-howtos-31526
Open

docs: add PostgreSQL KB how-to and troubleshooting guides (MIDDLEWARE-31526)#35
SuJinpei wants to merge 2 commits into
masterfrom
docs/add-pg-kb-howtos-31526

Conversation

@SuJinpei

@SuJinpei SuJinpei commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Summary

Precipitates historical internal Confluence KB PostgreSQL solutions into the product manual (pg-docs), part of MIDDLEWARE-31526 (KB 沉淀 / 对标 OCP). Each guide was modernized to the current acid.zalan.do/v1 postgresql CR and verified live on ACP 4.2 and 4.3 (PostgreSQL 16, operator v4.3.0).

how_to (5)

  • Install the pgvector extension
  • Install the zhparser extension
  • Configure the pg_hba client-auth whitelist
  • Run PostgreSQL internal processes as root (with OCP SCC note)
  • Disable NodePort exposure (LoadBalancer/MetalLB; OCP Route note)

trouble_shooting (4)

  • Connection fails with "SSL off"
  • Disk full due to pg_wal accumulation
  • Coredump caused by huge pages
  • Repair a broken streaming replica

Verification

All procedures were re-run on live clusters on ACP 4.2/4.3 standing environments. Notes: distance operators / extension versions confirmed (pgvector 0.8.2, zhparser 2.3); huge_pages parameter replaces the obsolete per-version ConfigMap-mount hack; pg_hba applied via spec.patroni.pg_hba and reloaded without restart.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Documentation
    • Added configuration guides for PostgreSQL settings, including pg_hba.conf rules and extension installation (pgvector, zhparser)
    • Added operational procedures for managing NodePort exposure and running PostgreSQL as root user
    • Added troubleshooting guides covering SSL connection issues, huge pages, streaming replication, and WAL disk space management

…-31526)

Precipitate historical internal KB solutions into the product manual,
modernized to the current acid.zalan.do/v1 postgresql CR and verified
live on ACP 4.2/4.3:

how_to: install pgvector, install zhparser, configure pg_hba whitelist,
run as root, disable NodePort exposure.
trouble_shooting: connection SSL off, pg_wal disk full, coredump from
huge pages, repair streaming replica.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@SuJinpei, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 42 minutes and 9 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8616a221-4b9f-46fa-a7b6-543f1b0c06d0

📥 Commits

Reviewing files that changed from the base of the PR and between db4dc32 and 2580206.

📒 Files selected for processing (3)
  • docs/en/how_to/configure_pg_hba_whitelist.mdx
  • docs/en/how_to/install_zhparser_extension.mdx
  • docs/en/trouble_shooting/pg_wal_disk_full.mdx

Walkthrough

Nine new MDX documentation pages are added across two directories: five how-to guides covering pg_hba.conf whitelist configuration, NodePort disabling, pgvector extension, zhparser extension, and running PostgreSQL as root; and four troubleshooting guides covering SSL-off connection failures, coredump from huge pages, broken streaming replication, and pg_wal disk-full incidents.

Changes

How-To Guides

Layer / File(s) Summary
pg_hba whitelist configuration
docs/en/how_to/configure_pg_hba_whitelist.mdx
Full guide explaining Patroni manages pg_hba.conf and rules must be set via spec.patroni.pg_hba in the postgresql CR, with YAML examples, pg_hba_file_rules verification, and warnings about hostssl preference and preserving +zalandos entries.
Disable NodePort exposure
docs/en/how_to/disable_nodeport_exposure.mdx
Guide to switching Services to LoadBalancer and patching allocateLoadBalancerNodePorts: false with nulled nodePort on master and replica Services, with verification commands and a port-name matching note.
pgvector extension installation
docs/en/how_to/install_pgvector_extension.mdx
Guide covering availability check, CREATE EXTENSION, smoke test with distance queries, operator reference table, IVFFlat and HNSW index creation with tuning params, upgrade snippet, and version verification.
zhparser extension installation
docs/en/how_to/install_zhparser_extension.mdx
Guide covering extension creation, text-search configuration and mappings, tokenization examples, custom dictionary sync, parser option configuration via ALTER SYSTEM, upgrade, and verification.
Run PostgreSQL as root
docs/en/how_to/run_postgresql_as_root.mdx
Guide with security warning, prerequisites including OpenShift privileged SCC, CR fields (spiloRunAsUser, spiloRunAsGroup, privileged flags), verification commands, and revert procedure.

Troubleshooting Guides

Layer / File(s) Summary
SSL-off connection failure
docs/en/trouble_shooting/connection_ssl_off.mdx
Describes the SSL off error pattern, root cause in missing host rules in pg_hba.conf, diagnosis via pg_hba_file_rules, recommended spec.patroni.pg_hba fix, and security warning about permissive host rules.
Coredump from huge pages
docs/en/trouble_shooting/coredump_huge_pages.mdx
Explains SIGBUS coredump when huge pages are enabled on the host without pod allocation, fix via huge_pages: "off" in the CR, and verification via SHOW huge_pages.
Broken streaming replica repair
docs/en/trouble_shooting/fix_streaming_replication.mdx
Step-by-step guide using patronictl list, leader-side pg_stat_replication and pg_replication_slots queries, patronictl reinit --force remediation, and post-reinit verification with a base-backup impact note.
pg_wal disk-full recovery
docs/en/trouble_shooting/pg_wal_disk_full.mdx
Covers WAL accumulation root cause, diagnosis via patronictl and replication lag SQL, resolution steps including temporary single-instance reduction, and a danger admonition against manual pg_wal deletion.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

Poem

🐇 Hippity-hop, nine pages appear,
Each doc a carrot, crisp and clear!
pg_hba rules? No longer a riddle,
WAL disk full? Solved in the middle.
Vectors and parsers, replicas too—
This bunny's proud of every review! 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding PostgreSQL KB how-to and troubleshooting guides to documentation.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docs/add-pg-kb-howtos-31526

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (6)
docs/en/how_to/install_zhparser_extension.mdx (2)

67-90: ⚡ Quick win

Add zhparser.extra_dicts to the configuration table if it applies.

The text references zhparser.extra_dicts as an option that must be set before the backend starts (line 88), but this option is missing from the configuration table on lines 72–80. If this option is relevant to users, add it to the table for completeness. If it's an advanced/rarely-used option outside the scope of this guide, clarify that the table covers only the most common options.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/en/how_to/install_zhparser_extension.mdx` around lines 67 - 90, The
configuration table in the "Parser configuration" section lists options from
`zhparser.punctuation_ignore` through `zhparser.multi_zall`, but the text below
the table references `zhparser.extra_dicts` as an option that must be set before
the backend starts. Either add `zhparser.extra_dicts` to the options table with
its purpose (if it's a commonly-used option users should know about), or add a
clarifying note in the text explaining that the table covers the most common
options and `zhparser.extra_dicts` is an advanced configuration option covered
separately.

98-102: 💤 Low value

Consider adding version expectations to the verification section.

The PR objectives note that zhparser v2.3 has been tested. You may want to clarify the expected version in the verification section so users can confirm they have a compatible version installed.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/en/how_to/install_zhparser_extension.mdx` around lines 98 - 102, The
Verification section contains a SQL query to check the zhparser extension but
lacks clarity on what version users should expect. Add explanatory text after
the verification SQL query that specifies the expected version (v2.3 as
mentioned in the PR objectives) and clarifies what the output should look like.
This will help users confirm they have installed a compatible version by
providing concrete expectations for the query result.
docs/en/how_to/install_pgvector_extension.mdx (3)

24-36: ⚡ Quick win

Add fallback or troubleshooting guidance if pgvector is not available.

Line 32 notes that "version may differ depending on the operand release," implying that users should expect variation. However, there is no guidance on what to do if the extension is not found at all (empty result set). Consider adding a brief troubleshooting note: e.g., "If the query returns no rows, ensure the PostgreSQL Operator version is v4.3.0 or later."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/en/how_to/install_pgvector_extension.mdx` around lines 24 - 36, Add
troubleshooting guidance after the expected output section of the pgvector
extension verification step. Include a note that explains what to do if the
query returns no rows (empty result set), such as advising the user to verify
that the PostgreSQL Operator version is v4.3.0 or later. This will help users
understand what to check when the vector extension is not found on their system,
rather than leaving them without guidance for this failure case.

70-97: ⚖️ Poor tradeoff

Add guidance on parameter tuning and trade-offs for production use.

The indexing sections provide example configurations but lack context on when and how to tune them:

  • IVFFlat (lines 72–82): The heuristic rows / 1000 is a reasonable starting point, but there is no guidance on post-deployment tuning. Should users run benchmarks to adjust lists or ivfflat.probes?
  • HNSW (lines 84–94): The claim "slower build time and higher memory usage than IVFFlat" is accurate, but concrete resource implications (e.g., CPU/memory during index creation) are not provided. What does "slower" mean—minutes, hours? How much additional memory is typical?
  • Parameter defaults (m=16, ef_construction=64): While these are reasonable pgvector defaults, there is no guidance on adjusting them for workloads with different cardinality, dimensionality, or latency requirements.

Consider adding a brief tuning section or link to pgvector documentation on parameter selection for different workload profiles.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/en/how_to/install_pgvector_extension.mdx` around lines 70 - 97, The
documentation sections for IVFFlat and HNSW index creation lack practical
guidance on parameter tuning for production use. Add content that explains when
and how to tune the `lists` and `ivfflat.probes` parameters for IVFFlat
(including guidance on benchmarking and post-deployment adjustments), quantifies
the performance and memory trade-offs mentioned for HNSW (e.g., typical build
time ranges and memory overhead compared to IVFFlat), and provides clear
guidance on adjusting the HNSW parameters (m and ef_construction) based on
workload characteristics like cardinality, dimensionality, and latency
requirements. Consider including a dedicated tuning section or linking to
relevant pgvector documentation that explains parameter selection strategies for
different production scenarios.

40-103: ⚡ Quick win

Clarify command execution context: kubectl exec vs direct psql.

The verification command (lines 27–30) uses kubectl exec to run psql within the pod, but the SQL examples (lines 40–103) for CREATE EXTENSION, smoke test, indexing, and verification do not show the execution context. Users may be unsure whether to:

  • Run these SQL commands directly via the kubectl exec wrapper shown above, or
  • Use a local psql client with appropriate connection flags.

For consistency, either:

  1. Show all SQL examples wrapped in kubectl exec (or reference the pattern from step 1), or
  2. Clarify that the SQL commands can be executed via any psql session (local, kubectl exec, or application client) connected to the database.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/en/how_to/install_pgvector_extension.mdx` around lines 40 - 103, The SQL
examples in the "Smoke test", "IVFFlat", "HNSW", and "Upgrading the extension"
sections lack clarity about their execution context, creating ambiguity about
whether users should run them via kubectl exec (as shown in the earlier
verification step) or via a local psql client. Either wrap all SQL code blocks
with the kubectl exec pattern that was introduced earlier in the document, or
add a clarifying statement above the SQL examples explaining that these commands
can be executed via any psql session connected to the database (local, kubectl
exec, or application client). This will provide consistent guidance throughout
the documentation.
docs/en/trouble_shooting/fix_streaming_replication.mdx (1)

27-34: ⚡ Quick win

Add explicit SQL execution context.

The SQL queries in the "Check replication state on the leader" section lack an explicit command for executing them (e.g., via psql in a kubectl exec wrapper). For operators unfamiliar with the Patroni/PostgreSQL troubleshooting workflow, this creates ambiguity about how to run the queries.

Suggested enhancement to show SQL execution
 ### 2. Check replication state on the leader
 
+```bash
+kubectl exec -n $NAMESPACE $CLUSTER_NAME-0 -c postgres -- psql -U postgres -d postgres -c "
 ```sql
 -- On the leader: a healthy standby appears here in state 'streaming'
 SELECT application_name, state, sent_lsn, replay_lsn, sync_state
@@ -32,6 +33,9 @@ SELECT application_name, state, sent_lsn, replay_lsn, sync_state
 -- An inactive slot / stale restart_lsn indicates a stuck standby
 SELECT slot_name, active, restart_lsn FROM pg_replication_slots;

+```


Alternatively, restructure to explicitly instruct running the SQL via `psql` on the leader container.
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @docs/en/trouble_shooting/fix_streaming_replication.mdx around lines 27 - 34,
The SQL queries in the replication state checking section (the SELECT queries
from pg_stat_replication and pg_replication_slots) lack explicit execution
instructions, creating ambiguity for users unfamiliar with the workflow. Wrap
these SQL queries with a clear execution command showing how to run them, either
by adding a kubectl exec wrapper around the psql invocation that demonstrates
connecting to the PostgreSQL container on the leader, or by adding explicit
step-by-step instructions immediately before the SQL block that explain how to
execute these queries via psql on the leader container. Ensure the execution
method is clearly visible and easy to follow.


</details>

<!-- cr-comment:v1:c7a23cbc4449b2d1a0914d2d -->

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @docs/en/how_to/configure_pg_hba_whitelist.mdx:

  • Around line 48-49: The pg_hba configuration example at lines 48-49 with the
    catch-all rules for host all all 0.0.0.0/0 md5 and host all all ::0/0 md5
    presents highly permissive and insecure authentication settings without any
    security warning, creating a disconnect with the security warning already
    present in the SSL-off troubleshooting guide. Add an inline comment in the YAML
    example explaining that these permissive rules allow unencrypted password
    authentication from any network address and should only be used in development
    or controlled environments, or consider moving these specific lines to a
    separate clearly-marked example with a warning admonition, or omit them from the
    default example and add a reference directing readers to the troubleshooting
    guide when non-SSL access is required.

In @docs/en/trouble_shooting/pg_wal_disk_full.mdx:

  • Around line 51-54: The documentation snippet shows a kubectl get command
    followed by a comment about setting spec.numberOfInstances, which lacks clarity
    on the actual editing procedure. Replace the get command and comment with
    concrete editing instructions that demonstrate how to actually modify the
    PostgreSQL resource. Include a choice of methods such as using kubectl edit to
    open the resource in an editor and then modify the spec.numberOfInstances field,
    or alternatively show how to use kubectl patch to apply the change directly with
    the updated numberOfInstances value set to 1. Update the comment to clearly
    indicate that the edit happens within the editor rather than being a separate
    step.

Nitpick comments:
In @docs/en/how_to/install_pgvector_extension.mdx:

  • Around line 24-36: Add troubleshooting guidance after the expected output
    section of the pgvector extension verification step. Include a note that
    explains what to do if the query returns no rows (empty result set), such as
    advising the user to verify that the PostgreSQL Operator version is v4.3.0 or
    later. This will help users understand what to check when the vector extension
    is not found on their system, rather than leaving them without guidance for this
    failure case.
  • Around line 70-97: The documentation sections for IVFFlat and HNSW index
    creation lack practical guidance on parameter tuning for production use. Add
    content that explains when and how to tune the lists and ivfflat.probes
    parameters for IVFFlat (including guidance on benchmarking and post-deployment
    adjustments), quantifies the performance and memory trade-offs mentioned for
    HNSW (e.g., typical build time ranges and memory overhead compared to IVFFlat),
    and provides clear guidance on adjusting the HNSW parameters (m and
    ef_construction) based on workload characteristics like cardinality,
    dimensionality, and latency requirements. Consider including a dedicated tuning
    section or linking to relevant pgvector documentation that explains parameter
    selection strategies for different production scenarios.
  • Around line 40-103: The SQL examples in the "Smoke test", "IVFFlat", "HNSW",
    and "Upgrading the extension" sections lack clarity about their execution
    context, creating ambiguity about whether users should run them via kubectl exec
    (as shown in the earlier verification step) or via a local psql client. Either
    wrap all SQL code blocks with the kubectl exec pattern that was introduced
    earlier in the document, or add a clarifying statement above the SQL examples
    explaining that these commands can be executed via any psql session connected to
    the database (local, kubectl exec, or application client). This will provide
    consistent guidance throughout the documentation.

In @docs/en/how_to/install_zhparser_extension.mdx:

  • Around line 67-90: The configuration table in the "Parser configuration"
    section lists options from zhparser.punctuation_ignore through
    zhparser.multi_zall, but the text below the table references
    zhparser.extra_dicts as an option that must be set before the backend starts.
    Either add zhparser.extra_dicts to the options table with its purpose (if it's
    a commonly-used option users should know about), or add a clarifying note in the
    text explaining that the table covers the most common options and
    zhparser.extra_dicts is an advanced configuration option covered separately.
  • Around line 98-102: The Verification section contains a SQL query to check the
    zhparser extension but lacks clarity on what version users should expect. Add
    explanatory text after the verification SQL query that specifies the expected
    version (v2.3 as mentioned in the PR objectives) and clarifies what the output
    should look like. This will help users confirm they have installed a compatible
    version by providing concrete expectations for the query result.

In @docs/en/trouble_shooting/fix_streaming_replication.mdx:

  • Around line 27-34: The SQL queries in the replication state checking section
    (the SELECT queries from pg_stat_replication and pg_replication_slots) lack
    explicit execution instructions, creating ambiguity for users unfamiliar with
    the workflow. Wrap these SQL queries with a clear execution command showing how
    to run them, either by adding a kubectl exec wrapper around the psql invocation
    that demonstrates connecting to the PostgreSQL container on the leader, or by
    adding explicit step-by-step instructions immediately before the SQL block that
    explain how to execute these queries via psql on the leader container. Ensure
    the execution method is clearly visible and easy to follow.

</details>

<details>
<summary>🪄 Autofix (Beta)</summary>

Fix all unresolved CodeRabbit comments on this PR:

- [ ] <!-- {"checkboxId": "4b0d0e0a-96d7-4f10-b296-3a18ea78f0b9"} --> Push a commit to this branch (recommended)
- [ ] <!-- {"checkboxId": "ff5b1114-7d8c-49e6-8ac1-43f82af23a33"} --> Create a new PR with the fixes

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: Organization UI

**Review profile**: CHILL

**Plan**: Pro

**Run ID**: `95764454-28c2-438a-a82e-42df0c44b031`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between d596803b1ab9902c7a6e85938ba110caf6a75e99 and db4dc3239ea33f222ce3706980ef60d0a652b373.

</details>

<details>
<summary>📒 Files selected for processing (9)</summary>

* `docs/en/how_to/configure_pg_hba_whitelist.mdx`
* `docs/en/how_to/disable_nodeport_exposure.mdx`
* `docs/en/how_to/install_pgvector_extension.mdx`
* `docs/en/how_to/install_zhparser_extension.mdx`
* `docs/en/how_to/run_postgresql_as_root.mdx`
* `docs/en/trouble_shooting/connection_ssl_off.mdx`
* `docs/en/trouble_shooting/coredump_huge_pages.mdx`
* `docs/en/trouble_shooting/fix_streaming_replication.mdx`
* `docs/en/trouble_shooting/pg_wal_disk_full.mdx`

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Comment thread docs/en/how_to/configure_pg_hba_whitelist.mdx
Comment thread docs/en/trouble_shooting/pg_wal_disk_full.mdx
@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jun 18, 2026

Copy link
Copy Markdown

Deploying alauda-postgresql with  Cloudflare Pages  Cloudflare Pages

Latest commit: 2580206
Status: ✅  Deploy successful!
Preview URL: https://bef1c9e6.alauda-postgresql.pages.dev
Branch Preview URL: https://docs-add-pg-kb-howtos-31526.alauda-postgresql.pages.dev

View logs

- pg_hba whitelist: warn about permissive catch-all 0.0.0.0/0 / ::0/0 rules
- pg_wal disk full: show concrete kubectl patch instead of a get + comment
- zhparser: add zhparser.extra_dicts to the options table

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant