Skip to content

Add GlobalSecondaryIndexDefinition for GSI container support#47468

Open
mbhaskar wants to merge 3 commits into
Azure:mainfrom
mbhaskar:mbhaskar/implement-gsi-containers
Open

Add GlobalSecondaryIndexDefinition for GSI container support#47468
mbhaskar wants to merge 3 commits into
Azure:mainfrom
mbhaskar:mbhaskar/implement-gsi-containers

Conversation

@mbhaskar

Copy link
Copy Markdown
Member

Summary

Implements Global Secondary Index (GSI) container support in the Python Cosmos SDK, mirroring the Java SDK implementation (Azure/azure-sdk-for-java#48480).

A GSI container is a derived container built from a source container using a SQL projection query. The service already supports this; this PR adds the client-side support.

Changes

New files

  • azure/cosmos/_global_secondary_index.pyGlobalSecondaryIndexDefinition class with constructor validation, serialization (_to_dict/_from_dict), and read-only server-populated properties (source_container_rid, status)
  • tests/test_global_secondary_index.py — 20 unit tests covering constructor validation, serialization, deserialization, forward-compatibility
  • tests/test_global_secondary_index_live.py — Live integration tests using dedicated GSI test account

Modified files

  • azure/cosmos/__init__.py — Export GlobalSecondaryIndexDefinition
  • azure/cosmos/database.py — Added global_secondary_index_definition kwarg to create_container, create_container_if_not_exists, and replace_container (all overloads) with dual-write pattern
  • azure/cosmos/aio/_database.py — Async mirror of all sync changes
  • CHANGELOG.md — Feature entry
  • pytest.ini — Added cosmosGSI marker for live tests
  • live-platform-matrix.json — Added GSITestConfig matrix entry
  • tests.yml — Mapped gsi-pipeline-uri/gsi-pipeline-key KV secrets to env vars

Design Decisions

Dual-Write Pattern

For backward compatibility with older service versions, the SDK writes the GSI definition under both:

  • "globalSecondaryIndexDefinition" (new key)
  • "materializedViewDefinition" (legacy key)

On read, the new key is preferred with fallback to the legacy key.

Status as String Constants (not enum)

Status values ("Initializing", "InitialBuildAfterCreate", "InitialBuildAfterRestore", "Active", "DeleteInProgress") are kept as plain strings for forward-compatibility with future service-defined values.

Pattern Consistency

Follows the same integration pattern as computed_properties, vector_embedding_policy, full_text_policy, and change_feed_policy — kwarg extraction via kwargs.pop(), added to definition dict if not None.

Testing

  • All 20 unit tests pass
  • Live tests configured to use gsi-pipeline-uri/gsi-pipeline-key from KV (same secrets used by Java SDK pipeline)

Copilot AI review requested due to automatic review settings June 12, 2026 00:19
@mbhaskar mbhaskar requested a review from a team as a code owner June 12, 2026 00:19
mbhaskar and others added 2 commits June 11, 2026 17:21
Implement client-side support for Global Secondary Index (GSI) containers
in the Python Cosmos SDK, mirroring the Java SDK implementation.

Changes:
- Add GlobalSecondaryIndexDefinition class with serialization/deserialization
- Add global_secondary_index_definition keyword to create_container,
  create_container_if_not_exists, and replace_container (sync and async)
- Implement dual-write pattern: writes to both globalSecondaryIndexDefinition
  and materializedViewDefinition keys for backward compatibility
- Add comprehensive unit tests (20 tests)
- Update CHANGELOG

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add 'cosmosGSI' pytest mark to pytest.ini
- Add GSITestConfig matrix entry in live-platform-matrix.json
- Add GSI_ACCOUNT_HOST/GSI_ACCOUNT_KEY env var mappings from Key Vault
  secrets (gsi-pipeline-uri, gsi-pipeline-key) in tests.yml
- Create test_global_secondary_index_live.py with live tests for:
  - Creating GSI container with GlobalSecondaryIndexDefinition class
  - Creating GSI container with raw dict
  - create_container_if_not_exists with GSI definition

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@mbhaskar mbhaskar force-pushed the mbhaskar/implement-gsi-containers branch from fef6964 to 713f56a Compare June 12, 2026 00:21

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds client-side support for Cosmos DB Global Secondary Index (GSI) containers in azure-cosmos, including a new GlobalSecondaryIndexDefinition model, plumbing through container create/replace APIs (sync + async) with a dual-write wire format, and CI/live-test wiring for a dedicated GSI-enabled account.

Changes:

  • Added GlobalSecondaryIndexDefinition (validation + (de)serialization) and exported it from azure.cosmos.
  • Added global_secondary_index_definition kwarg support to create_container, create_container_if_not_exists, and replace_container for both sync and async database clients (writing both globalSecondaryIndexDefinition and materializedViewDefinition).
  • Added unit + live tests plus pipeline/matrix/marker updates to run GSI live coverage.
Show a summary per file
File Description
sdk/cosmos/tests.yml Injects GSI account secrets as env vars for pipeline live runs.
sdk/cosmos/live-platform-matrix.json Adds a dedicated matrix entry to run cosmosGSI marked live tests.
sdk/cosmos/azure-cosmos/tests/test_global_secondary_index.py Unit tests for constructor validation and dict round-tripping.
sdk/cosmos/azure-cosmos/tests/test_global_secondary_index_live.py Live tests that exercise creating GSI containers (class + raw dict).
sdk/cosmos/azure-cosmos/pytest.ini Registers the new cosmosGSI pytest marker.
sdk/cosmos/azure-cosmos/CHANGELOG.md Documents the new GSI feature in the Unreleased section.
sdk/cosmos/azure-cosmos/azure/cosmos/database.py Adds sync API kwarg plumbing + dual-write of GSI definition in create/replace.
sdk/cosmos/azure-cosmos/azure/cosmos/aio/_database.py Async mirror of sync GSI kwarg plumbing + dual-write behavior.
sdk/cosmos/azure-cosmos/azure/cosmos/_global_secondary_index.py Introduces GlobalSecondaryIndexDefinition model implementation.
sdk/cosmos/azure-cosmos/azure/cosmos/init.py Exports GlobalSecondaryIndexDefinition from the public package namespace.

Copilot's findings

Comments suppressed due to low confidence (1)

sdk/cosmos/azure-cosmos/CHANGELOG.md:11

  • The new changelog entry under "Features Added" is missing the usual reference/link to the PR (most other bullets in this section end with "See PR ####" or similar). Keeping the same format makes it easier to trace changes back to discussions and reviews.

#### Bugs Fixed

#### Other Changes
  • Files reviewed: 10/10 changed files
  • Comments generated: 1

Comment on lines 114 to 120
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dibahlfi

Copy link
Copy Markdown
Member

@sdkReviewAgent-2

@dibahlfi

Copy link
Copy Markdown
Member

/azp run python - cosmos - tests

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

analytical_storage_ttl: Optional[int] = None,
computed_properties: Optional[list[dict[str, str]]] = None,
full_text_policy: Optional[dict[str, Any]] = None,
global_secondary_index_definition: Optional[Any] = None,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Blocking — Sync/Async Parity: Missing parameter in second replace_container overload

The replace_container overload with return_properties: Literal[True] (line 976–993) is missing global_secondary_index_definition in its signature:

# Line 989-990 — gap between full_text_policy and return_properties
full_text_policy: Optional[dict[str, Any]] = None,
return_properties: Literal[True],  # <-- global_secondary_index_definition should be before this

The first sync overload (line 912, return_properties: Literal[False]) has it at line 926. The async second overload (aio/_database.py line 911–928) also has it at line 924. And this overload's own docstring (line 1018) documents the parameter.

At runtime this works because **kwargs catches it, but the type signature is wrong — mypy/pyright users calling db.replace_container(container, pk, global_secondary_index_definition=gsi_def, return_properties=True) will get a type error from this overload.

Fix: Add global_secondary_index_definition: Optional[Any] = None, between full_text_policy and return_properties on line 989–990, matching the first overload and the async mirror.

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

* Added `get_response_headers()` and `get_last_response_headers()` methods to the `CosmosItemPaged` and `CosmosAsyncItemPaged` objects returned by `query_items()`, allowing access to response headers from query operations. See [PR 44593](https://github.com/Azure/azure-sdk-for-python/pull/44593)
* Added InferenceRequestTimeout property for HttpTimeout Policy to Reranking API. See [45469](https://github.com/Azure/azure-sdk-for-python/pull/45469)
* Added `full_text_score_scope` parameter to `query_items()` for controlling BM25 statistics scope in hybrid search queries. Supports "Local" and "Global" (default) scopes. See [45686](https://github.com/Azure/azure-sdk-for-python/pull/45686)
* Added `GlobalSecondaryIndexDefinition` class and `global_secondary_index_definition` keyword to `create_container`, `create_container_if_not_exists`, and `replace_container` methods for creating Global Secondary Index (GSI) containers.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Blocking — Correctness: CHANGELOG entry in wrong version section

This line is under the 4.16.0b1 (2026-03-21) heading (line 54), which is an already-released version. The entry should be under the 4.16.2 (Unreleased) section at the top of the file (line 3).

Shipping a changelog entry in an already-released version means the feature won't appear in the next release's notes, and it retroactively modifies a published version's history.

Fix: Move this line under ### 4.16.2 (Unreleased) > #### Features Added (around line 5).

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

gsi_dict = (global_secondary_index_definition._to_dict()
if hasattr(global_secondary_index_definition, '_to_dict')
else global_secondary_index_definition)
definition["globalSecondaryIndexDefinition"] = gsi_dict

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Recommendation — Testing: Dual-write pattern has no test coverage

The dual-write to both globalSecondaryIndexDefinition and materializedViewDefinition is the key backward-compatibility contract of this PR (matching Java SDK behavior), but no test verifies that both keys are present in the serialized output.

The live tests only assert properties["globalSecondaryIndexDefinition"] is present — they never check materializedViewDefinition. If someone accidentally removes line 462, backward compatibility with older service versions silently breaks with no test failure.

Suggestion: Add a unit test that mocks client_connection.CreateContainer and asserts the collection dict passed to it contains both keys with identical values. Alternatively, extract the dict-building logic and test it directly. This is the single most important behavioral contract that lacks test proof.

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

self.test_db.delete_container(source_container.id)


if __name__ == "__main__":

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Recommendation — Testing: No test coverage for replace_container with GSI

The PR adds global_secondary_index_definition support to replace_container (both sync and async), but no test — unit or live — exercises this code path. The replace_container implementation builds the parameters dict differently from create_container (dict comprehension vs sequential if blocks), so a copy-paste error or refactoring mistake would go undetected.

Additionally, consider adding at least one async live test. The async mirror in aio/_database.py has its own kwargs.pop(), dict-building, and dual-write logic but has zero test coverage. Existing features like computed_properties and full_text_policy have paired sync/async test files.

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

return result

@classmethod
def _from_dict(cls, data: Optional[dict]) -> Optional["GlobalSecondaryIndexDefinition"]:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Recommendation — Cross-SDK: _from_dict is unused in production; read-path lacks legacy key fallback

_from_dict is tested (6 unit tests) but never called from any production code path. When users call container.read(), the response returns raw dicts — the GSI definition is never hydrated into a GlobalSecondaryIndexDefinition instance. This is consistent with how other features work in this SDK, but creates a gap vs. the Java SDK:

The Java SDK's DocumentCollection.getGlobalSecondaryIndexDefinition() implements a read-path key fallback — it checks globalSecondaryIndexDefinition first, then falls back to materializedViewDefinition. If the Python SDK talks to a service version that only returns the legacy key, users accessing properties["globalSecondaryIndexDefinition"] will get KeyError.

The live test at line 66 (self.assertIn("globalSecondaryIndexDefinition", properties)) only passes because the test account returns the new key.

Suggestions:

  1. If _from_dict is intended as a public convenience, consider making it a public method (drop the underscore) and documenting its usage.
  2. Consider adding a note in the class docstring that users may need to check both globalSecondaryIndexDefinition and materializedViewDefinition keys when reading container properties during the service transition period.

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

@xinlian12

Copy link
Copy Markdown
Member

Review complete (38:22)

Posted 5 inline comment(s).

Steps: ✓ context, correctness, cross-sdk, design, history, past-prs, synthesis, test-coverage

gsi_dict = (global_secondary_index_definition._to_dict()
if hasattr(global_secondary_index_definition, '_to_dict')
else global_secondary_index_definition)
definition["globalSecondaryIndexDefinition"] = gsi_dict

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Recommendation — Cross-SDK: No Read-Path Fallback for Legacy Key

The dual-write here correctly sends the GSI definition under both globalSecondaryIndexDefinition (new) and materializedViewDefinition (legacy). But on the read path, there's no equivalent normalization.

If an older service version returns only materializedViewDefinition in the container properties, callers who look for properties["globalSecondaryIndexDefinition"] — as the live tests do — will get a KeyError.

The Java SDK handles this explicitly in DocumentCollection.getGlobalSecondaryIndexDefinition() (lines 475–489): it checks the new key first, then falls back to the legacy key.

Since the Python SDK returns raw dicts from container.read() (consistent with how computed_properties, vector_embedding_policy, etc. work), a full read-path normalization may be out of scope. But at minimum, consider:

  1. Documenting that users should check both keys on older service versions, or
  2. Adding a helper in ContainerProxy.read() that normalizes materializedViewDefinitionglobalSecondaryIndexDefinition in the response.

Also note: the _from_dict method exists and is unit-tested, but is never called from any production code path. If it's intended as a user-facing convenience, it should be documented. If it's for future SDK-internal use, the PR description's claim about read-path behavior ("On read, the new key is preferred with fallback to the legacy key") should be corrected, since that behavior is not implemented.

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

vector_embedding_policy: Optional[dict[str, Any]] = None,
change_feed_policy: Optional[dict[str, Any]] = None,
full_text_policy: Optional[dict[str, Any]] = None,
global_secondary_index_definition: Optional[Any] = None,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Recommendation — Optional[Any] Type Hint Weaker Than Siblings

Every sibling container-definition parameter uses a specific type:

  • computed_properties: Optional[list[dict[str, str]]]
  • vector_embedding_policy: Optional[dict[str, Any]]
  • full_text_policy: Optional[dict[str, Any]]

And the codebase already has a Union example for dual-type params:

  • offer_throughput: Optional[Union[int, ThroughputProperties]]

But GSI uses Optional[Any], which gives type checkers and IDEs nothing to work with. The docstrings already state the correct type (~azure.cosmos.GlobalSecondaryIndexDefinition or dict[str, Any]), so the signature should match:

global_secondary_index_definition: Optional[Union["GlobalSecondaryIndexDefinition", dict[str, Any]]] = None,

This applies to all 9 overload/implementation signatures across database.py and aio/_database.py.

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

* Added `get_response_headers()` and `get_last_response_headers()` methods to the `CosmosItemPaged` and `CosmosAsyncItemPaged` objects returned by `query_items()`, allowing access to response headers from query operations. See [PR 44593](https://github.com/Azure/azure-sdk-for-python/pull/44593)
* Added InferenceRequestTimeout property for HttpTimeout Policy to Reranking API. See [45469](https://github.com/Azure/azure-sdk-for-python/pull/45469)
* Added `full_text_score_scope` parameter to `query_items()` for controlling BM25 statistics scope in hybrid search queries. Supports "Local" and "Global" (default) scopes. See [45686](https://github.com/Azure/azure-sdk-for-python/pull/45686)
* Added `GlobalSecondaryIndexDefinition` class and `global_secondary_index_definition` keyword to `create_container`, `create_container_if_not_exists`, and `replace_container` methods for creating Global Secondary Index (GSI) containers.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Blocking — CHANGELOG Entry Under Wrong Version

This feature entry is placed under ### 4.16.0b1 (2026-03-21) — a version that was already released ~3 months ago. It should be under ### 4.16.2 (Unreleased) > #### Features Added (currently empty at lines 3–6).

Leaving it here would make the release history claim that GSI support shipped in 4.16.0b1, which is incorrect. Automated changelog consumers and customers checking version compatibility will be misled.

Fix: Move this line to under ### 4.16.2 (Unreleased) > #### Features Added.

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

gsi_dict = (global_secondary_index_definition._to_dict()
if hasattr(global_secondary_index_definition, '_to_dict')
else global_secondary_index_definition)
parameters["globalSecondaryIndexDefinition"] = gsi_dict

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Suggestion — No Test Coverage for Dual-Write Behavior

The dual-write to both globalSecondaryIndexDefinition and materializedViewDefinition is the central backward-compatibility design decision of this PR, but no test verifies it:

  • The live tests only assert "globalSecondaryIndexDefinition" in properties on read-back — they never check that materializedViewDefinition was also sent.
  • The unit tests only cover GlobalSecondaryIndexDefinition in isolation — none exercise the database.py wiring.

If someone refactors this code and accidentally drops the materializedViewDefinition write, no test catches it — yet older service versions would silently fail to recognize the GSI definition.

Consider adding a unit test that mocks client_connection.CreateContainer, calls database.create_container(global_secondary_index_definition=...), and asserts the definition dict contains both keys with identical values.

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

:keyword dict[str, Any] full_text_policy: **provisional** The full text policy for the container.
Used to denote the default language to be used for all full text indexes, or to individually
assign a language to each full text index path.
:keyword global_secondary_index_definition: The global secondary index definition for the container.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Blocking — Missing Overload Parameter

The sync replace_container overload with return_properties: Literal[True] (lines 976–993) is missing global_secondary_index_definition from its signature, while:

  • Sync overload 1 (return_properties=False, line 926) includes it ✅
  • Async overload 1 (line 862) includes it ✅
  • Async overload 2 (line 924) includes it ✅
  • Sync overload 2 (line 990) — missing

The docstring here correctly documents the parameter, but the @overload signature above it doesn't declare it. This means type checkers will reject:

db.replace_container(container, pk, global_secondary_index_definition=gsi, return_properties=True)

even though the implementation (kwargs.pop) handles it correctly at runtime.

Fix: Add global_secondary_index_definition: Optional[Any] = None, between full_text_policy and return_properties in the overload at line 989–990.

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.


import test_config
from azure.cosmos import CosmosClient, PartitionKey
from azure.cosmos._global_secondary_index import GlobalSecondaryIndexDefinition

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Suggestion — Live Tests Import from Private Module

Both test_global_secondary_index.py and this file import from the private path azure.cosmos._global_secondary_index instead of the public API azure.cosmos:

# Current (private import)
from azure.cosmos._global_secondary_index import GlobalSecondaryIndexDefinition

# Preferred (public import)
from azure.cosmos import GlobalSecondaryIndexDefinition

The unit test at line 196 (test_import_from_azure_cosmos) verifies the public export works — but the live test itself doesn't use it. If the public re-export were accidentally dropped from __init__.py, the unit test would catch it, but these live tests would still pass, masking the breakage for real users.

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

@xinlian12

Copy link
Copy Markdown
Member

Review complete (38:58)

Posted 6 inline comment(s).

Steps: ✓ context, correctness, cross-sdk, design, history, past-prs, synthesis, test-coverage

gsi_dict = (global_secondary_index_definition._to_dict()
if hasattr(global_secondary_index_definition, '_to_dict')
else global_secondary_index_definition)
definition["globalSecondaryIndexDefinition"] = gsi_dict

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Recommendation — Cross-SDK: Missing Source Container RID Resolution

The Java SDK's CosmosAsyncDatabase.createContainerInternal() performs a pre-create step: it reads the source container to resolve its _rid, injects it as sourceCollectionRid into the GSI definition, and then sends the create request. This Python implementation does not do that — it sends the GSI definition as-is, with sourceCollectionRid absent.

// Java SDK — CosmosAsyncDatabase.java
ridResolution = this.getContainer(sourceContainerId).read()
    .flatMap(resp -> {
        String rid = resp.getProperties().getResourceId();
        setSourceCollectionRid(gsiDefinition, rid);
        snapshot.setGlobalSecondaryIndexDefinition(gsiDefinition);
        return Mono.empty();
    });

Why this matters:

  1. If the service requires sourceCollectionRid in the request body, GSI creation will fail.
  2. If a user specifies a non-existent source_container_id, Java gives a clear 404 during RID resolution; Python delegates to the service, which may return a less clear error.

Suggested action: Confirm via live tests whether the service accepts requests without sourceCollectionRid. If it does, this is acceptable (the service handles resolution). If not, add a pre-create read of the source container to resolve and inject the RID, similar to Java. At minimum, document this behavioral difference.

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

analytical_storage_ttl: Optional[int] = None,
computed_properties: Optional[list[dict[str, str]]] = None,
full_text_policy: Optional[dict[str, Any]] = None,
global_secondary_index_definition: Optional[Any] = None,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Blocking — Missing Overload Parameter

The sync replace_container overload with return_properties: Literal[True] (starting at line 976) is missing global_secondary_index_definition in its signature. Compare with the Literal[False] overload here at line 926 which correctly includes it, and both async overloads which also have it.

The parameter jumps from full_text_policy directly to return_properties: Literal[True] at line 990:

full_text_policy: Optional[dict[str, Any]] = None,
return_properties: Literal[True],  # ← missing global_secondary_index_definition before this

Why this matters: Type checkers (mypy/pyright) will reject db.replace_container(..., global_secondary_index_definition=gsi, return_properties=True) as invalid, even though it works at runtime. IDE autocomplete also won't suggest the parameter for this overload.

Fix: Add global_secondary_index_definition: Optional[Any] = None, between full_text_policy and return_properties in the second replace_container overload (line ~990), matching the pattern here.

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

:keyword dict[str, Any] full_text_policy: **provisional** The full text policy for the container.
Used to denote the default language to be used for all full text indexes, or to individually
assign a language to each full text index path.
:keyword global_secondary_index_definition: The global secondary index definition for the container.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Recommendation — Async Overload Docstrings Missing GSI Keyword

The global_secondary_index_definition parameter was added to all 6 async overload signatures (2 for create_container, 2 for create_container_if_not_exists, 2 for replace_container), but the :keyword global_secondary_index_definition: docstring entry was only added to the 3 implementation method docstrings — not to any of the 6 overload docstrings.

Compare with the sync database.py, where ALL 9 docstrings (6 overloads + 3 implementations) correctly include the keyword documentation. Also compare with existing keywords like full_text_policy and change_feed_policy, which ARE documented in the async overload docstrings.

Why this matters: Users relying on IDE hover docs will see the parameter in the signature but no documentation for it, inconsistently across sync vs. async.

Fix: Add the :keyword and :paramtype entries to all 6 async overload docstrings, matching the sync versions:

:keyword global_secondary_index_definition: The global secondary index definition for the container.
    Used to create a GSI container derived from a source container via a SQL projection query.
:paramtype global_secondary_index_definition: ~azure.cosmos.GlobalSecondaryIndexDefinition or dict[str, Any]

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

from typing import Optional


class GlobalSecondaryIndexDefinition:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Recommendation — Missing **provisional** Marker

The Java SDK marks all GSI public API with @Beta(warningText = PREVIEW_SUBJECT_TO_CHANGE_WARNING), signaling that the feature is subject to change. This Python class has no equivalent **provisional** marker.

Other preview features in this SDK use **provisional** in their docstrings — for example, full_text_policy is documented as :keyword dict[str, Any] full_text_policy: **provisional** The full text policy... in all method docstrings. The vector_embedding_policy's embeddingSource also uses **provisional**.

Why this matters: Users may build production code against this API without realizing it could change. The CHANGELOG entry also doesn't note the preview status.

Fix:

  1. Add **provisional** to the class docstring: """**provisional** Definition for a Global Secondary Index (GSI) container.
  2. Add **provisional** prefix to all :keyword global_secondary_index_definition: docstring entries (e.g., :keyword global_secondary_index_definition: **provisional** The global secondary index...)
  3. Update the CHANGELOG entry to note it's provisional

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

"sourceCollectionId": self._source_container_id,
"definition": self._definition,
}
if self._source_container_rid is not None:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Suggestion — _to_dict Re-serializes Server-Populated Read-Only Fields

When _to_dict() is called on a GlobalSecondaryIndexDefinition that was deserialized from a server response (via _from_dict), the server-populated sourceCollectionRid and status fields are included in the output. If this dict is then passed to replace_container, stale server-populated values would be sent back to the service.

if self._source_container_rid is not None:
    result["sourceCollectionRid"] = self._source_container_rid
if self._status is not None:
    result["status"] = self._status

Why this matters: While the service likely ignores read-only fields on writes, this creates a subtle correctness risk if the service behavior changes. The Java SDK handles this more carefully by re-emitting the definition through populatePropertyBag() which controls which fields are serialized.

Suggested alternative: Consider either (a) excluding server-populated fields from _to_dict (since they're read-only), or (b) adding a note in the docstring that _to_dict is intended for read/inspection and that create_container/replace_container should use freshly-constructed instances.

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

self.test_db.delete_container(source_container.id)


if __name__ == "__main__":

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Recommendation — No Test Coverage for replace_container with GSI

The PR adds global_secondary_index_definition support to replace_container (sync + async), with the same dual-write pattern as create_container. However, there are no live or unit tests exercising this path.

The Java SDK has a dedicated replaceGsiContainer test that verifies the GSI definition is preserved after a replace operation (when modifying other properties like indexing policy). Comparable Python features (full_text_policy, computed_properties) also have replace-specific test coverage.

Why this matters: The replace_container implementation constructs the parameters dict differently from create_container (using a dict comprehension, then appending GSI after). A bug in this integration would go undetected. The replace_container path also has a confirmed missing overload parameter (see separate comment on database.py), suggesting this area received less review attention.

Fix: Add at least one live test that calls replace_container on a GSI container and verifies the definition is preserved in the response.

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

gsi_dict = (global_secondary_index_definition._to_dict()
if hasattr(global_secondary_index_definition, '_to_dict')
else global_secondary_index_definition)
parameters["globalSecondaryIndexDefinition"] = gsi_dict

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Recommendation — Dual-Write Aliasing Repeated in 4 Places Without Centralization

The backward-compatibility dual-write logic — writing to both globalSecondaryIndexDefinition and materializedViewDefinition — is duplicated in 4 call sites: sync create_container, sync replace_container, async create_container, async replace_container. This is unique to GSI; no other container property uses this pattern.

parameters["globalSecondaryIndexDefinition"] = gsi_dict
parameters["materializedViewDefinition"] = gsi_dict

Why this matters: If the aliasing story changes (e.g., the legacy key is deprecated), all 4 call sites must be updated in lockstep. A future change could easily update create_container but miss replace_container (or the async variant), introducing a silent regression. The hasattr-based serialization is also repeated in all 4 sites.

Suggested alternative: Extract a single helper function:

def _normalize_gsi_definition(value):
    """Normalize GSI definition and return (gsi_dict, key_pairs) for dual-write."""
    gsi_dict = value._to_dict() if hasattr(value, '_to_dict') else value
    return gsi_dict

Then call it once per site with a standardized dual-write block. This keeps the aliasing logic in one place.

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

@xinlian12

Copy link
Copy Markdown
Member

Review complete (47:54)

Posted 7 inline comment(s).

Steps: ✓ context, correctness, cross-sdk, design, history, past-prs, synthesis, test-coverage

analytical_storage_ttl: Optional[int] = None,
computed_properties: Optional[list[dict[str, str]]] = None,
full_text_policy: Optional[dict[str, Any]] = None,
global_secondary_index_definition: Optional[Any] = None,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Blocking — Sync/Async Parity: Missing parameter in replace_container overload

The return_properties: Literal[True] overload of replace_container (line 976–993) is missing global_secondary_index_definition: Optional[Any] = None in its signature. The first overload (line 926) and both async overloads (aio/_database.py lines 862 and 924) include it correctly.

Why this matters: Type checkers (mypy, pyright) and IDE autocompletion will reject or hide the parameter when callers pass return_properties=True:

# Type error — parameter not in this overload's signature
db.replace_container(
    container, pk,
    global_secondary_index_definition=gsi,
    return_properties=True  # ← triggers the Literal[True] overload
)

At runtime it still works (falls into **kwargs), but the static contract is broken, and sync/async parity is violated.

Fix: Add global_secondary_index_definition: Optional[Any] = None, between full_text_policy and return_properties in the second overload at line 989:

        full_text_policy: Optional[dict[str, Any]] = None,
        global_secondary_index_definition: Optional[Any] = None,  # ← add this
        return_properties: Literal[True],

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

* Added `get_response_headers()` and `get_last_response_headers()` methods to the `CosmosItemPaged` and `CosmosAsyncItemPaged` objects returned by `query_items()`, allowing access to response headers from query operations. See [PR 44593](https://github.com/Azure/azure-sdk-for-python/pull/44593)
* Added InferenceRequestTimeout property for HttpTimeout Policy to Reranking API. See [45469](https://github.com/Azure/azure-sdk-for-python/pull/45469)
* Added `full_text_score_scope` parameter to `query_items()` for controlling BM25 statistics scope in hybrid search queries. Supports "Local" and "Global" (default) scopes. See [45686](https://github.com/Azure/azure-sdk-for-python/pull/45686)
* Added `GlobalSecondaryIndexDefinition` class and `global_secondary_index_definition` keyword to `create_container`, `create_container_if_not_exists`, and `replace_container` methods for creating Global Secondary Index (GSI) containers.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Recommendation — Release: CHANGELOG entry under already-released version

This feature entry is placed under 4.16.0b1 (2026-03-21), which has already been released. It should be under the 4.16.2 (Unreleased) section at the top of the file (line 5, "Features Added").

If a release is cut from the unreleased section, the GSI feature won't appear in the correct release notes.

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

"sourceCollectionId": self._source_container_id,
"definition": self._definition,
}
if self._source_container_rid is not None:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Recommendation — Design: _to_dict() serializes server-owned read-only fields back on the write path

_to_dict() includes sourceCollectionRid and status in the output dict whenever they are set. These are documented as read-only, server-populated fields, but they get echoed back into the CreateContainer / ReplaceContainer request payload if a user round-trips a hydrated model:

# Hypothetical round-trip scenario:
gsi = GlobalSecondaryIndexDefinition._from_dict(container_props["globalSecondaryIndexDefinition"])

# gsi now has status="Active", source_container_rid="abc123="
db.replace_container(c, pk, global_secondary_index_definition=gsi)

# → payload includes "status": "Active" and "sourceCollectionRid": "abc123=" as client-authored fields

The Java SDK avoids this by using separate internal/external serialization paths. Consider either:

  • Omitting server-populated fields from _to_dict() by default (only serialize sourceCollectionId and definition)
  • Or adding a flag: _to_dict(include_server_fields=False)

This also future-proofs against unknown status values: reading them is safe, but echoing them back on writes couples the client to undocumented server tolerance for response-only fields.

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

gsi_dict = (global_secondary_index_definition._to_dict()
if hasattr(global_secondary_index_definition, '_to_dict')
else global_secondary_index_definition)
definition["globalSecondaryIndexDefinition"] = gsi_dict

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Recommendation — Correctness: Dual-write contract has no test coverage

The backward-compatibility dual-write — emitting the GSI payload under both "globalSecondaryIndexDefinition" and "materializedViewDefinition" — is the core compatibility mechanism in this PR, yet no test verifies that both keys are present in the outgoing request. The live tests only assert "globalSecondaryIndexDefinition" in the read-back response.

A regression that drops or misspells one key (e.g., a future refactor) would silently break compatibility with older service versions and would not be caught by the current test suite.

Suggested action: Add a mock/offline unit test (similar to test_vector_policy.py:705-722) that patches CreateContainer / ReplaceContainer and asserts:

  1. Both "globalSecondaryIndexDefinition" and "materializedViewDefinition" are present in the payload
  2. Both keys contain identical values
  3. This works for both GlobalSecondaryIndexDefinition input and raw dict input

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.


import test_config
from azure.cosmos import CosmosClient, PartitionKey
from azure.cosmos._global_secondary_index import GlobalSecondaryIndexDefinition

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Suggestion — Code Quality: Live test uses private import path

This imports from the private module azure.cosmos._global_secondary_index rather than the public export azure.cosmos. The unit test (test_global_secondary_index.py:193-194) correctly tests the public import path, but the live test doesn't use it.

If the public export in __init__.py were accidentally removed in a future change, only the unit test would catch it — the live tests would continue passing against the private module.

Fix:

from azure.cosmos import GlobalSecondaryIndexDefinition

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

partition_key=PartitionKey(path="/id"),
global_secondary_index_definition=gsi_definition
)
self.assertEqual(gsi_container_again.id, container_id)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Recommendation — Test Coverage: No replace_container or async live tests

The PR modifies replace_container in both sync and async paths (adding GSI kwarg extraction, serialization, and dual-write), but the live test suite only covers create_container and create_container_if_not_exists. By comparison, sibling features have dedicated replace_container tests:

  • test_computed_properties.py has test_replace_with_same_computed_properties, test_replace_without_computed_properties, etc.
  • test_full_text_policy.py has replace coverage
  • test_vector_policy.py has replace coverage

Additionally, there are no async live tests at all — the async mirror (aio/_database.py) receives identical GSI changes but has zero test coverage. Sibling features have async test files (e.g., test_computed_properties_async.py).

Suggested additions:

  1. A sync live replace_container test that verifies the GSI definition survives (or is properly rejected as immutable)
  2. At minimum one async live test for create_container with GSI

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

return result

@classmethod
def _from_dict(cls, data: Optional[dict]) -> Optional["GlobalSecondaryIndexDefinition"]:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Recommendation — Design: _from_dict has no production callers — PR description claims unimplemented behavior

The PR description states: "On read, the new key is preferred with fallback to the legacy key." The Java SDK implements this in DocumentCollection.getGlobalSecondaryIndexDefinition() (checks globalSecondaryIndexDefinition first, falls back to materializedViewDefinition).

However, _from_dict is never called from any SDK production code — only from unit tests. ContainerProxy.read() returns raw CosmosDict, and no code normalizes the top-level key. This means:

  1. If a server returns only the legacy materializedViewDefinition key, users checking properties["globalSecondaryIndexDefinition"] will get a KeyError
  2. The read-only properties (source_container_rid, status) advertised on the class are effectively unreachable through normal SDK usage
  3. The _from_dict method could silently drift from the actual wire format with no production usage to catch regressions

Suggested action: Either:

  • Implement the read-path fallback described in the PR description (normalize key names when container properties are returned), or
  • Remove _from_dict and the read-only properties for now, and update the PR description to accurately reflect the current behavior (write-only integration). Add them back when read-path integration is ready.

This is consistent with how sibling features work (all return raw dicts), so the second option may be the lower-risk path.

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

@xinlian12

Copy link
Copy Markdown
Member

Review complete (05:21)

Posted 7 inline comment(s).

Steps: ✓ context, correctness, cross-sdk, design, history, past-prs, synthesis, test-coverage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

4 participants