Add GlobalSecondaryIndexDefinition for GSI container support#47468
Add GlobalSecondaryIndexDefinition for GSI container support#47468mbhaskar wants to merge 3 commits into
Conversation
Implement client-side support for Global Secondary Index (GSI) containers in the Python Cosmos SDK, mirroring the Java SDK implementation. Changes: - Add GlobalSecondaryIndexDefinition class with serialization/deserialization - Add global_secondary_index_definition keyword to create_container, create_container_if_not_exists, and replace_container (sync and async) - Implement dual-write pattern: writes to both globalSecondaryIndexDefinition and materializedViewDefinition keys for backward compatibility - Add comprehensive unit tests (20 tests) - Update CHANGELOG Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add 'cosmosGSI' pytest mark to pytest.ini - Add GSITestConfig matrix entry in live-platform-matrix.json - Add GSI_ACCOUNT_HOST/GSI_ACCOUNT_KEY env var mappings from Key Vault secrets (gsi-pipeline-uri, gsi-pipeline-key) in tests.yml - Create test_global_secondary_index_live.py with live tests for: - Creating GSI container with GlobalSecondaryIndexDefinition class - Creating GSI container with raw dict - create_container_if_not_exists with GSI definition Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
fef6964 to
713f56a
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds client-side support for Cosmos DB Global Secondary Index (GSI) containers in azure-cosmos, including a new GlobalSecondaryIndexDefinition model, plumbing through container create/replace APIs (sync + async) with a dual-write wire format, and CI/live-test wiring for a dedicated GSI-enabled account.
Changes:
- Added
GlobalSecondaryIndexDefinition(validation + (de)serialization) and exported it fromazure.cosmos. - Added
global_secondary_index_definitionkwarg support tocreate_container,create_container_if_not_exists, andreplace_containerfor both sync and async database clients (writing bothglobalSecondaryIndexDefinitionandmaterializedViewDefinition). - Added unit + live tests plus pipeline/matrix/marker updates to run GSI live coverage.
Show a summary per file
| File | Description |
|---|---|
| sdk/cosmos/tests.yml | Injects GSI account secrets as env vars for pipeline live runs. |
| sdk/cosmos/live-platform-matrix.json | Adds a dedicated matrix entry to run cosmosGSI marked live tests. |
| sdk/cosmos/azure-cosmos/tests/test_global_secondary_index.py | Unit tests for constructor validation and dict round-tripping. |
| sdk/cosmos/azure-cosmos/tests/test_global_secondary_index_live.py | Live tests that exercise creating GSI containers (class + raw dict). |
| sdk/cosmos/azure-cosmos/pytest.ini | Registers the new cosmosGSI pytest marker. |
| sdk/cosmos/azure-cosmos/CHANGELOG.md | Documents the new GSI feature in the Unreleased section. |
| sdk/cosmos/azure-cosmos/azure/cosmos/database.py | Adds sync API kwarg plumbing + dual-write of GSI definition in create/replace. |
| sdk/cosmos/azure-cosmos/azure/cosmos/aio/_database.py | Async mirror of sync GSI kwarg plumbing + dual-write behavior. |
| sdk/cosmos/azure-cosmos/azure/cosmos/_global_secondary_index.py | Introduces GlobalSecondaryIndexDefinition model implementation. |
| sdk/cosmos/azure-cosmos/azure/cosmos/init.py | Exports GlobalSecondaryIndexDefinition from the public package namespace. |
Copilot's findings
Comments suppressed due to low confidence (1)
sdk/cosmos/azure-cosmos/CHANGELOG.md:11
- The new changelog entry under "Features Added" is missing the usual reference/link to the PR (most other bullets in this section end with "See PR ####" or similar). Keeping the same format makes it easier to trace changes back to discussions and reviews.
#### Bugs Fixed
#### Other Changes
- Files reviewed: 10/10 changed files
- Comments generated: 1
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
@sdkReviewAgent-2 |
|
/azp run python - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
| analytical_storage_ttl: Optional[int] = None, | ||
| computed_properties: Optional[list[dict[str, str]]] = None, | ||
| full_text_policy: Optional[dict[str, Any]] = None, | ||
| global_secondary_index_definition: Optional[Any] = None, |
There was a problem hiding this comment.
🔴 Blocking — Sync/Async Parity: Missing parameter in second replace_container overload
The replace_container overload with return_properties: Literal[True] (line 976–993) is missing global_secondary_index_definition in its signature:
# Line 989-990 — gap between full_text_policy and return_properties
full_text_policy: Optional[dict[str, Any]] = None,
return_properties: Literal[True], # <-- global_secondary_index_definition should be before thisThe first sync overload (line 912, return_properties: Literal[False]) has it at line 926. The async second overload (aio/_database.py line 911–928) also has it at line 924. And this overload's own docstring (line 1018) documents the parameter.
At runtime this works because **kwargs catches it, but the type signature is wrong — mypy/pyright users calling db.replace_container(container, pk, global_secondary_index_definition=gsi_def, return_properties=True) will get a type error from this overload.
Fix: Add global_secondary_index_definition: Optional[Any] = None, between full_text_policy and return_properties on line 989–990, matching the first overload and the async mirror.
| * Added `get_response_headers()` and `get_last_response_headers()` methods to the `CosmosItemPaged` and `CosmosAsyncItemPaged` objects returned by `query_items()`, allowing access to response headers from query operations. See [PR 44593](https://github.com/Azure/azure-sdk-for-python/pull/44593) | ||
| * Added InferenceRequestTimeout property for HttpTimeout Policy to Reranking API. See [45469](https://github.com/Azure/azure-sdk-for-python/pull/45469) | ||
| * Added `full_text_score_scope` parameter to `query_items()` for controlling BM25 statistics scope in hybrid search queries. Supports "Local" and "Global" (default) scopes. See [45686](https://github.com/Azure/azure-sdk-for-python/pull/45686) | ||
| * Added `GlobalSecondaryIndexDefinition` class and `global_secondary_index_definition` keyword to `create_container`, `create_container_if_not_exists`, and `replace_container` methods for creating Global Secondary Index (GSI) containers. |
There was a problem hiding this comment.
🔴 Blocking — Correctness: CHANGELOG entry in wrong version section
This line is under the 4.16.0b1 (2026-03-21) heading (line 54), which is an already-released version. The entry should be under the 4.16.2 (Unreleased) section at the top of the file (line 3).
Shipping a changelog entry in an already-released version means the feature won't appear in the next release's notes, and it retroactively modifies a published version's history.
Fix: Move this line under ### 4.16.2 (Unreleased) > #### Features Added (around line 5).
| gsi_dict = (global_secondary_index_definition._to_dict() | ||
| if hasattr(global_secondary_index_definition, '_to_dict') | ||
| else global_secondary_index_definition) | ||
| definition["globalSecondaryIndexDefinition"] = gsi_dict |
There was a problem hiding this comment.
🟡 Recommendation — Testing: Dual-write pattern has no test coverage
The dual-write to both globalSecondaryIndexDefinition and materializedViewDefinition is the key backward-compatibility contract of this PR (matching Java SDK behavior), but no test verifies that both keys are present in the serialized output.
The live tests only assert properties["globalSecondaryIndexDefinition"] is present — they never check materializedViewDefinition. If someone accidentally removes line 462, backward compatibility with older service versions silently breaks with no test failure.
Suggestion: Add a unit test that mocks client_connection.CreateContainer and asserts the collection dict passed to it contains both keys with identical values. Alternatively, extract the dict-building logic and test it directly. This is the single most important behavioral contract that lacks test proof.
| self.test_db.delete_container(source_container.id) | ||
|
|
||
|
|
||
| if __name__ == "__main__": |
There was a problem hiding this comment.
🟡 Recommendation — Testing: No test coverage for replace_container with GSI
The PR adds global_secondary_index_definition support to replace_container (both sync and async), but no test — unit or live — exercises this code path. The replace_container implementation builds the parameters dict differently from create_container (dict comprehension vs sequential if blocks), so a copy-paste error or refactoring mistake would go undetected.
Additionally, consider adding at least one async live test. The async mirror in aio/_database.py has its own kwargs.pop(), dict-building, and dual-write logic but has zero test coverage. Existing features like computed_properties and full_text_policy have paired sync/async test files.
| return result | ||
|
|
||
| @classmethod | ||
| def _from_dict(cls, data: Optional[dict]) -> Optional["GlobalSecondaryIndexDefinition"]: |
There was a problem hiding this comment.
🟡 Recommendation — Cross-SDK: _from_dict is unused in production; read-path lacks legacy key fallback
_from_dict is tested (6 unit tests) but never called from any production code path. When users call container.read(), the response returns raw dicts — the GSI definition is never hydrated into a GlobalSecondaryIndexDefinition instance. This is consistent with how other features work in this SDK, but creates a gap vs. the Java SDK:
The Java SDK's DocumentCollection.getGlobalSecondaryIndexDefinition() implements a read-path key fallback — it checks globalSecondaryIndexDefinition first, then falls back to materializedViewDefinition. If the Python SDK talks to a service version that only returns the legacy key, users accessing properties["globalSecondaryIndexDefinition"] will get KeyError.
The live test at line 66 (self.assertIn("globalSecondaryIndexDefinition", properties)) only passes because the test account returns the new key.
Suggestions:
- If
_from_dictis intended as a public convenience, consider making it a public method (drop the underscore) and documenting its usage. - Consider adding a note in the class docstring that users may need to check both
globalSecondaryIndexDefinitionandmaterializedViewDefinitionkeys when reading container properties during the service transition period.
|
✅ Review complete (38:22) Posted 5 inline comment(s). Steps: ✓ context, correctness, cross-sdk, design, history, past-prs, synthesis, test-coverage |
| gsi_dict = (global_secondary_index_definition._to_dict() | ||
| if hasattr(global_secondary_index_definition, '_to_dict') | ||
| else global_secondary_index_definition) | ||
| definition["globalSecondaryIndexDefinition"] = gsi_dict |
There was a problem hiding this comment.
🟡 Recommendation — Cross-SDK: No Read-Path Fallback for Legacy Key
The dual-write here correctly sends the GSI definition under both globalSecondaryIndexDefinition (new) and materializedViewDefinition (legacy). But on the read path, there's no equivalent normalization.
If an older service version returns only materializedViewDefinition in the container properties, callers who look for properties["globalSecondaryIndexDefinition"] — as the live tests do — will get a KeyError.
The Java SDK handles this explicitly in DocumentCollection.getGlobalSecondaryIndexDefinition() (lines 475–489): it checks the new key first, then falls back to the legacy key.
Since the Python SDK returns raw dicts from container.read() (consistent with how computed_properties, vector_embedding_policy, etc. work), a full read-path normalization may be out of scope. But at minimum, consider:
- Documenting that users should check both keys on older service versions, or
- Adding a helper in
ContainerProxy.read()that normalizesmaterializedViewDefinition→globalSecondaryIndexDefinitionin the response.
Also note: the _from_dict method exists and is unit-tested, but is never called from any production code path. If it's intended as a user-facing convenience, it should be documented. If it's for future SDK-internal use, the PR description's claim about read-path behavior ("On read, the new key is preferred with fallback to the legacy key") should be corrected, since that behavior is not implemented.
| vector_embedding_policy: Optional[dict[str, Any]] = None, | ||
| change_feed_policy: Optional[dict[str, Any]] = None, | ||
| full_text_policy: Optional[dict[str, Any]] = None, | ||
| global_secondary_index_definition: Optional[Any] = None, |
There was a problem hiding this comment.
🟡 Recommendation — Optional[Any] Type Hint Weaker Than Siblings
Every sibling container-definition parameter uses a specific type:
computed_properties: Optional[list[dict[str, str]]]vector_embedding_policy: Optional[dict[str, Any]]full_text_policy: Optional[dict[str, Any]]
And the codebase already has a Union example for dual-type params:
offer_throughput: Optional[Union[int, ThroughputProperties]]
But GSI uses Optional[Any], which gives type checkers and IDEs nothing to work with. The docstrings already state the correct type (~azure.cosmos.GlobalSecondaryIndexDefinition or dict[str, Any]), so the signature should match:
global_secondary_index_definition: Optional[Union["GlobalSecondaryIndexDefinition", dict[str, Any]]] = None,This applies to all 9 overload/implementation signatures across database.py and aio/_database.py.
| * Added `get_response_headers()` and `get_last_response_headers()` methods to the `CosmosItemPaged` and `CosmosAsyncItemPaged` objects returned by `query_items()`, allowing access to response headers from query operations. See [PR 44593](https://github.com/Azure/azure-sdk-for-python/pull/44593) | ||
| * Added InferenceRequestTimeout property for HttpTimeout Policy to Reranking API. See [45469](https://github.com/Azure/azure-sdk-for-python/pull/45469) | ||
| * Added `full_text_score_scope` parameter to `query_items()` for controlling BM25 statistics scope in hybrid search queries. Supports "Local" and "Global" (default) scopes. See [45686](https://github.com/Azure/azure-sdk-for-python/pull/45686) | ||
| * Added `GlobalSecondaryIndexDefinition` class and `global_secondary_index_definition` keyword to `create_container`, `create_container_if_not_exists`, and `replace_container` methods for creating Global Secondary Index (GSI) containers. |
There was a problem hiding this comment.
🔴 Blocking — CHANGELOG Entry Under Wrong Version
This feature entry is placed under ### 4.16.0b1 (2026-03-21) — a version that was already released ~3 months ago. It should be under ### 4.16.2 (Unreleased) > #### Features Added (currently empty at lines 3–6).
Leaving it here would make the release history claim that GSI support shipped in 4.16.0b1, which is incorrect. Automated changelog consumers and customers checking version compatibility will be misled.
Fix: Move this line to under ### 4.16.2 (Unreleased) > #### Features Added.
| gsi_dict = (global_secondary_index_definition._to_dict() | ||
| if hasattr(global_secondary_index_definition, '_to_dict') | ||
| else global_secondary_index_definition) | ||
| parameters["globalSecondaryIndexDefinition"] = gsi_dict |
There was a problem hiding this comment.
🟢 Suggestion — No Test Coverage for Dual-Write Behavior
The dual-write to both globalSecondaryIndexDefinition and materializedViewDefinition is the central backward-compatibility design decision of this PR, but no test verifies it:
- The live tests only assert
"globalSecondaryIndexDefinition" in propertieson read-back — they never check thatmaterializedViewDefinitionwas also sent. - The unit tests only cover
GlobalSecondaryIndexDefinitionin isolation — none exercise thedatabase.pywiring.
If someone refactors this code and accidentally drops the materializedViewDefinition write, no test catches it — yet older service versions would silently fail to recognize the GSI definition.
Consider adding a unit test that mocks client_connection.CreateContainer, calls database.create_container(global_secondary_index_definition=...), and asserts the definition dict contains both keys with identical values.
| :keyword dict[str, Any] full_text_policy: **provisional** The full text policy for the container. | ||
| Used to denote the default language to be used for all full text indexes, or to individually | ||
| assign a language to each full text index path. | ||
| :keyword global_secondary_index_definition: The global secondary index definition for the container. |
There was a problem hiding this comment.
🔴 Blocking — Missing Overload Parameter
The sync replace_container overload with return_properties: Literal[True] (lines 976–993) is missing global_secondary_index_definition from its signature, while:
- Sync overload 1 (
return_properties=False, line 926) includes it ✅ - Async overload 1 (line 862) includes it ✅
- Async overload 2 (line 924) includes it ✅
- Sync overload 2 (line 990) — missing ❌
The docstring here correctly documents the parameter, but the @overload signature above it doesn't declare it. This means type checkers will reject:
db.replace_container(container, pk, global_secondary_index_definition=gsi, return_properties=True)even though the implementation (kwargs.pop) handles it correctly at runtime.
Fix: Add global_secondary_index_definition: Optional[Any] = None, between full_text_policy and return_properties in the overload at line 989–990.
|
|
||
| import test_config | ||
| from azure.cosmos import CosmosClient, PartitionKey | ||
| from azure.cosmos._global_secondary_index import GlobalSecondaryIndexDefinition |
There was a problem hiding this comment.
🟢 Suggestion — Live Tests Import from Private Module
Both test_global_secondary_index.py and this file import from the private path azure.cosmos._global_secondary_index instead of the public API azure.cosmos:
# Current (private import)
from azure.cosmos._global_secondary_index import GlobalSecondaryIndexDefinition
# Preferred (public import)
from azure.cosmos import GlobalSecondaryIndexDefinitionThe unit test at line 196 (test_import_from_azure_cosmos) verifies the public export works — but the live test itself doesn't use it. If the public re-export were accidentally dropped from __init__.py, the unit test would catch it, but these live tests would still pass, masking the breakage for real users.
|
✅ Review complete (38:58) Posted 6 inline comment(s). Steps: ✓ context, correctness, cross-sdk, design, history, past-prs, synthesis, test-coverage |
| gsi_dict = (global_secondary_index_definition._to_dict() | ||
| if hasattr(global_secondary_index_definition, '_to_dict') | ||
| else global_secondary_index_definition) | ||
| definition["globalSecondaryIndexDefinition"] = gsi_dict |
There was a problem hiding this comment.
🟡 Recommendation — Cross-SDK: Missing Source Container RID Resolution
The Java SDK's CosmosAsyncDatabase.createContainerInternal() performs a pre-create step: it reads the source container to resolve its _rid, injects it as sourceCollectionRid into the GSI definition, and then sends the create request. This Python implementation does not do that — it sends the GSI definition as-is, with sourceCollectionRid absent.
// Java SDK — CosmosAsyncDatabase.java
ridResolution = this.getContainer(sourceContainerId).read()
.flatMap(resp -> {
String rid = resp.getProperties().getResourceId();
setSourceCollectionRid(gsiDefinition, rid);
snapshot.setGlobalSecondaryIndexDefinition(gsiDefinition);
return Mono.empty();
});Why this matters:
- If the service requires
sourceCollectionRidin the request body, GSI creation will fail. - If a user specifies a non-existent
source_container_id, Java gives a clear 404 during RID resolution; Python delegates to the service, which may return a less clear error.
Suggested action: Confirm via live tests whether the service accepts requests without sourceCollectionRid. If it does, this is acceptable (the service handles resolution). If not, add a pre-create read of the source container to resolve and inject the RID, similar to Java. At minimum, document this behavioral difference.
| analytical_storage_ttl: Optional[int] = None, | ||
| computed_properties: Optional[list[dict[str, str]]] = None, | ||
| full_text_policy: Optional[dict[str, Any]] = None, | ||
| global_secondary_index_definition: Optional[Any] = None, |
There was a problem hiding this comment.
🔴 Blocking — Missing Overload Parameter
The sync replace_container overload with return_properties: Literal[True] (starting at line 976) is missing global_secondary_index_definition in its signature. Compare with the Literal[False] overload here at line 926 which correctly includes it, and both async overloads which also have it.
The parameter jumps from full_text_policy directly to return_properties: Literal[True] at line 990:
full_text_policy: Optional[dict[str, Any]] = None,
return_properties: Literal[True], # ← missing global_secondary_index_definition before thisWhy this matters: Type checkers (mypy/pyright) will reject db.replace_container(..., global_secondary_index_definition=gsi, return_properties=True) as invalid, even though it works at runtime. IDE autocomplete also won't suggest the parameter for this overload.
Fix: Add global_secondary_index_definition: Optional[Any] = None, between full_text_policy and return_properties in the second replace_container overload (line ~990), matching the pattern here.
| :keyword dict[str, Any] full_text_policy: **provisional** The full text policy for the container. | ||
| Used to denote the default language to be used for all full text indexes, or to individually | ||
| assign a language to each full text index path. | ||
| :keyword global_secondary_index_definition: The global secondary index definition for the container. |
There was a problem hiding this comment.
🟡 Recommendation — Async Overload Docstrings Missing GSI Keyword
The global_secondary_index_definition parameter was added to all 6 async overload signatures (2 for create_container, 2 for create_container_if_not_exists, 2 for replace_container), but the :keyword global_secondary_index_definition: docstring entry was only added to the 3 implementation method docstrings — not to any of the 6 overload docstrings.
Compare with the sync database.py, where ALL 9 docstrings (6 overloads + 3 implementations) correctly include the keyword documentation. Also compare with existing keywords like full_text_policy and change_feed_policy, which ARE documented in the async overload docstrings.
Why this matters: Users relying on IDE hover docs will see the parameter in the signature but no documentation for it, inconsistently across sync vs. async.
Fix: Add the :keyword and :paramtype entries to all 6 async overload docstrings, matching the sync versions:
:keyword global_secondary_index_definition: The global secondary index definition for the container.
Used to create a GSI container derived from a source container via a SQL projection query.
:paramtype global_secondary_index_definition: ~azure.cosmos.GlobalSecondaryIndexDefinition or dict[str, Any]| from typing import Optional | ||
|
|
||
|
|
||
| class GlobalSecondaryIndexDefinition: |
There was a problem hiding this comment.
🟡 Recommendation — Missing **provisional** Marker
The Java SDK marks all GSI public API with @Beta(warningText = PREVIEW_SUBJECT_TO_CHANGE_WARNING), signaling that the feature is subject to change. This Python class has no equivalent **provisional** marker.
Other preview features in this SDK use **provisional** in their docstrings — for example, full_text_policy is documented as :keyword dict[str, Any] full_text_policy: **provisional** The full text policy... in all method docstrings. The vector_embedding_policy's embeddingSource also uses **provisional**.
Why this matters: Users may build production code against this API without realizing it could change. The CHANGELOG entry also doesn't note the preview status.
Fix:
- Add
**provisional**to the class docstring:"""**provisional** Definition for a Global Secondary Index (GSI) container. - Add
**provisional**prefix to all:keyword global_secondary_index_definition:docstring entries (e.g.,:keyword global_secondary_index_definition: **provisional** The global secondary index...) - Update the CHANGELOG entry to note it's provisional
| "sourceCollectionId": self._source_container_id, | ||
| "definition": self._definition, | ||
| } | ||
| if self._source_container_rid is not None: |
There was a problem hiding this comment.
🟢 Suggestion — _to_dict Re-serializes Server-Populated Read-Only Fields
When _to_dict() is called on a GlobalSecondaryIndexDefinition that was deserialized from a server response (via _from_dict), the server-populated sourceCollectionRid and status fields are included in the output. If this dict is then passed to replace_container, stale server-populated values would be sent back to the service.
if self._source_container_rid is not None:
result["sourceCollectionRid"] = self._source_container_rid
if self._status is not None:
result["status"] = self._statusWhy this matters: While the service likely ignores read-only fields on writes, this creates a subtle correctness risk if the service behavior changes. The Java SDK handles this more carefully by re-emitting the definition through populatePropertyBag() which controls which fields are serialized.
Suggested alternative: Consider either (a) excluding server-populated fields from _to_dict (since they're read-only), or (b) adding a note in the docstring that _to_dict is intended for read/inspection and that create_container/replace_container should use freshly-constructed instances.
| self.test_db.delete_container(source_container.id) | ||
|
|
||
|
|
||
| if __name__ == "__main__": |
There was a problem hiding this comment.
🟡 Recommendation — No Test Coverage for replace_container with GSI
The PR adds global_secondary_index_definition support to replace_container (sync + async), with the same dual-write pattern as create_container. However, there are no live or unit tests exercising this path.
The Java SDK has a dedicated replaceGsiContainer test that verifies the GSI definition is preserved after a replace operation (when modifying other properties like indexing policy). Comparable Python features (full_text_policy, computed_properties) also have replace-specific test coverage.
Why this matters: The replace_container implementation constructs the parameters dict differently from create_container (using a dict comprehension, then appending GSI after). A bug in this integration would go undetected. The replace_container path also has a confirmed missing overload parameter (see separate comment on database.py), suggesting this area received less review attention.
Fix: Add at least one live test that calls replace_container on a GSI container and verifies the definition is preserved in the response.
| gsi_dict = (global_secondary_index_definition._to_dict() | ||
| if hasattr(global_secondary_index_definition, '_to_dict') | ||
| else global_secondary_index_definition) | ||
| parameters["globalSecondaryIndexDefinition"] = gsi_dict |
There was a problem hiding this comment.
🟡 Recommendation — Dual-Write Aliasing Repeated in 4 Places Without Centralization
The backward-compatibility dual-write logic — writing to both globalSecondaryIndexDefinition and materializedViewDefinition — is duplicated in 4 call sites: sync create_container, sync replace_container, async create_container, async replace_container. This is unique to GSI; no other container property uses this pattern.
parameters["globalSecondaryIndexDefinition"] = gsi_dict
parameters["materializedViewDefinition"] = gsi_dictWhy this matters: If the aliasing story changes (e.g., the legacy key is deprecated), all 4 call sites must be updated in lockstep. A future change could easily update create_container but miss replace_container (or the async variant), introducing a silent regression. The hasattr-based serialization is also repeated in all 4 sites.
Suggested alternative: Extract a single helper function:
def _normalize_gsi_definition(value):
"""Normalize GSI definition and return (gsi_dict, key_pairs) for dual-write."""
gsi_dict = value._to_dict() if hasattr(value, '_to_dict') else value
return gsi_dictThen call it once per site with a standardized dual-write block. This keeps the aliasing logic in one place.
|
✅ Review complete (47:54) Posted 7 inline comment(s). Steps: ✓ context, correctness, cross-sdk, design, history, past-prs, synthesis, test-coverage |
| analytical_storage_ttl: Optional[int] = None, | ||
| computed_properties: Optional[list[dict[str, str]]] = None, | ||
| full_text_policy: Optional[dict[str, Any]] = None, | ||
| global_secondary_index_definition: Optional[Any] = None, |
There was a problem hiding this comment.
🔴 Blocking — Sync/Async Parity: Missing parameter in replace_container overload
The return_properties: Literal[True] overload of replace_container (line 976–993) is missing global_secondary_index_definition: Optional[Any] = None in its signature. The first overload (line 926) and both async overloads (aio/_database.py lines 862 and 924) include it correctly.
Why this matters: Type checkers (mypy, pyright) and IDE autocompletion will reject or hide the parameter when callers pass return_properties=True:
# Type error — parameter not in this overload's signature
db.replace_container(
container, pk,
global_secondary_index_definition=gsi,
return_properties=True # ← triggers the Literal[True] overload
)At runtime it still works (falls into **kwargs), but the static contract is broken, and sync/async parity is violated.
Fix: Add global_secondary_index_definition: Optional[Any] = None, between full_text_policy and return_properties in the second overload at line 989:
full_text_policy: Optional[dict[str, Any]] = None,
global_secondary_index_definition: Optional[Any] = None, # ← add this
return_properties: Literal[True],| * Added `get_response_headers()` and `get_last_response_headers()` methods to the `CosmosItemPaged` and `CosmosAsyncItemPaged` objects returned by `query_items()`, allowing access to response headers from query operations. See [PR 44593](https://github.com/Azure/azure-sdk-for-python/pull/44593) | ||
| * Added InferenceRequestTimeout property for HttpTimeout Policy to Reranking API. See [45469](https://github.com/Azure/azure-sdk-for-python/pull/45469) | ||
| * Added `full_text_score_scope` parameter to `query_items()` for controlling BM25 statistics scope in hybrid search queries. Supports "Local" and "Global" (default) scopes. See [45686](https://github.com/Azure/azure-sdk-for-python/pull/45686) | ||
| * Added `GlobalSecondaryIndexDefinition` class and `global_secondary_index_definition` keyword to `create_container`, `create_container_if_not_exists`, and `replace_container` methods for creating Global Secondary Index (GSI) containers. |
There was a problem hiding this comment.
🟡 Recommendation — Release: CHANGELOG entry under already-released version
This feature entry is placed under 4.16.0b1 (2026-03-21), which has already been released. It should be under the 4.16.2 (Unreleased) section at the top of the file (line 5, "Features Added").
If a release is cut from the unreleased section, the GSI feature won't appear in the correct release notes.
| "sourceCollectionId": self._source_container_id, | ||
| "definition": self._definition, | ||
| } | ||
| if self._source_container_rid is not None: |
There was a problem hiding this comment.
🟡 Recommendation — Design: _to_dict() serializes server-owned read-only fields back on the write path
_to_dict() includes sourceCollectionRid and status in the output dict whenever they are set. These are documented as read-only, server-populated fields, but they get echoed back into the CreateContainer / ReplaceContainer request payload if a user round-trips a hydrated model:
# Hypothetical round-trip scenario:
gsi = GlobalSecondaryIndexDefinition._from_dict(container_props["globalSecondaryIndexDefinition"])
# gsi now has status="Active", source_container_rid="abc123="
db.replace_container(c, pk, global_secondary_index_definition=gsi)
# → payload includes "status": "Active" and "sourceCollectionRid": "abc123=" as client-authored fieldsThe Java SDK avoids this by using separate internal/external serialization paths. Consider either:
- Omitting server-populated fields from
_to_dict()by default (only serializesourceCollectionIdanddefinition) - Or adding a flag:
_to_dict(include_server_fields=False)
This also future-proofs against unknown status values: reading them is safe, but echoing them back on writes couples the client to undocumented server tolerance for response-only fields.
| gsi_dict = (global_secondary_index_definition._to_dict() | ||
| if hasattr(global_secondary_index_definition, '_to_dict') | ||
| else global_secondary_index_definition) | ||
| definition["globalSecondaryIndexDefinition"] = gsi_dict |
There was a problem hiding this comment.
🟡 Recommendation — Correctness: Dual-write contract has no test coverage
The backward-compatibility dual-write — emitting the GSI payload under both "globalSecondaryIndexDefinition" and "materializedViewDefinition" — is the core compatibility mechanism in this PR, yet no test verifies that both keys are present in the outgoing request. The live tests only assert "globalSecondaryIndexDefinition" in the read-back response.
A regression that drops or misspells one key (e.g., a future refactor) would silently break compatibility with older service versions and would not be caught by the current test suite.
Suggested action: Add a mock/offline unit test (similar to test_vector_policy.py:705-722) that patches CreateContainer / ReplaceContainer and asserts:
- Both
"globalSecondaryIndexDefinition"and"materializedViewDefinition"are present in the payload - Both keys contain identical values
- This works for both
GlobalSecondaryIndexDefinitioninput and rawdictinput
|
|
||
| import test_config | ||
| from azure.cosmos import CosmosClient, PartitionKey | ||
| from azure.cosmos._global_secondary_index import GlobalSecondaryIndexDefinition |
There was a problem hiding this comment.
🟢 Suggestion — Code Quality: Live test uses private import path
This imports from the private module azure.cosmos._global_secondary_index rather than the public export azure.cosmos. The unit test (test_global_secondary_index.py:193-194) correctly tests the public import path, but the live test doesn't use it.
If the public export in __init__.py were accidentally removed in a future change, only the unit test would catch it — the live tests would continue passing against the private module.
Fix:
from azure.cosmos import GlobalSecondaryIndexDefinition| partition_key=PartitionKey(path="/id"), | ||
| global_secondary_index_definition=gsi_definition | ||
| ) | ||
| self.assertEqual(gsi_container_again.id, container_id) |
There was a problem hiding this comment.
🟡 Recommendation — Test Coverage: No replace_container or async live tests
The PR modifies replace_container in both sync and async paths (adding GSI kwarg extraction, serialization, and dual-write), but the live test suite only covers create_container and create_container_if_not_exists. By comparison, sibling features have dedicated replace_container tests:
test_computed_properties.pyhastest_replace_with_same_computed_properties,test_replace_without_computed_properties, etc.test_full_text_policy.pyhas replace coveragetest_vector_policy.pyhas replace coverage
Additionally, there are no async live tests at all — the async mirror (aio/_database.py) receives identical GSI changes but has zero test coverage. Sibling features have async test files (e.g., test_computed_properties_async.py).
Suggested additions:
- A sync live
replace_containertest that verifies the GSI definition survives (or is properly rejected as immutable) - At minimum one async live test for
create_containerwith GSI
| return result | ||
|
|
||
| @classmethod | ||
| def _from_dict(cls, data: Optional[dict]) -> Optional["GlobalSecondaryIndexDefinition"]: |
There was a problem hiding this comment.
🟡 Recommendation — Design: _from_dict has no production callers — PR description claims unimplemented behavior
The PR description states: "On read, the new key is preferred with fallback to the legacy key." The Java SDK implements this in DocumentCollection.getGlobalSecondaryIndexDefinition() (checks globalSecondaryIndexDefinition first, falls back to materializedViewDefinition).
However, _from_dict is never called from any SDK production code — only from unit tests. ContainerProxy.read() returns raw CosmosDict, and no code normalizes the top-level key. This means:
- If a server returns only the legacy
materializedViewDefinitionkey, users checkingproperties["globalSecondaryIndexDefinition"]will get aKeyError - The read-only properties (
source_container_rid,status) advertised on the class are effectively unreachable through normal SDK usage - The
_from_dictmethod could silently drift from the actual wire format with no production usage to catch regressions
Suggested action: Either:
- Implement the read-path fallback described in the PR description (normalize key names when container properties are returned), or
- Remove
_from_dictand the read-only properties for now, and update the PR description to accurately reflect the current behavior (write-only integration). Add them back when read-path integration is ready.
This is consistent with how sibling features work (all return raw dicts), so the second option may be the lower-risk path.
|
✅ Review complete (05:21) Posted 7 inline comment(s). Steps: ✓ context, correctness, cross-sdk, design, history, past-prs, synthesis, test-coverage |
Summary
Implements Global Secondary Index (GSI) container support in the Python Cosmos SDK, mirroring the Java SDK implementation (Azure/azure-sdk-for-java#48480).
A GSI container is a derived container built from a source container using a SQL projection query. The service already supports this; this PR adds the client-side support.
Changes
New files
azure/cosmos/_global_secondary_index.py—GlobalSecondaryIndexDefinitionclass with constructor validation, serialization (_to_dict/_from_dict), and read-only server-populated properties (source_container_rid,status)tests/test_global_secondary_index.py— 20 unit tests covering constructor validation, serialization, deserialization, forward-compatibilitytests/test_global_secondary_index_live.py— Live integration tests using dedicated GSI test accountModified files
azure/cosmos/__init__.py— ExportGlobalSecondaryIndexDefinitionazure/cosmos/database.py— Addedglobal_secondary_index_definitionkwarg tocreate_container,create_container_if_not_exists, andreplace_container(all overloads) with dual-write patternazure/cosmos/aio/_database.py— Async mirror of all sync changesCHANGELOG.md— Feature entrypytest.ini— AddedcosmosGSImarker for live testslive-platform-matrix.json— AddedGSITestConfigmatrix entrytests.yml— Mappedgsi-pipeline-uri/gsi-pipeline-keyKV secrets to env varsDesign Decisions
Dual-Write Pattern
For backward compatibility with older service versions, the SDK writes the GSI definition under both:
"globalSecondaryIndexDefinition"(new key)"materializedViewDefinition"(legacy key)On read, the new key is preferred with fallback to the legacy key.
Status as String Constants (not enum)
Status values (
"Initializing","InitialBuildAfterCreate","InitialBuildAfterRestore","Active","DeleteInProgress") are kept as plain strings for forward-compatibility with future service-defined values.Pattern Consistency
Follows the same integration pattern as
computed_properties,vector_embedding_policy,full_text_policy, andchange_feed_policy— kwarg extraction viakwargs.pop(), added to definition dict if not None.Testing
gsi-pipeline-uri/gsi-pipeline-keyfrom KV (same secrets used by Java SDK pipeline)