All Posts

Azure Cosmos DB API Modes Explained: NoSQL, MongoDB, Cassandra, PostgreSQL, Gremlin, and Table

What each API can and cannot do — and the compatibility gaps that break production migrations

Abstract AlgorithmsAbstract Algorithms
··25 min read
Share
AI Share on X / Twitter
AI Share on LinkedIn
Copy link

TLDR: Cosmos DB's six API modes are wire-protocol compatibility layers over one shared ARS storage engine — except PostgreSQL (Citus), which is genuinely different. Every API emulates its native database incompletely, and those gaps are structural, not bugs. API mode is permanent and cannot be changed after account creation. NoSQL API is the only one with full Cosmos DB feature access. MongoDB API fails silently on complex aggregation pipelines. Cassandra API lacks materialized views and cross-partition transactions. PostgreSQL API is a managed Citus cluster — a different product entirely. Choose the right API before writing the first byte.

📖 Cosmos DB's Deceptive Promise: Six APIs, One Underlying Engine

A backend team at a mid-sized SaaS company spent three months migrating their MongoDB application to Azure Cosmos DB using the MongoDB API. The pitch was compelling: keep your existing MongoDB drivers, point them at a new connection string, and get Azure-managed global distribution and SLAs without any rewrite. It worked perfectly in staging. The migration went live.

Six months later, the team hit a wall. Their nightly analytics job — a set of aggregation pipelines that used $lookup to join order documents with customer records across two collections — began producing incomplete results. No errors, no exceptions, no stack traces. Just silently truncated output. After a week of debugging, they traced the problem to a hard architectural limitation: Cosmos DB's MongoDB API does not support cross-collection $lookup operations that span partition boundaries. The operation ran, returned partial data from the local partition, and reported success. The native MongoDB server had been joining across shards with no problem. Cosmos DB's compatibility layer could not replicate that behavior because the underlying engine doesn't work the same way.

The team was mid-production with no clean exit. Rewriting the aggregation pipelines to move the join logic to the application tier cost three engineer-weeks. A second migration was considered and rejected — you cannot change an existing Cosmos DB account's API mode. They were locked in.

This is not an edge case. It is the defining characteristic of Cosmos DB's API model that every architecture document glosses over: the APIs look like the databases they emulate, but they run on a completely different engine. The compatibility gaps are structural, not accidental. They arise from what the engine can and cannot support, and they are invisible until the production workload hits them.

This post explains what that engine is, how each of the six API modes sits on top of it, and — critically — what each API cannot do versus its native counterpart. The goal is to give you the information you need to choose an API before you create a Cosmos DB account, because that choice is final.


🔍 Wire-Protocol Translation: How Cosmos DB's API Modes Actually Work

Before diving into individual APIs, it helps to understand the mechanism. Cosmos DB is not six separate databases that happen to share a cloud control plane. It is one storage engine with six protocol adapters bolted on top.

The storage engine is called ARS — Atom-Record-Sequence. It is a schema-agnostic, key-value store where every record is a self-describing JSON atom. ARS manages partitioning, replication, indexing, multi-region synchronization, and consistency levels entirely internally. It does not understand MongoDB queries, CQL statements, or Gremlin traversals. It understands JSON key-value operations over partition keys.

The six API modes are wire-protocol translators. When a MongoDB driver sends a find() command with an aggregation pipeline, Cosmos DB's MongoDB protocol adapter intercepts that wire-protocol message, translates the relevant parts into ARS operations, executes them against the storage engine, and translates the result back into a MongoDB-shaped response. The driver on the client side never knows it is not talking to a real MongoDB server.

This translation layer explains everything:

  • Features that map cleanly onto ARS operations (CRUD, basic filtering, partition-key lookups) work perfectly.
  • Features that require capabilities ARS does not have (cross-partition joins, native graph algorithms, Cassandra materialized views) either fail, return partial results, or are silently dropped.
  • Features that require a different storage model entirely (PostgreSQL foreign keys and JOINs) are simply not available — which is why Cosmos DB for PostgreSQL is a separate product running Citus rather than an ARS protocol adapter.

Understanding this one principle lets you predict the gaps before you hit them in production.


⚙️ The Six API Modes: Capabilities, Scope, and Migration Fit

Each API mode targets a different category of workload and migration scenario. Here is the ground-level map before the deep-dive on individual gaps.

API ModeUnderlying EngineData ModelQuery LanguageBest Migration Source
NoSQL APIARS (native)JSON documentSQL dialectNew projects, no migration
MongoDB APIARS + MongoDB wire layerBSON documentMQL / aggregation pipelineMongoDB ≤ 4.0
Cassandra APIARS + CQL wire layerWide-columnCQLApache Cassandra
PostgreSQL APICitus (separate engine)RelationalFull PostgreSQL SQLPostgreSQL + Citus workloads
Gremlin APIARS + TinkerPop layerGraphGremlinTinkerPop applications
Table APIARS + OData layerKey-valueODataAzure Table Storage

NoSQL API (formerly Core/SQL API) is Cosmos DB's native interface. There is no translation layer — ARS speaks this dialect directly. Every Cosmos DB feature is available: all five consistency levels, change feed, TTL, stored procedures, triggers, user-defined functions, geospatial queries, and full control over partition key strategy. If you are building something new and have no migration requirement, this is the default choice.

MongoDB API gives existing MongoDB applications a near-zero-code migration path by speaking MongoDB's wire protocol. MongoDB drivers 3.6 and 4.0 connect without modification. Basic CRUD, indexing, and simple aggregation work correctly. The gaps appear in complex aggregation operators, change stream semantics, and cross-collection transactions.

Cassandra API exposes the CQL wire protocol that Apache Cassandra uses. Existing Cassandra drivers reconnect to Cosmos DB by changing the contact point and credentials. The wide-column data model (partition keys, clustering keys, rows) maps onto ARS's partitioned key-value layout reasonably well. Gaps emerge around materialized views, lightweight transactions, and user-defined functions.

PostgreSQL API (Cosmos DB for PostgreSQL) is the outlier. It does not run on ARS at all. It is a managed deployment of Citus — an open-source PostgreSQL extension that adds horizontal sharding and distributed query execution to standard PostgreSQL. This means full ACID transactions, JOINs, foreign keys, PL/pgSQL, and pg_extensions all work correctly. The tradeoff is that it behaves like a different product because it is one.

Gremlin API targets graph workloads using Apache TinkerPop's Gremlin traversal language. Vertices and edges are stored as ARS records with graph-specific metadata. Full five-level Cosmos DB consistency support applies to traversals. The gaps are around TinkerPop graph algorithms (PageRank, connected components) and bulk loading large graphs.

Table API provides wire-protocol compatibility with Azure Table Storage SDK. It is the simplest lift-and-shift path for existing Azure Table Storage workloads that need Cosmos DB's global distribution, higher throughput SLAs, and multi-region write capability. The data model is pure key-value: PartitionKey + RowKey + property bag.


🧠 Anatomy of the Compatibility Gaps: Internals and Performance Impact

Understanding where the gaps come from — not just what they are — prevents you from designing workarounds that will also fail.

Internals: How the ARS Engine Creates Structural Gaps

ARS manages data as a hierarchy: database → container → logical partition → physical partition. Every operation in ARS is scoped to a logical partition, identified by the partition key value. Within a single logical partition, ARS can provide ACID guarantees: reads see a consistent snapshot, writes are ordered, and multi-document transactions commit atomically.

Across logical partitions, ARS provides no native transaction boundary. Cross-partition operations are fan-out reads (scatter-gather) followed by a merge. There is no distributed lock manager, no two-phase commit across partition boundaries, and no native JOIN engine that can correlate records from two different partitions at the storage layer.

This single architectural fact explains a cascade of compatibility gaps:

  • MongoDB $lookup cross-partition fails because a join between collection A and collection B requires correlating records that may live in different logical partitions — and ARS has no cross-partition join primitive.
  • MongoDB multi-collection transactions are limited to single-partition scope because ACID at ARS level stops at the partition boundary.
  • Cassandra materialized views are not supported because materialized views require automatic synchronous updates when the base table changes — which requires cross-partition coordination that ARS doesn't expose at the CQL translation layer.
  • Cassandra cross-partition LOGGED batches silently degrade to best-effort because a LOGGED batch (which provides Cassandra's batch-level atomicity guarantee) requires a coordinator to track completion across partitions — which maps onto ARS operations that cannot provide that guarantee.

For the MongoDB change stream gap, the cause is slightly different. ARS has its own change feed mechanism with its own cursor token format and delivery semantics. The MongoDB API exposes this as "change streams," but the resume token format is different from MongoDB's native oplog token. Existing change stream consumers that checkpoint using MongoDB resume tokens must be rewritten to use Cosmos DB's token format.

Performance Analysis: What API Translation Costs at Runtime

The protocol translation layer adds latency, but it is not the primary performance story. The real performance implications come from two sources: indexing defaults and fan-out costs on non-partition-key queries.

By default, Cosmos DB indexes every property of every document in every container. This default is generous for query flexibility but expensive for write-heavy workloads. Native Cassandra uses explicit sparse indexing — only the columns you declare as indexes are indexed. When a team migrates a Cassandra write-heavy workload to Cosmos DB's Cassandra API without tuning the indexing policy, write throughput drops significantly because every write now triggers indexing across all properties.

Fan-out queries — queries that don't include the partition key in the predicate — require Cosmos DB to scatter the query to every physical partition and gather the results. A MongoDB query like db.orders.find({ status: "pending" }) where status is not the partition key becomes a fan-out across all partitions. In native MongoDB (with an index on status), this is an efficient index scan. In Cosmos DB, it is a parallel scan of all partitions regardless of indexing, with a cross-partition merge at the coordinator. For containers with 50+ physical partitions, this can be 10–50x more expensive than the equivalent query on native MongoDB.

The RU (Request Unit) billing model amplifies this: a fan-out query that costs 5 RUs on a single-partition container may cost 500 RUs once the container has grown across many physical partitions.


📊 The Full Architecture: From Client to Storage Across Every API

The diagram below shows how each API mode sits as a protocol translation layer between the client and the underlying ARS storage engine. The PostgreSQL API (Citus) is shown separately because it does not share the ARS engine.

flowchart TD
    subgraph Clients
        A[MongoDB Driver]
        B[Cassandra Driver / cqlsh]
        C[Azure SDK / NoSQL SDK]
        D[Gremlin Client / TinkerPop]
        E[Azure Table Storage SDK]
        F[PostgreSQL Client / psql]
    end

    subgraph CosmosDB_ARS[Cosmos DB — ARS Engine]
        direction TB
        M[MongoDB Protocol Adapter]
        K[Cassandra CQL Adapter]
        N[NoSQL Native Layer]
        G[Gremlin TinkerPop Adapter]
        T[Table OData Adapter]
        ARS[(ARS Storage Engine\nPartitioned JSON atoms\nMulti-region replication\n5 consistency levels)]

        M --> ARS
        K --> ARS
        N --> ARS
        G --> ARS
        T --> ARS
    end

    subgraph CosmosDB_PG[Cosmos DB for PostgreSQL]
        direction TB
        PG[PostgreSQL / Citus\nDistributed query executor\nFull ACID · JOINs · Extensions]
    end

    A --> M
    B --> K
    C --> N
    D --> G
    E --> T
    F --> PG

    style ARS fill:#1e3a5f,color:#fff
    style PG fill:#336791,color:#fff
    style CosmosDB_ARS fill:#0f2540,color:#ccc
    style CosmosDB_PG fill:#1a3a5c,color:#ccc

The diagram makes the structural split visible. Five of the six API modes (MongoDB, Cassandra, NoSQL, Gremlin, Table) are protocol adapters routing into the same ARS storage engine — which is why they share the same partition model, the same consistency levels, and the same cross-partition limitations. The PostgreSQL API (Cosmos DB for PostgreSQL) is a genuinely separate system running Citus with its own storage, its own query executor, and its own distributed transaction model. They share an Azure control plane and billing model but not a storage layer.


🌍 Where Each API Succeeds and Fails in Production Migrations

Real-world migration outcomes fall into three categories: clean migrations where the API covers all required functionality, conditional migrations where the API works but requires query pattern changes, and blocked migrations where critical features are structurally absent.

MongoDB API — Conditional for most, blocked for complex aggregation: Teams migrating MongoDB 3.6/4.0 apps that use CRUD and simple aggregation ($match, $group, $project, $sort, $limit) consistently report clean migrations. The driver reconnects, basic features work, and RU provisioning replaces ops overhead. The migration fails when the app uses $lookup across collections, $graphLookup, $facet, $bucket, text search ($text index), or change streams with precise resume-token semantics. A real-world pattern: e-commerce platforms with product catalog queries (partition-key-scoped) migrate cleanly; analytics platforms with cross-entity joins hit the $lookup wall.

Cassandra API — Clean for time-series, blocked for view-heavy patterns: IoT and telemetry workloads that write sensor data using partition-key-scoped CQL and read via time-range clustering key queries migrate with minimal friction. The wide-column model maps cleanly onto ARS. The migration fails for apps that rely on Cassandra materialized views to maintain pre-sorted denormalized read models — a common pattern for read-optimized Cassandra deployments. Those teams must manually maintain duplicate tables at write time, which increases application complexity and introduces the dual-write consistency problem.

NoSQL API — Best for new projects, awkward for migrations: The NoSQL API provides full Cosmos DB feature access but requires learning its SQL dialect and data modeling patterns. Teams building new services on Azure choose this API to access change feed (for event-driven architectures), stored procedures (for partition-scoped transactions), and full TTL control. It is not a migration target for any specific existing database — it is the native Cosmos DB surface.

PostgreSQL API (Citus) — Clean for relational workloads needing horizontal scale: Multi-tenant SaaS companies that hit single-node PostgreSQL limits are the clearest target for Cosmos DB for PostgreSQL. Citus's tenant isolation sharding model (distributing data by tenant ID) maps directly onto this use case. The migration from self-managed PostgreSQL + Citus to the managed Azure service is largely operational: the SQL semantics, JOINs, and ACID guarantees are identical. The product is not appropriate for document, graph, or key-value use cases.


⚖️ The Compatibility Gap Tables: What Each API Cannot Do

These tables cover the gaps that most frequently block or complicate production migrations. The "Impact" column describes the real consequence teams encounter.

MongoDB API Gaps

FeatureMongoDB (native)Cosmos DB MongoDB APIMigration Impact
$lookup cross-partition joinsFull supportCross-partition not supported — local partition onlyAggregation pipelines with collection joins fail or return partial results
Multi-collection ACID transactionsAcross collections, any scopeSingle logical partition onlyCross-collection atomic writes must move to app-tier logic
Change stream resume tokensOplog-based token formatCosmos change feed token (different format)Existing change stream consumers require token handling rewrite
$graphLookup, $facet, $bucketFull pipeline operatorsLimited or unsupportedComplex analytics pipelines must be rewritten or moved to Azure Synapse
$text index full-text searchNative text indexNot supportedFull-text search must be rearchitected using Azure AI Search
Capped collectionsSupportedNot supportedLog and audit collection patterns must use TTL-based rotation instead
Geospatial $nearSphereFull 2dsphere support$near and $geoWithin work; $nearSphere limitedLocation-based feature queries may require geometry conversion
Server-side JavaScript / MapReduceSupportedNot supportedMapReduce jobs must be rewritten as application-layer aggregations

Cassandra API Gaps

FeatureCassandra (native)Cosmos DB Cassandra APIMigration Impact
Materialized viewsSupported (with known Cassandra bugs)Not supportedDenormalized read models must be maintained manually via dual-write
Cross-partition LOGGED batchAtomic within batchDegrades silently to best-effortCross-partition batch atomicity must be enforced at app layer
User-Defined Functions (UDFs)SupportedNot supportedCustom CQL functions must move to application code
Secondary indexes on non-PK columnsSupported (with ALLOW FILTERING)Limited — ALLOW FILTERING triggers full ARS scanNon-partition-key query patterns require materialized views or model redesign
Lightweight Transactions (LWT) with PaxosFull Paxos-basedSupported but higher latency due to ARS overheadConditional writes work but at reduced throughput
Compaction strategy tuningSTCS / LCS / TWCS configurableManaged internally — no user controlTime-series compaction optimization not available
nodetool operations toolingFull cluster managementNot applicable — Azure Portal/CLI onlyOperational runbooks must be rewritten for Azure-native tooling

Gremlin API Gaps

FeatureTinkerPop (native)Cosmos DB Gremlin APIMigration Impact
OLAP graph algorithms (PageRank, BFS)Via TinkerPop's OLAP API (Spark, Hadoop)Not natively availableLarge-scale graph analytics must run via Azure Synapse or external compute
Bulk vertex/edge loadingBulk loader tools availableNo native bulk API — REST or SDK onlyInitial data ingestion at scale is slow and must be rate-limited
Cypher / SPARQL query languageThird-party plugins availableOnly Gremlin — no Cypher, no SPARQLTeams used to Neo4j's Cypher must learn Gremlin syntax

🧭 API Selection Flowchart: Choosing Before the First Write

The single most important operational constraint in Cosmos DB is that API mode is permanent. Once you create a Cosmos DB account with a given API, you cannot change it. Migrating to a different API requires creating a new Cosmos DB account and moving all data. Make this decision with complete information.

The flowchart below walks through the decision from first principles.

flowchart TD
    START([New project or migration?]) --> MIGRATE{Migrating an\nexisting system?}

    MIGRATE -->|Yes| WHICH_DB{Which database\nare you migrating from?}
    MIGRATE -->|No, new project| DATA_MODEL{What data model\ndoes your system need?}

    WHICH_DB -->|MongoDB ≤ 4.0| MONGO_AUDIT{Does your app use\n$lookup, $graphLookup\nor $text search?}
    WHICH_DB -->|Apache Cassandra| CASS_AUDIT{Does your app rely on\nmaterialized views or\ncross-partition batches?}
    WHICH_DB -->|PostgreSQL| PG_API[PostgreSQL API\nCosmos DB for PostgreSQL\nManaged Citus]
    WHICH_DB -->|Azure Table Storage| TABLE_API[Table API\nLift and shift with\nhigher SLAs]

    MONGO_AUDIT -->|No — basic CRUD + aggregation| MONGO_API[MongoDB API\nMinimal driver changes required]
    MONGO_AUDIT -->|Yes — complex pipelines| MONGO_RISK[MongoDB API with risk\nRewrite affected pipelines first\nor reconsider NoSQL API]

    CASS_AUDIT -->|No — time-series, partition-scoped| CASS_API[Cassandra API\nClean migration path]
    CASS_AUDIT -->|Yes — views or cross-partition batches| CASS_RISK[Cassandra API with redesign\nManually maintain dual-write tables\nor reconsider NoSQL API]

    DATA_MODEL -->|Documents, JSON-based| NEED_FULL{Need full Cosmos DB\nfeatures? Change feed,\nmulti-region writes, stored procs?}
    DATA_MODEL -->|Graph — vertices and edges| GREMLIN_API[Gremlin API\nApache TinkerPop compatible]
    DATA_MODEL -->|Relational with JOINs| PG_API2[PostgreSQL API\nCosmos DB for PostgreSQL]
    DATA_MODEL -->|Pure key-value lookups| NOSQL_KV[NoSQL API\nor Table API for simplest model]

    NEED_FULL -->|Yes| NOSQL_API[NoSQL API\nFull feature access\nAll 5 consistency levels\nChange feed, TTL, stored procs]
    NEED_FULL -->|No — just documents| NOSQL_API2[NoSQL API\nStill recommended for\nnew document workloads]

    style NOSQL_API fill:#1a6b3c,color:#fff
    style NOSQL_API2 fill:#1a6b3c,color:#fff
    style PG_API fill:#336791,color:#fff
    style PG_API2 fill:#336791,color:#fff
    style MONGO_API fill:#1a6b3c,color:#fff
    style CASS_API fill:#1a6b3c,color:#fff
    style GREMLIN_API fill:#1a6b3c,color:#fff
    style TABLE_API fill:#1a6b3c,color:#fff
    style MONGO_RISK fill:#a05a00,color:#fff
    style CASS_RISK fill:#a05a00,color:#fff

The flowchart leads with the most important question first: are you migrating, or building new? For migrations, the key gate is auditing whether your application uses the specific features that have known compatibility gaps. For new projects, the NoSQL API is almost always the right starting point because it gives access to every Cosmos DB capability without the overhead of a translation layer.


🧪 Pre-Migration Audit: Testing API Compatibility Before You Commit

The most reliable way to avoid the production compatibility failures described in the opening is to audit your application's database operations against the known gap list before creating a Cosmos DB account.

Case Study: MongoDB API Migration Readiness Audit

Before migrating a MongoDB application to Cosmos DB, run the following checks against your existing MongoDB operation logs or application code:

Check 1 — Scan aggregation pipeline stages for unsupported operators. Look for any use of $lookup, $graphLookup, $facet, $bucket, $text, or $where. These are the operators most commonly absent or limited in the Cosmos DB MongoDB API. If found, assess whether they can be rewritten as application-tier logic or whether they represent a fundamental incompatibility.

Check 2 — Identify change stream consumers. If your application uses MongoDB change streams for event-driven processing (CDC pipelines, real-time notifications), verify that the resume token handling can be updated to use Cosmos DB's change feed token format. This is a code change, but it is bounded and predictable.

Check 3 — Identify multi-collection transactions. MongoDB withSession blocks that span multiple collections are a hard wall. These operations cannot be replicated across partition boundaries in Cosmos DB. If your application relies on them for business-critical atomicity (e.g., deducting inventory AND creating an order in the same atomic write), the MongoDB API is not a clean migration target. Consider the NoSQL API with stored procedures scoped to a single partition, or redesign using the Outbox pattern.

Check 4 — Profile query patterns for cross-partition fan-out. Identify queries that do not include the partition key in the filter predicate. In Cosmos DB, these become cross-partition fan-out queries. Assess the frequency and latency sensitivity of these queries. For high-frequency, low-latency queries on non-partition-key attributes, the migration may require a data model redesign to align partition keys with query access patterns.

This audit should be completed in the pre-production phase. The output is a concrete list of changes required before the migration is production-safe — or a decision to reconsider the API choice.


🛠️ Azure CLI: Provisioning Cosmos DB Accounts Across All Six API Modes

API mode is set at account creation time and is permanent. Here are the Azure CLI commands for each API mode, which serve as the canonical reference for CI/CD provisioning pipelines and infrastructure-as-code templates.

# NoSQL API (default — Cosmos DB native, full feature access)
az cosmosdb create \
  --name myaccount \
  --resource-group myrg \
  --kind GlobalDocumentDB \
  --locations regionName=eastus failoverPriority=0

# MongoDB API (wire-protocol compatibility with MongoDB 4.0)
az cosmosdb create \
  --name myaccount \
  --resource-group myrg \
  --kind MongoDB \
  --server-version 4.0 \
  --locations regionName=eastus failoverPriority=0

# Cassandra API (CQL wire-protocol compatibility)
az cosmosdb create \
  --name myaccount \
  --resource-group myrg \
  --kind GlobalDocumentDB \
  --capabilities EnableCassandra \
  --locations regionName=eastus failoverPriority=0

# Gremlin API (Apache TinkerPop Gremlin)
az cosmosdb create \
  --name myaccount \
  --resource-group myrg \
  --kind GlobalDocumentDB \
  --capabilities EnableGremlin \
  --locations regionName=eastus failoverPriority=0

# Table API (Azure Table Storage wire-protocol compatibility)
az cosmosdb create \
  --name myaccount \
  --resource-group myrg \
  --kind GlobalDocumentDB \
  --capabilities EnableTable \
  --locations regionName=eastus failoverPriority=0

# PostgreSQL API — provisioned separately as Cosmos DB for PostgreSQL (Citus)
az cosmosdb postgres cluster create \
  --name mypgcluster \
  --resource-group myrg \
  --coordinator-v-cores 4 \
  --coordinator-server-edition GeneralPurpose \
  --node-count 2 \
  --node-v-cores 4

Critical operational note: There is no az cosmosdb update command that changes the API mode of an existing account. This is not a missing feature — it is by design, because the data stored in the account is encoded against the specific API mode's data model. A MongoDB API account stores BSON-over-ARS records with MongoDB-specific metadata. A Cassandra account stores CQL-style rows. Converting between them would require reading every record, transforming it, and rewriting it — that operation is the data migration itself.

For infrastructure-as-code, the Cosmos DB account resource in Terraform (azurerm_cosmosdb_account) and Bicep (Microsoft.DocumentDB/databaseAccounts) both set API mode via the same kind and capabilities parameters shown above. Make API mode a first-class infrastructure decision in your platform templates.


📚 Lessons Learned: What Production Migrations Get Wrong About Cosmos DB

The "looks like MongoDB" assumption is the most dangerous. Teams that evaluate Cosmos DB's MongoDB API by running their basic test suite almost always get a green signal. Basic CRUD, indexing, and simple aggregation work. The gaps only appear under specific advanced operators that may not be exercised by a unit test or integration test against a staging dataset. Always run your full analytics workload in a pre-production Cosmos DB environment, not just your functional tests.

API mode permanence is understated in Microsoft's migration documentation. Most migration guides lead with the "just change your connection string" narrative and bury the compatibility gap tables in footnotes. In practice, API permanence means you are making a contract with the storage layer that you cannot renegotiate after the first write. Treat it with the same gravity as choosing between SQL and NoSQL for a new system.

The ARS partition boundary is not like MongoDB's shard boundary. MongoDB's $lookup across shards involves a query router that understands how to scatter-gather and merge results at the coordinator. ARS's architecture provides no equivalent join primitive at the storage layer. Teams that understand MongoDB sharding sometimes assume that cross-collection joins will work as well as cross-shard joins — they won't.

Cosmos DB for PostgreSQL (Citus) is a different product, not a different mode. The naming creates genuine confusion. "Cosmos DB for PostgreSQL" and "Cosmos DB for MongoDB" sound like parallel constructs — different APIs on the same engine. They are not. Cosmos DB for PostgreSQL is a managed Citus deployment with its own storage, its own networking tier, and its own pricing model. Choosing between them is not "which API protocol do I prefer?" — it is "do I need a document store or a relational database?"

Default indexing policy can destroy write performance in Cassandra API migrations. Cosmos DB indexes all properties by default. Cassandra applications are typically designed for high-throughput writes with minimal indexing. Without tuning the Cosmos DB indexing policy to match Cassandra's sparse index model, write-heavy workloads will consume far more RUs than expected and hit throughput limits immediately after migration.

Silently incorrect results are worse than errors. The $lookup failure in the opening story did not throw an exception. It returned data — incomplete data — and the aggregation pipeline completed with a success status. Silent partial failures are the most dangerous class of compatibility gap because they don't trigger alerting. Test your migration by verifying output correctness against known datasets, not just by checking that operations complete without errors.


📌 TLDR: The Six-Bullet Cheat Sheet

  • All APIs except PostgreSQL share one engine (ARS): compatibility gaps are structural — features ARS can't support cannot be perfectly emulated by any protocol adapter, no matter how sophisticated.
  • API mode is permanent: choose before account creation; changing it requires creating a new account and migrating all data.
  • NoSQL API is the only full-fidelity option: it is the only API with access to every Cosmos DB feature at GA quality — change feed, all five consistency levels, stored procedures, triggers, TTL, and geospatial.
  • MongoDB API covers 80% of use cases but fails on complex aggregation: audit for $lookup, $graphLookup, $facet, $text, and multi-collection transactions before committing to a MongoDB API migration.
  • Cassandra API's biggest gap is materialized views: teams that rely on Cassandra's read-model denormalization pattern via materialized views must redesign to manual dual-write maintenance.
  • Cosmos DB for PostgreSQL is a different product: it runs Citus, not ARS, and is the correct choice when you need true relational semantics — JOINs, foreign keys, multi-table ACID. Do not confuse it with the other five API modes.

📝 Practice Quiz

  1. A team is migrating a MongoDB application to Cosmos DB using the MongoDB API. Their analytics pipeline includes a $lookup that joins an orders collection with a customers collection, where the two collections use different partition keys. What will happen in production?

    • A) The $lookup will work identically to native MongoDB because the MongoDB API is wire-compatible
    • B) The $lookup will fail with a clear error message indicating cross-partition joins are unsupported
    • C) The $lookup will return partial results scoped to the local partition without throwing an error
    • D) The $lookup will be automatically rewritten by the MongoDB protocol adapter to use a scatter-gather join Correct Answer: C
  2. An engineering team has been running Cosmos DB with the Cassandra API for six months. They want to add a read-optimized view that shows the latest 10 sensor readings per device, maintained automatically on write. They plan to use a Cassandra materialized view. What should the team do?

    • A) Create the materialized view using standard CQL — it is fully supported by the Cassandra API
    • B) Redesign using a separate denormalized table maintained via dual-write in the application layer, since Cosmos DB Cassandra API does not support materialized views
    • C) Use Cosmos DB's change feed to trigger a serverless function that maintains the denormalized table
    • D) Both B and C are valid approaches; materialized views are not supported in the Cassandra API Correct Answer: D
  3. A team wants to build a new multi-tenant SaaS application on Azure. Each tenant has relational data with complex JOINs between 10+ tables, and they need horizontal scaling as tenant count grows. Which Cosmos DB API is the correct choice?

    • A) NoSQL API — it supports SQL-like queries and JSON documents with high scale
    • B) MongoDB API — it has the most compatibility with existing tooling
    • C) PostgreSQL API (Cosmos DB for PostgreSQL / Citus) — it is the only option that provides true relational semantics with horizontal sharding
    • D) Cassandra API — its wide-column model supports multi-tenant isolation Correct Answer: C
  4. (Open-ended) A team is building a real-time fraud detection system that needs to model relationships between users, devices, IP addresses, and transactions as a graph. They are evaluating between the Gremlin API and the NoSQL API with a document-based adjacency list model. What tradeoffs would you consider, and under what circumstances would you choose each approach? Consider query complexity, operational overhead, analytics requirements, and team expertise. Correct Answer: Open-ended — no single correct answer. Strong responses will address: Gremlin API provides native graph traversal semantics and is the natural fit for multi-hop relationship queries (e.g., "find all accounts within 3 hops of this compromised device"), but lacks OLAP graph algorithms and bulk loading support; the NoSQL API with adjacency lists is more operationally familiar, works with any SDK, and allows co-location of entity data with relationship data in the same document, but requires application-tier traversal logic for multi-hop queries which becomes expensive at scale. Team expertise with TinkerPop vs. SQL-like querying is a practical tiebreaker. If graph analytics (PageRank, community detection) are needed at scale, neither API is sufficient and Azure Synapse Analytics integration should be considered.



Abstract Algorithms

Written by

Abstract Algorithms

@abstractalgorithms