Chapter 7: Schema Validation: Discipline Without Rigidity

The Waiter of Gold Lapel · Published Apr 19, 2026 · 8 min

I would like to address a misconception that has followed MongoDB for most of its life, and one that has, I believe, done it a disservice. MongoDB is often called schemaless. It is not. It is schema-optional — a distinction that matters enormously once you begin storing data that other people depend on.

PostgreSQL is also schema-optional. JSONB accepts any valid JSON by default, with no declared schema required. The freedom is identical. The starting position is the same. Where the two databases differ is in what they offer when you decide — and you will decide, because every team eventually does — that some of your documents need more structure than none.

Both databases provide a spectrum from fully open to fully validated. This chapter walks PostgreSQL's version of that spectrum, from no enforcement at all to full JSON Schema validation at the database level. More importantly, it makes a case for the middle ground — the position where most collections ultimately belong, once the initial thrill of schemalessness meets the enduring reality of data that must be read, aggregated, and trusted by code that was not written by the person who stored it.

No Validation

The default. JSONB columns accept any valid JSON document. No key requirements. No type requirements. No structural constraints of any kind. Two documents in the same table can have entirely different shapes, and the database will store both without comment. This is exactly the freedom that makes MongoDB's onboarding experience so appealing, and it is available in PostgreSQL identically.

This is the right choice more often than schema purists would like to admit. Event logs, where every event type carries different fields and the structure is the consumer's problem, not the producer's. User preferences, where shape varies per user and per feature flag and per A/B test variant. Form submissions, where the fields depend on which version of the form the user filled out last Tuesday. Staging tables during migration, where data arrives in whatever shape the export tool produced and will be cleaned later.

The cost of no validation is not that bad data enters the system. Bad data enters every system. The cost is that bad data enters silently. A document missing its email field is stored without complaint. A document with age stored as a string instead of a number is accepted and filed alongside documents where age is numeric. The malformed document is discovered later — usually by the code that tries to read it, usually in production, usually at a moment when the person who wrote the import script is unavailable.

Here is how that plays out in practice: a bulk import script loads fifty thousand user profiles from a partner's CSV. The script converts each row to JSON and inserts it. One field mapping is wrong — email_address instead of email. Fifty thousand documents are stored with the wrong key name. No error. No warning. The import completes successfully. The problem surfaces three weeks later when the email notification service queries data->>'email' and gets NULL for fifty thousand users. The data was present. The key was wrong. The database accepted it because nobody asked the database to check.

No validation is not the absence of consequences. It is the deferral of them. And deferred consequences collect interest.

For the collections that deserve better — and most collections eventually do — the next step on the spectrum requires exactly one ALTER TABLE statement.

CHECK Constraints

CHECK constraints are the middle of the spectrum, and they are where I believe most collections belong. They add structure without ceremony — a few declarative rules that the database enforces on every write, from every source, without exception.

CHECK constraints

-- Require an email key
ALTER TABLE users ADD CONSTRAINT require_email
  CHECK (data ? 'email');

-- Require email AND name
ALTER TABLE users ADD CONSTRAINT require_fields
  CHECK (data ? 'email' AND data ? 'name');

-- Enforce that age is a number
ALTER TABLE users ADD CONSTRAINT age_type
  CHECK (jsonb_typeof(data->'age') = 'number');

-- Enforce that status is one of a set of values
ALTER TABLE users ADD CONSTRAINT valid_status
  CHECK (data->>'status' IN ('active', 'inactive', 'suspended'));

Each constraint is evaluated on every INSERT and UPDATE. Existing rows are not affected when you add a constraint — only new writes are validated. Adding a CHECK constraint to a table with a million documents does not rewrite or scan those documents. It takes effect on the next write.

What that looks like in practice: the import script from the previous section runs again. This time, the users table has CHECK (data ? 'email'). The script inserts a document with email_address instead of email. PostgreSQL returns:

Constraint violation

ERROR: new row for relation "users" violates check constraint "require_email"
DETAIL: Failing row contains (a1b2c3d4-..., {"email_address": "alice@...", ...}, 2026-04-14...).

The insert fails. The script stops. The developer fixes the field mapping. Fifty thousand documents are imported correctly. The email notification service works three weeks later. The constraint cost one line of SQL to create. The absence of the constraint cost three weeks of silent data corruption and a production incident. This is the arithmetic of database-level validation, and it favors the constraint every time.

The appeal of CHECK constraints is their proportionality. You enforce exactly the rules your application depends on. Require the keys your code reads. Enforce types where type mismatches cause bugs. Constrain values where invalid values cause business logic failures. Leave everything else open. The documents can still carry additional fields your application does not yet know about, fields from future features, fields from integrations that haven't been built yet. The guardrails protect the structure that matters. The flexibility remains for everything else.

There is a deeper principle at work here, and it is one I feel strongly about: enforcement at the database level is fundamentally stronger than enforcement at the application level. A CHECK constraint is evaluated by the database engine itself. Every write path — application code, migration scripts, admin queries run from a terminal at 3 AM, bulk imports from a partner's CSV — goes through the same constraint. There is no way to bypass it without explicitly dropping it. Compare this to validation in the application layer, where enforcement depends on every client, every script, and every ad-hoc query implementing the same rules. Application-level validation is a policy. Database-level validation is a law. Policies can be forgotten. Laws cannot.

MongoDB's validation, by default, enforces at the driver or mongos level. This is effective for application writes but leaves a gap for operations that bypass the standard driver path — direct shell operations, administrative scripts, bulk loading tools. PostgreSQL's CHECK constraints close that gap entirely. I mention this not as a criticism of MongoDB's design — driver-level validation is a reasonable default — but as a factual difference that matters to teams whose data integrity requirements extend beyond application writes.

CHECK constraints handle the majority of collections. But some data carries obligations that "require these keys and enforce these types" does not fully express. When the structure of a document is a contract — with regulators, with downstream systems, with teams that did not write the code that produces the document — the next tier on the spectrum provides the full weight of JSON Schema.

JSON Schema Validation

For collections where the document structure is a contract — with downstream consumers, with regulatory frameworks, with reporting systems that will break if a required field is missing or mistyped — pg_jsonschema provides full JSON Schema enforcement at the database level.

pg_jsonschema validation

CREATE EXTENSION IF NOT EXISTS pg_jsonschema;

ALTER TABLE users ADD CONSTRAINT user_schema CHECK (
  jsonb_matches_schema('{
    "type": "object",
    "required": ["email", "name"],
    "properties": {
      "email": {"type": "string", "format": "email"},
      "name": {"type": "string", "minLength": 1},
      "age": {"type": "integer", "minimum": 0},
      "status": {"type": "string", "enum": ["active", "inactive", "suspended"]}
    },
    "additionalProperties": true
  }'::jsonb, data)
);

pg_jsonschema supports JSON Schema drafts 4, 6, 7, 2019-09, and 2020-12. The schema is defined once, in the constraint, and every write is validated against it.

The setting that makes this practical for document workloads is additionalProperties: true. The schema enforces the required fields and their types but permits any additional fields. A document with email, name, and an unexpected nickname field passes validation. A document missing email does not. This is progressive enforcement — strict where the contract demands it, flexible where the data is still evolving. Rigidity where it protects. Freedom where it serves.

Evolving the schema is one ALTER TABLE: drop the old constraint, add a new one. No data rewrite. Existing documents are not retroactively validated — only future writes are checked against the updated schema. This is a deliberate design decision that allows schema evolution without downtime or data migration.

In practice, schema evolution looks like this: version 2 of your API adds a required phone field to user profiles.

Schema evolution

-- Drop the old schema constraint
ALTER TABLE users DROP CONSTRAINT user_schema;

-- Add the updated schema with the new required field
ALTER TABLE users ADD CONSTRAINT user_schema CHECK (
  jsonb_matches_schema('{
    "type": "object",
    "required": ["email", "name", "phone"],
    "properties": {
      "email": {"type": "string", "format": "email"},
      "name": {"type": "string", "minLength": 1},
      "phone": {"type": "string"},
      "age": {"type": "integer", "minimum": 0},
      "status": {"type": "string", "enum": ["active", "inactive", "suspended"]}
    },
    "additionalProperties": true
  }'::jsonb, data)
);

Two statements. The constraint is updated. Existing documents without a phone field remain untouched — they were valid under the old schema and are not retroactively invalidated. New inserts and updates must include phone. The schema evolved. The data did not need to migrate. The application did not need to redeploy. If you need to backfill existing documents, that is a separate operation performed at your pace, not the database's.

When full JSON Schema is the right choice: financial records where regulators expect structural guarantees. API response caches where downstream consumers parse specific fields. Compliance data where a missing field is not a bug but an audit finding. Any collection where the cost of a malformed document exceeds the cost of defining and maintaining a schema.

MongoDB offers broadly similar capability — $jsonSchema in collection validation options, available since version 3.6. I mention this because the capability is not unique to PostgreSQL, and this book does not claim otherwise. What PostgreSQL adds is the enforcement point: a CHECK constraint evaluated by the database engine, covering every write path without exception. MongoDB's validation is effective and well-designed. PostgreSQL's is inescapable, and for certain categories of data, inescapable is the appropriate standard.

I have walked you through three tiers — no validation, CHECK constraints, full JSON Schema — as if they were separate tools. They are not. They are positions on a single spectrum, and the most useful thing I can show you is how different collections in the same database occupy different positions simultaneously. A team that understands this spectrum does not argue about whether to have schemas. They argue about which collections need which level of enforcement — and that is a much better argument to have.

The Spectrum in Practice

The power of this approach is that it is per-table, not per-database. A single PostgreSQL instance can have collections at every point on the spectrum, each with exactly the enforcement its data requires.

Event logs — no validation. Accept any valid JSON. The event producer defines the shape. The consumer parses what it needs. Structure is a downstream concern.

User profiles — CHECK constraints. Require email and name, because the application reads them on every request. Enforce status as an enum, because the authorization system branches on it. Allow additional fields for preferences, feature flags, and metadata that varies per user and per product iteration.

Financial records — full JSON Schema. Required fields, enforced types, constrained values, validated format. The schema is a contract with the reporting pipeline, the compliance team, and the audit trail. A malformed document in this collection is not a bug to fix. It is an incident to explain.

Content management — CHECK constraints plus a generated column. Require a title and a content type. Enforce the content type as an enum. Add a generated column on data->>'slug' with a unique index. The document stays flexible for body content, metadata, and tags. The fields the application routes on are guaranteed to exist and be correct.

The migration path through the spectrum is deliberate and incremental. Start with no validation during early development, when the schema changes with every sprint. Add CHECK constraints as the data model stabilizes and the team learns which fields the application truly depends on. Move to full JSON Schema when the collection's structure becomes a promise to systems beyond your immediate control. Each step is one ALTER TABLE statement. No data migration required. No application rewrite. The enforcement grows with your confidence in the schema, which is precisely as it should be. Confidence that outruns evidence is recklessness. Enforcement that outruns confidence is bureaucracy. The spectrum lets you match one to the other.

Four chapters now stand behind us. How documents are stored (Chapter 4), how they are queried (Chapter 5), how they are indexed (Chapter 6), and how much structure they carry (this chapter). You can now store documents in JSONB, query them in six patterns, make those queries fast with the right indexes, and enforce whatever discipline your data warrants — from none at all to the full weight of JSON Schema.

One chapter remains in Part II, and it turns to a different question — one that I find particularly interesting. Not how to find documents, but how to aggregate them. MongoDB's pipeline model is familiar to most readers and well-designed for the problems it addresses. SQL's aggregation model is older, less intuitive at first glance, and strictly more powerful. The comparison rewards careful attention, because it reveals something about the difference between a language that describes the steps and a language that describes the destination.