← How-To

PostGIS for Application Developers

A Practical Performance Guide

The Butler of Gold Lapel · March 26, 2026 · 28 min read
The illustrator was given precise coordinates. The artwork arrived on the wrong continent. We have asked him to check his SRID.

Good Evening. I Understand You Have Location Data.

PostGIS is the spatial extension for PostgreSQL, adding geometry and geography types, spatial indexing, and hundreds of functions for querying location data. It transforms PostgreSQL into a fully capable geographic information system. Most application developers, however, use only a fraction of what PostGIS offers — and often not the most efficient fraction. I should like to remedy that.

This guide covers the spatial query patterns that application developers encounter most often: nearest-neighbor search, radius queries, containment and geofencing, intersection tests, and bulk spatial joins. It then addresses GiST index internals, tuning parameters, ORM integration, and data loading. The goal is practical spatial competence — enough to build performant location-aware features without a GIS background.

This guide does not cover raster operations, topology, 3D geometry, or network routing (pgRouting). For PostGIS installation and core type overview, see the PostGIS extension page. For a focused deep dive on the "find nearby" optimization pattern (ST_DWithin vs ST_Distance, geometry vs geography, materialized views), see the PostGIS optimization guide.

What This Guide Covers — and What It Does Not

This guide is for application developers who store and query spatial data in PostgreSQL. I shall assume you have PostGIS installed and a working familiarity with points, polygons, and coordinate systems.

Covers:

  • The five spatial query patterns every application needs (KNN, radius, containment, intersection, bulk joins)
  • GiST index internals and tuning parameters
  • ORM integration (Django, Rails, SQLAlchemy)
  • Data loading from GeoJSON, shapefiles, and application-generated coordinates
  • Common pitfalls and their resolutions

Does not cover:

  • Raster operations (satellite imagery, elevation data)
  • Topology (network-aware spatial relationships)
  • 3D geometry (Z coordinates, volumetric analysis)
  • Network routing (pgRouting for shortest path, driving directions)

The Five Spatial Query Patterns Worth Knowing

Pattern 1 — Nearest Neighbor (KNN with <->)

The <-> distance operator combined with ORDER BY and LIMIT performs true K-nearest-neighbor search. The query planner uses the GiST index to traverse the spatial index tree, returning the K closest features without computing the distance to every row.

-- Find the 3 nearest hospitals to a given point
SELECT name, address,
       location <-> ST_SetSRID(ST_MakePoint(-73.985, 40.748), 4326) AS distance
FROM hospitals
ORDER BY location <-> ST_SetSRID(ST_MakePoint(-73.985, 40.748), 4326)
LIMIT 3;

This requires a GiST index on the location column. Without it, PostgreSQL computes the distance to every row, sorts the entire result, and returns the top 3. A full table scan plus sort — the kind of arrangement that does not age well.

-- GiST index for KNN queries
CREATE INDEX hospitals_location_idx ON hospitals USING GiST (location);

With the index, EXPLAIN ANALYZE shows an Index Scan with ordering:

EXPLAIN output
Limit  (cost=0.28..12.53 rows=3 width=72)
  ->  Index Scan using hospitals_location_idx on hospitals
        Order By: (location <-> '0101000020E6100000...'::geometry)

How KNN differs from ST_DWithin: KNN finds the K closest features regardless of how far away they are. ST_DWithin finds all features within a fixed radius. Use KNN for "find the 5 closest restaurants" and ST_DWithin for "find all restaurants within 2 km."

Pattern 2 — Within Radius (ST_DWithin)

ST_DWithin finds all features within a specified distance. I should be direct: it is the correct function for radius search — not ST_Distance in a WHERE clause.

-- Find all restaurants within 1km (geography type, distance in meters)
SELECT name, address
FROM restaurants
WHERE ST_DWithin(
  location::geography,
  ST_SetSRID(ST_MakePoint(-73.985, 40.748), 4326)::geography,
  1000  -- meters
);

ST_DWithin uses the GiST index to eliminate candidates by bounding box before computing exact distances. ST_Distance in a WHERE clause computes the distance to every row and then filters — bypassing the spatial index entirely. The distinction is not subtle, and the performance difference is not small.

For a detailed comparison of ST_DWithin vs ST_Distance, geometry vs geography casting patterns, and materialized view optimizations for radius search, see the PostGIS optimization guide.

Pattern 3 — Containment and Geofencing (ST_Contains, ST_Within)

Containment queries answer the questions that location-aware applications ask most frequently: "which delivery zone is this address in?" and "which city districts overlap this proposed development?"

Point-in-polygon: assigning an order to a delivery zone

-- Which delivery zone contains this order's location?
SELECT z.zone_name, z.delivery_fee
FROM delivery_zones z
WHERE ST_Contains(z.boundary, ST_SetSRID(ST_MakePoint(-73.985, 40.748), 4326));

Finding all zones that cover a given coordinate:

-- All zones covering a point (a point can be in multiple overlapping zones)
SELECT z.zone_name
FROM delivery_zones z
WHERE ST_Covers(z.boundary, ST_SetSRID(ST_MakePoint(-73.985, 40.748), 4326));
FunctionReturns true whenUse case
ST_Contains(A, B)A fully contains B, boundaries may touch"Which zone contains this point?"
ST_Within(A, B)A is fully inside BInverse of ST_Contains: ST_Within(point, zone)
ST_Covers(A, B)A covers B, including boundary pointsPreferred over ST_Contains for point-in-polygon (handles edge cases on polygon boundaries)

For point-in-polygon queries, I would recommend ST_Covers over ST_Contains. The reason is precise: ST_Covers handles the edge case of points exactly on the polygon boundary, while ST_Contains excludes them. A small distinction, but the kind that produces support tickets at the worst possible moment.

Index placement: Create the GiST index on the polygon column, not the point column. The query planner uses the index on the column in the first argument of ST_Contains/ST_Covers.

-- Index on the polygon column (delivery zone boundaries)
CREATE INDEX delivery_zones_boundary_idx ON delivery_zones USING GiST (boundary);

Pattern 4 — Intersection Queries (ST_Intersects, ST_Crosses)

Intersection queries determine whether two geometries share any space. They are the most general spatial predicates.

Does a delivery route cross a restricted zone?

-- Find restricted zones that a route crosses
SELECT rz.zone_name, rz.restriction_type
FROM restricted_zones rz
WHERE ST_Crosses(
  rz.boundary,
  ST_GeomFromGeoJSON('{"type":"LineString","coordinates":[[-73.99,40.74],[-73.97,40.75]]}')
);

Which parcels intersect a flood plain?

-- Find all parcels that intersect a given polygon boundary
SELECT p.parcel_id, p.address, p.area_sqft
FROM parcels p
JOIN flood_plains fp ON ST_Intersects(p.boundary, fp.boundary)
WHERE fp.risk_level = 'high';

ST_Intersects vs ST_Crosses:

  • ST_Intersects is the most general spatial predicate — it returns true if the geometries share any space at all (overlap, touch, cross, or one contains the other).
  • ST_Crosses is specific to linestring-polygon or linestring-linestring intersections where the geometries actually cross through each other (not just touch at a boundary point).

For most application queries, ST_Intersects is the right choice. Use ST_Crosses only when you need to distinguish crossing from touching.

Pattern 5 — Bulk Spatial Joins

Joining two spatial tables is a common analytics pattern: match every order to its nearest warehouse, assign all customers to their service areas, or find all points of interest near each store location.

The cross-join — an arrangement I would prefer you avoid:

-- Avoid: cross join scales poorly on large tables
SELECT o.id, w.name
FROM orders o, warehouses w
WHERE ST_DWithin(o.location, w.location, 50000)
ORDER BY o.id, ST_Distance(o.location, w.location)
LIMIT 1;

For 100,000 orders and 50 warehouses, this evaluates 5 million spatial predicates. One does not send the entire staff to answer one question.

LATERAL JOIN with KNN: The proper approach uses LATERAL to perform an indexed KNN lookup for each row in the driving table.

-- Assign each order to its nearest warehouse
SELECT o.id AS order_id, nearest.warehouse_name, nearest.distance
FROM orders o
CROSS JOIN LATERAL (
  SELECT w.name AS warehouse_name,
         o.location <-> w.location AS distance
  FROM warehouses w
  ORDER BY o.location <-> w.location
  LIMIT 1
) nearest;

With a GiST index on warehouses.location, each LATERAL subquery performs an indexed KNN lookup. The total cost is proportional to num_orders * log(num_warehouses) rather than num_orders * num_warehouses.

EXPLAIN output
Nested Loop  (cost=0.28..125000.00 rows=100000 width=52)
  ->  Seq Scan on orders o  (rows=100000)
  ->  Limit  (cost=0.28..0.53 rows=1 width=44)
        ->  Index Scan using warehouses_location_idx on warehouses w
              Order By: (o.location <-> w.location)

For 100,000 orders and 50 warehouses, this completes in seconds instead of minutes. Rather more presentable.

GiST Index Internals — What the Index Actually Does

If you'll indulge me for a moment, I think understanding what happens beneath the surface will serve you well. GiST (Generalized Search Tree) is PostgreSQL's extensible index framework. For spatial data, PostGIS implements an R-tree structure within the GiST framework.

Bounding box hierarchy: The R-tree organizes spatial data into a hierarchy of bounding boxes. Each internal node represents a bounding box that encloses all of its children. Leaf nodes contain the actual geometry references with their bounding boxes. When you query for features near a point, the tree is traversed top-down, pruning branches whose bounding boxes are too far away.

Two-phase filter: Spatial queries in PostGIS use a two-phase approach:

  1. Bounding box filter (index phase): The GiST index eliminates candidates whose bounding boxes do not satisfy the spatial predicate. This is fast — it compares rectangles, not complex geometries.
  2. Exact geometry check (recheck phase): The remaining candidates are tested against the actual geometry. This is computationally expensive for complex polygons.

The && operator is the bounding box overlap test that ST_DWithin, ST_Intersects, and other spatial functions use internally. When you see the && operator in an EXPLAIN plan, the GiST index is being used for the bounding box filter.

High candidate-to-result ratios: In some scenarios, the bounding box filter passes many candidates that the exact geometry check rejects. This happens with dense urban data, complex polygons with large bounding boxes relative to their actual area, and queries near boundaries between many features.

If EXPLAIN ANALYZE shows a high ratio of rows examined to rows returned (e.g., "Rows Removed by Filter: 5000" for a query returning 10 results), the bounding boxes are not providing good selectivity. Simplifying complex geometries or splitting them into smaller components can improve the situation considerably.

Tuning GiST Indexes for Spatial Workloads

Now, if you'll permit me, the tuning parameters that earn their keep.

Fillfactor

The fillfactor controls how full each index page is packed during initial index creation. The default is 90%.

-- Lower fillfactor for update-heavy tables
CREATE INDEX drivers_location_idx ON drivers USING GiST (location)
WITH (fillfactor = 70);

When to lower fillfactor (70–80%): Tables where spatial columns are frequently updated, such as delivery driver positions, vehicle tracking, or real-time asset locations. A lower fillfactor leaves space on each index page for updates without requiring page splits. Page splits are expensive — they lock the page and require reorganizing the tree.

When to keep at 90% (default): Static or infrequently updated spatial data, such as store locations, city boundaries, postal code polygons, and other reference data. A higher fillfactor produces a smaller, more compact index.

When to Consider SP-GiST

SP-GiST (Space-Partitioned GiST) is an alternative spatial index type that uses quadtree partitioning instead of the R-tree bounding box hierarchy.

-- SP-GiST index on point data
CREATE INDEX locations_spgist_idx ON locations USING SPGiST (coordinates);

Better for: Point data that is heavily clustered in small regions — such as urban ride-share pickups, where millions of points concentrate in a few city blocks. Quadtree partitioning handles this skew more efficiently than R-tree bounding boxes.

Not suitable for: Overlapping geometries (polygons). SP-GiST's partitioning scheme assumes non-overlapping regions, which is true for points but not for polygons that can overlap each other. Use GiST for polygon data.

Performance comparison: On 1 million heavily clustered urban points, SP-GiST can outperform GiST by 20–40% for nearest-neighbor queries because the quadtree more efficiently partitions the dense regions. On uniformly distributed points, the difference is negligible.

Partial Spatial Indexes

Create spatial indexes only on the subset of rows that your queries actually need:

-- Index only active delivery drivers
CREATE INDEX active_drivers_location_idx ON drivers USING GiST (location)
WHERE active = true;

-- Index only current-year orders
CREATE INDEX orders_2026_location_idx ON orders USING GiST (location)
WHERE created_at >= '2026-01-01';

If your table has 10 million rows but only 50,000 are active at any given time, a partial index is 200x smaller than a full index. Smaller indexes are faster to traverse and more likely to fit in memory. The difference is not theoretical — it is the difference between a query that touches disk and one that does not.

For more on partial index patterns, see the partial indexes guide.

ANALYZE After Bulk Loads

I mention this because I have seen it cause unnecessary consternation. PostGIS relies on the query planner's statistics to determine whether to use a GiST index. If the statistics are stale — because you just loaded 500,000 rows — the planner may not know the GiST index exists or may underestimate its usefulness.

-- Always run ANALYZE after bulk loads or index creation
ANALYZE locations;

Symptom of stale statistics: EXPLAIN shows a sequential scan despite a GiST index existing on the column. If you see this, run ANALYZE on the table and try again.

Autovacuum runs ANALYZE automatically, but its schedule may not keep up with large bulk loads. After a COPY or batch INSERT, run ANALYZE explicitly.

PostGIS with Your ORM

Django (GeoDjango)

Django's django.contrib.gis module provides native PostGIS support with spatial field types and query lookups.

Model
from django.contrib.gis.db import models

class Hospital(models.Model):
    name = models.CharField(max_length=255)
    location = models.PointField(srid=4326)

    class Meta:
        indexes = [
            models.Index(fields=['location'], name='hospital_location_idx'),
        ]
Queries
from django.contrib.gis.geos import Point
from django.contrib.gis.measure import D

# Find hospitals within 5km
point = Point(-73.985, 40.748, srid=4326)
nearby = Hospital.objects.filter(location__distance_lte=(point, D(km=5)))

# KNN - nearest 3 hospitals
from django.contrib.gis.db.models.functions import Distance
nearest = Hospital.objects.annotate(
    dist=Distance('location', point)
).order_by('dist')[:3]

Key detail: The distance_lte lookup generates ST_DWithin (uses the GiST index), not a ST_Distance < value filter (bypasses the index). This is the correct behavior.

Common pitfall: Forgetting to set the SRID on the lookup geometry. If your column is SRID 4326 and you create a Point without specifying srid=4326, Django may silently perform a coordinate transformation or produce incorrect results.

Rails (RGeo + ActiveRecord)

The activerecord-postgis-adapter gem adds PostGIS support to ActiveRecord.

Migration
# Migration
class CreateHospitals < ActiveRecord::Migration[7.0]
  def change
    create_table :hospitals do |t|
      t.string :name
      t.st_point :location, srid: 4326, geographic: true
      t.timestamps
    end

    add_index :hospitals, :location, using: :gist
  end
end
Query
# Find hospitals within 5km
factory = RGeo::Geographic.spherical_factory(srid: 4326)
point = factory.point(-73.985, 40.748)

Hospital.where(
  "ST_DWithin(location, ST_SetSRID(ST_MakePoint(?, ?), 4326)::geography, ?)",
  -73.985, 40.748, 5000
)

Common pitfall: N+1 queries on spatial associations. If each order loads its associated delivery zone via a spatial lookup, you get one query per order. Use includes with preloaded spatial joins or batch the lookups.

SQLAlchemy (GeoAlchemy2)

GeoAlchemy2 extends SQLAlchemy with PostGIS column types and spatial functions.

Model
from sqlalchemy import Column, Integer, String
from geoalchemy2 import Geometry
from sqlalchemy import func

class Hospital(Base):
    __tablename__ = 'hospitals'
    id = Column(Integer, primary_key=True)
    name = Column(String)
    location = Column(Geometry('POINT', srid=4326))
Query
from geoalchemy2.elements import WKTElement

point = WKTElement('POINT(-73.985 40.748)', srid=4326)

# Find hospitals within 5km
nearby = session.query(Hospital).filter(
    func.ST_DWithin(
        Hospital.location,
        point,
        5000
    )
).all()

Common pitfall: Missing the func. prefix on spatial functions. Writing ST_DWithin(...) instead of func.ST_DWithin(...) calls a Python function (which does not exist) instead of generating the SQL function call. This produces a Python error, not a SQL query.

Raw SQL — When the ORM Has Done Its Best

ORMs add genuine convenience for CRUD operations, and I would not ask you to abandon them lightly. But for LATERAL joins with KNN, bulk spatial operations, and multi-table spatial analytics, raw SQL is often cleaner and more predictable.

# SQLAlchemy raw SQL for LATERAL KNN
result = session.execute(text("""
    SELECT o.id, nearest.warehouse_name, nearest.distance
    FROM orders o
    CROSS JOIN LATERAL (
        SELECT w.name AS warehouse_name,
               o.location <-> w.location AS distance
        FROM warehouses w
        ORDER BY o.location <-> w.location
        LIMIT 1
    ) nearest
"""))

My recommendation: Use the ORM for standard CRUD operations with spatial columns. Switch to raw SQL for spatial analytics, bulk joins, and any query involving LATERAL, window functions over spatial data, or complex multi-predicate spatial filters. The ORM and raw SQL are not rivals — they serve different occasions.

Loading Spatial Data into PostgreSQL

GeoJSON

For individual records, use ST_GeomFromGeoJSON:

INSERT INTO zones (name, boundary)
VALUES (
  'Downtown',
  ST_SetSRID(
    ST_GeomFromGeoJSON('{"type":"Polygon","coordinates":[[[-73.99,40.74],[-73.97,40.74],[-73.97,40.75],[-73.99,40.75],[-73.99,40.74]]]}'),
    4326
  )
);

For bulk loading large GeoJSON files, use ogr2ogr from the GDAL toolkit:

ogr2ogr
ogr2ogr -f "PostgreSQL" \
  PG:"host=localhost dbname=myapp user=myuser" \
  neighborhoods.geojson \
  -nln neighborhoods \
  -lco GEOMETRY_NAME=boundary \
  -lco FID=id

ogr2ogr handles coordinate system detection, type mapping, and batch insertion. It is significantly faster than parsing GeoJSON in application code and inserting row by row.

Shapefiles

Shapefiles are the traditional GIS interchange format. PostGIS ships with shp2pgsql for importing them:

shp2pgsql
# Convert shapefile to SQL and pipe to psql
shp2pgsql -s 4326 city_boundaries.shp public.city_boundaries | psql -d myapp

# Or use ogr2ogr for more control
ogr2ogr -f "PostgreSQL" \
  PG:"host=localhost dbname=myapp" \
  city_boundaries.shp \
  -nln city_boundaries \
  -s_srs EPSG:4326 \
  -t_srs EPSG:4326

Always verify the SRID after import. Shapefiles often lack explicit coordinate system metadata, and the import tool may default to SRID 0 (unspecified). Check with:

SELECT ST_SRID(boundary) FROM city_boundaries LIMIT 1;

If the SRID is 0, update it:

SELECT UpdateGeometrySRID('city_boundaries', 'boundary', 4326);

Application-Generated Coordinates

Geocoding APIs and GPS devices return longitude/latitude pairs. Construct PostGIS points with ST_MakePoint:

INSERT INTO locations (name, coordinates)
VALUES (
  'Office',
  ST_SetSRID(ST_MakePoint(-73.985428, 40.748817), 4326)
);

Longitude first, latitude second. This follows the ISO 6709 standard and matches the x/y convention used by PostGIS. It is the opposite of the "lat, lng" display order used by Google Maps and many geocoding APIs. I mention this with some emphasis because getting the order reversed is one of the most common PostGIS mistakes — your points end up in the ocean or on the wrong continent, and the query returns results without complaint.

Validate coordinates before insert:

-- Add a CHECK constraint
ALTER TABLE locations ADD CONSTRAINT valid_coordinates
CHECK (
  ST_X(coordinates) BETWEEN -180 AND 180
  AND ST_Y(coordinates) BETWEEN -90 AND 90
);

Or validate in application code before constructing the SQL. Invalid coordinates (longitude outside [-180, 180], latitude outside [-90, 90]) produce geometries that PostGIS accepts but that produce incorrect results for distance and containment queries.

Common Pitfalls — A Checklist for Your Reference

PitfallSymptomResolution
Missing GiST indexSeq Scan in EXPLAIN for spatial queriesCREATE INDEX USING GiST (column) — see GiST index guide
ST_Distance in WHERE clauseSlow radius queries, Seq Scan despite indexUse ST_DWithin instead — see PostGIS optimization guide
SRID mismatchIncorrect distances, wrong results, index bypassEnsure query point SRID matches column SRID
Geometry distances treated as metersDistances in degrees (0.01 ≈ 1.1 km at equator)Cast to geography for meter-based distances
Longitude/latitude reversalPoints appear on wrong continent or in oceanST_MakePoint(longitude, latitude) — longitude (x) first
ST_Transform in WHERE clauseSlow queries, per-row coordinate transformationTransform the query parameter to match the column SRID
Stale statistics after bulk loadSeq Scan despite GiST index existingRun ANALYZE tablename after bulk INSERT/COPY
Cross-join for bulk nearest-neighborQuery runs for minutes/hours on large tablesUse CROSS JOIN LATERAL with KNN operator and LIMIT

How Gold Lapel Attends to Spatial Queries

Gold Lapel detects spatial queries that are missing GiST indexes and recommends index creation. It identifies expensive spatial patterns — ST_Distance computations that would perform better as ST_DWithin, spatial results that would benefit from materialized views — and surfaces them as actionable recommendations. It also monitors spatial query performance over time, so you are informed when degradation occurs as your spatial data grows, rather than discovering it from a user complaint.

Frequently asked questions