PostgreSQL intarray: A Practical Guide to Integer Array Operations
One operator where there were five lines of SQL. That is the kind of improvement worth attending to.
What intarray adds to PostgreSQL
PostgreSQL includes native array support — integer[], text[], and operators like @>, <@, and && for containment and overlap checks. These work on any array type and can be indexed with GIN indexes. For many applications, native arrays are sufficient.
intarray extends PostgreSQL specifically for integer arrays. It adds operators that the native array system does not provide — intersection (&), union (|), and difference (-) — along with utility functions for sorting, deduplication, and element lookup. It also provides GiST index support for integer array columns, enabling fast containment and overlap queries through a different indexing strategy than GIN.
Why a dedicated extension for integer arrays? Performance and convenience. intarray's operators are implemented in C and optimized for sorted integer arrays. Operations that would require subqueries with unnest() and INTERSECT in standard SQL become single-operator expressions. One operator where there were five lines of SQL. That is the kind of improvement worth attending to.
Installation requires no restart and no shared_preload_libraries configuration:
CREATE EXTENSION intarray; intarray is part of PostgreSQL's contrib modules and is available on virtually every managed provider including Amazon RDS, Google Cloud SQL, Supabase, and Neon.
When to use intarray versus native array operators: install intarray when your workload involves integer arrays specifically and you need set operations (intersection, union, difference) or want GiST-based indexing for containment and overlap queries. If your arrays contain non-integer types, or if you only need basic containment checks with GIN indexing, native array operators will serve you well.
The core operators
Allow me to introduce the operators that make intarray worth installing.
Intersection — the & operator
The & operator returns elements present in both arrays:
SELECT ARRAY[1,2,3] & ARRAY[2,3,4];
-- Result: {2,3}
SELECT ARRAY[10,20,30,40] & ARRAY[20,40,60];
-- Result: {20,40}
SELECT ARRAY[1,2,3] & ARRAY[4,5,6];
-- Result: {} The primary use case is finding common elements between two sets — shared tags between two items, overlapping permissions between two roles, or categories that two products have in common.
Standard PostgreSQL has no single-operator intersection for arrays. Without intarray, you would need a subquery approach:
-- Without intarray: verbose and slower
SELECT ARRAY(
SELECT unnest(ARRAY[1,2,3])
INTERSECT
SELECT unnest(ARRAY[2,3,4])
); intarray's C implementation operates on sorted arrays in linear time, making it substantially faster than the subquery approach for repeated operations.
Union — the | operator
The | operator returns all unique elements from both arrays, deduplicated and sorted:
SELECT ARRAY[1,2,3] | ARRAY[3,4,5];
-- Result: {1,2,3,4,5}
SELECT ARRAY[10,20] | ARRAY[20,30] | ARRAY[30,40];
-- Result: {10,20,30,40} The result is always sorted and deduplicated — a useful property for consistent storage. If you store arrays normalized with |, you need never worry about duplicate entries or inconsistent ordering.
Standard PostgreSQL has no single-operator array union. Without intarray, you would need:
-- Without intarray
SELECT ARRAY(
SELECT DISTINCT unnest(ARRAY[1,2,3] || ARRAY[3,4,5])
ORDER BY 1
); Difference — the - operator
The - operator returns elements in the first array that are not in the second:
SELECT ARRAY[1,2,3,4,5] - ARRAY[2,4];
-- Result: {1,3,5}
SELECT ARRAY[10,20,30] - ARRAY[30,40,50];
-- Result: {10,20} This is an asymmetric operation — A - B is not the same as B - A:
SELECT ARRAY[1,2,3] - ARRAY[3,4,5];
-- Result: {1,2}
SELECT ARRAY[3,4,5] - ARRAY[1,2,3];
-- Result: {4,5} Use cases include removing revoked permissions, untagging items, and applying exclusion filters.
Containment — @> and <@ (enhanced)
The containment operators check whether one array contains all elements of another:
SELECT ARRAY[1,2,3,4,5] @> ARRAY[2,4];
-- true (left contains right)
SELECT ARRAY[2,4] <@ ARRAY[1,2,3,4,5];
-- true (left is contained by right) These operators exist in base PostgreSQL and work the same way. intarray does not change their behavior — it makes them indexable with GiST indexes. Without intarray, containment queries on integer arrays require either a GIN index or a sequential scan. With intarray, you gain an additional indexing option (GiST) that has different performance characteristics.
Overlap — && (enhanced)
The overlap operator checks whether two arrays share at least one element:
SELECT ARRAY[1,2,3] && ARRAY[3,4,5];
-- true (element 3 in common)
SELECT ARRAY[1,2,3] && ARRAY[4,5,6];
-- false (no elements in common) Like containment, the && operator exists in base PostgreSQL. intarray enhances it with GiST index support. The practical value is in queries like "find all products that share at least one tag with this product" — with a GiST index on the tag array column, this query uses an index scan instead of a sequential scan.
Useful functions
sort(), sort_asc(), sort_desc()
The sort() function sorts the elements of an integer array:
SELECT sort(ARRAY[3,1,4,1,5,9,2,6]);
-- Result: {1,1,2,3,4,5,6,9}
SELECT sort_asc(ARRAY[3,1,2]);
-- Result: {1,2,3}
SELECT sort_desc(ARRAY[3,1,2]);
-- Result: {3,2,1} Sorting matters because intarray's operators perform best on sorted input. The &, |, and - operators use sorted-array fast paths. Storing arrays pre-sorted — via a trigger, a generated column, or application code — ensures consistent performance.
uniq()
The uniq() function removes consecutive duplicate elements:
SELECT uniq(ARRAY[1,1,2,2,3,3]);
-- Result: {1,2,3}
SELECT uniq(ARRAY[1,2,1,3,2,3]);
-- Result: {1,2,1,3,2,3} -- only consecutive duplicates removed Because uniq() only removes consecutive duplicates, it requires sorted input to fully deduplicate an array. The proper normalization pattern is sort() followed by uniq():
SELECT uniq(sort(ARRAY[3,1,2,1,3,2]));
-- Result: {1,2,3} This uniq(sort(...)) combination is the standard way to normalize integer arrays before storage.
idx()
The idx() function returns the 1-based position of an element in an array, or 0 if the element is not found:
SELECT idx(ARRAY[10,20,30,40], 30);
-- Result: 3
SELECT idx(ARRAY[10,20,30,40], 50);
-- Result: 0 Use idx() to check whether a value exists in an array and to find its position. A return value of 0 means the element is absent.
subarray()
The subarray() function extracts a slice of the array, using 1-based indexing:
SELECT subarray(ARRAY[10,20,30,40,50], 2, 3);
-- Result: {20,30,40} (start at position 2, take 3 elements)
SELECT subarray(ARRAY[10,20,30,40,50], 3);
-- Result: {30,40,50} (start at position 3, take rest) icount()
The icount() function returns the number of elements in an integer array:
SELECT icount(ARRAY[10,20,30]);
-- Result: 3
SELECT icount(ARRAY[]::integer[]);
-- Result: 0 Functionally equivalent to array_length(arr, 1), but icount() reads more clearly in complex expressions and handles empty arrays without returning NULL.
GiST indexing for integer arrays
If the operators are the reason to consider intarray, the GiST index support is the reason to commit. Without intarray, array containment and overlap queries can use GIN indexes (native PostgreSQL) or sequential scans. With intarray, GiST becomes an additional option with different trade-offs.
intarray provides two GiST operator classes:
gist__int_ops — exact representation. Stores the actual set of integers in the index entry. Best for arrays where all values are small (under approximately 100,000). Produces no false positives.
CREATE INDEX idx_tags_exact ON products USING gist(tag_ids gist__int_ops); gist__intbig_ops — signature-based (lossy). Uses a fixed-size bitmap signature to represent the array. Works with any integer values. The trade-off is that the bitmap can produce false positives, requiring a recheck against the actual table data. In practice, the false positive rate is low for reasonably-sized arrays.
CREATE INDEX idx_tags_sig ON products USING gist(tag_ids gist__intbig_ops); Which to choose: gist__intbig_ops is the safe default — it works with any integer values and produces fixed-size index entries regardless of array content. Use gist__int_ops only when your arrays contain small values and you want to eliminate rechecks.
What the GiST index accelerates
With a GiST index in place, these queries use index scans instead of sequential scans:
Containment (@>) — find all products tagged with tags 5 and 12:
SELECT * FROM products WHERE tag_ids @> ARRAY[5, 12]; Contained by (<@) — find all products whose tags are a subset of {1, 2, 3, 4, 5}:
SELECT * FROM products WHERE tag_ids <@ ARRAY[1, 2, 3, 4, 5]; Overlap (&&) — find all products that share at least one tag with {5, 12}:
SELECT * FROM products WHERE tag_ids && ARRAY[5, 12]; GiST vs GIN for array indexing
Both GiST (via intarray) and GIN indexes support containment (@>, <@) and overlap (&&) queries on integer arrays. The choice between them depends on your workload:
| Factor | GIN | GiST (intarray) |
|---|---|---|
| Lookup speed | Generally faster | Slightly slower |
| False positives | None (exact) | Possible with gist__intbig_ops |
| Insert/update speed | Slower (pending list + merge) | Faster |
| Supports &, |, - | No | Yes (in index-aware contexts) |
| Supports = | Yes | No |
| Index size | Varies with vocabulary | Fixed with gist__intbig_ops |
General guidance: Use GIN for read-heavy workloads where lookup speed is the priority and you do not need intarray's extended operators. Use GiST via intarray when you need the intersection, union, and difference operators, when write performance matters, or when your workload is write-heavy.
For a broader comparison of PostgreSQL index types, see the index types overview.
Real-world patterns
Tagging system
A tag system using integer arrays stores tag IDs directly on the tagged entity:
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
tag_ids INTEGER[] NOT NULL DEFAULT '{}'
);
CREATE INDEX idx_products_tags ON products USING gist(tag_ids gist__intbig_ops); -- Find products with all specified tags (containment):
SELECT * FROM products WHERE tag_ids @> ARRAY[5, 12];
-- Find products with any of the specified tags (overlap):
SELECT * FROM products WHERE tag_ids && ARRAY[5, 12, 20];
-- Find common tags between two products (intersection):
SELECT
p1.tag_ids & p2.tag_ids AS shared_tags
FROM products p1, products p2
WHERE p1.id = 101 AND p2.id = 202;
-- Add a tag (union -- deduplicates automatically):
UPDATE products SET tag_ids = tag_ids | ARRAY[15] WHERE id = 101;
-- Remove a tag (difference):
UPDATE products SET tag_ids = tag_ids - ARRAY[15] WHERE id = 101; The GiST index makes the WHERE clause queries fast. The set operators make the mutations clean — no need to unnest, filter, and re-aggregate. Each operation is a single, readable expression.
Permission bitmaps
User permissions as an integer array where each integer represents a permission ID:
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
permissions INTEGER[] NOT NULL DEFAULT '{}'
);
CREATE INDEX idx_users_perms ON users USING gist(permissions gist__intbig_ops);
-- Check if a user has all required permissions:
SELECT * FROM users WHERE permissions @> ARRAY[1, 4, 7];
-- Grant new permissions:
UPDATE users SET permissions = permissions | ARRAY[9, 10] WHERE id = 42;
-- Revoke a permission:
UPDATE users SET permissions = permissions - ARRAY[4] WHERE id = 42;
-- Find users who share at least one permission with a set:
SELECT * FROM users WHERE permissions && ARRAY[1, 2, 3]; Multi-category filtering
Products assigned to multiple categories using an integer array of category IDs:
CREATE TABLE catalog_items (
id SERIAL PRIMARY KEY,
title TEXT NOT NULL,
category_ids INTEGER[] NOT NULL DEFAULT '{}'
);
CREATE INDEX idx_catalog_cats ON catalog_items USING gist(category_ids gist__intbig_ops);
-- OR filter -- show products in categories 5, 12, or 20:
SELECT * FROM catalog_items WHERE category_ids && ARRAY[5, 12, 20];
-- AND filter -- show products in all of categories 5 and 12:
SELECT * FROM catalog_items WHERE category_ids @> ARRAY[5, 12];
-- Category difference -- which categories does item A have that item B does not?
SELECT a.category_ids - b.category_ids AS unique_to_a
FROM catalog_items a, catalog_items b
WHERE a.id = 101 AND b.id = 202; Performance considerations
I should note a few practical boundaries.
Array size limits. intarray works best with arrays of up to a few hundred elements. As arrays grow beyond that, the cost of set operations and GiST index maintenance increases. intarray is not designed for arrays of thousands of elements.
Pre-sort arrays on write. intarray's operators exploit sorted-array fast paths. Storing arrays already sorted avoids repeated sorting during query execution. Normalize arrays with uniq(sort(arr)) in a trigger or application code:
CREATE OR REPLACE FUNCTION normalize_int_array()
RETURNS TRIGGER AS $$
BEGIN
NEW.tag_ids := uniq(sort(NEW.tag_ids));
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER trg_normalize_tags
BEFORE INSERT OR UPDATE ON products
FOR EACH ROW EXECUTE FUNCTION normalize_int_array(); VACUUM and GiST indexes. GiST indexes can bloat after heavy update activity. Monitor GiST index size relative to the data size and REINDEX when bloat becomes significant. The pgstatindex function from the pgstattuple extension provides page-level statistics for index analysis.
When to use a junction table instead. I should be honest about the limits of the array approach. Integer arrays with intarray are not a universal replacement for junction tables. Consider a normalized junction table when:
- Arrays regularly exceed 500 elements
- You need foreign key constraints to enforce referential integrity
- You need to JOIN on attributes of the related entity (e.g., tag name, category description)
- You need complex aggregations across the relationship
The array approach trades normalization for simplicity and performance on read-heavy set operations. The junction table approach trades performance for flexibility and referential integrity. The right choice depends on your access patterns — and many applications benefit from both, each in its proper place.
For comprehensive guidance on PostgreSQL performance optimization, see the performance tuning guide. For an overview of PostgreSQL extensions that improve performance, see the extensions roundup.