tablefunc

Q: What is the difference between the one-parameter and two-parameter forms of crosstab()?

The single-parameter form fills columns left to right based on the order values appear in the source query. If a row is missing a category, subsequent values shift into the wrong columns — quietly and without apology. The two-parameter form takes a second query that defines the expected categories, and matches each value to the correct column by category name. It is more reliable and should be your default. The few extra characters of SQL are a small price for results you can trust.

Q: Does the crosstab() source query need to be ordered?

Yes, without exception. The source query must be ordered by the row name column first, then by the category column. Neglect the ordering and values will end up in the wrong columns or rows — a disservice to anyone relying on the output. Both the single-parameter and two-parameter forms require this.

Q: Can crosstab() handle a dynamic number of columns?

I am afraid not. The output column list must be specified in the AS clause at query time, which means you need to know the categories in advance. If the number of categories changes, the query must be rewritten. For truly dynamic pivots, you will need to generate the SQL dynamically in your application or use a PL/pgSQL function that builds and executes the crosstab query. This is the one genuine limitation worth knowing about.

Q: Is connectby() still the best way to query hierarchical data?

For most use cases, recursive CTEs (WITH RECURSIVE) have been the preferred approach since PostgreSQL 8.4. They are part of the SQL standard, require no extension, and offer considerably more flexibility. connectby() remains simpler for basic tree traversals, but I would not recommend building new code around it. Use what the standard provides.

Q: Does tablefunc add any overhead when installed but not used?

None whatsoever. tablefunc is a lightweight contrib extension that registers a few functions. It does not run background processes, consume shared memory, or require shared_preload_libraries. If you never call its functions, it sits quietly and costs you nothing. There is no reason not to have it available.

Crosstab and pivot table functions — because data deserves to be presented properly.

The Waiter of Gold Lapel · Updated Mar 30, 2026 Published Mar 21, 2026 · 5 min read

There is something rather undignified about making a report consumer mentally pivot rows into columns. That is the database's job. tablefunc is a PostgreSQL contrib extension that provides functions for creating pivot tables (crosstabs), generating normally distributed random numbers, and traversing hierarchical data. Its main function, crosstab(), transforms row-oriented query results into columnar output — the kind of row-to-column transformation that reporting and analytics queries frequently need.

What tablefunc does

tablefunc provides three families of functions. The most widely used is crosstab(), which takes a query returning row-name/category/value triples and pivots them into a table where each category becomes a column. This is the standard way to produce pivot tables directly in PostgreSQL without reshaping data in application code.

The extension also includes normal_rand() for generating sets of normally distributed random numbers, and connectby() for traversing parent-child relationships stored in a table. While crosstab() sees regular production use, the other two functions are more niche — normal_rand() is useful for test data generation, and connectby() has largely been superseded by recursive CTEs.

When to use tablefunc

Pivot tables for reporting — turn month-over-month, category-by-category, or any row-oriented data into columnar output
Dashboard queries — produce results already shaped for display, avoiding application-level pivoting
Data exports — reshape normalized data into the wide-format layout that spreadsheets and BI tools expect
Test data generation — use normal_rand() to create realistic distributions for performance testing

If your pivot categories are truly dynamic (unknown at query time), you will need to generate the SQL dynamically in application code — crosstab() requires the output columns to be declared in the AS clause.

Installation and setup

tablefunc is a contrib module that ships with PostgreSQL — no separate download or build required. The official PostgreSQL documentation covers all functions and their parameters. It does not need shared_preload_libraries, so no restart is necessary.

SQL

-- tablefunc ships with PostgreSQL (contrib module)
-- No shared_preload_libraries needed — just create the extension
CREATE EXTENSION tablefunc;

-- Verify it's working
SELECT * FROM normal_rand(5, 0, 1);

crosstab(): single-parameter form

The basic crosstab() takes a single SQL query that must return exactly three columns: a row name, a category, and a value. The results are pivoted so that each distinct category becomes a column.

SQL

-- Sample data: monthly sales by product
CREATE TABLE monthly_sales (
  product text,
  month text,
  revenue numeric
);

INSERT INTO monthly_sales VALUES
  ('Widget', 'Jan', 1200),
  ('Widget', 'Feb', 1500),
  ('Widget', 'Mar', 980),
  ('Gadget', 'Jan', 3200),
  ('Gadget', 'Feb', 2800),
  ('Gadget', 'Mar', 3500);

-- Single-parameter crosstab: pivots rows into columns
-- The source query MUST return (row_name, category, value)
-- ordered by row_name, then category
SELECT * FROM crosstab(
  'SELECT product, month, revenue
   FROM monthly_sales
   ORDER BY 1, 2'
) AS ct(product text, jan numeric, feb numeric, mar numeric);

-- Result:
-- product | jan  | feb  | mar
-- --------+------+------+-----
-- Gadget  | 3200 | 2800 | 3500
-- Widget  | 1200 | 1500 | 980

The source query must be ordered by row name first, then by category. Values are assigned to output columns left to right in the order they appear. This form works correctly when every row has the same set of categories — but breaks when categories are missing. Silently placing March's revenue in February's column is the sort of behaviour that erodes trust in a report.

crosstab(): two-parameter form

The two-parameter form is more reliable. The second argument is a query that returns the list of category values in order. PostgreSQL matches each value to the correct output column by category name rather than by position, inserting NULL for any missing categories.

SQL

-- Two-parameter crosstab: handles missing values correctly
-- First arg: source query (row_name, category, value)
-- Second arg: query returning the category values in order

SELECT * FROM crosstab(
  'SELECT product, month, revenue
   FROM monthly_sales
   ORDER BY 1, 2',
  $$SELECT DISTINCT month FROM monthly_sales ORDER BY 1$$
) AS ct(product text, feb numeric, jan numeric, mar numeric);

-- Why the two-parameter form matters:
-- If Gadget has no Feb data, the single-parameter form
-- would shift Mar's value into Feb's column.
-- The two-parameter form matches values to the correct
-- category column, leaving gaps as NULL.

The two-parameter form should be your default choice. It handles sparse data correctly and makes the intent of the query explicit. A NULL in the output is honest; a misplaced value is a lie. The PostgreSQL wiki maintains a detailed crosstab function guide with additional examples and edge cases.

normal_rand()

Generates a set of normally distributed (Gaussian) random numbers. Useful for populating test tables with realistic data distributions.

SQL

-- Generate 10 normally distributed random numbers
-- normal_rand(count, mean, standard_deviation)
SELECT * FROM normal_rand(10, 100, 15);

-- Useful for generating test data with realistic distributions
-- Example: generate 1000 simulated response times (mean 200ms, stddev 50ms)
SELECT round(val::numeric, 1) AS response_ms
FROM normal_rand(1000, 200, 50) AS val
WHERE val > 0;

connectby()

Traverses hierarchical data stored as parent-child references in a single table. While still functional, recursive CTEs (WITH RECURSIVE) are generally preferred for new code. I include it here for completeness — a proper inventory omits nothing.

SQL

-- connectby: traverse hierarchical data
-- connectby(relname, keyid_fld, parent_keyid_fld,
--           start_with, max_depth [, branch_delim])

CREATE TABLE employees (
  id integer PRIMARY KEY,
  name text,
  manager_id integer REFERENCES employees(id)
);

INSERT INTO employees VALUES
  (1, 'Alice', NULL),
  (2, 'Bob', 1),
  (3, 'Carol', 1),
  (4, 'Dave', 2),
  (5, 'Eve', 2);

-- Traverse the org chart starting from Alice (id=1), up to 10 levels deep
SELECT * FROM connectby(
  'employees', 'id', 'manager_id', '1', 10, '~'
) AS ct(id integer, manager_id integer, level integer, branch text);

-- Result:
-- id | manager_id | level | branch
-- ---+------------+-------+--------
--  1 |            |     0 | 1
--  2 |          1 |     1 | 1~2
--  3 |          1 |     1 | 1~3
--  4 |          2 |     2 | 1~2~4
--  5 |          2 |     2 | 1~2~5

Cloud availability

Provider	Status
Amazon RDS / Aurora	Available — install with CREATE EXTENSION
Google Cloud SQL	Available — supported as a standard contrib extension
Azure Database for PostgreSQL	Available — add to the azure.extensions allowlist, then CREATE EXTENSION
Supabase	Available — enable from the Extensions dashboard or via SQL
Neon	Available — install with CREATE EXTENSION

As a contrib module that ships with PostgreSQL itself, tablefunc is available on virtually every managed PostgreSQL service.

How Gold Lapel relates

A well-formed crosstab query is a pleasure. A well-formed crosstab query that runs forty times an hour against the same underlying data is a concern. The pivot itself — scanning, grouping, redistributing — is honest work, but repeating it identically for every dashboard refresh is the kind of redundancy I find difficult to overlook.

Gold Lapel addresses this through materialized views. When the same crosstab structure appears repeatedly, Gold Lapel can cache the pivoted results and serve subsequent requests from the precomputed output. The full pivot still runs at refresh time, but your dashboard consumers see sub-millisecond responses rather than waiting for the operation to complete from scratch. The data is presented properly, and it is presented promptly. Both matter.

tablefunc

What tablefunc does

When to use tablefunc

Installation and setup

crosstab(): single-parameter form

crosstab(): two-parameter form

normal_rand()

connectby()

Cloud availability

How Gold Lapel relates

Frequently asked questions

What is the difference between the one-parameter and two-parameter forms of crosstab()?

Does the crosstab() source query need to be ordered?

Can crosstab() handle a dynamic number of columns?

Is connectby() still the best way to query hierarchical data?

Does tablefunc add any overhead when installed but not used?

Related content