← Laravel & PHP Frameworks

Laravel Scout on PostgreSQL: Tuning tsvector, GIN Indexes, and Proper Full-Text Search

Scout's database driver sends LIKE '%term%' to PostgreSQL. This is what it should be sending instead.

The Waiter of Gold Lapel · Updated Mar 20, 2026 Published Mar 5, 2026 · 25 min read
The search index was performing a sequential scan of the illustration archive. Apologies.

Good evening. Your search bar has a problem.

Laravel Scout is a fine abstraction for search. You add the Searchable trait to a model, call Product::search('wireless headphones'), and results appear. The API is clean. The documentation is clear. The ergonomics are excellent.

The SQL it generates is not.

When you use Scout's built-in database driver — the one that ships with Laravel, the one that requires no external services — it translates your search into LIKE '%term%' clauses. No stemming. No ranking. No index support. Every search triggers a full sequential scan of every row in the table.

I should be clear about what I mean by "no index support." It is not that someone forgot to create an index. It is that LIKE '%term%' — with the leading wildcard — is structurally incompatible with B-tree indexes. PostgreSQL's query planner does not choose to ignore your index. It cannot use it. The data structure does not support the operation. We shall examine exactly why in a moment.

PostgreSQL has had proper full-text search since version 8.3, released in 2008. It includes lexical parsing, language-aware stemming, stop word elimination, weighted ranking, and a dedicated index type (GIN) built specifically for this workload. None of it gets used when Scout's database driver is in charge.

This article covers the complete replacement: from LIKE to tsvector, from sequential scans to GIN indexes, from no ranking to weighted ts_rank. With benchmarks, migration code, trigger-based indexing for the cases where generated columns are not sufficient, and the honest counterpoints about when PostgreSQL full-text search is not enough and you genuinely need Elasticsearch or Meilisearch.

If you will permit me — and I do hope you will, as this is rather important — we shall begin with the query that prompted your visit.

What Scout's database driver actually sends to PostgreSQL

Before fixing anything, you should see the problem clearly. When you call Product::search('wireless headphones') with the database driver, this is what PostgreSQL receives:

Scout's database driver — EXPLAIN ANALYZE output
-- What Laravel Scout's database driver actually sends to PostgreSQL:
SELECT * FROM "products"
WHERE "name" LIKE '%wireless headphones%'
   OR "description" LIKE '%wireless headphones%'
ORDER BY "updated_at" DESC
LIMIT 20;

-- On a table with 500,000 products:
-- Seq Scan on products  (cost=0.00..28431.00 rows=2847 width=312)
--   Filter: ((name ~~ '%wireless headphones%') OR (description ~~ '%wireless headphones%'))
--   Rows Removed by Filter: 497153
--   Planning Time: 0.14 ms
--   Execution Time: 342.71 ms
--
-- Every row. Every time. 342ms per search.

342 milliseconds. Every search. On a 500,000-row table that is, by production standards, modest.

The sequential scan is not a pessimistic estimate. It is the only possible plan. PostgreSQL reads every row from disk (or shared buffers, if you are fortunate), evaluates the LIKE pattern against both columns, and discards everything that does not match. 497,153 rows examined. 18 rows returned. That is a selectivity of 0.004%. The query planner has no choice — but we shall demonstrate that explicitly.

Why LIKE '%term%' cannot use an index

I encounter a common assumption: "I have a B-tree index on the name column, so LIKE queries should be fast." This assumption is understandable. It is also wrong, and the reason it is wrong illuminates exactly why tsvector exists.

B-tree index vs leading wildcard — EXPLAIN ANALYZE
-- "But I have a B-tree index on the name column!"
-- Let's see what PostgreSQL does with it:

CREATE INDEX idx_products_name ON products (name);

EXPLAIN ANALYZE
SELECT * FROM products WHERE name LIKE '%wireless headphones%';

-- Seq Scan on products  (cost=0.00..21934.00 rows=50 width=312)
--   Filter: ((name)::text ~~ '%wireless headphones%'::text)
--   Rows Removed by Filter: 499950
--   Planning Time: 0.08 ms
--   Execution Time: 287.44 ms

-- B-tree index: completely ignored.
-- PostgreSQL cannot binary-search for a pattern that starts with a wildcard.
-- A B-tree organizes data left-to-right — it can find 'wireless%' (prefix),
-- but '%wireless%' (infix) could appear anywhere in the string.

-- Now with a prefix match — no leading wildcard:
EXPLAIN ANALYZE
SELECT * FROM products WHERE name LIKE 'wireless%';

-- Index Scan using idx_products_name on products
--   (cost=0.42..8.44 rows=50 width=312)
--   (actual time=0.024..0.031 rows=12 loops=1)
--   Index Cond: (((name)::text >= 'wireless'::text)
--                AND ((name)::text < 'wirelesu'::text))
--   Planning Time: 0.12 ms
--   Execution Time: 0.04 ms

-- 0.04ms with a prefix. 287ms with a leading wildcard.
-- That is the fundamental problem with LIKE '%term%'.

A B-tree organizes data in sorted order, left to right. It can efficiently answer "find all strings that start with 'wireless'" because it navigates to the 'wireless' position in the tree and scans forward. That is a prefix match — LIKE 'wireless%' — and it takes 0.04 milliseconds.

But LIKE '%wireless%' asks a different question entirely: "find all strings that contain 'wireless' anywhere." The substring could start at position 1 or position 47. The B-tree has no way to navigate to "all strings containing this substring" because the tree is sorted by the beginning of the string, not by every possible interior substring. PostgreSQL must fall back to a sequential scan.

This is not a PostgreSQL limitation. It is a fundamental property of B-tree data structures. MySQL, SQL Server, Oracle — they all behave identically with leading-wildcard LIKE patterns. The data structure simply does not support the operation.

There are two solutions. The first is pg_trgm, which creates a GIN index on trigrams (three-character substrings) and can accelerate LIKE patterns to around 8 milliseconds. The second is to stop using LIKE entirely and switch to PostgreSQL's native full-text search with tsvector and tsquery. The second approach is faster, uses less disk space, and provides relevance ranking. It is the approach we shall pursue.

PostgreSQL full-text search: the mental model

If you have not worked with PostgreSQL full-text search before — and the PostgreSQL documentation on the topic is, if you will permit me, remarkably thorough, here is the core mental model. It consists of three components that work together: a document representation, a query representation, and a matching operator.

A tsvector is a preprocessed document. PostgreSQL parses text into tokens, applies language-specific stemming (so "running," "runs," and "ran" all become "run"), removes stop words ("the," "and," "is"), and stores the result as a sorted list of lexemes with position information. The tsvector is not the text — it is a compact, searchable representation of the text's meaning.

A tsquery is a preprocessed search expression. It goes through the same stemming pipeline and supports boolean operators: & (AND), | (OR), ! (NOT), and <-> (FOLLOWED BY).

The @@ operator matches a tsvector against a tsquery. A GIN index on the tsvector column makes this match fast — sublinear in the number of rows, rather than linear.

tsvector + GIN — the complete setup
-- Step 1: Add a tsvector column
ALTER TABLE products ADD COLUMN search_vector tsvector
  GENERATED ALWAYS AS (
    setweight(to_tsvector('english', coalesce(name, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(description, '')), 'B')
  ) STORED;

-- Step 2: Create a GIN index on it
CREATE INDEX idx_products_search ON products USING gin (search_vector);

-- Step 3: Query with tsquery
SELECT name,
       ts_rank(search_vector, query) AS rank
FROM products,
     to_tsquery('english', 'wireless & headphones') AS query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;

-- Index Scan using idx_products_search on products
--   (cost=0.28..12.43 rows=6 width=128)
--   (actual time=0.089..0.214 rows=18 loops=1)
--   Index Cond: (search_vector @@ '''wireless'' & ''headphon'''::tsquery)
--   Planning Time: 0.21 ms
--   Execution Time: 0.31 ms
--
-- 0.31ms. That is 1,105x faster.

0.31 milliseconds versus 342 milliseconds. Same table. Same data. Same question. The difference is that PostgreSQL now uses an inverted index to find matching rows by their lexemes, rather than scanning every row and pattern-matching against raw text.

The setweight calls assign relevance tiers to different columns. Weight 'A' is highest priority (product name), 'B' is next (description). When you call ts_rank, these weights influence the ranking score — a match in the product name ranks higher than the same match in the description.

Stemming is the key advantage over LIKE

The performance improvement alone justifies the switch. But stemming is what makes the results actually better, not merely faster.

Stemming in action — tsvector vs LIKE
-- Watch what PostgreSQL does with English text:
SELECT to_tsvector('english', 'The runners were running quickly toward the finish line');
-- Result: 'finish':7 'line':8 'quick':5 'run':1,3 'toward':6
--
-- "The" and "were" — stop words — gone.
-- "runners" and "running" — stemmed to "run".
-- "quickly" — stemmed to "quick".
-- Positions preserved: 'run' appears at positions 1 and 3.

-- Now watch what a tsquery does with the search term:
SELECT to_tsquery('english', 'running & quickly');
-- Result: 'run' & 'quick'
--
-- Same stemming. "running" becomes "run", "quickly" becomes "quick".
-- When tsvector 'run' matches tsquery 'run', the user who searched
-- for "running" finds the document about "runners". Correct behavior.

-- Compare with LIKE:
SELECT 'The runners were running quickly' LIKE '%running%';
-- true — but only for the exact substring "running".
-- Searching for "runner" would fail. "ran" would fail. "runs" would fail.
-- LIKE has no linguistic intelligence. It matches bytes, not meaning.

A user searching for "running shoes" expects to find products described as "runners' shoes," "lightweight running shoe," and "run in comfort." LIKE matches none of them. tsvector matches all of them, because it understands that "running," "runners," "run," and "shoe," "shoes" share the same root forms.

This is not artificial intelligence. It is a deterministic algorithm — the Snowball stemmer — applied consistently at both index time and query time. The same stemming applied to both sides of the match is what guarantees correct recall. It has been shipping with PostgreSQL since 2008 and processes text in microseconds.

"PostgreSQL provides tsvector for lexical full-text search, pgvector for semantic similarity search, pg_trgm for fuzzy matching, and fuzzystrmatch for phonetic matching. Four extensions. Zero additional services."

— from You Don't Need Redis, Chapter 19: The Case for Simplicity

The four tsquery constructors: choosing the right one for user input

This is where many Laravel implementations go wrong. PostgreSQL offers four functions for constructing a tsquery from text, and they behave very differently with user input. Choosing the wrong one either crashes on unexpected input or silently produces poor results.

to_tsquery vs plainto_tsquery vs phraseto_tsquery vs websearch_to_tsquery
-- PostgreSQL offers four tsquery constructors.
-- Each handles user input differently.

-- 1. to_tsquery: strict syntax. Requires explicit operators.
--    Use for programmatic queries, not raw user input.
SELECT to_tsquery('english', 'wireless & noise-cancelling');
-- 'wireless' & 'noise' & 'cancel'

-- If the user types "wireless headphones" without &, this fails:
-- SELECT to_tsquery('english', 'wireless headphones');
-- ERROR: syntax error in tsquery

-- 2. plainto_tsquery: treats all words as AND-ed terms.
--    Safe for raw user input. No syntax to get wrong.
SELECT plainto_tsquery('english', 'wireless noise cancelling headphones');
-- 'wireless' & 'nois' & 'cancel' & 'headphon'

-- 3. phraseto_tsquery: words must appear in order, adjacent.
--    Like putting quotes around a search term.
SELECT phraseto_tsquery('english', 'noise cancelling');
-- 'nois' <-> 'cancel'

-- 4. websearch_to_tsquery (PostgreSQL 11+): Google-style syntax.
--    Supports quotes, OR, and negation with -.
--    This is what your users actually expect.
SELECT websearch_to_tsquery('english', '"noise cancelling" headphones -bluetooth');
-- 'nois' <-> 'cancel' & 'headphon' & !'bluetooth'

-- websearch_to_tsquery handles every input safely.
-- No syntax errors. No injection. Users can type naturally.

For most Laravel Scout implementations, websearch_to_tsquery is the correct choice. It handles arbitrary user input without syntax errors, supports the operators users expect from search boxes (quoted phrases, OR, negation with -), and requires PostgreSQL 11 or later — which is any version you should be running in production today.

If you are on PostgreSQL 10 or earlier — and I would gently suggest that upgrading should be a priority — use plainto_tsquery. It is safe with arbitrary input and produces reasonable AND queries, but it does not support quoted phrases or negation.

Reserve to_tsquery for programmatic use, where your code constructs the query with explicit operators. Never pass raw user input to to_tsquery. It will throw a syntax error the first time someone types a space.

The benchmark table

Numbers settle arguments. Here are the four approaches measured on the same table — 1 million products, English text, querying for 'bluetooth & speaker & portable':

MethodIndex typeQuery timeIndex sizeBuild timeRelevance ranking
LIKE (Scout default)None342 ms0 MB0 sNo
LIKE with trigramGIN (pg_trgm)8.4 ms312 MB22 sNo
tsvector + tsqueryGIN1.4 ms128 MB14 sYes (ts_rank)
tsvector + tsqueryGiST5.1 ms214 MB49 sYes (ts_rank)

Three things worth noting. First, GIN is 3.6x faster than GiST for full-text search lookups, while using 40% less disk space. GiST has its uses — nearest-neighbor queries, range overlap checks — but for @@ matching, GIN is the correct choice. (The thorough comparison of all PostgreSQL index types is worth consulting if you would like to understand where GIN and GiST each excel.)

Second, the trigram approach (pg_trgm) is 40x faster than raw LIKE but still 6x slower than tsvector, and it produces no relevance ranking. It is useful for fuzzy matching and autocomplete, not for search results pages.

Third, the index sizes are meaningful. At 128 MB for GIN on a million-row table, this is a nontrivial data structure. It earns its keep by turning 342ms scans into 1.4ms lookups, but you should be aware of the disk and write-amplification costs. We shall discuss GIN's internal mechanics — including the pending list that controls write performance — in its own section.

Wiring Scout to use tsvector

Scout's architecture makes it straightforward to replace the database driver. Community packages like laravel-scout-postgres provide a driver that generates proper tsvector/tsquery SQL instead of LIKE clauses.

Installation and configuration
# Install a community package that provides tsvector support
composer require devnll/laravel-scout-postgres

# In config/scout.php:
return [
    'driver' => 'pgsql',

    'pgsql' => [
        'connection' => 'pgsql',
        'maintain_index' => true,
        'config' => 'english',
    ],
];

Your model keeps the Searchable trait. The toSearchableArray method defines which columns feed into the tsvector. The searchableOptions method maps columns to weight tiers.

app/Models/Product.php
<?php

namespace App\Models;

use Illuminate\Database\Eloquent\Model;
use Laravel\Scout\Searchable;

class Product extends Model
{
    use Searchable;

    /**
     * Get the indexable data array for the model.
     * These columns will be converted to tsvector.
     */
    public function toSearchableArray()
    {
        return [
            'name' => $this->name,
            'description' => $this->description,
            'category' => $this->category?->name,
            'brand' => $this->brand,
        ];
    }

    /**
     * Weight configuration — 'A' is highest relevance, 'D' is lowest.
     */
    public function searchableOptions()
    {
        return [
            'column_weights' => [
                'name' => 'A',
                'brand' => 'B',
                'category' => 'C',
                'description' => 'D',
            ],
        ];
    }
}

Your controller code does not change at all. That is the value of Scout's driver abstraction — the search API stays identical:

app/Http/Controllers/SearchController.php
<?php

namespace App\Http\Controllers;

use App\Models\Product;
use Illuminate\Http\Request;

class SearchController extends Controller
{
    public function search(Request $request)
    {
        $query = $request->validate([
            'q' => 'required|string|min:2|max:200',
        ]);

        $results = Product::search($query['q'])
            ->take(25)
            ->get();

        return view('search.results', [
            'products' => $results,
            'query' => $query['q'],
        ]);
    }
}

Same Product::search() call. Same ->take(25)->get() chain. But now PostgreSQL receives a tsquery matched against a GIN-indexed tsvector instead of a pair of LIKE clauses matched against nothing.

The alternative: skip Scout entirely

I should mention the option that Scout's documentation, understandably, does not emphasize. If your application only searches PostgreSQL — no Algolia, no Meilisearch, no multi-engine strategy — you may not need Scout at all. A well-written Eloquent scope gives you the same result with less indirection.

app/Models/Product.php — native scope approach
<?php

namespace App\Models;

use Illuminate\Database\Eloquent\Builder;
use Illuminate\Database\Eloquent\Model;

class Product extends Model
{
    /**
     * Full-text search scope using native PostgreSQL tsvector/tsquery.
     * Use this if you prefer to skip Scout entirely and talk
     * to PostgreSQL directly.
     */
    public function scopeSearch(Builder $query, string $term): Builder
    {
        // websearch_to_tsquery handles arbitrary user input safely.
        // No syntax errors, no injection, no escaping needed.
        return $query
            ->whereRaw("search_vector @@ websearch_to_tsquery('english', ?)", [$term])
            ->selectRaw("*, ts_rank(search_vector, websearch_to_tsquery('english', ?)) AS rank", [$term])
            ->orderByDesc('rank');
    }

    /**
     * Highlighted search results using ts_headline.
     */
    public function scopeSearchWithHighlights(Builder $query, string $term): Builder
    {
        return $query
            ->whereRaw("search_vector @@ websearch_to_tsquery('english', ?)", [$term])
            ->selectRaw("
                *,
                ts_rank(search_vector, websearch_to_tsquery('english', ?)) AS rank,
                ts_headline('english', description,
                    websearch_to_tsquery('english', ?),
                    'StartSel=<mark>, StopSel=</mark>, MaxWords=35, MinWords=15'
                ) AS highlighted_description
            ", [$term, $term])
            ->orderByDesc('rank');
    }
}

// Usage:
// $results = Product::search('wireless headphones')->take(25)->get();
// $results = Product::searchWithHighlights('wireless headphones')->take(25)->get();

This approach has one significant advantage: you control the exact SQL. You can use websearch_to_tsquery, ts_headline for highlighted snippets, ts_rank_cd for cover density ranking — features that a Scout driver may or may not expose. The trade-off is that if you later need to add Algolia or Meilisearch, you lose Scout's driver-swappable abstraction.

My recommendation: if PostgreSQL is your only search backend and you want full control over the query, the scope approach is simpler and more transparent. If you might add a third-party search engine later, keep Scout for the abstraction and accept the driver's limitations.

The migration: generated columns vs triggers

You have two options for maintaining the tsvector column. Each has a clear use case, and choosing the wrong one creates either unnecessary complexity or unnecessary limitations.

Option 1: Generated columns (PostgreSQL 12+)

A GENERATED ALWAYS AS ... STORED column is computed automatically on every insert and update. No trigger. No maintenance code. The database handles it.

database/migrations/create_search_vector.php
<?php

use Illuminate\Database\Migrations\Migration;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Support\Facades\DB;
use Illuminate\Support\Facades\Schema;

return new class extends Migration
{
    public function up(): void
    {
        // Add the tsvector column
        DB::statement("
            ALTER TABLE products ADD COLUMN search_vector tsvector
            GENERATED ALWAYS AS (
                setweight(to_tsvector('english', coalesce(name, '')), 'A') ||
                setweight(to_tsvector('english', coalesce(brand, '')), 'B') ||
                setweight(to_tsvector('english', coalesce(description, '')), 'D')
            ) STORED
        ");

        // Create the GIN index
        DB::statement("
            CREATE INDEX idx_products_search_vector
            ON products USING gin (search_vector)
        ");
    }

    public function down(): void
    {
        Schema::table('products', function (Blueprint $table) {
            $table->dropColumn('search_vector');
        });
    }
};

Generated columns are the right choice when: all the text lives in the same table, the transformation is a pure function of existing columns, and you are on PostgreSQL 12 or later. Most Laravel applications meet all three conditions.

The constraint that matters: a generated column's expression can only reference columns from the same table. You cannot join in data from a related table. If your search vector needs the category name from a categories table, a generated column cannot express that. You need a trigger.

Option 2: Trigger-based indexing

Triggers are necessary when: the tsvector needs data from related tables (join data into the search vector), the logic includes conditional weighting, or you need to run on PostgreSQL 11 or earlier.

Trigger-based tsvector maintenance — single table
-- Alternative: trigger-based tsvector maintenance
-- Use this when you need to index related table data
-- or when GENERATED ALWAYS cannot express your logic.

-- 1. Add a plain tsvector column (not generated)
ALTER TABLE products ADD COLUMN search_vector tsvector;

-- 2. Create the trigger function
CREATE OR REPLACE FUNCTION products_search_vector_update()
RETURNS trigger AS $$
BEGIN
    NEW.search_vector :=
        setweight(to_tsvector('english', coalesce(NEW.name, '')), 'A') ||
        setweight(to_tsvector('english', coalesce(NEW.brand, '')), 'B') ||
        setweight(to_tsvector('english', coalesce(NEW.description, '')), 'D');
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

-- 3. Attach the trigger
CREATE TRIGGER trig_products_search_vector
    BEFORE INSERT OR UPDATE OF name, brand, description
    ON products
    FOR EACH ROW
    EXECUTE FUNCTION products_search_vector_update();

-- 4. Backfill existing rows
UPDATE products SET search_vector =
    setweight(to_tsvector('english', coalesce(name, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(brand, '')), 'B') ||
    setweight(to_tsvector('english', coalesce(description, '')), 'D');

-- 5. Create the GIN index
CREATE INDEX idx_products_search_vector
    ON products USING gin (search_vector);

The BEFORE INSERT OR UPDATE OF name, brand, description clause is important. It restricts the trigger to fire only when the indexed columns change. An update to price or stock_count will not rebuild the search vector. Without that OF clause, every update to any column triggers a recomputation — wasteful on write-heavy tables.

When the search vector needs related data

This is where triggers earn their keep over generated columns. If your product search should include the category name, and categories live in a separate table, you need a trigger that fetches from the relationship — and a second trigger on the related table to rebuild search vectors when the related data changes.

Trigger-based tsvector with related table data
-- Trigger that indexes data from a related table.
-- This is the case GENERATED ALWAYS cannot handle.

CREATE OR REPLACE FUNCTION products_search_vector_update()
RETURNS trigger AS $$
DECLARE
    category_name TEXT;
BEGIN
    -- Fetch the category name from the related table
    SELECT name INTO category_name
    FROM categories
    WHERE id = NEW.category_id;

    NEW.search_vector :=
        setweight(to_tsvector('english', coalesce(NEW.name, '')), 'A') ||
        setweight(to_tsvector('english', coalesce(NEW.brand, '')), 'B') ||
        setweight(to_tsvector('english', coalesce(category_name, '')), 'C') ||
        setweight(to_tsvector('english', coalesce(NEW.description, '')), 'D');
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

-- The trade-off: this trigger runs a SELECT on every product insert/update.
-- On a table with 10,000 writes per day, that is 10,000 additional
-- single-row lookups on the categories table. With a primary key index
-- on categories.id, each lookup costs ~0.02ms. Negligible.
--
-- But if the category name changes? The products table does not know.
-- You need a second trigger on the categories table:

CREATE OR REPLACE FUNCTION categories_name_update()
RETURNS trigger AS $$
BEGIN
    -- Rebuild search vectors for all products in this category
    UPDATE products SET name = name WHERE category_id = NEW.id;
    -- The SET name = name is a no-op on data, but fires the
    -- products trigger which rebuilds search_vector.
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trig_categories_name_update
    AFTER UPDATE OF name ON categories
    FOR EACH ROW
    EXECUTE FUNCTION categories_name_update();

The cascading trigger pattern — where a change on the categories table fires an update on the products table, which fires the products trigger to rebuild the search vector — is admittedly more complex than a generated column. It is also the only correct approach. A search vector that goes stale when the category name changes produces wrong results, which is worse than complexity.

Backfilling existing rows without locking the table

The backfill step (the bare UPDATE) populates existing rows. On a large table, doing this in a single transaction locks the table, generates a massive WAL spike, and may exhaust temp_buffers. Batch it instead.

Batched backfill strategies
-- Backfilling a million rows in one UPDATE locks the table
-- and generates a massive WAL spike. Batch it instead.

-- Option 1: Pure SQL batching
DO $$
DECLARE
    batch_size INT := 10000;
    total_rows INT;
    processed INT := 0;
BEGIN
    SELECT count(*) INTO total_rows FROM products WHERE search_vector IS NULL;
    RAISE NOTICE 'Backfilling % rows in batches of %', total_rows, batch_size;

    LOOP
        UPDATE products
        SET name = name  -- triggers the search_vector update
        WHERE id IN (
            SELECT id FROM products
            WHERE search_vector IS NULL
            LIMIT batch_size
        );

        GET DIAGNOSTICS processed = ROW_COUNT;
        EXIT WHEN processed = 0;

        RAISE NOTICE 'Processed % rows', processed;
        COMMIT;
    END LOOP;
END;
$$;

-- Option 2: Laravel artisan command for backfilling
-- php artisan scout:import "App\Models\Product"
-- Scout's import command processes models in chunks of 500 by default.
-- For tsvector-based drivers, this translates to batched UPDATEs.

On a million-row table, the batched approach takes longer in wall-clock time — perhaps 3 minutes instead of 45 seconds — but it does not hold a table-level lock, does not bloat the WAL into the gigabytes, and does not make your DBA send you a pointed Slack message at 2 AM. A worthwhile trade-off.

Highlighting search results with ts_headline

A search results page that shows matching products without indicating where the match occurred is a missed opportunity. Users expect to see their search terms highlighted in the results. PostgreSQL provides ts_headline for exactly this purpose.

ts_headline — highlighted search snippets
-- ts_headline generates a text snippet with search terms highlighted.
-- This is what users expect from a search results page.

SELECT
    name,
    ts_rank(search_vector, query) AS rank,
    ts_headline(
        'english',
        description,
        query,
        'StartSel=<mark>, StopSel=</mark>, MaxWords=35, MinWords=15, ShortWord=3'
    ) AS snippet
FROM products,
     websearch_to_tsquery('english', 'noise cancelling headphones') AS query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;

-- Result:
-- name: "Sony WH-1000XM5"
-- rank: 0.6079
-- snippet: "...industry-leading <mark>noise</mark> <mark>cancelling</mark>
--           with two processors. Comfortable <mark>headphones</mark> with
--           30-hour battery life..."

-- The options explained:
-- StartSel/StopSel: HTML tags wrapping matched terms.
-- MaxWords: maximum words in the snippet (prevents massive excerpts).
-- MinWords: minimum words (prevents absurdly short fragments).
-- ShortWord: minimum word length to highlight (avoids highlighting "a", "an").

-- Performance note: ts_headline is the most expensive part of the query.
-- It re-parses the original text column (not the tsvector) on every
-- matching row. On 20 results, this is negligible. On 10,000 results
-- without a LIMIT, it becomes a bottleneck.
-- Always apply your LIMIT before ts_headline processes the text.

A word of caution about performance. ts_headline is the most expensive operation in the full-text search toolkit, because it re-parses the original text column — not the tsvector — for every matching row. On a search returning 20 results, the overhead is negligible. On a search returning 10,000 results before pagination, it becomes the bottleneck.

The solution is to apply ts_headline only to the rows that will actually be displayed. In practice, this means your query should filter and sort first, apply LIMIT, and only then compute the headline. PostgreSQL's query planner generally handles this correctly when the LIMIT is in place, but if you are constructing the query through Eloquent, verify with EXPLAIN ANALYZE that the headline computation does not precede the limit.

Prefix search for autocomplete

The search box on your e-commerce site does not wait for the user to finish typing. It suggests results as they type. This requires prefix matching — the user has typed "head" and you need to find "headphones," "headset," and "headband."

Prefix matching with the :* operator
-- Prefix search: autocomplete as-you-type.
-- The user has typed "head" — find all products starting with "head*".

-- Method 1: tsquery prefix operator (the :* suffix)
SELECT name, ts_rank(search_vector, query) AS rank
FROM products,
     to_tsquery('english', 'head:*') AS query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 10;

-- Matches: "headphones", "headset", "headband", "headlamp"
-- The :* turns "head" into a prefix match on the lexeme.

-- Method 2: Combine prefix with other terms
SELECT name, ts_rank(search_vector, query) AS rank
FROM products,
     to_tsquery('english', 'wireless & head:*') AS query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 10;

-- Matches: products with "wireless" AND any word starting with "head"

-- Method 3: websearch_to_tsquery does NOT support prefix matching.
-- For autocomplete, you need to construct the tsquery manually.
-- A common pattern in Laravel:

-- In your search scope:
-- $words = explode(' ', $term);
-- $lastWord = array_pop($words);
-- $prefix = implode(' & ', $words) . ' & ' . $lastWord . ':*';
-- ->whereRaw("search_vector @@ to_tsquery('english', ?)", [$prefix])

The :* suffix on a tsquery term is the prefix operator. It matches any lexeme that starts with the given prefix. Combined with the GIN index, this is fast — typically under 2 milliseconds, even on large tables.

The practical challenge is that websearch_to_tsquery does not support the prefix operator. For autocomplete, you need to construct the tsquery manually. The common pattern is to AND together all complete words, then append :* to the last (incomplete) word. This gives you "find products matching all the complete words AND any word starting with the last typed characters."

For the autocomplete use case specifically, you may also want to consider pg_trgm with a GIN index, which provides fuzzy matching and handles typos — something tsvector does not do. The two approaches are not mutually exclusive: use tsvector for the main search results page and pg_trgm for the autocomplete dropdown.

Tuning ts_rank: normalization, weights, and cover density

The default ts_rank call with no normalization flag biases toward longer documents. A 5,000-word product description will rank higher than a 20-word product name, even if the name is a better match. For product search, that is backwards.

ts_rank normalization flags
-- ts_rank normalization flags control how document length
-- affects ranking. The default (0) does no normalization,
-- which biases toward longer documents.

-- Flag 0: no normalization (default) — long docs rank higher
SELECT name, ts_rank(search_vector, query) AS rank
FROM products, to_tsquery('english', 'organic & coffee') AS query
WHERE search_vector @@ query
ORDER BY rank DESC;

-- Flag 1: divides rank by 1 + log(document length)
SELECT name, ts_rank(search_vector, query, 1) AS rank
FROM products, to_tsquery('english', 'organic & coffee') AS query
WHERE search_vector @@ query
ORDER BY rank DESC;

-- Flag 2: divides rank by document length
-- Good for comparing short product names vs long descriptions
SELECT name, ts_rank(search_vector, query, 2) AS rank
FROM products, to_tsquery('english', 'organic & coffee') AS query
WHERE search_vector @@ query
ORDER BY rank DESC;

-- Flag 32: divides rank by itself + 1 (normalizes to 0..1 range)
SELECT name, ts_rank(search_vector, query, 32) AS rank
FROM products, to_tsquery('english', 'organic & coffee') AS query
WHERE search_vector @@ query
ORDER BY rank DESC;

-- You can combine flags with bitwise OR:
-- Flag 2|32 = 34: length normalization + 0..1 range
SELECT name, ts_rank(search_vector, query, 2|32) AS rank
FROM products, to_tsquery('english', 'organic & coffee') AS query
WHERE search_vector @@ query
ORDER BY rank DESC;

The normalization flags that matter most for Laravel applications:

  • Flag 0 (default) — No normalization. Long descriptions dominate. Appropriate when document length correlates with quality (articles, blog posts).
  • Flag 1 — Divides by 1 + log(length). A gentle correction. Good starting point for mixed-length content.
  • Flag 2 — Divides by document length. Strongly favors short, precise matches. Appropriate for product catalogs where the name match is more important than description match.
  • Flag 32 — Normalizes to the 0..1 range. Useful when you need to combine text rank with other scoring signals (popularity, recency, boost factors).

You can combine flags with bitwise OR. ts_rank(vector, query, 2|32) gives you length normalization and a 0..1 range. Experiment with your actual data — the right normalization depends entirely on what your users expect "best match" to mean.

ts_rank vs ts_rank_cd: frequency vs proximity

PostgreSQL offers two ranking functions, and the difference matters more than most implementations acknowledge.

ts_rank (frequency) vs ts_rank_cd (cover density)
-- ts_rank vs ts_rank_cd: two ranking algorithms.

-- ts_rank: frequency-based. How many times do the query terms
-- appear in the document? More matches = higher rank.
SELECT name, ts_rank(search_vector, query) AS rank
FROM products, to_tsquery('english', 'wireless & headphones') AS query
WHERE search_vector @@ query
ORDER BY rank DESC LIMIT 5;

-- ts_rank_cd: cover density ranking. How close together are
-- the query terms in the document? Closer = higher rank.
SELECT name, ts_rank_cd(search_vector, query) AS rank
FROM products, to_tsquery('english', 'wireless & headphones') AS query
WHERE search_vector @@ query
ORDER BY rank DESC LIMIT 5;

-- In practice:
-- "wireless bluetooth noise-cancelling headphones" — rank_cd lower
--   because "wireless" and "headphones" are far apart
-- "wireless headphones with premium sound" — rank_cd higher
--   because "wireless" and "headphones" are adjacent

-- For product search: ts_rank_cd generally produces better results
-- because products with the search terms in the title (close together)
-- are usually more relevant than products mentioning the terms
-- scattered across a long description.

ts_rank counts how many times the query terms appear in the document. A product description mentioning "wireless" twelve times ranks higher than one mentioning it once, regardless of context. ts_rank_cd — cover density ranking — measures how close together the query terms appear. A product titled "Wireless Headphones" ranks higher than a product with "wireless" in paragraph one and "headphones" in paragraph four.

For product search, where the title is the most important signal and titles are short, ts_rank_cd generally produces better results. For blog search or document search, where frequency is a reasonable proxy for relevance, ts_rank is appropriate. Test both with your actual data. The correct choice is the one that puts the right product at the top of the list.

GIN vs GiST for full-text search: a closer look

Both GIN and GiST can index tsvector columns. The choice is not a matter of preference — it is a matter of workload.

GIN vs GiST benchmark — 1M rows
-- Benchmark: GIN vs GiST for full-text search
-- Table: 1 million products, tsvector column, English config

-- GIN index creation:
CREATE INDEX idx_gin ON products USING gin (search_vector);
-- CREATE INDEX — 14.2 seconds
-- Index size: 128 MB

-- GiST index creation:
CREATE INDEX idx_gist ON products USING gist (search_vector);
-- CREATE INDEX — 48.7 seconds
-- Index size: 214 MB

-- Query: to_tsquery('english', 'bluetooth & speaker & portable')

-- With GIN:
--   Bitmap Index Scan on idx_gin
--     (actual time=0.18..0.18 rows=342 loops=1)
--   Total query time: 1.4 ms

-- With GiST:
--   Index Scan using idx_gist on products
--     (actual time=0.52..4.87 rows=342 loops=1)
--   Total query time: 5.1 ms

-- GIN: 3.6x faster for lookups. Smaller index. Slower to build.

GIN wins on read performance. GIN builds an inverted index — a mapping from each lexeme to the list of rows containing it. A tsquery match is a set intersection across posting lists. This is fast: O(n) in the number of matching terms, not in the number of rows.

GiST wins on write performance. GiST uses a balanced tree of lossy signatures. Updates are faster because the tree restructuring is cheaper than maintaining posting lists. GiST also supports the <-> distance operator for nearest-neighbor searches on tsvector, which GIN does not.

For search-behind-a-search-bar — the typical Scout use case — GIN is the answer. The workload is read-heavy (many searches per second) and write-light (products change infrequently). The 3.6x read performance advantage outweighs the higher write cost by a comfortable margin.

GiST makes sense for: tables with very high write throughput where search is infrequent, or when you need the distance operator for relevance-based nearest-neighbor queries. In a Laravel Scout context, that is rare.

GIN's pending list: the hidden write optimization

GIN indexes have an internal mechanism that is worth understanding, particularly if your product catalog receives frequent updates.

GIN pending list — controlling write vs read trade-offs
-- GIN indexes have a "pending list" that batches writes.
-- This is controlled by the gin_pending_list_limit setting.

-- Check the current setting:
SHOW gin_pending_list_limit;
-- Default: 4MB

-- The pending list is a buffer. When you INSERT or UPDATE a row,
-- the new tsvector entries go into the pending list instead of
-- immediately updating the main GIN tree. When the list reaches
-- gin_pending_list_limit, PostgreSQL merges it into the index.

-- Why this matters:
-- 1. Writes are fast — appending to a list is O(1)
-- 2. Reads may be slower — PostgreSQL must scan the pending list
--    on every query, in addition to the main index
-- 3. VACUUM triggers a merge — so frequent VACUUMs keep the
--    pending list short

-- For read-heavy search (the Scout use case), a smaller pending
-- list means more frequent merges but faster reads:
ALTER INDEX idx_products_search_vector SET (gin_pending_list_limit = '1MB');

-- For write-heavy tables where search is infrequent, a larger
-- pending list reduces merge overhead:
ALTER INDEX idx_products_search_vector SET (gin_pending_list_limit = '8MB');

-- To force-merge the pending list right now:
SELECT gin_clean_pending_list('idx_products_search_vector');

The pending list is GIN's answer to the "fast writes vs fast reads" tension. Instead of immediately updating the inverted index on every insert — which requires modifying posting lists across multiple pages — GIN appends new entries to a flat list. This makes writes fast. When the list is full (controlled by gin_pending_list_limit), PostgreSQL batch-merges it into the main index structure.

The trade-off: every read query must scan both the main index and the pending list. A large pending list means slower reads. For a search bar serving 50 queries per second, you want the pending list small — 1 MB or even 512 KB. For a batch import job loading 100,000 products, you want it large — 8 MB or more — to reduce the number of expensive merges.

After a bulk import, call gin_clean_pending_list or run VACUUM to force-merge the pending list before users start searching. This is the kind of detail that separates "full-text search works" from "full-text search works well."

Multi-language search configuration

PostgreSQL ships with full-text search configurations for 15 languages. Each configuration includes language-specific stemming rules and stop word lists. The default 'english' config stems "running" to "run" and drops "the" — behavior that is wrong for French text, where stemming rules differ entirely.

Multi-language tsvector configuration
-- Multi-language full-text search configuration

-- Option 1: Use 'simple' config (no stemming, no stop words)
-- Works across all languages but loses linguistic intelligence
SELECT to_tsvector('simple', 'Les chaussures de course rapides');
-- 'chaussures':2 'course':4 'de':3 'les':1 'rapides':5

-- Option 2: Create a per-row language config
ALTER TABLE products ADD COLUMN fts_config regconfig DEFAULT 'english';

-- Set per product:
UPDATE products SET fts_config = 'french' WHERE locale = 'fr';
UPDATE products SET fts_config = 'german' WHERE locale = 'de';
UPDATE products SET fts_config = 'spanish' WHERE locale = 'es';

-- Use it in the trigger:
CREATE OR REPLACE FUNCTION products_search_vector_update()
RETURNS trigger AS $$
BEGIN
    NEW.search_vector :=
        setweight(to_tsvector(NEW.fts_config, coalesce(NEW.name, '')), 'A') ||
        setweight(to_tsvector(NEW.fts_config, coalesce(NEW.description, '')), 'D');
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

-- PostgreSQL ships with configs for:
-- danish, dutch, english, finnish, french, german, hungarian,
-- italian, norwegian, portuguese, romanian, russian, simple,
-- spanish, swedish, turkish
--
-- Each config includes language-specific stemming and stop words.

For applications serving a single language, set the config once and forget it. For multi-language catalogs, the per-row regconfig column approach ensures each product is stemmed according to its own language.

One subtlety: the same regconfig must be used at both index time and query time. If you index a French product with to_tsvector('french', name) and search it with to_tsquery('english', term), stemming will disagree and matches will be missed. The query must know which config was used for each row, or you must use 'simple' for both — sacrificing stemming for consistency.

For most Laravel applications with a single primary language, the generated column approach with a hardcoded config is sufficient and far simpler. Multi-language search adds real complexity, and it should be adopted because the product genuinely serves multiple languages, not because someone imagined it might someday.

The honest counterpoints: when tsvector is not enough

I have spent considerable time explaining why PostgreSQL's full-text search is excellent, and I stand by all of it. But a waiter who overstates his case is no waiter at all, so here are the gaps, frankly stated.

Typo tolerance. PostgreSQL's tsvector does not handle misspellings. A user searching for "headphnes" will get zero results. Elasticsearch, Meilisearch, and Typesense all provide fuzzy matching that tolerates typos. If your users frequently misspell search terms — and on mobile, they do — this is a genuine limitation. The partial workaround is pg_trgm with a similarity threshold, but it does not integrate cleanly with tsvector ranking.

Faceted search. If your search results page needs "filter by brand, filter by price range, filter by rating" alongside full-text results — the kind of faceted navigation that Shopify and Amazon provide — PostgreSQL can do it with additional WHERE clauses and aggregations, but it requires more manual work than Elasticsearch's built-in aggregation framework or Meilisearch's facet API. The queries are still fast with proper indexes, but the developer experience is less polished.

Synonyms and custom dictionaries. PostgreSQL supports custom dictionaries and thesaurus configurations, but configuring them requires creating shared library files on the database server — not something you can do in a Laravel migration. Elasticsearch and Meilisearch let you configure synonyms through their APIs. If "laptop" should also match "notebook" and "portable computer," the dedicated search engines make this easier.

Very large document collections. On tables with 50 million rows and complex tsvector configurations, GIN index maintenance and query performance remain good, but you may start to feel the edges — particularly around index rebuild times and WAL volume during bulk updates. Elasticsearch distributes this workload across a cluster. PostgreSQL does not, unless you are running Citus or a similar extension.

My position, which I offer with no commercial interest: if your application has under 10 million searchable rows, a single primary language, and users who are tolerant of exact-match-only search, PostgreSQL's full-text search is the right choice. You avoid an external service, its operational overhead, its synchronization complexity, and its cost. If you need typo tolerance, faceted navigation, synonym expansion, or you are operating at a scale where a single PostgreSQL instance is the bottleneck, a dedicated search engine earns its keep.

The wrong choice is the one made without measurement. If you have not benchmarked PostgreSQL full-text search on your actual data with your actual queries, you do not yet know whether it is sufficient. Start there. For a comprehensive comparison of what PostgreSQL search can and cannot do relative to Elasticsearch, see the PostgreSQL vs Elasticsearch comparison.

Monitoring full-text search performance in production

After deploying tsvector-based search, verify that the old LIKE queries are gone and the new ones are performing as expected. The deployment is not complete until you have confirmed it in production metrics.

pg_stat_statements — finding search queries
-- Monitor your full-text search performance in production
SELECT query,
       calls,
       round(mean_exec_time::numeric, 2) AS avg_ms,
       round(total_exec_time::numeric, 0) AS total_ms
FROM pg_stat_statements
WHERE query LIKE '%@@%to_tsquery%'
   OR query LIKE '%LIKE%'
ORDER BY total_exec_time DESC
LIMIT 15;

-- If you see LIKE queries with high call counts
-- alongside @@ queries, you still have Scout's
-- default driver running somewhere.

Three things to watch:

  1. Lingering LIKE queries. If you see LIKE '%term%' queries alongside your @@ queries, Scout's database driver is still active on some model or some code path. Find and replace it. This is more common than you might expect — a second model using Searchable that was not updated, or a manual query in a controller that predates the migration.
  2. GIN index usage. Check pg_stat_user_indexes for your GIN index. The idx_scan column should be climbing steadily. If it is zero, your queries are not hitting the index — likely a config mismatch between the tsvector and tsquery.
  3. Index bloat. GIN indexes on frequently-updated columns can accumulate dead entries. Monitor pg_stat_user_indexes.idx_tup_read versus idx_tup_fetch. A growing gap suggests bloat. A REINDEX CONCURRENTLY will reclaim the space without downtime.
Verifying GIN index usage and detecting bloat
-- Verify your GIN index is actually being used
SELECT
    schemaname,
    relname AS table_name,
    indexrelname AS index_name,
    idx_scan AS times_used,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
    idx_tup_read AS rows_returned,
    idx_tup_fetch AS rows_fetched
FROM pg_stat_user_indexes
WHERE indexrelname LIKE '%search%'
ORDER BY idx_scan DESC;

-- What to check:
-- idx_scan > 0: the index is being used. Good.
-- idx_scan = 0: the index exists but nothing uses it.
--   Possible causes: config mismatch between tsvector and tsquery,
--   LIKE queries still running instead of @@ queries,
--   or the table is small enough that PostgreSQL prefers a seq scan.

-- Check for index bloat:
SELECT
    indexrelname,
    pg_size_pretty(pg_relation_size(indexrelid)) AS current_size,
    idx_tup_read,
    idx_tup_fetch,
    CASE WHEN idx_tup_read > 0
         THEN round(100.0 * idx_tup_fetch / idx_tup_read, 1)
         ELSE 0
    END AS fetch_ratio_pct
FROM pg_stat_user_indexes
WHERE indexrelname LIKE '%search%';

-- A fetch_ratio declining over time suggests growing bloat.
-- Fix with: REINDEX INDEX CONCURRENTLY idx_products_search_vector;
-- CONCURRENTLY rebuilds the index without locking the table.

I would recommend checking these metrics weekly for the first month after deploying tsvector search, then monthly thereafter. GIN index bloat tends to be gradual — you will not notice it until query latency has drifted from 1.4 ms to 8 ms, at which point a REINDEX CONCURRENTLY returns it to form.

What Gold Lapel does with full-text search patterns

Gold Lapel sits between your Laravel application and PostgreSQL. It sees every query — including the ones Scout generates.

When Gold Lapel detects LIKE '%term%' patterns hitting the same columns repeatedly, it creates pg_trgm GIN indexes to accelerate them. That alone drops your search latency from 342ms to around 8ms, with no query changes — just add the Gold Lapel package to your project.

But the real optimization happens when it detects tsvector/tsquery patterns. If your application is already using PostgreSQL full-text search but lacks a GIN index on the tsvector column — because the migration was forgotten, or the column was added without one, or someone created a GiST index when GIN would have been 3.6x faster — Gold Lapel creates the right index automatically.

It also monitors your existing GIN indexes for bloat and surfaces recommendations before the bloat affects query latency. The pending list size, the index-to-table ratio, the idx_scan count relative to query volume — the kind of things you would attend to yourself if you had time to check pg_stat_user_indexes every week. Which, respectfully, you do not.

Run composer require goldlapel/goldlapel-laravel — the service provider is auto-discovered. Your Scout search calls stay exactly the same. The queries arrive at PostgreSQL with proper indexing, whether you remembered to create the migration or not.

For the full Laravel integration setup, the process takes about ninety seconds.

The search bar deserves the same care as the rest of your application

There is a pattern I have observed across hundreds of Laravel applications. The authentication system is carefully tested. The payment flow is reviewed by three engineers. The admin dashboard gets its own sprint. And the search bar — the feature users interact with on every visit, the feature that directly determines whether they find what they came to buy — runs on LIKE '%term%' because that is what shipped in the prototype and nobody revisited it.

PostgreSQL full-text search is not a workaround. It is not a "good enough for now" solution while you save up for Algolia. It is a proper, battle-tested, linguistically intelligent search engine that has been part of PostgreSQL for nearly two decades. It handles stemming, ranking, weighting, phrase matching, boolean operators, and prefix search. It integrates with GIN indexes that turn millisecond queries into microsecond queries. It runs inside the database you are already paying for, on the server you are already operating, with the backups you are already taking.

The migration from Scout's database driver to tsvector takes an afternoon. The GIN index builds in seconds. The query performance improvement is three orders of magnitude. And your users — who will never know that you replaced a sequential scan with an inverted index — will simply notice that the search bar works the way they expected it to all along.

That, if you will forgive the observation, is the entire point.

Frequently asked questions

Terms referenced in this article

You may find the following worth your time: the question of whether PostgreSQL's native full-text search can truly replace a dedicated search engine is one I have examined in our comparison of Gold Lapel with Elasticsearch. The answer, as is often the case with PostgreSQL, is more encouraging than most expect.