Flyway and PostgreSQL: Why CREATE INDEX CONCURRENTLY Hangs Your Spring Boot Application at Startup
Your migration is six words long. It will never finish. The problem is not your SQL — it is the transaction Flyway wrapped around it without telling you.
Good evening. Your application did not start.
I should describe a morning that has ruined several engineering sprints. You have written a Flyway migration. It contains exactly one statement:
-- V42__add_orders_email_index.sql
-- Looks perfectly reasonable. Ships to production. Hangs forever.
CREATE INDEX CONCURRENTLY idx_orders_email
ON orders (email); You chose CREATE INDEX CONCURRENTLY because you are responsible. A regular CREATE INDEX acquires an ACCESS EXCLUSIVE lock on the table, blocking every read and write for the duration of the build. On a 200-million-row orders table, that could be minutes. CONCURRENTLY avoids that lock. It is the right tool for production index creation.
You deploy. The Spring Boot application begins its startup sequence. Flyway picks up the migration. The logs show Migrating schema "public" to version 42. And then nothing. No error. No timeout. No progress. The application hangs indefinitely. Ctrl+C is the only exit.
The Kubernetes health check begins to fail. The deployment pipeline, which has been patiently waiting for the readiness probe, eventually times out and rolls back to the previous version. Or worse — it does not, and the pod sits there, consuming a database connection, doing nothing, until someone notices the alerts. Or much worse still — there are no alerts, and the pod quietly occupies a slot in the replica set while serving zero traffic, and the on-call engineer discovers this at 3 AM when CPU looks fine but response times have doubled because one-third of the pods behind the load balancer are catatonic.
This is not a bug in your SQL. The statement is valid. It is not a bug in PostgreSQL. The database is working exactly as designed. It is a fundamental incompatibility between how Flyway manages migration transactions and how CREATE INDEX CONCURRENTLY operates internally. And it has been an open issue since at least Flyway 9.1.2, documented across five GitHub issues, the most prominent being issue #3508 with 49 thumbs-up reactions and no official fix in Flyway Community edition.
I have attended to this particular situation more times than I care to admit. What follows is exactly why it happens, how to diagnose it when you are staring at frozen logs at an unhelpful hour, and five ways to work around it — ranked by reliability.
What your logs will not tell you
Before we examine the mechanism, I should prepare you for the most frustrating aspect of this issue: the silence.
When a database query fails, PostgreSQL tells you. When a connection times out, the JDBC driver tells you. When a migration syntax error occurs, Flyway tells you. When this hang occurs, nothing tells you anything. The last log line you will see is Flyway's migration start message. There is no subsequent line. Not an error, not a warning, not a progress indicator. The application process is alive — it is consuming CPU (barely) and holding a database connection (definitely) — but it has entered a state from which it will never emerge on its own.
The Spring Boot actuator health endpoint, if it is configured to start before Flyway completes, will report the application as unhealthy — but with no specific indication that Flyway is the cause. The /actuator/health response will show {"status":"DOWN"} and nothing more. If the actuator is configured to start after Flyway (which is the default), even the health endpoint will be unreachable.
HikariCP, the default connection pool in Spring Boot, will eventually log a leak detection warning if you have leak-detection-threshold configured. This is one of the very few visible indicators that something is wrong:
# HikariCP connection pool settings that affect the hang behavior
spring:
datasource:
hikari:
maximum-pool-size: 10
connection-timeout: 30000 # 30 seconds — how long to wait for a connection
idle-timeout: 600000 # 10 minutes — how long idle connections live
max-lifetime: 1800000 # 30 minutes — maximum connection lifetime
leak-detection-threshold: 60000 # 1 minute — log warning for connections held too long
# The leak-detection-threshold is useful here: during the hang,
# HikariCP will eventually log a warning that a connection has been
# held for longer than the threshold. This is one of the few
# visible indicators that something is wrong.
#
# But it does NOT terminate the connection. It only logs a warning.
# The hang continues regardless. But even that warning is ambiguous. A connection held for longer than the leak threshold could be many things — a slow report query, a forgotten transaction in application code, a long-running batch job. There is nothing in the HikariCP warning that says "Flyway is deadlocked with PostgreSQL's concurrent index builder." You have to know to look for it.
I mention this not to alarm you but to save you time. If your Spring Boot application hangs at startup with no error and the last log line mentions Flyway, you are almost certainly experiencing this issue. Do not spend an hour checking network connectivity, database credentials, or disk space. Connect to PostgreSQL from a separate session and look at pg_stat_activity. The answer is there.
The mechanism: two things that cannot coexist
To understand the hang, you need to understand two things separately, and then watch them collide.
How Flyway runs migrations
Since version 9.1.2, Flyway uses PostgreSQL advisory locks to prevent concurrent migration runs. When your Spring Boot application starts and Flyway begins migrating, it acquires an advisory lock inside a transaction. It then runs each migration SQL file within that same transaction (or a closely related one). The transaction remains open for the duration of the migration.
This is sensible design. Advisory locks prevent two application instances from running migrations simultaneously, which would corrupt the schema history table. The transactional wrapper ensures that a failed migration can be rolled back cleanly. If your migration adds a column and the subsequent UPDATE to populate it fails, the entire migration rolls back — the column is never partially created. This is exactly the behavior you want for most DDL operations.
The critical detail: the transaction holding the advisory lock is open before your migration SQL executes and stays open until after your migration SQL completes.
You can see this advisory lock directly when the hang is occurring:
-- See the advisory lock Flyway is holding:
SELECT
l.locktype,
l.classid,
l.objid,
l.granted,
a.pid,
a.state,
a.query,
age(clock_timestamp(), a.xact_start) AS txn_duration
FROM pg_locks l
JOIN pg_stat_activity a ON a.pid = l.pid
WHERE l.locktype = 'advisory'
ORDER BY a.xact_start;
-- Typical output during a hang:
--
-- locktype | classid | objid | granted | pid | state | query
-- ----------+---------+-----------+---------+------+-------------------+---------------------------
-- advisory | 357654 | 357654321 | t | 1842 | idle in transaction | SELECT ... schema_history
--
-- The advisory lock is granted (good — no contention there).
-- But the transaction holding it is "idle in transaction" — it is
-- waiting for the CREATE INDEX CONCURRENTLY to return.
-- And it will wait forever. Prior to version 9.1.2, Flyway used a table-level lock on the flyway_schema_history table instead of advisory locks. The same fundamental problem existed — the lock was held inside a transaction that remained open during migration execution — but the specifics of the lock type were different. If you are on a Flyway version older than 9.1.2, the mechanism is similar but the diagnostic queries may show different lock types.
How CREATE INDEX CONCURRENTLY works
CREATE INDEX CONCURRENTLY cannot run inside a transaction. If you try, PostgreSQL rejects it immediately with ERROR: CREATE INDEX CONCURRENTLY cannot run inside a transaction block. Flyway knows this and, in some configurations, will execute the statement outside of the migration transaction.
But "outside the transaction" does not mean "independent of the transaction." The concurrent index build proceeds in multiple phases, and Phase 2 has an absolute requirement:
-- How CREATE INDEX CONCURRENTLY works internally:
--
-- Phase 1: Catalog update
-- PostgreSQL adds the index to pg_index with indisvalid = false.
-- This happens in its own transaction, which commits immediately.
--
-- Phase 2: First table scan
-- A new transaction begins. PostgreSQL scans the entire table
-- and builds index entries for all existing rows.
-- Before this scan can begin, PostgreSQL WAITS for all transactions
-- that started before Phase 1 to complete.
--
-- *** THIS IS WHERE FLYWAY CAUSES THE HANG ***
-- Flyway's advisory lock transaction started before Phase 1.
-- PostgreSQL is waiting for it to finish.
-- But Flyway is waiting for the CREATE INDEX to finish.
--
-- Phase 3: Second table scan
-- After Phase 2 completes, PostgreSQL does a second scan to
-- pick up any rows inserted/updated/deleted during Phase 2.
-- Again waits for all transactions from before Phase 2 to end.
--
-- Phase 4: Mark valid
-- Sets indisvalid = true in pg_index. The index is now usable.
--
-- The two-phase scan is what makes CONCURRENTLY non-blocking for
-- other queries. But Phase 2's requirement — "wait for all prior
-- transactions" — is absolute. No exceptions. No timeouts.
-- If ANY transaction that started before Phase 1 is still open,
-- Phase 2 will wait indefinitely. Phase 2 waits for every transaction that was open when Phase 1 completed to finish. Every single one. No exceptions. No timeout. The Flyway advisory lock transaction was open before Phase 1. It is still open. Phase 2 waits for it.
But Flyway is waiting for the CREATE INDEX CONCURRENTLY statement to return before it commits its transaction and releases the advisory lock. The migration is waiting for Flyway. Flyway is waiting for the migration. Neither will ever yield.
This is not technically a deadlock in the PostgreSQL sense — the deadlock detector watches for cycles in regular lock waits, not virtualxid waits. So there is no automatic detection, no timeout, no error. Just silence.
The reason PostgreSQL requires this wait is fundamental to how concurrent index building maintains consistency. A regular CREATE INDEX takes an ACCESS EXCLUSIVE lock on the table, which guarantees no other transaction can read or write the table during the build. The index sees a perfectly consistent snapshot. CONCURRENTLY avoids that lock by using a two-phase approach: build the index based on current data, then make a second pass to catch any changes that happened during the first pass. But for the first pass to produce a consistent starting point, PostgreSQL needs all prior transactions to finish first. Otherwise, rows visible to those transactions might not be visible to the index build, producing an inconsistent index.
This is not a design flaw. It is a correctness guarantee. The problem is that Flyway's transaction model creates a situation where that guarantee can never be satisfied.
The timeline, step by step
If the mechanism is still abstract, this timeline makes the circular dependency concrete:
-- Timeline of what happens when Flyway runs a CONCURRENTLY migration:
--
-- T0: Spring Boot starts. Flyway initializes.
--
-- T1: Flyway opens Connection A. Begins transaction.
-- Executes: SELECT pg_advisory_lock(hashcode)
-- The advisory lock is now held inside this transaction.
-- Transaction A is OPEN.
--
-- T2: Flyway reads flyway_schema_history table.
-- Determines V42 needs to run.
-- Transaction A is still OPEN.
--
-- T3: Flyway opens Connection B (for the actual migration).
-- Executes: CREATE INDEX CONCURRENTLY idx_orders_email ...
-- PostgreSQL begins Phase 1: adds index to pg_index.
-- Phase 1 commits in its own mini-transaction.
--
-- T4: PostgreSQL begins Phase 2.
-- Requirement: wait for ALL transactions that were open at T3.
-- Transaction A was open at T3.
-- Phase 2 WAITS for Transaction A to commit or rollback.
--
-- T5: ... waiting ...
-- Transaction A will not commit until the migration completes.
-- The migration will not complete until Phase 2 finishes.
-- Phase 2 will not finish until Transaction A commits.
--
-- T∞: Nothing happens. Ever. The application is hung. The circular dependency is established at T4 and can never be broken by the participants. Transaction A will not commit until the migration (on Connection B) returns. Connection B will not return until Phase 2 completes. Phase 2 will not complete until Transaction A commits. This is a cycle, but it is invisible to PostgreSQL's deadlock detector because one edge of the cycle is a virtualxid wait rather than a regular lock wait.
Diagnosing the hang: what to look for
When your application is stuck, connect to PostgreSQL from a separate session and run the diagnostic query:
-- When your Spring Boot app hangs at startup, run this
-- from a separate psql session to see what is happening.
SELECT
pid,
state,
wait_event_type,
wait_event,
left(query, 80) AS query_snippet,
age(clock_timestamp(), xact_start) AS xact_duration,
age(clock_timestamp(), query_start) AS query_duration
FROM pg_stat_activity
WHERE datname = 'myapp'
AND pid != pg_backend_pid()
ORDER BY xact_start;
-- You will see something like this:
--
-- pid | state | wait_event_type | wait_event | query_snippet
-- ------+--------+-----------------+---------------+-----------------------------
-- 1842 | idle in transaction | Lock | advisory lock | SELECT ... flyway_schema_history
-- 1842 | | | | (this is the Flyway lock)
-- 1843 | active | Lock | virtualxid | CREATE INDEX CONCURRENTLY ...
--
-- pid 1843 is your CREATE INDEX CONCURRENTLY, waiting on a virtual
-- transaction lock. pid 1842 is the Flyway advisory lock transaction.
-- 1843 is waiting for 1842's transaction to finish.
-- 1842 won't finish until 1843 finishes.
-- Neither will ever finish. This is your hang. You are looking for two backends. One is idle in a transaction (that is Flyway, holding its advisory lock). The other is active with a CREATE INDEX CONCURRENTLY query, waiting on a virtualxid lock. The second backend is waiting for the first backend's transaction to end. The first backend is waiting for the second backend's statement to return.
For a more detailed view of the lock dependency chain:
-- Deeper lock analysis: see the actual lock dependency
SELECT
blocked.pid AS blocked_pid,
blocked.query AS blocked_query,
blocking.pid AS blocking_pid,
blocking.query AS blocking_query
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked
ON blocked.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks
ON blocking_locks.locktype = blocked_locks.locktype
AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database
AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking
ON blocking.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;
-- In the Flyway + CONCURRENTLY case, this shows:
--
-- blocked_pid | blocked_query | blocking_pid | blocking_query
-- -------------+----------------------------------------+--------------+----------------------------
-- 1843 | CREATE INDEX CONCURRENTLY idx_orders... | 1842 | COMMIT (Flyway txn)
--
-- The CREATE INDEX CONCURRENTLY needs ALL concurrent transactions
-- to terminate. The Flyway advisory lock holds a transaction open.
-- Classic deadlock — but PostgreSQL's deadlock detector cannot see it
-- because one side is waiting on a virtualxid, not a regular lock. The virtualxid lock type is the signature. Regular lock contention between queries produces relation or transactionid lock types. The virtualxid wait is specific to operations that need to observe a consistent snapshot across all backends — and CREATE INDEX CONCURRENTLY's Phase 2 is one of the few operations that does this.
-- The specific lock that causes the hang:
-- virtualxid (Virtual Transaction ID)
-- Every PostgreSQL backend has a virtualxid, even idle ones.
-- CREATE INDEX CONCURRENTLY must wait for all virtualxids that
-- were active when it started Phase 1 to be released.
-- Check which virtualxids are blocking:
SELECT
vxid AS blocking_vxid,
pid AS blocking_pid,
left(query, 60) AS query,
state,
age(clock_timestamp(), xact_start) AS txn_age
FROM pg_locks l
JOIN pg_stat_activity a ON a.pid = l.pid
WHERE locktype = 'virtualxid'
AND granted = true
AND pid != pg_backend_pid()
ORDER BY xact_start;
-- The Flyway connection will show as "idle in transaction"
-- with a long txn_age. That is your culprit. Using pg_stat_progress_create_index
If you are running PostgreSQL 12 or later, there is a more direct way to see exactly what the index build is doing:
-- Monitor the progress of a CREATE INDEX CONCURRENTLY build
-- (PostgreSQL 12+ only — uses pg_stat_progress_create_index)
SELECT
p.pid,
p.datname,
p.command,
p.phase,
p.lockers_total,
p.lockers_done,
p.blocks_total,
p.blocks_done,
CASE WHEN p.blocks_total > 0
THEN round(100.0 * p.blocks_done / p.blocks_total, 1)
ELSE 0
END AS pct_complete,
a.query
FROM pg_stat_progress_create_index p
JOIN pg_stat_activity a ON a.pid = p.pid;
-- The 'phase' column tells you exactly where the build is:
-- 'initializing' — just started
-- 'waiting for writers before build' — Phase 2 wait (THIS IS THE HANG)
-- 'building index' — scanning the table
-- 'waiting for writers before validation' — Phase 3 wait
-- 'index validation: scanning index' — validating the build
-- 'waiting for old snapshots' — final wait
-- 'waiting for readers before marking dead' — cleanup
--
-- If phase = 'waiting for writers before build' and lockers_done
-- is stuck at a value less than lockers_total, you have the hang.
-- The lockers_total - lockers_done backends are the ones keeping
-- transactions open that the index build is waiting for. The phase column is definitive. If it reads waiting for writers before build and the lockers_done count is stuck below lockers_total, you have the Flyway hang. The difference between those two numbers tells you how many backends are holding transactions that the index build is waiting for. In the Flyway case, it will typically be exactly one — the Flyway advisory lock transaction.
This view was added in PostgreSQL 12 specifically to provide visibility into long-running index builds. Before PostgreSQL 12, the only diagnostic was pg_stat_activity and pg_locks, which required more interpretation. If you are still on PostgreSQL 11 or earlier, I would gently encourage an upgrade — the observability improvements alone are worth it, and this is far from the only situation where pg_stat_progress_* views save hours of debugging.
Why PostgreSQL's deadlock detector cannot help
PostgreSQL has a deadlock detector that runs every deadlock_timeout interval (default: 1 second). It looks for cycles in the lock wait graph: A waits for B, B waits for A. When it finds one, it cancels the youngest transaction to break the cycle.
This detector does not help here. The wait on a virtualxid is not entered into the regular lock wait graph the same way. PostgreSQL treats the virtualxid wait as a "soft" wait — the backend is waiting for another backend's transaction to end, not waiting to acquire a specific lock on a specific object. The deadlock detector's cycle detection does not traverse virtualxid waits.
The practical consequence: no error, no log message, no automatic resolution. Your application will hang until you kill it or until you manually terminate one of the PostgreSQL backends with pg_terminate_backend(). This is why the issue is so insidious — there is no indication that anything is wrong unless you know to look for it.
I should note that this is not a limitation unique to this scenario. The virtualxid wait pattern can cause similar invisible hangs in other contexts — for example, if a CREATE INDEX CONCURRENTLY runs while a long-running reporting query holds a transaction open, or while a forgotten BEGIN in a psql session has been sitting idle for hours. The Flyway case is simply the most common trigger because it happens automatically at application startup, affects every deployment, and occurs at precisely the moment when you are watching the logs most carefully and expecting them to tell you something.
The PostgreSQL documentation for CREATE INDEX CONCURRENTLY does mention this requirement: "the index build must wait for any existing transactions that could potentially modify or use the index to terminate." But it does not warn about migration tool advisory locks specifically, because from PostgreSQL's perspective, a transaction is a transaction. It does not care what opened it.
There is a reasonable argument that PostgreSQL could add a timeout parameter to CREATE INDEX CONCURRENTLY — something like CREATE INDEX CONCURRENTLY ... WITH (wait_timeout = '30s') — that would cause the build to fail rather than wait indefinitely. This has been discussed on the pgsql-hackers mailing list. As of PostgreSQL 17, no such parameter exists. The statement_timeout setting does apply to CREATE INDEX CONCURRENTLY, but setting it broadly affects all statements, which is rarely what you want. And if you set it per-migration, you need to calculate how long the index build should take, which varies with table size, I/O throughput, and concurrent load.
Emergency recovery: your application is hung right now
If you have arrived at this article because your application is currently stuck and you need to get unstuck before understanding the root cause, here is the procedure:
-- Emergency recovery: your app is hung RIGHT NOW.
-- Here is how to get unstuck without restarting PostgreSQL.
-- Step 1: Find the stuck backends
SELECT pid, state, query, age(clock_timestamp(), xact_start) AS duration
FROM pg_stat_activity
WHERE datname = 'myapp'
AND state != 'idle'
ORDER BY xact_start;
-- Step 2: Identify the CREATE INDEX CONCURRENTLY backend
-- It will show state = 'active' with the CREATE INDEX query.
-- Note its pid.
-- Step 3: Cancel the index build (graceful)
SELECT pg_cancel_backend(1843); -- Replace 1843 with actual pid
-- pg_cancel_backend sends a SIGINT — it cancels the current query
-- but keeps the connection alive. The CREATE INDEX CONCURRENTLY
-- will fail, and Flyway will receive an error.
-- Step 4: If pg_cancel_backend does not work (rare), terminate it
SELECT pg_terminate_backend(1843); -- Kills the connection entirely
-- Step 5: Check for an invalid index left behind
SELECT indexname, indisvalid
FROM pg_indexes i
JOIN pg_index pi ON pi.indexrelid = (
SELECT oid FROM pg_class WHERE relname = i.indexname
)
WHERE indexname = 'idx_orders_email';
-- If indisvalid = false, drop the invalid index:
DROP INDEX CONCURRENTLY IF EXISTS idx_orders_email;
-- Step 6: Clean up Flyway's state
-- If the migration was partially recorded, you may need to
-- remove the entry from flyway_schema_history:
DELETE FROM flyway_schema_history
WHERE version = '42'
AND success = false;
-- Now restart your application with the correct workaround in place. After cancelling the index build, your Flyway migration will fail with an error, and the Spring Boot application will either report a startup failure or retry the migration depending on your configuration. That is fine — the immediate goal is to unblock the deployment. Apply one of the workarounds described below before deploying again.
If you are in a Kubernetes environment and the pod is stuck, you can also simply delete the pod. Kubernetes will schedule a new one, and the new pod will attempt the same migration and hang again — but at least the old pod is gone and its database connections are released. This buys you time to apply a fix, though it does not solve the problem.
I should be direct: if you have multiple application instances attempting to start simultaneously and all of them hit this migration, you may have multiple hung backends consuming database connections. In a connection-limited environment (say, a managed PostgreSQL instance with a 100-connection limit and 10 application pods each with a 10-connection HikariCP pool), those stuck connections can exhaust your pool and block even healthy application instances from connecting. In that scenario, terminating the stuck backends becomes urgent — not just to fix the migration but to restore database connectivity for the rest of your infrastructure.
Five workarounds, ranked
There is no configuration setting in Flyway Community edition that solves this. The executeInTransaction=false per-migration directive exists only in Flyway Teams and Enterprise (paid editions). For the free edition, you need a workaround.
| Approach | Flyway edition | Complexity | Idempotent | Risk | Notes |
|---|---|---|---|---|---|
| Flyway callback (canHandleInTransaction=false) | Community | Medium | Yes (IF NOT EXISTS) | Low | Recommended. Runs after migration, outside transaction. |
| executeInTransaction=false directive | Teams/Enterprise | Low | Yes (IF NOT EXISTS) | Low | Cleanest solution, but requires paid Flyway. |
| ApplicationRunner bean | Any | Low | Yes (IF NOT EXISTS) | Medium | Runs at app startup, after Flyway. No migration tracking. |
| External shell script | Any | Low | Yes (IF NOT EXISTS) | Low | Runs in deployment pipeline. Decoupled from app lifecycle. |
| Regular CREATE INDEX (without CONCURRENTLY) | Community | None | Yes (IF NOT EXISTS) | High | Acquires ACCESS EXCLUSIVE lock. Blocks all reads and writes. |
| Skip Flyway entirely for DDL | Any | High | Varies | Medium | Manage schema with Liquibase, pgAdmin, or manual SQL. |
The recommended approach for most Spring Boot teams is the Flyway callback. It integrates with the migration lifecycle, runs after migration completion (so the advisory lock transaction is already committed), and requires no paid license.
Each one deserves a thorough examination.
Workaround 1: Flyway callback with canHandleInTransaction=false
Flyway's callback system lets you hook into migration lifecycle events. The AFTER_MIGRATE event fires after all versioned migrations have been applied and the Flyway transaction has committed. By returning false from canHandleInTransaction(), you tell Flyway to execute your callback outside of any transaction — exactly what CREATE INDEX CONCURRENTLY requires.
import org.flywaydb.core.api.callback.BaseCallback;
import org.flywaydb.core.api.callback.Context;
import org.flywaydb.core.api.callback.Event;
import java.sql.Connection;
import java.sql.Statement;
// Workaround: run CREATE INDEX CONCURRENTLY outside Flyway's transaction
// by using a Flyway callback with a separate connection.
public class ConcurrentIndexCallback extends BaseCallback {
@Override
public boolean supports(Event event, Context context) {
return event == Event.AFTER_MIGRATE;
}
@Override
public boolean canHandleInTransaction(Event event, Context context) {
// This is the critical line. Returning false tells Flyway
// to execute this callback outside of any transaction.
return false;
}
@Override
public void handle(Event event, Context context) {
try (Connection conn = context.getConfiguration()
.getDataSource().getConnection()) {
conn.setAutoCommit(true); // No transaction wrapping
try (Statement stmt = conn.createStatement()) {
stmt.execute(
"CREATE INDEX CONCURRENTLY IF NOT EXISTS "
+ "idx_orders_email ON orders (email)"
);
}
} catch (Exception e) {
throw new RuntimeException(
"Failed to create concurrent index", e
);
}
}
} Register the callback in your Spring Boot configuration:
# Register the callback in application.yml
spring:
flyway:
callbacks: com.example.migration.ConcurrentIndexCallback
# Or register it programmatically in a @Configuration class:
# @Bean
# public FlywayMigrationStrategy flywayStrategy() {
# return flyway -> Flyway.configure()
# .configuration(flyway.getConfiguration())
# .callbacks(new ConcurrentIndexCallback())
# .load()
# .migrate();
# } The IF NOT EXISTS clause makes this idempotent. The callback runs on every application startup, but if the index already exists, the statement is a no-op. No harm in running it repeatedly.
The downside: the index creation is not tracked in Flyway's schema history table. If you need to know whether the index was created successfully, you need to check pg_indexes directly. For most teams, this is an acceptable trade-off.
A production-ready version with logging and validation
The basic callback above works, but for production use I would recommend a version that logs timing, handles multiple indexes, and verifies the results:
import org.flywaydb.core.api.callback.BaseCallback;
import org.flywaydb.core.api.callback.Context;
import org.flywaydb.core.api.callback.Event;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.Statement;
import java.util.ArrayList;
import java.util.List;
public class ConcurrentIndexCallback extends BaseCallback {
private static final Logger log =
LoggerFactory.getLogger(ConcurrentIndexCallback.class);
// Define all concurrent indexes here. Single source of truth.
private static final List<String> INDEX_STATEMENTS = List.of(
"CREATE INDEX CONCURRENTLY IF NOT EXISTS "
+ "idx_orders_email ON orders (email)",
"CREATE INDEX CONCURRENTLY IF NOT EXISTS "
+ "idx_orders_created_at ON orders (created_at)",
"CREATE INDEX CONCURRENTLY IF NOT EXISTS "
+ "idx_line_items_order_id ON line_items (order_id)"
);
@Override
public boolean supports(Event event, Context context) {
return event == Event.AFTER_MIGRATE;
}
@Override
public boolean canHandleInTransaction(Event event, Context context) {
return false; // Critical: run outside any transaction
}
@Override
public void handle(Event event, Context context) {
try (Connection conn = context.getConfiguration()
.getDataSource().getConnection()) {
conn.setAutoCommit(true);
for (String sql : INDEX_STATEMENTS) {
log.info("Creating concurrent index: {}", sql);
long start = System.currentTimeMillis();
try (Statement stmt = conn.createStatement()) {
stmt.execute(sql);
}
long elapsed = System.currentTimeMillis() - start;
log.info("Index created in {}ms", elapsed);
}
// Verify no invalid indexes were left behind
verifyIndexes(conn);
} catch (Exception e) {
log.error("Failed to create concurrent indexes", e);
throw new RuntimeException(
"Concurrent index creation failed", e
);
}
}
private void verifyIndexes(Connection conn) throws Exception {
String checkSql = "SELECT indexname FROM pg_indexes i "
+ "JOIN pg_index pi ON pi.indexrelid = "
+ "(SELECT oid FROM pg_class WHERE relname = i.indexname) "
+ "WHERE NOT pi.indisvalid";
try (Statement stmt = conn.createStatement();
ResultSet rs = stmt.executeQuery(checkSql)) {
List<String> invalid = new ArrayList<>();
while (rs.next()) {
invalid.add(rs.getString("indexname"));
}
if (!invalid.isEmpty()) {
log.warn("Invalid indexes detected: {}", invalid);
}
}
}
} The logging serves two purposes. First, it provides visibility into how long index creation takes, which is useful for capacity planning and deployment timing. Second, it creates an audit trail — if an index build fails or produces an invalid index, the log tells you exactly when and which statement was involved.
The verification step at the end checks for invalid indexes across the entire database, not just the indexes this callback created. This is deliberate. If another process left behind an invalid index — a previous failed deployment, a manual CREATE INDEX CONCURRENTLY that was interrupted, or a concurrent build that ran out of disk space — you want to know about it. Invalid indexes are one of those silent operational hazards that can persist for months without anyone noticing.
Why AFTER_MIGRATE and not BEFORE_MIGRATE
You might wonder whether using BEFORE_MIGRATE instead of AFTER_MIGRATE would sidestep the issue entirely — create the indexes before Flyway opens its advisory lock transaction. It would not. The BEFORE_MIGRATE callback fires after Flyway has already acquired the advisory lock. The transaction is already open. You would be in exactly the same situation.
The only Flyway event that fires before the advisory lock is acquired is BEFORE_CONNECT, and at that point you do not have a database connection to work with. AFTER_MIGRATE is the correct event because by the time it fires, the migration transaction has committed, the advisory lock has been released, and the database is in a clean state for your concurrent index build.
Workaround 2: ApplicationRunner bean
If you prefer keeping your index creation in Java without involving Flyway's callback system at all, a Spring Boot ApplicationRunner bean will execute after the entire application context — including Flyway — has finished initializing.
// Spring Boot @Configuration — manual concurrent index creation
// Runs after Flyway migration completes and the app context loads.
@Configuration
public class ConcurrentIndexConfig {
@Bean
public ApplicationRunner createConcurrentIndexes(DataSource dataSource) {
return args -> {
try (Connection conn = dataSource.getConnection()) {
conn.setAutoCommit(true);
try (Statement stmt = conn.createStatement()) {
// IF NOT EXISTS makes this idempotent
stmt.execute(
"CREATE INDEX CONCURRENTLY IF NOT EXISTS "
+ "idx_orders_email ON orders (email)"
);
stmt.execute(
"CREATE INDEX CONCURRENTLY IF NOT EXISTS "
+ "idx_orders_created_at ON orders (created_at)"
);
}
}
};
}
} The key detail is conn.setAutoCommit(true). Without this, the JDBC connection may be in an implicit transaction, and you are back to the same problem. AutoCommit mode ensures each statement runs as its own top-level command, which is what PostgreSQL requires for CREATE INDEX CONCURRENTLY.
This approach has a timing subtlety: the indexes are created after the application is already accepting traffic. If your application receives requests that would benefit from the index during the brief window while it is being built, those queries run without it. For most applications, this window is short enough to be irrelevant. For high-traffic applications where the index is critical for query performance, consider using the external script approach so the index exists before traffic begins.
Connection pool considerations
There is a subtlety with the ApplicationRunner approach that deserves attention. The connection obtained from dataSource.getConnection() comes from HikariCP's pool. CREATE INDEX CONCURRENTLY on a large table can take minutes. During that time, the connection is occupied and unavailable to serve application requests. If your pool is small (say, 5 connections) and the index build takes 3 minutes, you are running on 4 connections for those 3 minutes.
For most applications, this is fine. But if your application serves high-concurrency traffic and the pool is already right-sized to its workload, losing a connection for several minutes during startup may cause connection timeout errors for incoming requests. In that case, either increase the pool size by one to account for the index build, or use the external script approach to decouple the index creation from the application's connection pool entirely.
There is also the question of what happens if the ApplicationRunner bean fails. By default, a failed ApplicationRunner will cause the Spring Boot application to fail to start. This is actually the behavior you want — if the index creation fails, you want to know immediately, not discover it in production when queries are slow. But if you prefer the application to start even if the index creation fails (perhaps you have a retry mechanism elsewhere), wrap the callback body in a try-catch and log the failure rather than rethrowing.
Workaround 3: external deployment script
The simplest approach, conceptually, is to keep CREATE INDEX CONCURRENTLY out of Flyway entirely and run it from your deployment pipeline.
#!/bin/bash
# run_concurrent_indexes.sh
# Execute concurrent index creation AFTER Flyway migration completes.
# Call this from your deployment pipeline, after the Spring Boot app starts.
set -euo pipefail
DB_URL="postgresql://app_user:${DB_PASSWORD}@localhost:5432/myapp"
echo "Creating concurrent indexes..."
psql "$DB_URL" -v ON_ERROR_STOP=1 <<'SQL'
-- These run outside any transaction, as bare statements.
-- psql's default autocommit mode is exactly what we need.
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_orders_email
ON orders (email);
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_orders_created_at
ON orders (created_at);
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_line_items_order_id
ON line_items (order_id);
SQL
echo "Concurrent indexes created." This runs after your application has started (and Flyway has completed), or as a separate deployment step. The psql session runs in autocommit mode by default, so each statement executes as its own transaction — no conflict with anything.
The advantage: complete decoupling from Flyway and the application lifecycle. The disadvantage: you now have index creation logic in two places (Flyway for regular indexes, shell scripts for concurrent ones). This is manageable for a handful of indexes but becomes a coordination problem at scale.
Integrating with CI/CD pipelines
In a typical deployment pipeline — GitHub Actions, GitLab CI, Jenkins, or ArgoCD — the index creation script runs as a post-deploy step. The sequence looks like this:
- Build the application artifact (JAR, Docker image)
- Deploy to the target environment
- Wait for the health check to pass (Flyway migration is complete, application is serving traffic)
- Run the concurrent index script against the production database
- Verify indexes were created successfully
Step 3 is important. The index script must not run while Flyway is still migrating — that would recreate the very conflict we are trying to avoid. Waiting for the health check ensures Flyway has finished and released its advisory lock before the index build begins.
Step 5 is also important, and often omitted. The IF NOT EXISTS guard means the script will succeed even if an index was partially created and left in an invalid state. A success exit code from psql does not mean the index is valid — only that the statement did not error. Always verify with the invalid index check query.
For teams that use database migration tooling like gh-ost or pt-online-schema-change for MySQL, the external script approach will feel natural. Those tools have always operated outside the application lifecycle, running as standalone processes against the database. The same pattern works well for PostgreSQL's concurrent index builds.
What about Flyway Teams' executeInTransaction=false?
If your organization has a Flyway Teams or Enterprise license, the cleanest solution is the per-migration executeInTransaction=false directive:
-- R__create_concurrent_indexes.sql
-- Repeatable migration (R__ prefix) — Flyway runs this every time
-- its checksum changes. But this STILL runs inside a transaction
-- by default. You need executeInTransaction=false in the config.
-- Option A: Flyway Teams/Enterprise (paid)
-- Add to the SQL file header:
-- flyway:executeInTransaction=false
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_orders_email
ON orders (email);
-- Option B: Flyway Community (free)
-- This directive is ignored. You cannot disable per-migration
-- transactions in Flyway Community edition.
-- Use the callback approach or external script instead. This tells Flyway to run the migration SQL outside of a transaction entirely. No advisory lock conflict. No virtualxid wait. The statement runs exactly as it would from a bare psql session.
The directive is parsed from a SQL comment in the migration file header. Flyway Community silently ignores it. This is the single most common source of confusion around this issue: developers add the directive, test in their local environment (which may have a Teams trial), and it works. Then it deploys to production with a Community license, the directive is ignored, and the application hangs.
Check your Flyway edition. Then check it again.
I should be specific about what "silently ignores" means. Flyway Community does not log a warning when it encounters executeInTransaction=false in a migration file. It does not throw an error. It does not print "this directive requires Flyway Teams." It parses the comment and discards it. The migration runs inside a transaction as if the directive were not there. This is, in my considered opinion, a design choice that has caused more operational incidents than any other single aspect of Flyway's behavior. A warning log at migration time — "executeInTransaction=false requires Flyway Teams; this migration will run inside a transaction" — would have prevented thousands of startup hangs across the industry. It remains absent.
Verifying your Flyway edition programmatically
If you want to be absolutely certain which edition of Flyway is active in your application, you can check at startup:
// Add to a @PostConstruct method or ApplicationRunner
String version = org.flywaydb.core.Flyway.configure()
.load()
.info()
.getInfoResult()
.flywayVersion;
log.info("Flyway version: {}", version);
// Community editions show version like "9.22.3"
// Teams/Enterprise show version like "9.22.3-teams" or include
// additional info in the Flyway banner at startup.
// Check for the presence of "teams" or "enterprise" in the banner
// output or the license information in Flyway's configuration. This will not prevent the hang — but it will at least give you visibility into which edition is running, so you can avoid the "it worked in dev" surprise.
Workaround 5: dropping CONCURRENTLY (and when it is defensible)
I have listed this as the highest-risk option in the comparison table, and I stand by that assessment for production databases. But I would be remiss if I did not acknowledge the situations where it is the correct choice.
A regular CREATE INDEX (without CONCURRENTLY) acquires an ACCESS EXCLUSIVE lock on the table. This blocks all reads and writes — every SELECT, every INSERT, every UPDATE, every DELETE — until the index build completes. On a table with millions of rows, that can be minutes of complete unavailability for that table.
But consider: during a deployment with a maintenance window, when the application is not serving traffic and no other process is reading the table, an ACCESS EXCLUSIVE lock is harmless. There is no one to block. The index builds faster without the overhead of concurrent snapshot management (no two-phase scan, no waiting for other transactions). And it runs perfectly inside Flyway's transaction, so if it fails, the entire migration rolls back cleanly.
For development and staging environments, regular CREATE INDEX is almost always the right choice. There is no traffic to block, and the transactional safety of Flyway's migration wrapper is genuinely valuable. Save CONCURRENTLY for production, where the non-blocking behavior is actually needed.
The danger is when teams deploy the same migration file to all environments. The migration that works flawlessly in staging (regular CREATE INDEX inside Flyway's transaction) hangs in production (where CONCURRENTLY is needed but conflicts with Flyway). Or the migration that uses the callback workaround for production (to support CONCURRENTLY) leaves staging without the index because nobody ran the callback manually. This is a coordination problem, and it has no perfect solution within Flyway's current architecture.
My recommendation: use the callback approach consistently across all environments. The overhead of IF NOT EXISTS checks on every startup is negligible. The index builds slightly slower in staging (because CONCURRENTLY has more overhead than a regular build). But the behavior is identical across environments, and you will never discover a discrepancy at deployment time.
The bigger problem: why migration tools and DDL do not mix well
This issue is not unique to Flyway. Every migration tool that wraps DDL in transactions creates potential conflicts with PostgreSQL operations that cannot run inside transactions.
-- How other migration tools handle CREATE INDEX CONCURRENTLY:
-- Rails (ActiveRecord)
-- Uses disable_ddl_transaction! to skip the wrapping transaction.
-- This actually works. Rails sets it at the migration level.
--
-- class AddEmailIndexToOrders < ActiveRecord::Migration[7.0]
-- disable_ddl_transaction!
--
-- def change
-- add_index :orders, :email, algorithm: :concurrently
-- end
-- end
-- Django
-- Uses atomic = False on the migration class.
-- Also works correctly — Django respects this flag.
--
-- class Migration(migrations.Migration):
-- atomic = False
-- operations = [
-- AddIndexConcurrently(
-- model_name='order',
-- index=models.Index(fields=['email'],
-- name='idx_orders_email'),
-- ),
-- ]
-- Alembic (SQLAlchemy)
-- Uses op.execute() outside a transaction context.
-- Requires setting transaction_per_migration = False in env.py.
--
-- def upgrade():
-- op.execute(
-- "CREATE INDEX CONCURRENTLY IF NOT EXISTS "
-- "idx_orders_email ON orders (email)"
-- )
-- Liquibase
-- Uses runInTransaction="false" on the changeSet.
-- Similar to Flyway Teams — the attribute exists, but Liquibase's
-- own lock mechanism can still interfere in some configurations. Rails handles this most gracefully. The disable_ddl_transaction! directive actually disables the wrapping transaction for that migration. There is no silent ignoring, no edition-gating. Django's atomic = False is similarly explicit and reliable. Alembic requires a configuration change in env.py, which is less discoverable but works correctly once configured.
Liquibase occupies a similar position to Flyway. Its runInTransaction="false" attribute exists but interacts with Liquibase's own locking mechanism (DATABASECHANGELOGLOCK table) in ways that can produce the same kind of hang. The Liquibase community has documented this extensively, and the recommended workaround is similar: use a custom change type that manages its own connection.
The fundamental tension is this: migration tools want transactional safety. They want to roll back a failed migration cleanly. But CREATE INDEX CONCURRENTLY is explicitly designed to operate outside transactions. These two requirements are incompatible, and every migration tool handles the incompatibility differently — some gracefully, some not.
"Java and Spring Boot bring their own opinions about how to manage database interactions. Understanding where those opinions align with PostgreSQL's strengths — and where they quietly work against them — is the foundation of a properly attired application."
— from You Don't Need Redis, Chapter 10: Java & Spring Boot: Properly Attired
Beyond index creation: other DDL that conflicts
For PostgreSQL specifically, the lock contention patterns created by DDL statements inside migration transactions extend beyond index creation. Several other operations conflict with long-running migration transactions:
REINDEX CONCURRENTLY— Introduced in PostgreSQL 12, this has the same Phase 2 wait requirement asCREATE INDEX CONCURRENTLY. If you are rebuilding a bloated index inside a Flyway migration using theCONCURRENTLYoption, you will get the same hang.ALTER TABLE ... ALTER COLUMN TYPE— This rewrites the entire table and takes anACCESS EXCLUSIVElock. While it works inside a transaction (unlikeCONCURRENTLY), the long hold time of theACCESS EXCLUSIVElock while the table is being rewritten blocks all other access. Combined with Flyway's already-open advisory lock transaction, you can end up with a cascade of blocked queries.VACUUM— Cannot run inside a transaction block at all. If you need to vacuum a table as part of a migration (rare, but it happens after large deletes), it must be done outside Flyway.CLUSTER— Rewrites the table in physical order according to an index. Takes anACCESS EXCLUSIVElock and cannot run inside a transaction with other statements.
The general principle: keep your migration transactions short. Run DDL that requires special transaction handling outside the migration framework. Track it with IF NOT EXISTS guards rather than migration version numbers. And always verify the results.
How large is your table? Estimating index build time
Before applying any workaround, it helps to know how long the index build itself will take — independent of the Flyway conflict. If you are building an index on a 500-row development table, the answer is "instantly" and the workaround is academic. If you are building an index on a 500-million-row production table, the answer is "a while" and the choice of workaround matters significantly.
-- Estimate how long a concurrent index build will take
-- by checking table size and row count.
SELECT
pg_size_pretty(pg_total_relation_size('orders')) AS total_size,
pg_size_pretty(pg_relation_size('orders')) AS table_size,
reltuples::bigint AS estimated_rows,
pg_size_pretty(
pg_relation_size('orders') / NULLIF(reltuples, 0)::bigint
) AS avg_row_size
FROM pg_class
WHERE relname = 'orders';
-- Rough guidelines for B-tree index build time (SSD storage):
-- < 1M rows: seconds
-- 1M - 10M: 10-60 seconds
-- 10M - 100M: 1-10 minutes
-- 100M - 1B: 10-60 minutes
-- > 1B: hours (consider partitioning first)
--
-- These are for the build itself. The Flyway hang adds
-- infinite time on top of that. These estimates assume SSD storage and a system that is not under heavy I/O pressure. On spinning disks, multiply by 3-5x. Under heavy concurrent write load, add 20-50% for the second scan phase, which must process all changes that occurred during the first scan. On a system with high shared_buffers and sufficient maintenance_work_mem, the build may be faster; on a constrained system, slower.
The maintenance_work_mem setting is particularly relevant. PostgreSQL uses this much memory for the sort phase of index building. The default is 64MB. For large index builds, increasing it to 512MB or 1GB can significantly reduce build time. But set it per-session, not globally — you do not want every autovacuum worker consuming 1GB of RAM.
-- Set maintenance_work_mem for the index build session only
SET maintenance_work_mem = '512MB';
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_orders_email
ON orders (email);
-- Reset to default
RESET maintenance_work_mem; If you are using the external script approach, you can set maintenance_work_mem directly in the psql session. If you are using the callback approach, set it on the JDBC connection before executing the CREATE INDEX statement.
Verifying the fix: confirming your index was built correctly
After applying any workaround, verify that the index was created successfully and is valid. A CREATE INDEX CONCURRENTLY that is interrupted (by a crash, a killed process, or a timeout) can leave behind an invalid index — one that exists in the catalog but is not used by the query planner.
-- Check for invalid indexes
SELECT
schemaname,
tablename,
indexname,
indexdef
FROM pg_indexes i
JOIN pg_index pi ON pi.indexrelid = (
SELECT oid FROM pg_class WHERE relname = i.indexname
)
WHERE NOT pi.indisvalid;
-- If your index appears here, it was not built successfully.
-- Drop it and recreate:
-- DROP INDEX CONCURRENTLY idx_orders_email;
-- CREATE INDEX CONCURRENTLY idx_orders_email ON orders (email);
-- A valid index will NOT appear in this query.
-- You can also check directly:
SELECT indisvalid
FROM pg_index
WHERE indexrelid = 'idx_orders_email'::regclass;
-- Should return: true Invalid indexes are a silent hazard. PostgreSQL does not use them for queries, but it does maintain them during writes — so they cost storage and write performance while providing zero query benefit. Always verify after a concurrent index build, especially one that involved workarounds for transaction issues.
The invalid index problem deserves more attention than it typically receives. When CREATE INDEX CONCURRENTLY fails partway through — because you killed the hung application, because the database ran out of disk space, because a network partition interrupted the connection — PostgreSQL does not clean up the partial index. It leaves it in pg_index with indisvalid = false. The index is not used for queries, but it is updated on every write to the table. If you have three invalid indexes on a heavily-written table, you are paying the write overhead of three indexes while receiving zero query benefit. I have seen production systems with dozens of invalid indexes accumulated over months of failed deployments, each one adding a small but measurable write overhead that nobody thought to investigate.
The correct response to an invalid index is always the same: drop it and recreate it. DROP INDEX CONCURRENTLY IF EXISTS idx_orders_email removes the invalid index without blocking table access. Then recreate it with the proper workaround in place.
For a complete guide to PostgreSQL index types and their behavioral differences, the index types reference covers B-tree, GIN, GiST, BRIN, and hash indexes with storage and performance characteristics for each.
Honest counterpoints: when this advice falls short
I should be forthcoming about the situations where none of these workarounds are entirely satisfying.
Schema version tracking. The central value proposition of Flyway is that your database schema is version-controlled. Every DDL change has a version number, a checksum, and a record in flyway_schema_history. When you move index creation out of Flyway — whether to a callback, an ApplicationRunner, or an external script — you lose that tracking. You can check whether an index exists by querying pg_indexes, but you cannot easily answer "was this index created by migration V42 or by the callback? When was it last rebuilt? Did it succeed on the first attempt or the third?" The IF NOT EXISTS guard makes the operation idempotent, but idempotent is not the same as tracked.
For most teams, this is an acceptable trade-off. For teams with strict compliance requirements — SOC 2 audits that demand a complete history of every schema change, or regulated industries where you must prove that the production schema matches the migration history exactly — the loss of tracking is a real cost. In those environments, Flyway Teams with executeInTransaction=false is the only workaround that maintains full version tracking, which means the compliance requirement has a direct dollar cost attached to it.
Multi-instance deployments. If you run 20 pods and all of them start simultaneously, 20 instances of the callback or ApplicationRunner will attempt to create the same index at the same time. The IF NOT EXISTS guard handles this gracefully — 19 of the 20 will be no-ops. But in the brief window before any of them succeed, you have 20 backends competing for the same work. PostgreSQL handles this correctly (only one will actually build the index; the others will wait on a ShareLock and then find the index already exists), but the connection consumption during that window can be significant on a small database instance.
Index creation during zero-downtime deployments. The callback and ApplicationRunner approaches create indexes while the application is already serving traffic. If the index is critical for a query that handles a high-traffic endpoint, you have a window — potentially minutes on a large table — where that endpoint runs without the index. The external script approach avoids this by decoupling the index creation from the application lifecycle, but it introduces a different window: between when the application deploys and when the script runs. Neither approach provides a fully uninterrupted transition from "no index" to "index" with zero impact on live traffic. CREATE INDEX CONCURRENTLY itself is non-blocking for queries, which helps, but the queries that would benefit from the index still run without it until it is built.
Flyway's position on this issue. I should acknowledge that Flyway's maintainers have a defensible perspective. Transactional migration execution is a core safety feature. The advisory lock prevents corruption from concurrent migrations. Relaxing these guarantees in the Community edition would expose users to data integrity risks that most migration users are not equipped to handle. The executeInTransaction=false directive in the paid edition is a reasonable business decision — advanced transaction control for advanced users who are willing to pay for it. One can disagree with this decision while acknowledging that it is not unreasonable.
That said, silently ignoring the directive in the Community edition — rather than logging a warning — is a design choice I wish they would revisit. The silence converts a configuration issue into an operational incident. A warning in the logs would cost nothing and save a great deal of confusion. I raise this not to score a point, but because every team debugging a hung migration deserves to know why it hung.
Prevention: patterns that avoid this problem entirely
If you are setting up a new Spring Boot project with Flyway and PostgreSQL, or if you are establishing migration conventions for a team, here are patterns that prevent the Flyway + CONCURRENTLY conflict from ever occurring.
Convention: Flyway handles structure, callbacks handle indexes
Establish a clear separation of concerns: Flyway versioned migrations handle table creation, column additions, constraint changes, and data migrations. Concurrent index creation is always done in a Flyway callback. This means your db/migration/ directory contains no CREATE INDEX CONCURRENTLY statements, ever. Indexes that can be created inside a transaction (small tables, development environments) use regular CREATE INDEX inside versioned migrations. Indexes that need concurrent creation for production safety use the callback.
Document this convention. Put it in the project's CONTRIBUTING.md. Add a code review checklist item. The first time a developer adds CREATE INDEX CONCURRENTLY to a Flyway migration, it will work in their local development environment (because the table is small and the build is instant, so the hang resolves before anyone notices). It will fail in staging or production. The convention prevents this by making the correct approach the default approach.
Convention: always use IF NOT EXISTS
Every CREATE INDEX statement — whether in a migration, a callback, or an external script — should include IF NOT EXISTS. This makes every index creation statement idempotent. If the index already exists, the statement is a no-op. If it does not exist, it is created. There is no downside to this guard and significant upside: callbacks can run on every startup without harm, scripts can be re-run without error, and partial failures can be retried without manual cleanup.
The IF NOT EXISTS clause was added to CREATE INDEX in PostgreSQL 9.5. If you are on an older version — well, you have larger concerns than Flyway compatibility, and I would encourage an upgrade at your earliest convenience.
Convention: separate DDL and DML migrations
Keep DDL changes (table structure, columns, constraints, indexes) in separate migration files from DML changes (data backfills, updates, inserts). This is good practice regardless of the Flyway issue, but it specifically helps here because it keeps migration transactions short. A migration that creates a table, adds three columns, backfills 50 million rows, and then creates an index is a single long-running transaction that holds locks for the entire duration. A migration that creates the table and columns (fast) followed by a separate migration that backfills data (slow, but only holds row-level locks) followed by a callback that creates the index (outside a transaction) is three short operations with minimal lock contention.
How Gold Lapel handles this differently
Gold Lapel is a self-optimizing PostgreSQL proxy. When it identifies that a table would benefit from an index — based on query patterns, sequential scan frequency, and filter selectivity — it creates the index automatically.
It uses CREATE INDEX CONCURRENTLY, because that is the correct approach for a production database. But it does so outside of any transaction context. Gold Lapel's index creation runs as a standalone operation on its own connection, with autocommit semantics. There is no wrapping transaction. No advisory lock. No migration framework holding open a transaction that blocks Phase 2.
The Flyway deadlock cannot occur because the conditions that produce it do not exist. There is no migration transaction to conflict with. There is no advisory lock to hold. The index build proceeds through all four phases without waiting on anything except the table scan itself.
Gold Lapel also monitors the index build's progress using pg_stat_progress_create_index and handles the failure case — if CREATE INDEX CONCURRENTLY fails partway through and leaves an invalid index, Gold Lapel drops it and retries. The invalid index hazard described above is handled automatically. And because Gold Lapel observes your actual query patterns, it creates indexes that your queries actually need — not indexes that a developer guessed would be useful six months ago in a migration file that has never been revisited.
For Spring Boot applications, this means you can remove index management from your Flyway migrations entirely. Let Flyway handle table schemas, constraints, and data migrations. Let Gold Lapel handle index creation. Each tool does what it was designed for, and the transactional conflict disappears.
I should be clear about the boundary of this recommendation. Gold Lapel handles performance indexes — the indexes created to make queries faster. It does not handle constraint indexes — unique indexes that enforce business rules, primary key indexes, or foreign key indexes. Those belong in Flyway because they are part of your data model, not your performance tuning. The distinction is important: a missing performance index makes queries slow. A missing unique index allows duplicate data. These are different categories of concern, and they belong in different tools.
Six words. Infinite patience required.
Six words in a migration file. An indefinite hang at startup. Five workarounds, each with trade-offs. The core issue is a mismatch between two correct systems: Flyway is correct to use transactions for migration safety. PostgreSQL is correct to require all prior transactions to complete before building a concurrent index. Neither system is wrong. They are simply incompatible when combined, and neither one knows enough about the other to detect the conflict or warn you about it.
The callback approach resolves this for most teams. The external script approach resolves it for teams that want complete separation of concerns. The Flyway Teams directive resolves it for teams willing to pay for the cleanest solution. And Gold Lapel resolves it by removing the migration framework from the index creation process entirely.
Whichever approach you choose, verify the result. Check for invalid indexes. Log the timing. And if you are establishing conventions for a new team, make the correct approach the default one — because someone on your team will, with the best of intentions, put CREATE INDEX CONCURRENTLY in a Flyway migration, and it will work perfectly in development, and it will hang forever in production, and they will spend an hour checking network connectivity and database credentials before they find this article.
I shall leave a light on.
Frequently asked questions
Terms referenced in this article
If you'll permit me a further observation — the lock contention that makes CREATE INDEX CONCURRENTLY hang is a symptom of a broader challenge. I have written a thorough guide to diagnosing and resolving PostgreSQL lock contention that covers the full taxonomy of locks you are likely to encounter, not merely the ones Flyway provokes.