1 min read

Optimizing JSONB Queries in Postgres with ALTER TABLE ... SET STATISTICS

Slow PostgreSQL queries on JSONB columns? Even with a GIN index, performance can degrade as datasets grow. Increasing the statistics target from 100 to 1000 improved query speed significantly. Learn how SET STATISTICS and ANALYZE can help optimize query planning for high-variance JSONB data.

Recently at Sunbeam, we encountered an interesting performance issue while working with a PostgreSQL table that stored tags as a JSONB column. The table held key-value pairs representing metadata tags applied to datasets. However, these tag keys weren’t fixed across the entire table—each dataset had a specific set of valid tags, leading to a high variety of key-value pairs.

The Problem: Slow Queries on JSONB Filtering

Initially, filtering queries that used the tags column performed reasonably well, but as the dataset grew, we noticed a sharp decline in performance. We had already added a GIN index on the column to speed up lookups, but despite that, query times continued to degrade. This made it clear that Postgres wasn't optimizing these queries effectively. Given that JSONB is inherently flexible but not as index-friendly as structured columns, we needed a way to help the query planner make better decisions.

The Fix: Increasing the Statistics Target

We discovered that the default statistics target for columns in Postgres is 100. This means that Postgres only samples a limited subset of values when estimating query plans, which can lead to suboptimal execution paths—especially for highly variable data like our JSONB column.

By increasing the statistics target for the tags column to 1000, we significantly improved the query planner’s ability to estimate the distribution of data. This adjustment resulted in a 10x-100x speedup in queries involving tag-based filtering.

ALTER TABLE datasets ALTER COLUMN tags SET STATISTICS 1000;

# Re-run the table analysis for this to actually be come into effect.
# This could be a slow one, depending on the size of your table so this
# may impact when you choose to do this.
ANALYZE datasets;

This change increased the amount of sampled data used for statistics collection, leading to much more accurate query planning.

Lessons Learned

  1. Postgres' default statistics target (100) may not be sufficient for high-variance columns, especially JSONB columns with diverse key-value pairs.
  2. Manually tuning the statistics target can help the query planner make better decisions, significantly improving query performance.
  3. Indexes on JSONB fields are useful but not always enough—query performance is also heavily influenced by how well the planner understands the data distribution.