Index Selectivity: A Key to High Performance SQL

Dongqing Hu
Flexport Engineering
9 min readJul 25, 2022

--

Introduction

It is generally known that indexes improve SQL performance but this is not always true, because only efficient indexes can really improve SQL performance. In fact, too many indexes slow down the write operations.We’ve previously discussed how an inefficient index slowed down system performance, but its example is a GIN index, which is not very commonly used. Now we’re introducing examples of inefficient BTREE indexes, which are commonly used. In both use cases, the common key to high performance SQL is index selectivity.

As a reminder, an index is selective if it helps the queries using it filter out most of the rows during an index scan. An index that is not selective often downgrades the performance of index scans to the degree of table scans.

Problem

The Flexport Platform lets Flexport operators (client and supply chain operations teams), clients and transportation partners send messages across our applications. When a message is sent, its recipients will receive notifications. A Flexport operator can receive thousands of notifications per week, so we developed tools to help them better manage such as the “Notifications” dropdown list in the navigation bar to show unread notifications and the “Inbox” tool that allows a user to query the notifications on certain shipments or from particular clients in a time range.

It is important to design the system to meet the future scale requirements without performance degradation. The performance challenges to these tools are tough: as the business grows rapidly, some users have received millions of notifications, and as a result their queries could take over a minute to complete.

Schema

The notifications table schema looks like this:

To explain the columns in the above picture:

  • id: the primary key
  • user_id: a foreign key referencing the recipient user
  • notification_type_id: a foreign key referencing a notification type object
  • subject_id: a foreign key referencing a user who acted on an object
  • object_id: a foreign key referencing an object which is acted on
  • created_at: notification creation time
  • updated_at: notification update time
  • read_at: a timestamp which is set with current time when a notification is read by the user
  • details: the text description of the notification

In the following, we introduce the slow SQL cases and solutions in detail (the mentioned SQL and query plans are modified to hide sensitive business information).

Case 1: Counting unread notifications

SQL

SELECT COUNT(*)FROM notificationsWHERE notifications.user_id = ? AND read_at IS NULL

“read_at IS NULL” means unread, so this query counts the number of unread notifications for a user. There is a BTREE index on the user_id column to boost this query.

The number of notifications for a user varies from thousands to millions. For most users, this count is in the thousands so this query executes fast (in milliseconds). But for a few users in our client-facing operations teams, with notification counts in the millions, this query executes slowly (in seconds or even minutes) because the index scan takes much more time to go through the millions of index entries.

Solution

We found that less than 10% of notifications were unread, so the index scan could be faster if it skipped the read notifications. PostgreSQL supports such a feature named partial indexes. We replace the user_id index with a “user_id where read_at IS NULL” partial index. This index only includes the rows that match the “read_at IS NULL” condition, so that most of the rows are excluded and the index size is reduced by 90%. With this change, this query is always fast (in milliseconds).

Case 2: Listing recent notifications

SQL

SELECT *FROM notificationsWHERE notifications.user_id = ?ORDER BY notifications.created_at descLIMIT ? OFFSET ?

This is a paginated query, so it queries recent notifications for a user (ordered by the created_at timestamp descending). Similar to the counting case, it is also fast for most users and slow for a few users.

Solution

We found that a multicolumn index on (user_id, created_at) performs better than the user_id index. Index entries are stored while being ordered by their values. A scan on the multicolumn index will firstly locate the subset of the entries having the certain user_id value, and then traverse the subset in the order of created_at values and stop as soon as it finds enough rows for the LIMIT clause. Note that there has been a (user_id, object_id, created_at) index which is less efficient than the (user_id, created_at) index because the object_id column is not only useless but also harmful for this query. The subset filtered by user_id is not ordered by created_at but ordered by the (object_id, created_at) combination, so the traverse in the subset will jump around.

A multicolumn index (also known as a composite index, or a concatenated index) is composed of a list of columns. The ordering of the list matters. To get the index design right, let the intended search patterns for the index guide the left-to-right ordering of the columns listed. Generally, when a query filters by some columns and orders by other columns, you can create a multicolumn index by appending the “order-by” column names after the “filter-by” column names.

The rule of thumb with ordering is to list your primary lookups first (left-most column entries), then subsequent columns should be matched to that primary lookup. For example, the very common “(last_name, first_name)” composite index ordering.

Case 3: Querying the notifications from certain shipments or clients in a time range

SQL

SELECT notifications.*FROM notificationsINNER JOIN notification_types ON notification_types.id = notifications.notification_type_idJOIN messages ON messages.id = notifications.object_id AND notification_types.object_type = ‘message’JOIN shipments ON messages.messageable_id = shipments.id AND messages.messageable_type = ‘Shipment’WHERE notifications.user_id = ?AND notification_types.domain = ?AND shipments.status = ?AND shipments.client_id IN (?)AND (messages.assignee_id = ? OR messages.assignee_id IS NULL)AND messages.created_at >= ?AND notifications.created_at >= ?ORDER BY notifications.created_at DESC, notifications.id DESC

This query is very complex. Different from the previous queries, even the same query with the same arguments (surely for the same user) is sometimes fast and sometimes slow.

Slow query plan

Below is a slow query plan:

The critical step of this query plan is

-> Index Scan using index_messages_on_messageable_type_and_messageable_id on messages (cost=0.56..81.19 rows=1 width=12) (actual time=4.341..4.672 rows=0 loops=21142)Index Cond: (((messageable_type)::text = ‘Shipment’::text) AND (messageable_id = shipments.id))Filter: (((assignee_id = ?) OR (assignee_id IS NULL)) AND (created_at >= ?::timestamp without time zone))Rows Removed by Filter: 8

This step is in a Nested Loop Join, and each loop performs an index scan on the target table. The average time of each loop is 4.5 ms (4.341..4.672). From “loops=21142” we know there are 21142 loops, so the total time of this step could be 4.5 * 21142 = 95139 ms, but the total time of query execution is only 34338 ms. Why?

We can see this information from the query plan:

Workers Planned: 2Workers Launched: 2

It means the parallelism value is 3, composed of 1 leader process and 2 worker processes (current query execution process is the leader, and two additional workers are launched). So, the real total time of this step is 4.5 * 21142 / 3 = 31713 ms, matching the total time of query execution. This step consumes most of the total time, so we should improve it.

Fast query plan

When we executed the same query with the same arguments again, we got a very similar but much faster query plan:

The only difference is the time of each loop (0.012..0.012), which is 375 times faster (4.5 / 0.012 = 375). If we execute this query with different arguments, we would get a slow query plan again. In a nutshell, the first execution with certain arguments will be slow but the following executions will be fast. The fast query plan is almost the same as the slow one, with only the time of the critical step being different. What can we infer from this phenomenon? Yes, there should have been a cache.

Analysis

There are two main problems in the slow query:

Each loop in the Index Scan on the messages table is slow when the cache is not hit. The single-value index scan is 4.5 ms, which is not acceptable.We notice that the Index Scan on the notifications table is always fast (0.006 ms or 0.002 ms), which is a more reasonable time for such a case. So what makes them different?

The Index Scan on the messages table has “Index Cond” and “Filter” conditions, while the Index Scan on the notifications table has only the “Index Cond” condition. The index on the messages table covers messageable_type and messageable_id columns, but does not cover assignee_id and created_at columns, so the Index Scan must firstly load the index to filter by messageable_type and messageable_id, and then load the table to filter by assignee_id and created_at. Table loading needs multiple disk reads and takes a lot of time, but can benefit from caching (this is why the following executions of the same query and arguments become fast). You can include cache information to the query plan by executing “EXPLAIN (ANALYZE, BUFFERS) #{sql}”. The time cost of Index Scan may be improved by improving index selectivity.

Another issue is that the query planner underestimated the number of rows of shipments filtered by client_id. The following information says the estimated number of rows is 138 (rows=138), but actual number of rows is 21141 (rows=7047 loops=3):

-> Parallel Bitmap Heap Scan on shipments (cost=2799.21..46196.17 rows=138 width=12) (actual time=1365.688..1385.022 rows=7047 loops=3)

Hash Join creates a hashmap which is faster for a lot of rows but consumes more space, while Nested Loop Join is faster for a few rows and saves space. The query planner would choose Hash Join for a lot of rows, but would choose Nested Loop Join for a few rows. In this case, it chose Nested Loop Join because it thought there were only 138 rows. The “Join” step could be improved by using Hash Join.

Solution

A SQL query should not rely on the mercy of caches. Caches are great! But your SQL query should be rock solid in terms of optimization before worrying about using them.

We first need to improve index selectivity. If the index on the messages table could be more selective by covering more columns, less or zero table loading would be needed. As the “assignee_id IS NULL” condition was not index-friendly, we chose to only append the created_at column to the index. We also placed messageable_id to the first column in the index as a small optimization. Now this index is on (messageable_id, messageable_type, created_at) of the messages table. The average time of each loop is reduced from 4.5 ms to 0.338 ms, and the total query execution time is reduced from 34339 ms to 3893 ms, 9 times faster.

Now, what if we tell the query planner to choose Hash Join? PostgreSQL does not support SQL hints, so we have to configure a “SET enable_nestloop = off” option. This option forces PostgreSQL to never use Nested Loop Join so it will use Hash Join. The total query execution time is reduced from 34339 ms to 5568 ms, 6 times faster. In fact, MySQL 8 deprecates Nested Loop Join and always uses Hash Join for this reason.

The index selectivity optimization is good enough. The Hash Join optimization requires a database configuration change which is not production-safe, and its effect is not that significant when used upon the index selectivity optimization. Thus, our decision was to adopt only the index selectivity optimization.

Real production improvement

The following diagrams illustrate the significant production improvements to the “Inbox” tool after optimization.

Due to the heavy load from business growth, the API endpoint error rate before optimization was 11.8%.
The API endpoint error rate after optimization was 0.8%, 15 times better. (We are refactoring code to further improve it.)

Conclusion

We discussed three cases about BTREE index performance, all of which benefited from improved index selectivity. An index that isn’t selective not only consumes more memory and storage spaces but also slows down SQL performance. As a principle, please be aware of and improve index selectivity when designing business-critical database queries.

Acknowledgement

Thank you to Joker Ye for contributing to the related engineering work, and to Dylan Irlbeck, Vinod Kumar, David Goussev and other Flexport colleagues for reviewing this article.

--

--