logoTan Chia Chun

Database Indexes

An introduction to database indexes and how Hash Index, B-Tree Index, and LSM Tree Index improve query performance.

Introduction

As your application grows, so does your data. Without proper optimization, querying large datasets can become slow and inefficient.

This is where database indexes come in.

An index is a data structure that improves the speed of data retrieval operations on a database table, similar to how an index in a book helps you quickly find specific information without scanning every page.

However, indexes come with trade-offs in terms of storage and write performance, so understanding different types of indexes is crucial.


What is a Database Index?

A database index is a separate data structure that stores a mapping between keys (e.g., column values) and their corresponding rows in a table.

Instead of scanning the entire table (full table scan), the database can use the index to quickly locate the required data.


Hash Index

A Hash Index uses a hash function to map keys to specific locations (buckets).

How it works:

  • The key is passed through a hash function
  • The result determines where the data is stored
  • Lookup is done by computing the same hash

Example:

Searching for a user by email:

  • hash("jeremy@email.com") → bucket 42
  • Directly retrieve from bucket 42

Advantages:

  • Extremely fast for exact match queries (=)
  • Constant time complexity on average: O(1)

Limitations:

  • Not suitable for range queries (>, <, BETWEEN)
  • Hash collisions can occur (multiple keys mapping to same bucket)

B-Tree Index

A B-Tree Index is the most common type of index used in databases.

It organizes data in a balanced tree structure, keeping keys sorted for efficient searching.

How it works:

  • Data is stored in sorted order
  • Tree structure allows binary-like search
  • Each node contains multiple keys and children

Example:

Finding users with age between 20 and 30:

  • Traverse the tree
  • Efficiently locate the range

Advantages:

  • Supports range queries and sorting
  • Balanced tree ensures O(log n) performance
  • Works well for most general use cases

Limitations:

  • Slightly slower than hash index for exact matches
  • Requires rebalancing on inserts and deletes

LSM Tree Index (Log-Structured Merge Tree)

An LSM Tree Index is optimized for high write throughput, commonly used in modern databases and storage systems.

How it works:

  • Writes are first stored in memory (MemTable)
  • Periodically flushed to disk as immutable files (SSTables)
  • Background processes merge and compact these files

Example:

Logging millions of events:

  • Writes go to memory (fast)
  • Later flushed and merged on disk

Advantages:

  • Excellent for write-heavy workloads
  • Sequential disk writes (faster than random writes)
  • Efficient for large-scale systems

Limitations:

  • Read operations can be slower (may check multiple files)
  • Requires compaction, which adds complexity

When to Use Each Index

Index TypeBest Use Case
Hash IndexExact match queries (e.g., WHERE email = ?)
B-Tree IndexGeneral-purpose queries, ranges, sorting
LSM Tree IndexWrite-heavy systems (logs, analytics, big data)

Trade-offs of Indexing

While indexes improve read performance, they also introduce costs:

  • Storage overhead: Indexes take additional space
  • Slower writes: Inserts/updates must also update indexes
  • Maintenance complexity: Especially for large datasets

Why Indexes Matter

Without indexes:

  • Queries become slower as data grows
  • Full table scans consume more resources
  • User experience degrades significantly

Indexes are essential for building scalable and performant applications.


Final Thoughts

Choosing the right index depends on your workload:

  • Heavy reads → optimize for fast lookups (B-Tree, Hash)
  • Heavy writes → consider LSM Tree
  • Mixed workloads → B-Tree is usually a safe default

Understanding indexes allows you to design databases that scale efficiently while maintaining good performance.


References

Database Index on Wikipedia

LSM Tree on Wikipedia

On this page