Database Index

What is Database Index?

A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure.

Primary Index

Definition:

A primary index is automatically created when a table has a primary key.
It is typically a clustered index, meaning the actual row data is physically stored in the order of the primary key.

Characteristics:

Unique: Enforces uniqueness.
Sorted: Data is stored in sorted order of the primary key.
Efficient for range and point queries on the primary key.

Example:

CREATE TABLE users (
  id INT PRIMARY KEY,
  name VARCHAR(100),
  age INT
);

Here, id is the primary index, and the table’s data is clustered/sorted by id.

Sparse Index

Definition:

A sparse index only holds entries for some of the data blocks, not every record.
Often points to the first record of each block.

How It Works:

If we have data like:

Block 1: id 1, 2, 3
Block 2: id 4, 5, 6
Block 3: id 7, 8, 9

Then the index would look like:

Index: [1 → Block1, 4 → Block2, 7 → Block3]

Searching for id = 8:

Find the closest index entry $\leq$ 8, which is 7.
Go to Block 3 and scan within the block to find 8.

Pros:

Smaller index size, less memory.
Suitable when data is accessed block-wise or in ranges.

Cons:

Not good for random access to single records, as scanning within a block is needed

Clustered Index

Definition:

A clustered index determines the physical order of data in storage.
Each table can have only one clustered index.
In many DBMS like InnoDB (MySQL), the primary key is the clustered index by default.

Characteristics:

The index contains the actual row data (not just a pointer).
Range queries are fast due to sorted physical layout.

Comparison with Sparse Index:

Clustered indexing means data is stored with the index.
Sparse indexing means index points to the data blocks, and data is stored separately.
Clustered index may look like a sparse index, but it’s integrated with the data.

Secondary Index

Definition:

Any index not on the primary key is a secondary index.
It stores the indexed column and a pointer to the primary key.

Structure:

Secondary index on name:

Index (name → id):
[Alice → 1, Bob → 2, Carl → 3]

To retrieve full row for name = 'Bob':

Use secondary index → Get id = 2.
Use primary index to locate full row with id = 2.

This is called an index back lookup.

Performance:

Slower than clustered index because it requires two lookups.
But faster than scanning the whole table.
Can be optimised using covering indexes.

Covering Index

CREATE INDEX idx_name_age ON users(name, age);

Then,

SELECT age FROM users WHERE name = 'Bob';

→ Fully answered by index. No back lookup needed.

Index Strategy in Production

When to Create Indexes:

Based on traffic patterns: Analyze slow queries and frequent filters.
Create indexes on:
- WHERE conditions
- JOIN columns
- ORDER BY fields (if sorting is frequent)
- GROUP BY fields
Use composite indexes when queries use multiple columns together.

Avoid Over-indexing:

Every index slows down INSERT/UPDATE/DELETE (because all indexes must be updated).
Use only necessary indexes.
Use monitoring tools (like EXPLAIN in MySQL/Postgres) to check query plans.

Example: If you have:

SELECT * FROM orders WHERE customer_id = ? AND status = 'PAID';

Consider:

CREATE INDEX idx_customer_status ON orders(customer_id, status);

Digital Garden 🌱

Recent Notes

Go Concurrency Notes

Bidirectional Protocols

HTTP Versions Overview

Explorer