Indexing In Relational Databases

Indexing In Relational Databases

Motivation

We have table Student{ID, First_name, Last_name, Age} that contains 1M records

Now, if we want to search about student that have ID = x, we will scan all blocks to search about it. (Linear search)

What if we have file that contains ID and record address? it's more optimal than Linear search and index file size is less than data file size

Note that index file must be sorted

Overview

Indexing improve search query and solve problems such as data file can't load it in the memory

Improve means try to minimize number of disk access

Indexing

  • Technique used to improve search query and may be consider it as auxiliary file

Index file structure

  • Search key value
  • Address (record pointer)

Indexes can be characterize

  • Sparse index

    • Has index for only some entries (record)
    • image.png
    • Is applicable if data file is sequential (ordered)
    • Less space as it's not store index for all entries
    • Less maintenance for insertion and deletion operation
  • Dense index

    • Has index for all entries (record)
    • image.png

    • Faster than sparse as it's has index for all entries

    • More Space
    • More maintenance for insertion and deletion operation

Type of index

  • Single level index

    • Primary index
    • Secondary index
  • Multi-level index

    • B tree
    • B+ tree

Constraints to build primary index file

  • Data file must be sorted on search key
  • Always search key is Primary key

Note We can't build more than one primary index file as data file must be sorted on search key (PK)

Constraints to build secondary index file

  • Data file not sorted on search key
  • search key is any attribute other than primary key and may be candidate key

Note We can build more than one secondary index file

Primary key Vs Candidate key

  • Primary key

    • Primary Key is a set of attributes (or attribute) which uniquely identify the tuples in table
  • Candidate key

    • Is a set of attributes (or attribute) which uniquely identify the tuples in relation or table
    • Candidate key’s attributes can contain a NULL value which opposes to the primary key
    • There is one or more candidate key in any table

Example

Student{ID, First_name, Last_name, Age}

Here only ID can be primary key because the name, age and address can be same, but ID can’t be same.

Note any Primary key is also candidate key and the opposite is not true

Problems with indexing

  • Insertion, deletion and update operation not optimal as after each operation tree must be balanced tree

Finally, what about if index file can't load it in the memory?

See You Later