Indexing In Relational Databases

We have table Student{ID, First_name, Last_name, Age} that contains 1M records

Now, if we want to search about student that have ID = x, we will scan all blocks to search about it. (Linear search)

What if we have file that contains ID and record address? it's more optimal than Linear search and index file size is less than data file size

Note that index file must be sorted

Indexing improve search query and solve problems such as data file can't load it in the memory

Improve means try to minimize number of disk access

Sparse index
- Has index for only some entries (record)
- Is applicable if data file is sequential (ordered)
- Less space as it's not store index for all entries
- Less maintenance for insertion and deletion operation
Dense index
- Has index for all entries (record)
- Faster than sparse as it's has index for all entries
- More Space
- More maintenance for insertion and deletion operation

Note We can't build more than one primary index file as data file must be sorted on search key (PK)

Note We can build more than one secondary index file

Primary key
- Primary Key is a set of attributes (or attribute) which uniquely identify the tuples in table
Candidate key
- Is a set of attributes (or attribute) which uniquely identify the tuples in relation or table
- Candidate key’s attributes can contain a NULL value which opposes to the primary key
- There is one or more candidate key in any table

Student{ID, First_name, Last_name, Age}

Here only ID can be primary key because the name, age and address can be same, but ID can’t be same.

Note any Primary key is also candidate key and the opposite is not true

Insertion, deletion and update operation not optimal as after each operation tree must be balanced tree

Finally, what about if index file can't load it in the memory?

See You Later