Introduction

What is a join?

  • Take tow relations , and create one new relation out of their matches on the join condition.
  • That is, for each Record in , find all records in that match the join condition and write as a new row in the output. All the fields of followed by all the fields of .

Notation Conventions

  • is the number of pages in table
  • is the number of records on each Page of
  • is the total number of records in table

Join Methods

Hash Join vs. Sort-Merge Join

Sorting pros:

  • Good if input already sorted, or need output sorted
  • Not sensitive to data skew or bad hash functions

Hashing pros:

  • For join: number of passes depends on size of smaller relation
    • E.g. if smaller relation is , naive/hybrid hashing is great
  • Good if input already hashed, or need output hashed