Kinds of Query Parallelism

Inter-Query

Multiple SQL queries runs in parallel.

Intra-Query

  • Inter-Operator
    • Pipeline
    • Bushy (Tree): Sub-tree uses pipeline. Between sub-trees runs in parallel.
PipelineBushy (Tree)
  • Intra-Operator
    • Single query runs in parallel by partitioning.

Intra-Operator

Data Partition

Assume we have multiple machines, how can we partition data into these individual machines? Introduce several partitioning methods:

  • Range: Good for equijoins, range queries and group-by
  • Hash: Good for equijoins, group-by
  • Round-Robin: Good for spreading load

Parallel Scans

We can do parallel scans on multiple machines:

  • Simply scan in parallel and merge result as the ouput.
  • : If we use range or hash partitioning, we can skip entire sites that have no tuples satisfying .
  • We can build indexes at each partition

Lookup by key

  • If data partitioned on function of key (range and hash for example), we can only lookup the relevant node.
  • Otherwise we have to broadcast lookup to all nodes.

Insert a key

  • Similarly, if data partitioned on function of key, we route the insert to relevant node.
  • Else, route insert to any node is ok.

Insert a unique key

  • Data partitioned, route the insert to the relevant node;
  • Else, broadcast lookup and collect results. If not exists, insert anywhere.

Hash Join

Naive Parallel Hash Join

  • Phase 1: shuffle each table across machines (using ).
  • Phase 2: receivers proceed with naive hashing in a pipeline as probe data streams in.

Grace Parallel Hash Join

  • Pass 1: parallel streaming
  • Pass 2: local Grace Hash Join per node
  • Every node waits for Pass 1 to end, and works at its top speed in Pass 2.

Sorting

  • Pass 0: shuffle data across machines
  • Pass 1-n: independently run as single-node sorting

Sort-Merge Join

  • Pass 0 to n-1: like parallel sorting above, but do it twice: once for each relation with same ranges
  • Pass n: merge join partitions locally on each node

Note: this picture is a 2-pass sort

Parallel Aggregates

Hierarchical aggregation