Query Processing Outline .fr

Normalize query predicates expressions into conjunctive or disjunctive normal form. ○ Analysis. – Detect and rejects incorrect queries: ○ Type errors,.
628KB taille 7 téléchargements 374 vues
Query Processing

Fabio Porto LBD IBD – été 2005

Fábio Porto

Outline

z z z

Introduction Query Decomposition Query Optimization – –

Rule based Cost based z

z

Cost model

Conclusion

Page 1

Query Processing Main Problem

Fábio Porto

High level query language (eg. Relational calculus) Schema information

Statistics

Query Processor Rules

Low-level operations (eg. Relational algebra)

Fábio Porto

A view of the problem Πename,dept (Depto dno (EMP

Πename(EMP) D’=D

ENO

(σ resp=“manager” (Alloc)))

Πename(EMP)

DNOE’

E’=E

E’ E’=E

D’

ENOA’

D’=D

A’

DNOA’

A’

A’=σresp=manager(A)

(P1)

DNOD’

A’=σresp=manager(A)

?????

Page 2

(P2)

Fábio Porto

Cost of Strategies z

Assumptions: – size(E) = 400, size(Alloc) = 1000, size(Depto)=200, 50 mngrs in Alloc; z Direct access to Alloc tuples given resp attribute; z Uniform distribution ≈ 2,5 Allcs/Emp

z

– Cost of accessing 1 tuple = 10 unit; – Cost of joining 1 tuple = 3 units; Strategy 1 • • • •

produce A': (50)∗tupleaccesscost Join A' to E: (50 x 400)∗joinCost Join E’ to D: (50x200) ∗ joinCost Project end result

= 500 = 60000 = 30000 = 0

Total Cost = 90500

Fábio Porto

Cost of Strategy 2 z

Strategy 2 • Exercise

Page 3

Fábio Porto

Stages of Query Processing Query over relations Query Query Analysis Analysis

Schema Schema

Algebraic query

Global Global Optimization Optimization

Statistics Statisticson on relations relations

Query Execution Plan (Algebraic operators + Control operators) Query QueryExecution Execution Engine Engine

Local Local Schema Schema

Query answer

Query Decomposition

Page 4

Fábio Porto

Step1 – Query Analysis Input: Calculus query over relations z

Normalization – Normalize query predicates expressions into conjunctive or

disjunctive normal form z

Analysis – Detect and rejects incorrect queries: z Type errors, z Semantically incorrect (disconnected query graph)

z

Simplification – Elimination of redundant predicates (eg. Resulted from view

unfolding)

z

rewriting – Rewrite query in relational algebra

Fábio Porto

Normalization z

Syntactic and semantic analysis – Like compiler – Check types of attributes and relations in operations

z Transform to normal form – Conjunctive normal form

(p11∨p12∨…∨p1n) ∧…∧ (pm1∨pm2∨…∨pmn) – Disjunctive normal form (p11∧p12 ∧…∧p1n) ∨…∨ (pm1 ∧pm2∧…∧ pmn) z conjunctions linked by unions z Max produce replicated joins – p1∧ (p2 ∨ p3) ≡ (p1 ∧ p2) ∨ (p1 ∧ p3)

Page 5

Fábio Porto

Analysis – Example SELECT FROM WHERE AND AND AND AND

ENAME,RESP E, G, J E.ENO = G.ENO G.JNO = J.JNO JNAME = "CAD/CAM" DUR > 36 TITLE = "Programmer"

Query Graph

Join Graph

DUR>36 E.ENO=G.ENO TITLE = "Programmer"

G

E.ENO=G.ENO

G.JNO=J.JNO

J RESP JNAME="CAD/CAM"

E ENAME

G

E

G.JNO=J.JNO

J

RESULT

Fábio Porto

Analysis – Example Non connected graph – query semantically incorrect . SELECT FROM WHERE AND AND AND

ENAME,RESP E, G, J E.ENO = G.ENO JNAME = "CAD/CAM" DUR > 36 TITLE = "Programmer" DUR 36

E.ENO=G.ENO TITLE = "Programador"

E ENAME

G

JNAME="CAD/CAM" RESP

RESULT

Page 6

J

Fábio Porto

Simplification – example SELECT TITLE FROM E WHERE E.ENAME = "J. Doe" OR (NOT(E.TITLE = "Programmer") AND (E.TITLE = "Programmer") OR E.TITLE = "Elect. Eng.") AND NOT(E.TITLE = "Elect. Eng."))

⇓ SELECT TITLE FROM E WHERE E.ENAME = "J. Doe"

Fábio Porto

Rewriting - example ΠENAME SELECT ENAME FROM E, G, J WHERE E.ENO = G.ENO AND G.JNO = J.JNO AND ENAME "J. Doe" AND JNAME = "CAD/CAM" AND (DUR = 12 OR DUR = 24)

Projection

σDUR=12 OR DUR=24 σJNAME=“CAD/CAM”

Selection

σENAME “J. DOE”

JNO

Join ENO

J

Page 7

G

E

Rewriting – Transformation rules z

Fábio Porto

Commutativity of binary operators – R×S⇔S×R – R S⇔S R – R∪S⇔S∪R

z

Associativity of binary operators – ( R × S ) × T ⇔ R × (S × T) – ( R S ) T ⇔ R (S T )

z

Idempotence of unary operators ƒ Π A’ (Π A’’(R)) ⇔ Π A’(R) where R[A] and A' ⊆ A, A" ⊆ A e A' ⊆ A" ƒ σp

(σp2(A2)(R)) ⇔ σp1(A1) ∧ p2(A2)(R)

1(A1)

Cascading projections: example

Fábio Porto

z

π [nom, prénom] (π [âge] Etudiant) false

z

π [nom, prénom] (π [nom, prénom, âge] Etudiant)

z

Condition : {A1,…,An} ⊆ {B1,…,Bm} ⇒ π A1,…,An(π B1,…,Bm(E)) ≡ π A1,…,An(E)

Page 8

Selection – Projection- inversion: example z

z

Fábio Porto

If F1 expressed uniquely over Ai: π [A1,…, An] ( σF1 (E)) ≡ σF1 (π [A1,…, An] E) Example: π [nom, prénom] σ prénom = "Jean" Etud ≡ ≡ σ prénom = "Jean" π [nom, prénom] Etud But π [nom, prénom] σ âge > 20 Etud ≡ σ âge = 20 π [nom, prénom, âge] Etud

A1, ..,An ∪ Attributes referenced in F

Fábio Porto

Rewriting– Transformation rules z

Commuting selection with binary operators –

σp1(A1)(R × S) ⇔ (σp1(A1)R) × S Commuting selection and join z



σp1(A1)(R

p(Aj,Bk) S)

⇔ (σp1(A1)R)

p(Aj,Bk) S

Commuting selection and union z σp

( 1(A1)

R ∪ S) ⇔ σp1(A1) (R) ∪ σp1(A1) (S)

Page 9

Rewriting – Transformation rules z

Fábio Porto

Commuting projection with binary operation ƒ Π C(R × S) ⇔ Π A’(R) × Π B’(S) ƒ Π C(R

(Aj,Bk)

S) ⇔ Π A’(R)

(Aj,Bk)

Π B’(S)

ƒ Π C(R ∪ S) ⇔ Π C (R) ∪ Π C (S)

where R[A] and S[B]; C = A' ∪ B' where A' ⊆ A, B' ⊆ B

Fábio Porto

Example .

ΠENAME SELECT ENAME FROM J, G, E σDUR=12 OR DUR=24 WHERE G.ENO=E.ENO σJNAME=“CAD/CAM” AND G.JNO=J.JNO AND ENAME =“J. Doe” σENAME=“J. DOE” AND J.NAME=“CAD/CAM” AND(DUR=12 OR DUR=24)

Projection

Selection

JNO

Join ENO

J

Page 10

G

E

Fábio Porto

Equivalent expression ΠENAME

σJNAME=“CAD/CAM” ∧(DUR=12 ∨ DUR=24) ∧ ENAME=“J. DOE”

JNO ∧ENO

× G

J

E

Fábio Porto

Rewriting ΠENAME

JNO

ΠJNO,ENAME

ENO

ΠJNO

ΠJNO,ENO

σJNAME = "CAD/CAM"

σDUR =12 ∧ DUR=24

J

G

Page 11

ΠJNO,ENAME σENAME = E

"J. Doe"

Fábio Porto

View Problems

z

Consider the schema below: – –

z

FilmStar (title,year,star) Film (title,year,duration,studio)

And a view CREATE VIEW FilmOf1996 as select * from Film where year=1996 Which were the actors and the studios where the former played the star role in 1996 films?



Fábio Porto

Query and operators tree Select star, studio from FilmOf1996 f2, FilmStar fs where f2.title= fs.title Π(star, studio)

FilmeStar

σ (year=1996) Film

Page 12

Fábio Porto

Query and operators tree Select star, studio from FilmOf1996 f2, FilmStar fs where f2.title= fs.title In this case, it is interesting to pull-up and Then push-down selections

Π(star, studio)

FilmStar

σ (year=1996) Film

Fábio Porto

Expensive predicates

z

Selections over user-function results: –

z

Push-down selection may be inefficient if the selection cost is comparable to that of joins:

Example: –

select nmregion from humidity h, pollution p where h.idregion=p.idregion and indhumidity(h.blobsatellite) > 5.3

Page 13

Pull-Up – Expensive Predicates

Fábio Porto

Π(nmregion)

Π(nmregion)

σ (indhumidity(h.blobsatellite) > 5.3) Pollution

σ (indhumidity(h.blobsatellite) > 5.3) Pollution

Humidity

Optimization

Page 14

Humidity

Query optimization based on rules

z z

z

Heuristic based approach; Not very efficient for large (number of relations) and complex (not only: select, project, join) queries; Different QEP are equivalent – –

z

Fábio Porto

Produce the same results; Transformations using equivalent expressions;

Ingres, Oracle6

Optimization principles for algebraic expressions

Fábio Porto

1. Exécuter les sélections aussitôt que possible ⇒ objectif : réduire la taille des relations à manipuler 2. Combiner les sélections avec un produit cartésien pour aboutir à une jointure 3. Combiner des séquences d’opérations unaires (π, σ) ⇒ cascade d’opérations appliquée groupée à chaque tuple 4. Sous-expressions communes à une expression (>1) ⇒ si le résultat n’est pas une grande relation, et peut être lu du disque en moins de temps qu’il ne faut pour le calculer ⇒ mémoriser la sous-expression commune.

Page 15

Sketch of an optimization algorithm 1. 2. 3. 4. 5. 6.

Fábio Porto

Decompose each conjunctive expression in a select operation into a cascade of operations; Apply commutativity rules to push-down selections the deepest possible; Rearrange binary operations so smaller relations get joined first and no cartesian product is produced; Substitute Products with joins Push-down projections using commutativity of projections with other operations; Identify group of operations that can be evaluated by a single algorithm;

Algebraic optimization: example

Fábio Porto

Livre (ISBN, titre, auteur, éditeur) Prêt (n°carteP, ISBNP, date) n°carteP référence Lecteur. n°carte ISBNP référence Livre.ISBN Lecteur (n°carte, nom, adresse) z

List readers’ names and book title for all rents before 15.6.04 π [nom, titre] σ [date < 15.06.04] σ [n°carte = n°carteP ∧ ISBN = ISBNP] (Prêt x Lecteur x Livre)

Page 16

After selection decomposition

Fábio Porto

π [nom, titre] σ [date 5(Dep) OP3: σ n0dep = 5(Emp) OP4: σ n0dep = 5 ∧ sal > 3000 ∧ sexo = ‘F’ (Emp) OP5: σid = 12345 ∧ n0proj=10 (Trab)

Fábio Porto

Busca: métodos de pesquisa

z z z z

z

z

S1: Pesquisa linear: recupera cada registro, testando se os valores dos atributos atendem à condição S2: Pesquisa binária: aplicado quando a seleção envolve um atributo chave correspondente a ordenação do arquivo (OP1) S3: Índice primário ou chave organizada por Hash: recuperação de um único registro (OP1) S4: Uso de índice primário para recuperação de registros múltiplos (>,30000

Fábio Porto

Condições disjuntivas (OR)

z

OP4’: σ n0dep = 5 ∨ sal > 3000 ∨ sexo = ‘F’ (Emp) Resultado: união de registros satisfazem às condições individuais

Page 42

Arquitetura – Sincronização e Transferência de dados

Fábio Porto

MultiThread

MonoThread

centralizado

P

paralelo

P TP

Q

distribuido

fila

Q TQ

TPQ •Cada OP produz apenas 1 item de cada vez •Não precisam de buffer

Thread

Fluxo de dados

Arquitetura – Centralização x Distribuição Distribuído

Centralizado

P

P TP

Fábio Porto

TP

in

site1

fila

site2

Q

out

TQ

Q TQ

Page 43

Fábio Porto

Arquitetura – Paralelismo INTERoperator horizontal

INTERoperator vertical

(paralelismo bushy)

(paralelismo pipelined)

INTRAoperator (Paraleismo baseado em particionamento)

P TP

P TP

in

fila

in

R

Q TR

TQ



P TP´

TP

Q

Q

TQ

fila

TQ

Problema: Load Balance

Fábio Porto

Arquitetura – Load balance Balancear:

Problema:

Problema:

Taxa de produção x

Q produz muito

Particionar a fila

Taxa de consumo

P consome pouco

na quantidade Certa!

P TP

P TP

in

R

Q TQ

fila

in

TR

Q TQ

TP´

Q TQ

Problema: Load Balance

Page 44



P TP

fila