Query Processing
Fabio Porto LBD IBD – été 2005
Fábio Porto
Outline
z z z
Introduction Query Decomposition Query Optimization – –
Rule based Cost based z
z
Cost model
Conclusion
Page 1
Query Processing Main Problem
Fábio Porto
High level query language (eg. Relational calculus) Schema information
Statistics
Query Processor Rules
Low-level operations (eg. Relational algebra)
Fábio Porto
A view of the problem Πename,dept (Depto dno (EMP
Πename(EMP) D’=D
ENO
(σ resp=“manager” (Alloc)))
Πename(EMP)
DNOE’
E’=E
E’ E’=E
D’
ENOA’
D’=D
A’
DNOA’
A’
A’=σresp=manager(A)
(P1)
DNOD’
A’=σresp=manager(A)
?????
Page 2
(P2)
Fábio Porto
Cost of Strategies z
Assumptions: – size(E) = 400, size(Alloc) = 1000, size(Depto)=200, 50 mngrs in Alloc; z Direct access to Alloc tuples given resp attribute; z Uniform distribution ≈ 2,5 Allcs/Emp
z
– Cost of accessing 1 tuple = 10 unit; – Cost of joining 1 tuple = 3 units; Strategy 1 • • • •
produce A': (50)∗tupleaccesscost Join A' to E: (50 x 400)∗joinCost Join E’ to D: (50x200) ∗ joinCost Project end result
= 500 = 60000 = 30000 = 0
Total Cost = 90500
Fábio Porto
Cost of Strategy 2 z
Strategy 2 • Exercise
Page 3
Fábio Porto
Stages of Query Processing Query over relations Query Query Analysis Analysis
Schema Schema
Algebraic query
Global Global Optimization Optimization
Statistics Statisticson on relations relations
Query Execution Plan (Algebraic operators + Control operators) Query QueryExecution Execution Engine Engine
Local Local Schema Schema
Query answer
Query Decomposition
Page 4
Fábio Porto
Step1 – Query Analysis Input: Calculus query over relations z
Normalization – Normalize query predicates expressions into conjunctive or
disjunctive normal form z
Analysis – Detect and rejects incorrect queries: z Type errors, z Semantically incorrect (disconnected query graph)
z
Simplification – Elimination of redundant predicates (eg. Resulted from view
unfolding)
z
rewriting – Rewrite query in relational algebra
Fábio Porto
Normalization z
Syntactic and semantic analysis – Like compiler – Check types of attributes and relations in operations
z Transform to normal form – Conjunctive normal form
(p11∨p12∨…∨p1n) ∧…∧ (pm1∨pm2∨…∨pmn) – Disjunctive normal form (p11∧p12 ∧…∧p1n) ∨…∨ (pm1 ∧pm2∧…∧ pmn) z conjunctions linked by unions z Max produce replicated joins – p1∧ (p2 ∨ p3) ≡ (p1 ∧ p2) ∨ (p1 ∧ p3)
Page 5
Fábio Porto
Analysis – Example SELECT FROM WHERE AND AND AND AND
ENAME,RESP E, G, J E.ENO = G.ENO G.JNO = J.JNO JNAME = "CAD/CAM" DUR > 36 TITLE = "Programmer"
Query Graph
Join Graph
DUR>36 E.ENO=G.ENO TITLE = "Programmer"
G
E.ENO=G.ENO
G.JNO=J.JNO
J RESP JNAME="CAD/CAM"
E ENAME
G
E
G.JNO=J.JNO
J
RESULT
Fábio Porto
Analysis – Example Non connected graph – query semantically incorrect . SELECT FROM WHERE AND AND AND
ENAME,RESP E, G, J E.ENO = G.ENO JNAME = "CAD/CAM" DUR > 36 TITLE = "Programmer" DUR 36
E.ENO=G.ENO TITLE = "Programador"
E ENAME
G
JNAME="CAD/CAM" RESP
RESULT
Page 6
J
Fábio Porto
Simplification – example SELECT TITLE FROM E WHERE E.ENAME = "J. Doe" OR (NOT(E.TITLE = "Programmer") AND (E.TITLE = "Programmer") OR E.TITLE = "Elect. Eng.") AND NOT(E.TITLE = "Elect. Eng."))
⇓ SELECT TITLE FROM E WHERE E.ENAME = "J. Doe"
Fábio Porto
Rewriting - example ΠENAME SELECT ENAME FROM E, G, J WHERE E.ENO = G.ENO AND G.JNO = J.JNO AND ENAME "J. Doe" AND JNAME = "CAD/CAM" AND (DUR = 12 OR DUR = 24)
Projection
σDUR=12 OR DUR=24 σJNAME=“CAD/CAM”
Selection
σENAME “J. DOE”
JNO
Join ENO
J
Page 7
G
E
Rewriting – Transformation rules z
Fábio Porto
Commutativity of binary operators – R×S⇔S×R – R S⇔S R – R∪S⇔S∪R
z
Associativity of binary operators – ( R × S ) × T ⇔ R × (S × T) – ( R S ) T ⇔ R (S T )
z
Idempotence of unary operators Π A’ (Π A’’(R)) ⇔ Π A’(R) where R[A] and A' ⊆ A, A" ⊆ A e A' ⊆ A" σp
(σp2(A2)(R)) ⇔ σp1(A1) ∧ p2(A2)(R)
1(A1)
Cascading projections: example
Fábio Porto
z
π [nom, prénom] (π [âge] Etudiant) false
z
π [nom, prénom] (π [nom, prénom, âge] Etudiant)
z
Condition : {A1,…,An} ⊆ {B1,…,Bm} ⇒ π A1,…,An(π B1,…,Bm(E)) ≡ π A1,…,An(E)
Page 8
Selection – Projection- inversion: example z
z
Fábio Porto
If F1 expressed uniquely over Ai: π [A1,…, An] ( σF1 (E)) ≡ σF1 (π [A1,…, An] E) Example: π [nom, prénom] σ prénom = "Jean" Etud ≡ ≡ σ prénom = "Jean" π [nom, prénom] Etud But π [nom, prénom] σ âge > 20 Etud ≡ σ âge = 20 π [nom, prénom, âge] Etud
A1, ..,An ∪ Attributes referenced in F
Fábio Porto
Rewriting– Transformation rules z
Commuting selection with binary operators –
σp1(A1)(R × S) ⇔ (σp1(A1)R) × S Commuting selection and join z
–
σp1(A1)(R
p(Aj,Bk) S)
⇔ (σp1(A1)R)
p(Aj,Bk) S
Commuting selection and union z σp
( 1(A1)
R ∪ S) ⇔ σp1(A1) (R) ∪ σp1(A1) (S)
Page 9
Rewriting – Transformation rules z
Fábio Porto
Commuting projection with binary operation Π C(R × S) ⇔ Π A’(R) × Π B’(S) Π C(R
(Aj,Bk)
S) ⇔ Π A’(R)
(Aj,Bk)
Π B’(S)
Π C(R ∪ S) ⇔ Π C (R) ∪ Π C (S)
where R[A] and S[B]; C = A' ∪ B' where A' ⊆ A, B' ⊆ B
Fábio Porto
Example .
ΠENAME SELECT ENAME FROM J, G, E σDUR=12 OR DUR=24 WHERE G.ENO=E.ENO σJNAME=“CAD/CAM” AND G.JNO=J.JNO AND ENAME =“J. Doe” σENAME=“J. DOE” AND J.NAME=“CAD/CAM” AND(DUR=12 OR DUR=24)
Projection
Selection
JNO
Join ENO
J
Page 10
G
E
Fábio Porto
Equivalent expression ΠENAME
σJNAME=“CAD/CAM” ∧(DUR=12 ∨ DUR=24) ∧ ENAME=“J. DOE”
JNO ∧ENO
× G
J
E
Fábio Porto
Rewriting ΠENAME
JNO
ΠJNO,ENAME
ENO
ΠJNO
ΠJNO,ENO
σJNAME = "CAD/CAM"
σDUR =12 ∧ DUR=24
J
G
Page 11
ΠJNO,ENAME σENAME = E
"J. Doe"
Fábio Porto
View Problems
z
Consider the schema below: – –
z
FilmStar (title,year,star) Film (title,year,duration,studio)
And a view CREATE VIEW FilmOf1996 as select * from Film where year=1996 Which were the actors and the studios where the former played the star role in 1996 films?
–
Fábio Porto
Query and operators tree Select star, studio from FilmOf1996 f2, FilmStar fs where f2.title= fs.title Π(star, studio)
FilmeStar
σ (year=1996) Film
Page 12
Fábio Porto
Query and operators tree Select star, studio from FilmOf1996 f2, FilmStar fs where f2.title= fs.title In this case, it is interesting to pull-up and Then push-down selections
Π(star, studio)
FilmStar
σ (year=1996) Film
Fábio Porto
Expensive predicates
z
Selections over user-function results: –
z
Push-down selection may be inefficient if the selection cost is comparable to that of joins:
Example: –
select nmregion from humidity h, pollution p where h.idregion=p.idregion and indhumidity(h.blobsatellite) > 5.3
Page 13
Pull-Up – Expensive Predicates
Fábio Porto
Π(nmregion)
Π(nmregion)
σ (indhumidity(h.blobsatellite) > 5.3) Pollution
σ (indhumidity(h.blobsatellite) > 5.3) Pollution
Humidity
Optimization
Page 14
Humidity
Query optimization based on rules
z z
z
Heuristic based approach; Not very efficient for large (number of relations) and complex (not only: select, project, join) queries; Different QEP are equivalent – –
z
Fábio Porto
Produce the same results; Transformations using equivalent expressions;
Ingres, Oracle6
Optimization principles for algebraic expressions
Fábio Porto
1. Exécuter les sélections aussitôt que possible ⇒ objectif : réduire la taille des relations à manipuler 2. Combiner les sélections avec un produit cartésien pour aboutir à une jointure 3. Combiner des séquences d’opérations unaires (π, σ) ⇒ cascade d’opérations appliquée groupée à chaque tuple 4. Sous-expressions communes à une expression (>1) ⇒ si le résultat n’est pas une grande relation, et peut être lu du disque en moins de temps qu’il ne faut pour le calculer ⇒ mémoriser la sous-expression commune.
Page 15
Sketch of an optimization algorithm 1. 2. 3. 4. 5. 6.
Fábio Porto
Decompose each conjunctive expression in a select operation into a cascade of operations; Apply commutativity rules to push-down selections the deepest possible; Rearrange binary operations so smaller relations get joined first and no cartesian product is produced; Substitute Products with joins Push-down projections using commutativity of projections with other operations; Identify group of operations that can be evaluated by a single algorithm;
Algebraic optimization: example
Fábio Porto
Livre (ISBN, titre, auteur, éditeur) Prêt (n°carteP, ISBNP, date) n°carteP référence Lecteur. n°carte ISBNP référence Livre.ISBN Lecteur (n°carte, nom, adresse) z
List readers’ names and book title for all rents before 15.6.04 π [nom, titre] σ [date < 15.06.04] σ [n°carte = n°carteP ∧ ISBN = ISBNP] (Prêt x Lecteur x Livre)
Page 16
After selection decomposition
Fábio Porto
π [nom, titre] σ [date 5(Dep) OP3: σ n0dep = 5(Emp) OP4: σ n0dep = 5 ∧ sal > 3000 ∧ sexo = ‘F’ (Emp) OP5: σid = 12345 ∧ n0proj=10 (Trab)
Fábio Porto
Busca: métodos de pesquisa
z z z z
z
z
S1: Pesquisa linear: recupera cada registro, testando se os valores dos atributos atendem à condição S2: Pesquisa binária: aplicado quando a seleção envolve um atributo chave correspondente a ordenação do arquivo (OP1) S3: Índice primário ou chave organizada por Hash: recuperação de um único registro (OP1) S4: Uso de índice primário para recuperação de registros múltiplos (>,30000
Fábio Porto
Condições disjuntivas (OR)
z
OP4’: σ n0dep = 5 ∨ sal > 3000 ∨ sexo = ‘F’ (Emp) Resultado: união de registros satisfazem às condições individuais
Page 42
Arquitetura – Sincronização e Transferência de dados
Fábio Porto
MultiThread
MonoThread
centralizado
P
paralelo
P TP
Q
distribuido
fila
Q TQ
TPQ •Cada OP produz apenas 1 item de cada vez •Não precisam de buffer
Thread
Fluxo de dados
Arquitetura – Centralização x Distribuição Distribuído
Centralizado
P
P TP
Fábio Porto
TP
in
site1
fila
site2
Q
out
TQ
Q TQ
Page 43
Fábio Porto
Arquitetura – Paralelismo INTERoperator horizontal
INTERoperator vertical
(paralelismo bushy)
(paralelismo pipelined)
INTRAoperator (Paraleismo baseado em particionamento)
P TP
P TP
in
fila
in
R
Q TR
TQ
P´
P TP´
TP
Q
Q
TQ
fila
TQ
Problema: Load Balance
Fábio Porto
Arquitetura – Load balance Balancear:
Problema:
Problema:
Taxa de produção x
Q produz muito
Particionar a fila
Taxa de consumo
P consome pouco
na quantidade Certa!
P TP
P TP
in
R
Q TQ
fila
in
TR
Q TQ
TP´
Q TQ
Problema: Load Balance
Page 44
P´
P TP
fila