Introduction OLS : Original method and modifications Application
Using the OLS algorithm to build interpretable rule bases: an application to a depollution problem S. Destercke 1 1 IRSN
S. Guillaume 2 and B. Charnomordic 3
(Institute of Radioprotection and Nuclear Safety), DPAM, SEMIC, LIMSI Cadarache, France
2 Cemagref
3 INRA
(Agricultural and Environmental Engineering Research), TEMO Montpellier, France (French National Institute for Agricultural Research), LASB Montpellier, France
FUZZ’IEEE 2007 S. Destercke, S. Guillaume, B. Charnomordic
OLS and interpretability : application to depoll. problem
Introduction OLS : Original method and modifications Application
Why and how? Why? (motivations) Among fuzzy learning methods, difficult to find one which . . . treats regression problem, is numerically efficient (good predictive abilities and not require too many resources), builds an interpretable rule base (RB), . . . at the same time (most efficient algorithms giving interpretable RBs are designed for classification problems). How? (our proposition) Take a numerically efficient algorithm designed for regression problems, the OLS (Orthogonal Least Squares), and make it interpretable S. Destercke, S. Guillaume, B. Charnomordic
OLS and interpretability : application to depoll. problem
Introduction OLS : Original method and modifications Application
Interpretability criteria: a reminder Retained interpretability criteria: Interpretable input fuzzy partitions (domain coverage, reasonable number, distinguishable) here, we take standardized fuzzy partitions with triangular membership functions. 1
Reasonable number of rules in the RB Limited number of distinct rule conclusions
S. Destercke, S. Guillaume, B. Charnomordic
OLS and interpretability : application to depoll. problem
Introduction OLS : Original method and modifications Application
Building a fuzzy system Given N samples, to optimize a zero order Sugeno fuzzy system by Least Squares comes down to the problem
min (yb − y )2 ≡ min
N P
k =1
P r
„
p V
i=1
« µ(xik ) θi
p PV
r i=1
µ(
xik
)
2
− y k where p is
the number of premises (input space dimension) Solving it require to optimize r (RB),µ(xik ) (membership fc.) and θi (rule conclusions) → difficult and non-linear problem! S. Destercke, S. Guillaume, B. Charnomordic
OLS and interpretability : application to depoll. problem
Introduction OLS : Original method and modifications Application
OLS: how it works
Original algorithm
Linearize by fixing membership functions (µ(xik ))
Select most important rules by orthogonal variance decomposition (r )
Optimize conclusions by Least Square fitting (θi )
Rule base with optimized conclusions
Modified algorithm
S. Destercke, S. Guillaume, B. Charnomordic
OLS and interpretability : application to depoll. problem
Introduction OLS : Original method and modifications Application
OLS: how it works Original algorithm One gauss. MF per sample 1
0.8
0.6
0.4
0.2
0
1
2
3
4
5
Select rules that explain the most variance by GramSchmidt decomposition
By L-S optimization, each rule has a distinct conclusion
Select most important rules by orthogonal variance decomposition (r )
Optimize conclusions by Least Square fitting (θi )
6
Linearize by fixing membership functions (µ(xik ))
Rule base with optimized conclusions
Modified algorithm
S. Destercke, S. Guillaume, B. Charnomordic
OLS and interpretability : application to depoll. problem
Introduction OLS : Original method and modifications Application
OLS: how it works Original algorithm One gauss. MF per sample
le tab e r p
1
0.8
ter
in ot
0.6
0.4
N
0.2
0
1
2
3
4
5
Select rules that explain the most s variance ru byleGramw Schmidt Fe decomposition
le By L-S optimizale ab thas e tab tion, each rule r e p r r a distinct erp nte concluint yi l t r sion oo No P
6
Linearize by fixing membership functions (µ(xik ))
Select most important rules by orthogonal variance decomposition (r )
Optimize conclusions by Least Square fitting (θi )
Rule base with optimized conclusions
Modified algorithm
S. Destercke, S. Guillaume, B. Charnomordic
OLS and interpretability : application to depoll. problem
Introduction OLS : Original method and modifications Application
OLS: how it works Original algorithm One gauss. MF per sample
le ab t e rpr
1
0.8
0.6
t No
0.4
0.2
0
1
2
e
int 3
4
5
Select rules that explain the most s variance ru byleGramw Schmidt Fe decomposition
le By L-S optimizale ab t ab e t tion, each rule has r e p pr er a distinct ter int conclun i y l t r sion oo No P
6
Linearize by fixing membership functions (µ(xik ))
Select most important rules by orthogonal variance decomposition (r )
Optimize conclusions by Least Square fitting (θi )
Build interpretable partitions fitting data by hierarchical process. 1
Eventually restrict number of selected rules
After LS optimization, reduce number of distinct conclusions by kmeans algorithm
Rule base with optimized conclusions
Int
e
et rpr
le ab
Modified algorithm S. Destercke, S. Guillaume, B. Charnomordic
OLS and interpretability : application to depoll. problem
Introduction OLS : Original method and modifications Application
How do we evaluate the final fuzzy system? samples Coverage Index CIα = # Active , a sample being active # Samples if it fires at least one rule over threshold α. Input 2
Input 2
x99
x99 3 2
3
x2,...,50
2
x1
x2,...,50 x1
1
1
x100 1
2
x100
x51,...,98
x51,...,98 3
Input 1
1
2
3
IF input 1 IS 2 AND IF input 2 IS 1 IF input 1 IS 1 AND IF input 2 IS 2
No threshold (MR ) 0.1 threshold (MR0.1 )
CI0 = 0.99
CI0.1 = 0.02
Input 1
Numerical accuracy measured by classical RMSE on active data q 1 b PI = n (y − y )2 (n=number of active samples for given α) S. Destercke, S. Guillaume, B. Charnomordic
OLS and interpretability : application to depoll. problem
Introduction OLS : Original method and modifications Application
Application: what and why? What?
why?
Water depollution process by anaerobic digestion
Process requires little energy and produces renewable energy But bacteria population grow slow and sensitive to environment changes Need to build system to detect quickly unstable state threatening the population (i.e. fault detection) Acidogenic state particularly critical Using OLS to analyze data with expert help and improve detection systems.
589 samples coming from a fixed-bed reactor of 1m3 Input: 7 variables Output: Expert value between 0-1, characterizing current state as acidogenic or not S. Destercke, S. Guillaume, B. Charnomordic
OLS and interpretability : application to depoll. problem
Introduction OLS : Original method and modifications Application
Application: building input partitions Applying hierarchical process on data gives partitions for four variables (pH, Volatile Fatty Acids, Input flow Qin, CH4 concentration ) A
A1
1.2
2
1.2
A4
A3
1
1
0.8
0.8
0.6
0.6
0.4
0.4
5
5.5
6
6.5
7
7.5
8
8.5
9
0
0
1000
2000
3000
pH A
1.2
5
4
A2
1
4000
5000
6000
7000
8000
vfa A
1.2
3
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0
A
A
3
0.2
0.2
0
A
A2
A1
A1
A
2
A3 A4
A5
0.2
0
5
10
15
20
25
30
35
40
45
50
Input flow rate (Qin) S. Destercke, S. Guillaume, B. Charnomordic
0 40
50
60
70
80
90
CH4 OLS and interpretability : application to depoll. problem
Introduction OLS : Original method and modifications Application
Application: analysis of first results OLS on 589 samples : final RB has 53 rules and PI=0.046 Remarks on these first result Rule ordering: On 589 samples, only 35 have output > 0.5, while among the first 10 rules, 8 have output > 0.5 (with 6 close to 1) → Selecting rule by variance tends to privilege "faulty" samples, which contribute more to the variance. Out of range conclusions: Some computed rule conclusions are outside [0, 1], due to the unconstrained least-square optimization. Detection and treatment of outliers: two of the first rules If pH is "high" (A3 ) and . . . , then output is 0.999 If pH is "very high" (A4 ) and . . . , then output is 1 were inconsistent with knowledge (acidogenic state incoherent with a basic pH). The 2 samples corresponding to these rules were removed. S. Destercke, S. Guillaume, B. Charnomordic
OLS and interpretability : application to depoll. problem
Introduction OLS : Original method and modifications Application
Application: distinct conclusion reduction In final system, number of distinct conclusions was brought from 49 to 6 5
30 25 Rule conclusion distribution with non−reduced vocabulary Number of rules
Number of rules
4
3
2
1
Rule conclusion distribution with reduced vocabulary
20 15 10 5
0 −0.5
0
0.5
Conclusion Value
1
1.5
S. Destercke, S. Guillaume, B. Charnomordic
0 −0.2
0
0.2
0.4
0.6
Conclusion Value
0.8
1
1.2
OLS and interpretability : application to depoll. problem
Introduction OLS : Original method and modifications Application
Application: final system summary Rules
PI(α=0)
CI0
CI0.1
Modified OLS
51
0.054
100 %
100 %
Original OLS
51
0.074
100 %
30 %
1
High risk Inferred value
0.8
Non-neglectable risk
0.6
0.4
Needs further investigation
0.2
Very low risk 0
0
0.2
0.4
0.6
Observed value
S. Destercke, S. Guillaume, B. Charnomordic
0.8
1
OLS and interpretability : application to depoll. problem
Introduction OLS : Original method and modifications Application
Application: summary
Applying the modified OLS algorithm allowed us to Remove erroneous data from sample base Extract rules corresponding to critical situations Point out interesting experimental points for experts Build a final interpretable system with a good qualitative predictive quality (and whose numerical efficiency competes with the one of the original method)
Moreover, OLS algorithm (by its principle) seems particularly fitted to problems when important samples are also rare, like fault detection problems.
S. Destercke, S. Guillaume, B. Charnomordic
OLS and interpretability : application to depoll. problem
Introduction OLS : Original method and modifications Application
modified OLS: advantages/defects/perspectives Advantages Provides robust and interpretable rule bases for regression problems Focus on rare samples and on most important rules Can be used for knowledge extraction as well as for system modeling
Disadvantages Computational cost in high dimensional problems Variables have to be selected before applying OLS Learned Rules are complete (i.e. contain all inputs)
Perspectives Robustness study, Remedies to disadvantages cited above Extend to other methods based on orthogonalisation (e.g. TLS) Refine rule sel., e.g. by using backward-forward regression techniques S. Destercke, S. Guillaume, B. Charnomordic
OLS and interpretability : application to depoll. problem