Tree Based Models - Ugo Jardonnet

Jul 13, 2012 - Committee. Boosting. Adaboost. Ugo Jardonnet. Tree Based Models. 12 / 21. Page 13. Committee. Boosting. Pro. Boosting ... over-fits very ...
279KB taille 0 téléchargements 339 vues
Tree Based Models Ugo Jardonnet

July 13, 2012

Ugo Jardonnet

Tree Based Models

1 / 21

Table of Contents 1

Classification and Regression Tree Introduction to CARTs Estimate Impurity

2

Committee Methods Bagging Boosting

3

Building CARTs Splits Construction Parameters

4

Conclusion

Ugo Jardonnet

Tree Based Models

2 / 21

CART

Outline 1

Classification and Regression Tree Introduction to CARTs Estimate Impurity

2

Committee Methods Bagging Boosting

3

Building CARTs Splits Construction Parameters

4

Conclusion

Ugo Jardonnet

Tree Based Models

3 / 21

CART

Introduction to CARTs

Classification tree

a

b

Ugo Jardonnet

Tree Based Models

c

4 / 21

CART

Introduction to CARTs

Classification tree

a

b

Ugo Jardonnet

Tree Based Models

c

5 / 21

CART

Introduction to CARTs

Classification and Regression trees

CARTs Binary trees Efficient for classification AND regression Expert friendly

Ugo Jardonnet

Tree Based Models

6 / 21

CART

Estimate Impurity

Estimate Node Impurity

CARTs Classification: Giny Index ...

Regression: Variance Variance   Var(X ) = E (X − E [X ])2 ...

Ugo Jardonnet

Tree Based Models

7 / 21

Committee

Outline 1

Classification and Regression Tree Introduction to CARTs Estimate Impurity

2

Committee Methods Bagging Boosting

3

Building CARTs Splits Construction Parameters

4

Conclusion

Ugo Jardonnet

Tree Based Models

8 / 21

Committee

Bagging

Random Forest

a

b

Ugo Jardonnet

a

f

c

d

e

Tree Based Models

c

e

...

9 / 21

Committee

Bagging

Pro

Random forest Excellent Accuracy Fast and efficient on large datasets Estimate what variables are important Methods for unbalanced Dataset Do not overfit

Ugo Jardonnet

Tree Based Models

10 / 21

Committee

Boosting

Boosting

Boosted Tree Introduced by Freund and Schapire 1995. General method for improving the accuracy of any given classifier/learner better than random. Given a weak learner model h generates a strong learner of the form X α t ht t

Ugo Jardonnet

Tree Based Models

11 / 21

Committee

Boosting

Adaboost

Ugo Jardonnet

Tree Based Models

12 / 21

Committee

Boosting

Pro

Boosting ... over-fits very slowly allows feature selection Standard for a large variety of detection and recognition applications. Face detection [Viola&Jones01] Face recognition [Lu06] Learning from Ambiguously Labeled Images [Cour08] ...

Ugo Jardonnet

Tree Based Models

13 / 21

CARTimpl

Outline 1

Classification and Regression Tree Introduction to CARTs Estimate Impurity

2

Committee Methods Bagging Boosting

3

Building CARTs Splits Construction Parameters

4

Conclusion

Ugo Jardonnet

Tree Based Models

14 / 21

1 2 3 4 5 6 7 8 9 10 11 12 13 14

CARTimpl

Splits

Building CARTs: Naive Split

for ( std :: size_t i = 0; i < features . size () ; i ++) { for ( std :: size_t j = 0; j < observations . size () ; j ++) { int threshold = observations [ j ][ i ]; for ( std :: size_t k = 0; k < observations . size () ; k ++) { if ( observations [ k ] < threshold ) ... else ... } } }

Listing˜1: Scan the entire dataset for each splitting value

Ugo Jardonnet

Tree Based Models

15 / 21

1 2 3 4 5 6 7 8 9 10 11 12

CARTimpl

Splits

Building CARTs: Standard Split

for ( std :: size_t dim = 0; dim < features . size () ; dim ++) { std :: sort ( observations . begin () , observations . end () , [ dim ]( const Obs & a , const Obs & b ) { return a [ dim ] > b [ dim ]; }) ; for ( auto obs : observations ) { ... } }

Listing˜2: Quick sort on each feature

Ugo Jardonnet

Tree Based Models

16 / 21

CARTimpl

Splits

Bucketed

Ugo Jardonnet

Tree Based Models

17 / 21

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

CARTimpl

Splits

Building CARTs: Bucket Split

for ( std :: size_t dim = 0; dim < nb_features ; dim ++) { for ( auto obs : observations ) { int bucket = ( obs [ dim ] - min [ dim ]) / (( max [ dim ] - min [ dim ]) ) * ( slices . size () -1) ; slices [ bucket ] += {y , y * y , 1}; } for ( auto current_slice : slices ) { left_sum , left_sum2 , nb_left += current_slice ; double vleft = variance ( left_sum , left_sum2 , nb_left ) ; ... double gain = vleft + vright ; } }

Listing˜3: Bucket sorting features

Ugo Jardonnet

Tree Based Models

18 / 21

CARTimpl

Splits

Building CARTs: Bucket Split

Possible if splitting Criteria is a direct function of additive sub-variables.   var (X ) = E X 2 − (E[X ])2

Ugo Jardonnet

Tree Based Models

19 / 21

Conclusion

Outline 1

Classification and Regression Tree Introduction to CARTs Estimate Impurity

2

Committee Methods Bagging Boosting

3

Building CARTs Splits Construction Parameters

4

Conclusion

Ugo Jardonnet

Tree Based Models

20 / 21

Conclusion

Conclusion let N be the number of observations. Complexities of a split: Naive : nb features × N × N Standard : nb features × (N.log (N) + N) Bucketed : nb features × (N + nb slices) CCL Committee methods have very good properties Rely on the fact that weak learner are indeed weak Good match with CARTS and fast to construct.

Ugo Jardonnet

Tree Based Models

21 / 21