Arthur Charpentier, ESCP - November 2018
Big Data and Artificial Intelligence* A. Charpentier (Universit´e du Qu´ebec ` a Montr´eal)
Ecole Sup´erieure de Commerce de Paris, 2018.
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
1
Arthur Charpentier, ESCP - November 2018
A. Charpentier (Universit´e du Qu´ebec ` a Montr´eal) Professor Mathematics Department, Universit´e du Qu´ebec ` a Montr´eal previously Econ. Dept, Universit´e de Rennes & ENSAE Paristech actuary in Hong Kong, IT & Stats FFA director Data Science for Actuaries Program, Institute of Actuaries PhD in Statistics (KU Leuven), Fellow of the Institute of Actuaries MSc in Financial Mathematics (Paris Dauphine) & ENSAE Research Chair : ACTINFO (valorisation et nouveaux usages actuariels de l’information) Editor of the freakonometrics.hypotheses.org’s blog Editor of Computational Actuarial Science, CRC Author of Math´ematiques de l’Assurance Non-Vie (2 vol.), Economica
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
2
Arthur Charpentier, ESCP - November 2018
Karmali (2017, Spam Classifier in Python from scratch)
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
3
Arthur Charpentier, ESCP - November 2018
Spam filter, The Guardian’s tech diary column (2018, Tired of texting? Google tests robot to chat with friends for you) @freakonometrics
freakonometrics
freakonometrics.hypotheses.org
4
Arthur Charpentier, ESCP - November 2018
Castelvecchi (2016, Deep learning boosts Google Translate tool) and Korbut (2017, Machine Learning Translation and the Google Translate Algorithm)
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
5
Arthur Charpentier, ESCP - November 2018
Silver et al. (2016, Mastering the game of go without human knowledge)
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
6
Arthur Charpentier, ESCP - November 2018
Chen (2014, Deep Learning for Self -driving Car)
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
7
Arthur Charpentier, ESCP - November 2018
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
8
Arthur Charpentier, ESCP - November 2018
O’Neil (2016, Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy) and Eubanks (2018, Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor) @freakonometrics
freakonometrics
freakonometrics.hypotheses.org
9
Arthur Charpentier, ESCP - November 2018
Starr (2018, Evidence-Based Sentencing and Scientific Rationalization of Discrimination) @freakonometrics
freakonometrics
freakonometrics.hypotheses.org
10
Arthur Charpentier, ESCP - November 2018
Backer (2018, And an Algorithm to Bind Them All? Social Credit, Data Driven Governance, and the Emergence of Operating System for Global Normative Orders)
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
11
Arthur Charpentier, ESCP - November 2018
Data ? Exhaustive statistics ?
Historically, a political instrument, with the use of aggregates to describe the society, see Martin (2016, chiffrer pour ´evaluer). Growing importance for public policy evaluation, and importance of censuses (exhaustivity) @freakonometrics
freakonometrics
freakonometrics.hypotheses.org
12
Arthur Charpentier, ESCP - November 2018
Data ? Sampling Techniques
1936, US Presidential Election, “the more, the better” ? Neyman (1934, On the two different aspects of the representative method) Cochran (1953, Sampling Techniques) or Deming (1966, Some Theory of Sampling). Importance of mathematical statistics (and asmptotic properties) @freakonometrics
freakonometrics
freakonometrics.hypotheses.org
13
Arthur Charpentier, ESCP - November 2018
Data ? The New Oil ?
Rotella (2012, Data The New Oil?) or Toonders (2014, Data Is the New Oil of the Digital Economy) Not rare, infinite deposit, good not rival, no intrinsic value, difficult to protect, and easy to (re)produce, importance of flows (more than stocks) @freakonometrics
freakonometrics
freakonometrics.hypotheses.org
14
Arthur Charpentier, ESCP - November 2018
Data ? What For ?
“People use statistics as the drunken man uses lamp posts - for support rather than illumination” (Andrew Lang, or not). Notion of Data Driven @freakonometrics
freakonometrics
freakonometrics.hypotheses.org
15
Arthur Charpentier, ESCP - November 2018
Big Data : What Data Say About Us (when we think no one’s watching)
Donnelly (2014, Why OkCupid Users Don’t Mind Being Lab Rats) about Rudder (2014, Dataclysm) and http://okcupid.com/
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
16
Arthur Charpentier, ESCP - November 2018
Big Data & Curse of Dimensionality “The common theme of these problems is that when the dimensionality increases, the volume of the space increases so fast that the available data become sparse. This sparsity is problematic for any method that requires statistical significance. In order to obtain a statistically sound and reliable result, the amount of data needed to support the result often grows exponentially with the dimensionality”
Rose (2016, The End of Average) @freakonometrics
freakonometrics
freakonometrics.hypotheses.org
17
Arthur Charpentier, ESCP - November 2018
Big Data & Curse of Dimensionality : the average does not exist...
Norma & Normann, Cleveland (1943) by artist Abram Belskie and obstetrician Robert Dickinson based on the measurements of 15,000 men and women between the ages of 21 and 25 compiled from a variety of sources, in white racial groups see Stephens (2004, The Object of Normality: The ‘Search for Norma’ Competition) or Cambers (2004, The Law of Averages : Norman and Norma)
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
18
Arthur Charpentier, ESCP - November 2018
Big Data / New Data ? Text Based Data
• text analytics, web crawling and graph mining e.g. yelp.com review corpus (see Lee & Mimmo, 2014 Low-dimensional Embeddings for Interpretable Anchor-based Topic Inference) index i is a review variable xk indicates if review contains kth word (e.g. yoga, dog or bbq)
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
19
Arthur Charpentier, ESCP - November 2018
Big Data / New Data ? Text Based Data Seminar noun ’sem.I.na:r
• co-clustering and text mining Simultaneous clustering of rows and columns in a matrix
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
an occasion when a teacher or expert and a group of people meet to study and discuss something via dictionary.cambridge.org a small group of students, as in a university, engaged in advanced study and original research under a member of the faculty and meeting regularly to exchange information and hold discussions via dictionary.reference.com
20
Arthur Charpentier, ESCP - November 2018
Big Data / New Data ? Text Based Data • Recommendation system What should you get when you search black iphone 5 and in which order should you sort items see also Santini & Jain (2005, Similarity matching
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
21
Arthur Charpentier, ESCP - November 2018
Big Data / New Data ? Text Based Data • Internet Browser Searches see Google Flu Project
Ginsberg et al. (2009, Detecting influenza epidemics using search engine query data) Butler (2013, When Google got flu wrong)
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
22
Arthur Charpentier, ESCP - November 2018
Big Data / New Data ? Network Data Classical (individual based) data in econonmetrics (yi , xi ), (yj , xj ), etc supposed to be independent Individuals are nodes of a network vi , vj , etc, that can be connected ei,j = 1 or not ei,j = 0. See Easley, D. & Kleinberg, J. (2010) Networks, Crowds, and Markets Cambridge University Press, Jackson, M. (2008). Social and Economic Networks, Wasserman, S. & Faust, K. (1994) Social Network Analysis, Christakis & Fowler (2009, Connected : The Surprising Power of Our Social Networks and How They Shape Our Lives ) or Can & Alatas (2017, Big Social Network Data and Sustainable Economic Development) Working with networks can be complicated, and conter-intuitive @freakonometrics
freakonometrics
freakonometrics.hypotheses.org
23
Arthur Charpentier, ESCP - November 2018
Big Data / New Data ? Network Data • Friendship paradox People on average have fewer friends than their friends (popular people are over-represented in the views of others) See Game of Thrones network
See Feld (1991) and Zuckerman & Jost (2001) @freakonometrics
freakonometrics
freakonometrics.hypotheses.org
24
Arthur Charpentier, ESCP - November 2018
Big Data / New Data ? Network Data • Homophily, “birds of a feather flock together”
from Moody (2001) Race, School Integration and Friendship Segregation in America see also Being ‘wasted’ on Facebook may damage your credit score
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
25
Arthur Charpentier, ESCP - November 2018
Big Data / New Data ? Network Data • Peer effect see Angrist (2014, The perils of peer effects)
Source : Perkins, Haines & Rice (2005, Misperceiving the college drinking norm and related problems)
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
26
Arthur Charpentier, ESCP - November 2018
Big Data / New Data ? Pictures Data • image processing a picture is an array [8,] [9,] [8,] [9,] [8,] [9,]
[,454] 0.5137255 0.5019608 [,457] 0.4666667 0.4823529 [,460] 0.5960784 0.5450980
@freakonometrics
[,455] 0.5176471 0.5254902 [,458] 0.5921569 0.5294118 [,461] 0.5764706 0.6000000
freakonometrics
[,456] 0.5411765 0.5137255 [,459] 0.5529412 0.6117647 [,462] 0.5529412 0.5607843
freakonometrics.hypotheses.org
27
Arthur Charpentier, ESCP - November 2018
Big Data / New Data ? Pictures Data
Lenna, Playboy Magazine, November 1972 with an RGB decomposition, wikipedia can be used for feature detection (edges, SIFT - scale-invariant feature transform)
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
28
Arthur Charpentier, ESCP - November 2018
Big Data / New Data ? Pictures Data
Learning is based on tagged trained samples,
Goedegebuure (2016, You Are Helping Google AI Image Recognition) O’Malley (2018, how you’ve been training AI for years without realising it
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
29
Arthur Charpentier, ESCP - November 2018
Big Data / New Data ? Pictures Data
Sometimes, things can be complicated...
Labradoodle or fried chicken ? Sheepdog or mop ? Barn owl or apple ?
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
30
Arthur Charpentier, ESCP - November 2018
Big Data / New Data ? Pictures Data • face or emotion recognition
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
31
Arthur Charpentier, ESCP - November 2018
Big Data / New Data ? Pictures Data - Automatic Labeling
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
32
Arthur Charpentier, ESCP - November 2018
Big Data / New Data ? Pictures Data can be used to locate a place from a picture
can be used to locate someone in the crowd ?
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
33
Arthur Charpentier, ESCP - November 2018
Big Data / Big Brother ?
Botsman (2017, Big data meets Big Brother as China moves to rate its citizens) @freakonometrics
freakonometrics
freakonometrics.hypotheses.org
34
Arthur Charpentier, ESCP - November 2018
Big Data / Open Data ?
One can use open data to create an app, e.g. based on oil price, prixdescarburants.info and data.gouv.fr
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
35
Arthur Charpentier, ESCP - November 2018
Big Data / Open Data ? See citymapper, public transit app and mapping service See Cohen (2018, The Guy Making Public Transit Smarter) 2011, Azmat Yusuf arrived in London to work for Google Lost with London maps for public transportation, creates “Busmapper” (that will become “Citymapper”
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
36
Arthur Charpentier, ESCP - November 2018
Big Data / Open Data ? Based on open data, see the blog post Getting from A to (Series) B
2016: arrived in Paris, RATP did not want to open its data, bad buzz... Collects billions of trajectories, used to compute real-time optimal journey See also Building a city without open data: introducing Project Istanbul
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
37
Arthur Charpentier, ESCP - November 2018
Big Data / Open Data ? Open data with very low granularity Rankin (2009) with census data. See privacy breach, Sweeney (2002, k-anonymity)
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
38
Arthur Charpentier, ESCP - November 2018
Privacy : A New Old Problem ?
Boeth (1970, Is Privacy Dead?) - see also GDPR in Europe
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
39
Arthur Charpentier, ESCP - November 2018
Open - But Grouped - data ? In order to avoid privacy issues, we do not acess individual data, xi = (x1,i , x2,i , · · · ), but aggregated data, e.g. per spatial region xj = (x1,j , x1,j , · · · ) where x1,j is the average of individuals i leaving in region j. Problem, ecological fallacy - see wikipedia
see Gelman (2010, Red State, Blue State, Rich State, Poor State) @freakonometrics
freakonometrics
freakonometrics.hypotheses.org
40
Arthur Charpentier, ESCP - November 2018
Open - But Grouped - data ? See also Simpson’s paradox - from Blyth (1972)
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
41
Arthur Charpentier, ESCP - November 2018
Non-open data ? Use scraping techniques, e.g. Web Browser Automation with selenium R code to scrap the Prom´eth´ee database on fires, in France see Scraper la base d’incendies de forˆets or Scraper pour avoir des infos sur les m´edecins sur Paris on the ameli database, on doctors
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
42
Arthur Charpentier, ESCP - November 2018
Non-open data ? see A quelle distance d’une banque habite-t-on ?
with a scrap of cbanque
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
43
Arthur Charpentier, ESCP - November 2018
Non-open data ? see Acheter un billet de train (pas trop cher)
based on casperjs, a browser emulator written in javascript. @freakonometrics
freakonometrics
freakonometrics.hypotheses.org
44
Arthur Charpentier, ESCP - November 2018
Non-open data ? Consider the case where datasets are located on various servers, and cannot be downloaded (e.g. hospitals), but one can run functions and obtain outputs. see Wolfson et. al (2010, Data Shield) or http://www.datashield.ac.uk/
Consider a regression model y = Xβ + ε b = [X T X]−1 X T y is the OLS estimator. Recall that β Is it possible to use parallel computations ? [spoiler: yes].
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
45
Arthur Charpentier, ESCP - November 2018
Big Data and Black Box Models Black box models that solve a prediction problem: • given an input x • predict an appropriate output y E.g spam detection : x is an incoming email, y ∈ {spam, not spam} (binary classification) E.g medical diagnosis : x is the list of symptoms, y is the diagnosis (classification) E.g finance : x is the history of stock’s prices, y is the prediction of stock price for the next day/week/month (regression) A prediction or a model is a function m : X → Y.
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
46
Arthur Charpentier, ESCP - November 2018
Big Data and Black Box Models
Historically, models were based on a rule-based approach ! (labor intensive to build, and hardly extendable to other situations)
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
47
Arthur Charpentier, ESCP - November 2018
Big Data and Black Box Models
Deep Blue, February 1996, 1047 board positions in chess, Campbell et al. (2002) AlphaGo, March 2016, 10170 board positions in Go Silver et al. (2017) (19 × 19)
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
48
Arthur Charpentier, ESCP - November 2018
Artificial Intelligence : the End of Models ?
Chris Anderson (2008, The Data Deluge Makes the Scientific Method Obsolete) @freakonometrics
freakonometrics
freakonometrics.hypotheses.org
49
Arthur Charpentier, ESCP - November 2018
“Models” (or “Algorithms”) can go wrong See Amazon’s prices, Eisen (2011, Amazon’s $23,698,655.93 book about flies)
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
50
Arthur Charpentier, ESCP - November 2018
“Models” (or “Algorithms”) can go wrong See 2010 Flash Crash ( see wikipedia)
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
51
Arthur Charpentier, ESCP - November 2018
(Mathematical) Statistics Consider observations {y1 , · · · , yn } from iid random variables Yi ∼ Fθ (with “density” fθ ). Likelihood is L(θ ; y) 7→
n Y
fθ (yi )
i=1
Maximum likelihood estimate is mle b θ n ∈ arg max{L(θ; y)} θ∈Θ
and, if n → ∞, √
L mle b n θ n − θ → N (0, I −1 (θ))
Fisher (1912, On an absolute criterion for fitting frequency curves). @freakonometrics
freakonometrics
freakonometrics.hypotheses.org
52
Arthur Charpentier, ESCP - November 2018
Can we use statistical models in practice ? Robust inference is important in real life applications (see Hubert, Rousseeuw & van Aelst (2004, Robustness)) How robust is that estimator ? See Martin (2014) on financial time series
MLE (or classical) correlation estimator θb ∼ 30% while θbmcd ∼ 65%
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
53
Arthur Charpentier, ESCP - November 2018
(Linear) Regression ? Adrien-Marie Legendre (1752-1833) least squares Charles Darwin (1809-1882) Francis Galton (1822-1922) regression Karl Pearson (1857-1936) correlation
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
54
Arthur Charpentier, ESCP - November 2018
Predictive Models & Algorithms
Galton (1870, Heriditary Genius, 1886, Regression towards mediocrity in hereditary stature) and Pearson & Lee (1896, On Telegony in Man, 1903 On the Laws of Inheritance in Man) studied genetic transmission of characterisitcs, e.g. the heigth. On average the child of tall parents is taller than other children, but less than his parents. “I have called this peculiarity by the name of regression”, Francis Galton, 1886.
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
55
Arthur Charpentier, ESCP - November 2018
Predictive Models & Algorithms, y ∈ R 2 ) yi = x T β + ε with ε ∼ N (0, σ i i so that E[Yi ] = µi = xT i β. ( n ) X 2 ols T b β = argmin yi − x β n
i
i=1 ols Tb and prediction ybi = xi β n . mle ols b b Observe that β n = β n
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
56
Arthur Charpentier, ESCP - November 2018
Predictive Models & Algorithms, y ∈ {0, 1} Reed & Berkson (1929, The Application of the Logistic Function to Experimental Data)
xT i β,
where logit(πi ) = log
,
●
●
0.8
πi 1 − πi
0.6
●
●
●
●
●
0.4
logit(πi ) =
1.0
Assume that P(Yi = 1) = πi ,
or πi = logit
(xT i β)
exp[xT i β] . = T 1 + exp[xi β]
● ●
0.0
−1
0.2
●
0.0
0.2
0.4
0.6
0.8
1.0
see Classification from scratch, logistic regression)
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
57
Arthur Charpentier, ESCP - November 2018
Predictive Models & Algorithms, y ∈ {0, 1} Bliss (1934, The method of probits) suggested P(Y = 1|X = x) = H(xT β) where H(·) = Φ(·) the c.d.f. of the N (0, 1) distribution. This is the probit model. This yields a latent model, yi = 1(yi? > 0) where yi? = xT i β + εi is a non-observable score. In the logistic regression, we model the odds ratio, P(Y = 1|X = x) = exp[xT β] P(Y 6= 1|X = x) exp[·] P(Y = 1|X = x) = H(x β) where H(·) = 1 + exp[·] T
which is the c.d.f. of the logistic variable, see Verhulst (1845)
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
58
Arthur Charpentier, ESCP - November 2018
From Score Functions to 0/1 Classifier x 7→ logit−1 (xT β) and x 7→ Φ(xT β) are called score functions. The score is interpreted as the probability that y takes value +1. To go from a score to a class: if s(x) > s, then Yb (x) = 1 and s(x) ≤ s, then Yb (x) = 0
●
0.6 0.2
0.4
True Positive Rate ●
●
0.2
0.4
freakonometrics.hypotheses.org
0.0
0.0
●
●
0.6
Predicted
freakonometrics
●
0.8
●
0.6
●
0.0
@freakonometrics
●
0.2
Y=1
●
●
0.4
Observed
Y=0
^ Y=1
Observed
^ Y=0
●
0.8
Predicted
1.0
1.0
Plot T P (s) = P[Yb = 1|Y = 1] against F P (s) = P[Yb = 1|Y = 0]
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
False Positive Rate
59
Arthur Charpentier, ESCP - November 2018
Predictive Models & Algorithms : The Process
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
60
Arthur Charpentier, ESCP - November 2018
Predictive Models & Algorithms : Machine Learning Models : classification (binary or multi-class) and regression Training data (yi , xi ) that can be text documents, time series, image files, sound recordings, DNA sequences, etc. transformed into inputs in Rd . Feature extraction: (arbitrary) mapping from raw input to inputs in Rd E.g one-hot encoding (or dummy variable encoding)
To “estimate” and evaluate a prediction function, use a loss function, ` : Y × Y → R+ , ) ( n X ? m = argmin `(m(xi ), yi ) m∈M
i=1
E.g. in classification, 0/1 loss (1 prediction is wrong, 0 prediction is correct) E.g. in regression, square loss (or `2 ), (predicted − target)2 @freakonometrics
freakonometrics
freakonometrics.hypotheses.org
61
Arthur Charpentier, ESCP - November 2018
Importance of Loss Function What do you want to predict ? Least squares = “on average”
one can consider the least absolute deviation = “mediane”
Error is evaluated on a new dataset (the test or validation dataset). Split data into train and test.
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
62
Arthur Charpentier, ESCP - November 2018
Complexity and overfit ( Without any explanatory variable, yb = y = argmin m∈R
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
n X
)
(yi − m)2
i=1
63
Arthur Charpentier, ESCP - November 2018
Complexity and overfit With one explanatory variable, yb = βb0 + βb1 x
the classical “linear regression” (from Sir Francis Galton in 1886)
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
64
Arthur Charpentier, ESCP - November 2018
Complexity and overfit With one explanatory variable, yb = m(x)
kernel based is a local regression (average on the neighborhood of x), while splines and polynomial are basis of functions (to approximate m)
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
65
Arthur Charpentier, ESCP - November 2018
Complexity and overfit With one explanatory variable, yb = m(x), overfitting
i.e. good (great) fit on the training dataset, but hard to imagine that it be generalized on new data...
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
66
Arthur Charpentier, ESCP - November 2018
Cross Validation To ovoid oferfit, use leave-one-out or k-fold cross validation. randomly partition data into k ”folds” (of equal size), and train model
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
67
Arthur Charpentier, ESCP - November 2018
Ensemble Methods “Ensemble learning” (see wikipedia) “aggregation methods” or “stacking” is combining predictive methods k X m(x) = m b k (x) j=1
See random forests, or bagging (bootstrap + aggregation) Classification from scratch, trees and forests
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
68
Arthur Charpentier, ESCP - November 2018
Neural Networks Rosenblatt (1958, The Perceptron) see Classification from scratch, neural nets
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
69
Arthur Charpentier, ESCP - November 2018
Incremental Algorithms and Reinforcement learning Incremental algorithms (see wikipedia) allow a model to be updated using new observations as they arrive, without having to reprocess old ones. We are talking about “data stream analysis”, “on-line” learning (see wikipedia) as opposed to “batch” or “offline learning”. Useful for data streams and data too massive to be processed in their entirety
Reinforcement learning (see wikipedia) Alphago: learned from a base of tens of thousands of games and 30 million moves, then by playing against himself (learning by reinforcement)
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
70
Arthur Charpentier, ESCP - November 2018
Data & Models : Selection Bias We cannot differentiate data and model that easily... “After an operation, should I stay at hospital, or go back home ?” as in Angrist & Pischke (2008, Mostly Harmless Econometrics), (health | hospital) − (health | stayed home)
[observed]
should be written (health | hospital) − (health | had stayed home) + (health | had stayed home) − (health | stayed home)
[treatment effect] [selection bias]
Need randomization to solve selection bias. see also Ioannidis (2005, Why Most Published Research Findings Are False).
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
71
Arthur Charpentier, ESCP - November 2018
Data & Models : Selection Bias also called “survivor bias” (see wikipedia) how to minimize bomber losses to enemy fire ? study of the damage done to aircraft that had returned from missions Which area should the Navy reinforce areas ?
see Mangel & Samaniego (1984, Abraham Wald’s Work on Aircraft Survivability).
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
72
Arthur Charpentier, ESCP - November 2018
Probabilistic Forecasts So far we did use ` on Y × Y: `(b y , y) which measure the difference between prediction yb and realization y consider a propensity score (meteorology) s(Fb, y) where Fb is the distribution of yb Used for time series, s(t−1 Fbt , yt ) between realization yt and the forecast distribution, obtained at time t − 1 :
b = P[Yt |yt−1 ]
t−1 Ft
see Gigerenzer et al. (2005, “A 30% chance of rain tomorrow”: How does the public understand probabilistic weather forecasts?)
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
73
Arthur Charpentier, ESCP - November 2018
Probabilistic Predictions with CitySense Two use cases • I’m new to the city : where does everybody hang out at night? • I know the city : is there anything special going on tonight? idea: use taxi GPS data in San Francisco & New York City (see Rosenberg (2017) intuition: Taxi destinations are a proxy for where people are going need to model “typical” behavior of each area of the city, and then detect the most unusual activities
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
74
Arthur Charpentier, ESCP - November 2018
Probabilistic Predictions with CitySense At some give location, given time, how many taxi pickup should you expect ?
y : taxi pickup (per hour), x : time, location
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
75
Arthur Charpentier, ESCP - November 2018
Probabilistic Predictions with CitySense Consider x (time, location) such that E[Y |x] = 30, with 80% confidence interval P[Y ∈ [18, 42]] = 90%. Conditional distribution of Y |x is
what if we observe 90 pikups ? how unusual is it ?
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
76
Arthur Charpentier, ESCP - November 2018
Probabilistic Predictions with CitySense
x = (x1 , x2 ), x1 is the time (in the week) and x2 the grid location detection of outliers in a spatio-temporal problem @freakonometrics
freakonometrics
freakonometrics.hypotheses.org
77
Arthur Charpentier, ESCP - November 2018
Conclusion
“Now this is not the end. It is not even the beginning of the end. But it is, perhaps, the end of the beginning” Winston Churchill To go further,
@freakonometrics
freakonometrics
freakonometrics.hypotheses.org
78
Arthur Charpentier, ESCP - November 2018
References • Shalev-Shwartz & Ben-David Understanding Machine Learning: From Theory to Algorithms (2014) • Hastie, Tibshirani & Friedman The Elements of Statistical Learning (2017) • L’intelligence artificielle dilue-t-elle la responsabilit´e (2018) • Les mod`eles pr´edictifs peuvent-ils ˆetre loyaux et justes (2017) • L’´ethique de la mod´elisation dans un monde o` u la normalit´e n’existe plus (2017) • Les d´erives du principe de pr´ecaution (2016) • Segmentation et mutualisation, les deux faces d’une mˆeme pi`ece (2015) • La tarification par genre en assurance, corr´elation ou causalit´e ? (2016) • Big data : passer d’une analyse de corr´elation `a une interpr´etation causale (2015) https://bloomberg.github.io/foml/lectures @freakonometrics
freakonometrics
freakonometrics.hypotheses.org
79