LEAPS IN VISUAL COMPUTING JEN-HSUN HUANG, CO-FOUNDER & CEO | GTC 2015
FOUR ANNOUNCEMENTS
A New GPU and Deep Learning
A Very Fast Box and Deep Learning
Roadmap Reveal and Deep Learning
Self-Driving Cars and Deep Learning
AMAZING YEAR IN VISUAL COMPUTING
© 2015 Industrial Light & Magic. All Rights Reserved.
10X GROWTH IN GPU COMPUTING 2008 150,000 CUDA Downloads
27
CUDA Apps
60
Universities Teaching
4,000
Academic Papers
6,000 Tesla GPUs
77
Supercomputing Teraflops
10X GROWTH IN GPU COMPUTING 2008 150,000 CUDA Downloads
27
CUDA Apps
60
Universities Teaching
4,000
Academic Papers
6,000 Tesla GPUs
77
Supercomputing Teraflops
2015 3 Million CUDA Downloads
10X GROWTH IN GPU COMPUTING 2008
2015 3 Million
150,000
CUDA Downloads
CUDA Downloads
27
CUDA Apps
60
Universities Teaching
4,000
Academic Papers
6,000 Tesla GPUs
77
Supercomputing Teraflops
319
CUDA Apps
10X GROWTH IN GPU COMPUTING 2008
2015 3 Million
150,000
CUDA Downloads
CUDA Downloads
27
CUDA Apps
60
Universities Teaching
4,000
Academic Papers
6,000 Tesla GPUs
77
Supercomputing Teraflops
319
CUDA Apps
800
Universities Teaching
10X GROWTH IN GPU COMPUTING 2008
2015 3 Million
150,000
CUDA Downloads
CUDA Downloads
319
27
CUDA Apps
CUDA Apps
60
800
Universities Teaching
4,000
Academic Papers
6,000 Tesla GPUs
77
Supercomputing Teraflops
Universities Teaching
60,000
Academic Papers
10X GROWTH IN GPU COMPUTING 2008
2015 3 Million
150,000
CUDA Downloads
CUDA Downloads
319
27
CUDA Apps
CUDA Apps
60
800
Universities Teaching
4,000
Academic Papers
6,000 Tesla GPUs
77
Supercomputing Teraflops
Universities Teaching
60,000
Academic Papers
450,000 Tesla GPUs
10X GROWTH IN GPU COMPUTING 2008
2015 3 Million
150,000
CUDA Downloads
CUDA Downloads
319
27
CUDA Apps
CUDA Apps
60
800
Universities Teaching
4,000
Academic Papers
6,000 Tesla GPUs
77
Supercomputing Teraflops
Universities Teaching
60,000
Academic Papers
450,000 Tesla GPUs
54,000
Supercomputing Teraflops
TITAN X
THE WORLD’S FASTEST GPU 8 Billion Transistors 3,072 CUDA Cores 7 TFLOPS SP / 0.2 TFLOPS DP 12GB Memory
TITAN X FOR DEEP LEARNING Training AlexNet
43
Days
…
~ 7 6 5 4 3 2 1 0 16-core Xeon CPU
TITAN
TITAN Black cuDNN
TITAN X cuDNN
TITAN X
THE WORLD’S FASTEST GPU 8 Billion Transistors 3,072 CUDA Cores 7 TFLOPS SP / 0.2 TFLOPS DP 12GB Memory
$999
FOUR ANNOUNCEMENTS
A New GPU and Deep Learning
A Very Fast Box and Deep Learning
Roadmap Reveal and Deep Learning
Self-Driving Cars and Deep Learning
A SHORT HISTORY OF DEEP LEARNING Accuracy %
DNN 84%
CV 72% 2010
Convolutional Neural Networks for Handwritten Digital Recognition
2011
2012
2013
2014
ImageNet Classification with NVIDIA GPUs
LECUN, BOTTOU, BENGIO, HAFFNER, 1998
1995
74%
KRIZHEVSKY, HINTON, ET AL., 2012
2000
2005
2010
2015
“Deep Image: Scaling up Image Recognition” — Baidu: 5.98%, Jan. 13, 2015 IMAGENET CHALLENGE Accuracy %
DNN
“Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification” — Microsoft: 4.94%, Feb. 6, 2015
84%
CV 72% 2010
74%
2011
2012
2013
2014
“Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariant Shift” — Google: 4.82%, Feb. 11, 2015
THE BIG BANG
DEEP LEARNING VISUALIZED
GPU-ACCELERATED DEEP LEARNING START-UPS
DEEP LEARNING REVOLUTIONIZING MEDICAL RESEARCH Detecting Mitosis in Breast Cancer Cells — IDSIA
Predicting the Toxicity of New Drugs — Johannes Kepler University
Understanding Gene Mutation to Prevent Disease — University of Toronto
“Automated Image Captioning with ConvNets and Recurrent Nets” —Andrej Karpathy, Fei-Fei Li
USER INTERFACE
Process Data
Configure DNN
Monitor Progress
Visualize Layers
DEEP GPU TRAINING SYSTEM FOR DATA SCIENTISTS
Theano Torch
Caffe
DIGITS
cuDNN, cuBLAS
Design DNNs
CUDA
Visualize activations GPU HW
GPU
Multi-GPU
GPU Cluster
Cloud
Manage multiple trainings
DIGITS Process Data
Configure DNN
Monitor Progress
Visualize Layers Test Image
DIGITS DEVBOX World’s fastest GPU Max GPU out of a plug Multi-GPU training & inference
DIGITS DEVBOX — EARLY RESULTS “ DIGITS makes it way easier to design the best network for the job” — Simon Osindero
3x
“ I’ve never seen AlexNet run this fast…TitanX is a monster, Crazy Fast”
2x
— Soumith Chintala
Multi-GPU scaling on Torch 4x
AlexNet
VGG
A.I. Architech
Research Engineer 1x
0x
1
2
4
DIGITS DEVBOX Available May 2015 $15,000
FOUR ANNOUNCEMENTS
A New GPU and Deep Learning
A Very Fast Box and Deep Learning
Roadmap Reveal and Deep Learning
Self-Driving Cars and Deep Learning
72
Volta
60
GPU ROADMAP Pascal 2x SGEMM/W
SGEMM / W
48
Pascal
Mixed Precision 3D Memory NVLink
36
24
Maxwell
12
Kepler Fermi
Tesla
0 2008
2010
2012
2014
2016
2018
60
Volta
GPU ROADMAP
Pascal 2.7x Memory Capacity
Frame Buffer Capacity (GB)
50
40
Pascal
30
Mixed Precision 3D Memory NVLink
20
Maxwell
10
Kepler Fermi
Tesla
0 2008
2010
2012
2014
2016
2018
144
Volta
120
GPU ROADMAP
Pascal 4x Mixed Precision
HGEMM / W
96
Pascal
Mixed Precision 3D Memory NVLink
72
48
24
Maxwell Tesla
0 2008
Kepler
Fermi 2010
2012
2014
2016
2018
900
Volta Pascal
750
Mixed Precision 3D Memory NVLink
GPU ROADMAP Pascal 3x Bandwidth
STREAM GB/s
600
450
300
Maxwell 150
Kepler
Fermi Tesla
0 2008
2010
2012
2014
2016
2018
PASCAL 10X MAXWELL forward
backward
CONVOLUTION (compute)
FULLY CONNECTED (bandwidth)
FULLY CONNECTED (bandwidth)
CONVOLUTION (compute)
4x (FP16)
6x
6x
4x
Mixed Precision
3D Memory
3D Memory
Mixed Precision
WEIGHT UPDATE (interconnect)
5x
10x
2x
NVLINK
* Very rough estimates
FOUR ANNOUNCEMENTS
A New GPU and Deep Learning
A Very Fast Box and Deep Learning
Roadmap Reveal and Deep Learning
Self-Driving Cars and Deep Learning
TODAY’S ADAS SENSE
PLAN
ACT WARN
FPGA CV ASIC
BRAKE CPU
NEXT-GENERATION ADAS SENSE
PLAN
ACT WARN
FPGA CV ASIC
BRAKE CPU STEER ACCELERATE
NVIDIA DRIVE PX SELF-DRIVING CAR COMPUTER IMAGENET CHALLENGE Accuracy %
SENSE
PLAN
FPGA CV ASIC
DNN
CPU
CV
2010
STEER
74%
2011
2012
WARN BRAKE
84%
72%
ACT
2013
2014
DNN
ACCELERATE
NVIDIA DRIVE PX SELF-DRIVING CAR COMPUTER IMAGENET CHALLENGE Accuracy %
SENSE
PLAN
FPGA CV ASIC
DNN
CPU
CV
2010
STEER
74%
2011
2012
WARN BRAKE
84%
72%
ACT
2013
2014
DNN
ACCELERATE
NVIDIA DRIVE PX SELF-DRIVING CAR COMPUTER IMAGENET CHALLENGE Accuracy %
SENSE
PLAN
FPGA CV ASIC
DNN
CPU
CV
2010
STEER
74%
2011
2012
WARN BRAKE
84%
72%
ACT
2013
2014
DNN
ACCELERATE
NVIDIA DRIVE PX SELF-DRIVING CAR COMPUTER IMAGENET CHALLENGE Accuracy %
SENSE
PLAN
FPGA CV ASIC
DNN
CPU
CV
2010
STEER
74%
2011
2012
WARN BRAKE
84%
72%
ACT
2013
2014
DNN
ACCELERATE
NVIDIA DRIVE PX SELF-DRIVING CAR COMPUTER IMAGENET CHALLENGE Accuracy %
SENSE
PLAN
FPGA CV ASIC
DNN
CPU
CV
2010
STEER
74%
2011
2012
WARN BRAKE
84%
72%
ACT
2013
2014
DNN
ACCELERATE
NVIDIA DRIVE PX SELF-DRIVING CAR COMPUTER IMAGENET CHALLENGE Accuracy %
SENSE
PLAN
FPGA CV ASIC
DNN
CPU
CV
2010
STEER
74%
2011
2012
WARN BRAKE
84%
72%
ACT
2013
2014
DNN
ACCELERATE
PROJECT DAVE — DARPA AUTONOMOUS VEHICLE IMAGENET CHALLENGE Accuracy %
DNN-based self-driving robot Training data by human driver No hand-coded CV algorithms
DNN 84%
PROJECT LEADS
CV 72% 2010
74%
2011
Urs Muller: Chief Architect, Autonomous Driving, NVIDIA 2012
2013
2014
Yann LeCun: Director, AI Research, Facebook
DAVE IN ACTION
TRAINING DATA 225K Images
TEST DRIVE No Training
TEST DRIVE
Partially Trained (52K images)
TEST DRIVE
Fully Trained (225K images)
Number of Connections Frames / Second Connections / Second
DAVE
AlexNet on DRIVE PX
3.1 Million
630 Million
12
184
38 Million
116 Billion
3,000x Faster
NVIDIA DRIVE PX SELF-DRIVING CAR COMPUTER IMAGENET CHALLENGE Accuracy %
SENSE
PLAN
FPGA CV ASIC
DNN
CPU
CV
2010
STEER
74%
2011
2012
WARN BRAKE
84%
72%
ACT
2013
2014
DNN
ACCELERATE
NVIDIA DRIVE PX ™
SELF-DRIVING CAR COMPUTER Available May 2015 $10,000
ELON MUSK
LEAPS IN VISUAL COMPUTING TITAN X
The World’s Fastest GPU
DIGITS DevBox
GPU Deep Learning Platform
Pascal — 10x Maxwell For Deep Learning
NVIDIA DRIVE PX
Deep Learning Platform for Self-Driving Cars