leaps in visual computing - GTC On-Demand

Configure. DNN. Process. Data. GPU. GPU HW. Cloud. GPU Cluster. Multi-GPU. Theano. Torch. Monitor. Progress. Caffe ... uffer C apacity (G. B. ) 2012. 2014. 2008. 2010. 2016. 40. 30. 10. 0. 20. 50. 2018. 60. Tesla. Fermi. Kepler ... Page 41 ...
6MB taille 0 téléchargements 307 vues
LEAPS IN VISUAL COMPUTING JEN-HSUN HUANG, CO-FOUNDER & CEO | GTC 2015

FOUR ANNOUNCEMENTS

A New GPU and Deep Learning

A Very Fast Box and Deep Learning

Roadmap Reveal and Deep Learning

Self-Driving Cars and Deep Learning

AMAZING YEAR IN VISUAL COMPUTING

© 2015 Industrial Light & Magic. All Rights Reserved.

10X GROWTH IN GPU COMPUTING 2008 150,000 CUDA Downloads

27

CUDA Apps

60

Universities Teaching

4,000

Academic Papers

6,000 Tesla GPUs

77

Supercomputing Teraflops

10X GROWTH IN GPU COMPUTING 2008 150,000 CUDA Downloads

27

CUDA Apps

60

Universities Teaching

4,000

Academic Papers

6,000 Tesla GPUs

77

Supercomputing Teraflops

2015 3 Million CUDA Downloads

10X GROWTH IN GPU COMPUTING 2008

2015 3 Million

150,000

CUDA Downloads

CUDA Downloads

27

CUDA Apps

60

Universities Teaching

4,000

Academic Papers

6,000 Tesla GPUs

77

Supercomputing Teraflops

319

CUDA Apps

10X GROWTH IN GPU COMPUTING 2008

2015 3 Million

150,000

CUDA Downloads

CUDA Downloads

27

CUDA Apps

60

Universities Teaching

4,000

Academic Papers

6,000 Tesla GPUs

77

Supercomputing Teraflops

319

CUDA Apps

800

Universities Teaching

10X GROWTH IN GPU COMPUTING 2008

2015 3 Million

150,000

CUDA Downloads

CUDA Downloads

319

27

CUDA Apps

CUDA Apps

60

800

Universities Teaching

4,000

Academic Papers

6,000 Tesla GPUs

77

Supercomputing Teraflops

Universities Teaching

60,000

Academic Papers

10X GROWTH IN GPU COMPUTING 2008

2015 3 Million

150,000

CUDA Downloads

CUDA Downloads

319

27

CUDA Apps

CUDA Apps

60

800

Universities Teaching

4,000

Academic Papers

6,000 Tesla GPUs

77

Supercomputing Teraflops

Universities Teaching

60,000

Academic Papers

450,000 Tesla GPUs

10X GROWTH IN GPU COMPUTING 2008

2015 3 Million

150,000

CUDA Downloads

CUDA Downloads

319

27

CUDA Apps

CUDA Apps

60

800

Universities Teaching

4,000

Academic Papers

6,000 Tesla GPUs

77

Supercomputing Teraflops

Universities Teaching

60,000

Academic Papers

450,000 Tesla GPUs

54,000

Supercomputing Teraflops

TITAN X

THE WORLD’S FASTEST GPU 8 Billion Transistors 3,072 CUDA Cores 7 TFLOPS SP / 0.2 TFLOPS DP 12GB Memory

TITAN X FOR DEEP LEARNING Training AlexNet

43

Days



~ 7 6 5 4 3 2 1 0 16-core Xeon CPU

TITAN

TITAN Black cuDNN

TITAN X cuDNN

TITAN X

THE WORLD’S FASTEST GPU 8 Billion Transistors 3,072 CUDA Cores 7 TFLOPS SP / 0.2 TFLOPS DP 12GB Memory

$999

FOUR ANNOUNCEMENTS

A New GPU and Deep Learning

A Very Fast Box and Deep Learning

Roadmap Reveal and Deep Learning

Self-Driving Cars and Deep Learning

A SHORT HISTORY OF DEEP LEARNING Accuracy %

DNN 84%

CV 72% 2010

Convolutional Neural Networks for Handwritten Digital Recognition

2011

2012

2013

2014

ImageNet Classification with NVIDIA GPUs

LECUN, BOTTOU, BENGIO, HAFFNER, 1998

1995

74%

KRIZHEVSKY, HINTON, ET AL., 2012

2000

2005

2010

2015

“Deep Image: Scaling up Image Recognition” — Baidu: 5.98%, Jan. 13, 2015 IMAGENET CHALLENGE Accuracy %

DNN

“Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification” — Microsoft: 4.94%, Feb. 6, 2015

84%

CV 72% 2010

74%

2011

2012

2013

2014

“Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariant Shift” — Google: 4.82%, Feb. 11, 2015

THE BIG BANG

DEEP LEARNING VISUALIZED

GPU-ACCELERATED DEEP LEARNING START-UPS

DEEP LEARNING REVOLUTIONIZING MEDICAL RESEARCH Detecting Mitosis in Breast Cancer Cells — IDSIA

Predicting the Toxicity of New Drugs — Johannes Kepler University

Understanding Gene Mutation to Prevent Disease — University of Toronto

“Automated Image Captioning with ConvNets and Recurrent Nets” —Andrej Karpathy, Fei-Fei Li

USER INTERFACE

Process Data

Configure DNN

Monitor Progress

Visualize Layers

DEEP GPU TRAINING SYSTEM FOR DATA SCIENTISTS

Theano Torch

Caffe

DIGITS

cuDNN, cuBLAS

Design DNNs

CUDA

Visualize activations GPU HW

GPU

Multi-GPU

GPU Cluster

Cloud

Manage multiple trainings

DIGITS Process Data

Configure DNN

Monitor Progress

Visualize Layers Test Image

DIGITS DEVBOX World’s fastest GPU Max GPU out of a plug Multi-GPU training & inference

DIGITS DEVBOX — EARLY RESULTS “ DIGITS makes it way easier to design the best network for the job” — Simon Osindero

3x

“ I’ve never seen AlexNet run this fast…TitanX is a monster, Crazy Fast”

2x

— Soumith Chintala

Multi-GPU scaling on Torch 4x

AlexNet

VGG

A.I. Architech

Research Engineer 1x

0x

1

2

4

DIGITS DEVBOX Available May 2015 $15,000

FOUR ANNOUNCEMENTS

A New GPU and Deep Learning

A Very Fast Box and Deep Learning

Roadmap Reveal and Deep Learning

Self-Driving Cars and Deep Learning

72

Volta

60

GPU ROADMAP Pascal 2x SGEMM/W

SGEMM / W

48

Pascal

Mixed Precision 3D Memory NVLink

36

24

Maxwell

12

Kepler Fermi

Tesla

0 2008

2010

2012

2014

2016

2018

60

Volta

GPU ROADMAP

Pascal 2.7x Memory Capacity

Frame Buffer Capacity (GB)

50

40

Pascal

30

Mixed Precision 3D Memory NVLink

20

Maxwell

10

Kepler Fermi

Tesla

0 2008

2010

2012

2014

2016

2018

144

Volta

120

GPU ROADMAP

Pascal 4x Mixed Precision

HGEMM / W

96

Pascal

Mixed Precision 3D Memory NVLink

72

48

24

Maxwell Tesla

0 2008

Kepler

Fermi 2010

2012

2014

2016

2018

900

Volta Pascal

750

Mixed Precision 3D Memory NVLink

GPU ROADMAP Pascal 3x Bandwidth

STREAM GB/s

600

450

300

Maxwell 150

Kepler

Fermi Tesla

0 2008

2010

2012

2014

2016

2018

PASCAL 10X MAXWELL forward

backward

CONVOLUTION (compute)

FULLY CONNECTED (bandwidth)

FULLY CONNECTED (bandwidth)

CONVOLUTION (compute)

4x (FP16)

6x

6x

4x

Mixed Precision

3D Memory

3D Memory

Mixed Precision

WEIGHT UPDATE (interconnect)

5x

10x

2x

NVLINK

* Very rough estimates

FOUR ANNOUNCEMENTS

A New GPU and Deep Learning

A Very Fast Box and Deep Learning

Roadmap Reveal and Deep Learning

Self-Driving Cars and Deep Learning

TODAY’S ADAS SENSE

PLAN

ACT WARN

FPGA CV ASIC

BRAKE CPU

NEXT-GENERATION ADAS SENSE

PLAN

ACT WARN

FPGA CV ASIC

BRAKE CPU STEER ACCELERATE

NVIDIA DRIVE PX SELF-DRIVING CAR COMPUTER IMAGENET CHALLENGE Accuracy %

SENSE

PLAN

FPGA CV ASIC

DNN

CPU

CV

2010

STEER

74%

2011

2012

WARN BRAKE

84%

72%

ACT

2013

2014

DNN

ACCELERATE

NVIDIA DRIVE PX SELF-DRIVING CAR COMPUTER IMAGENET CHALLENGE Accuracy %

SENSE

PLAN

FPGA CV ASIC

DNN

CPU

CV

2010

STEER

74%

2011

2012

WARN BRAKE

84%

72%

ACT

2013

2014

DNN

ACCELERATE

NVIDIA DRIVE PX SELF-DRIVING CAR COMPUTER IMAGENET CHALLENGE Accuracy %

SENSE

PLAN

FPGA CV ASIC

DNN

CPU

CV

2010

STEER

74%

2011

2012

WARN BRAKE

84%

72%

ACT

2013

2014

DNN

ACCELERATE

NVIDIA DRIVE PX SELF-DRIVING CAR COMPUTER IMAGENET CHALLENGE Accuracy %

SENSE

PLAN

FPGA CV ASIC

DNN

CPU

CV

2010

STEER

74%

2011

2012

WARN BRAKE

84%

72%

ACT

2013

2014

DNN

ACCELERATE

NVIDIA DRIVE PX SELF-DRIVING CAR COMPUTER IMAGENET CHALLENGE Accuracy %

SENSE

PLAN

FPGA CV ASIC

DNN

CPU

CV

2010

STEER

74%

2011

2012

WARN BRAKE

84%

72%

ACT

2013

2014

DNN

ACCELERATE

NVIDIA DRIVE PX SELF-DRIVING CAR COMPUTER IMAGENET CHALLENGE Accuracy %

SENSE

PLAN

FPGA CV ASIC

DNN

CPU

CV

2010

STEER

74%

2011

2012

WARN BRAKE

84%

72%

ACT

2013

2014

DNN

ACCELERATE

PROJECT DAVE — DARPA AUTONOMOUS VEHICLE IMAGENET CHALLENGE Accuracy %

DNN-based self-driving robot Training data by human driver No hand-coded CV algorithms

DNN 84%

PROJECT LEADS

CV 72% 2010

74%

2011

Urs Muller: Chief Architect, Autonomous Driving, NVIDIA 2012

2013

2014

Yann LeCun: Director, AI Research, Facebook

DAVE IN ACTION

TRAINING DATA 225K Images

TEST DRIVE No Training

TEST DRIVE

Partially Trained (52K images)

TEST DRIVE

Fully Trained (225K images)

Number of Connections Frames / Second Connections / Second

DAVE

AlexNet on DRIVE PX

3.1 Million

630 Million

12

184

38 Million

116 Billion

3,000x Faster

NVIDIA DRIVE PX SELF-DRIVING CAR COMPUTER IMAGENET CHALLENGE Accuracy %

SENSE

PLAN

FPGA CV ASIC

DNN

CPU

CV

2010

STEER

74%

2011

2012

WARN BRAKE

84%

72%

ACT

2013

2014

DNN

ACCELERATE

NVIDIA DRIVE PX ™

SELF-DRIVING CAR COMPUTER Available May 2015 $10,000

ELON MUSK

LEAPS IN VISUAL COMPUTING TITAN X

The World’s Fastest GPU

DIGITS DevBox

GPU Deep Learning Platform

Pascal — 10x Maxwell For Deep Learning

NVIDIA DRIVE PX

Deep Learning Platform for Self-Driving Cars