http://www.anandtech.com/show/11749/hot ... t-10pm-utc


Info o Volta GV100 z Hotchips:
https://www.servethehome.com/nvidia-v10 ... hips-2017/
Highlights
GV100 SM
Twice the schedulers
Large, fast L1 cache
Improved SIMT model
Tensor acceleration
+50% energy efficiency vs GP100 SM
SM Microarchitecture
Shared L1 cache
4 independently scheduled sub-cores
Shared MIO
Sub-Core
Warp Scheduler - 1 Warp instruction/clock, L0 cache, branch unit
Math Dispatch Unit - Keeps 2+ datapaths busy
MIO Instruction queue
Two 4x4x4 Tensor Cores
L1 and Shared Memory
Streaming L1 cache - 4x bandwidth vs GP100, 4x capacity vs GP100
Shared Memory - Unified Storage with L1 cache, Configurable up to 96KB
Tensor Core
Mixed Precision Matrix Math 4x4 matrices
With improved scheduling, GV100 can do 16x16 matrix math
V100 (CUDA 9 + Tensor Cores) = 9.3x faster for cuBLAS Mixed Precision vs P100 (CUDA
NVLINK Updates
New GV100 NVLINK offer 1.9x more bandwidth vs GP100
6 NVLINK connections





