In lieu of the multi-day extravaganza that is normally Nvidia’s flagship GTC in San Jose, the company has been rolling out a series of talks and announcements online. Even the keynote has gone virtual, with Jensen’s popular and traditionally rambling talk being shifted to YouTube. To be honest, it’s actually easier to cover keynotes from a livestream in an office anyway, although I do miss all the hands-on demos and socializing that goes with the in-person conference.
In any case, this year’s event featured an impressive suite of announcements around Nividia’s new Ampere architecture for both the data center and AI on the edge, beginning with the A100 Ampere-architecture GPU.
Nvidia A100: World’s Largest 7nm Chip Features 54 Billion Transistors
Nvidia’s first Ampere-based GPU, its new A100 is also the world’s largest and most complex 7nm chip, featuring a staggering 54 billion transistors. Nvidia claims performance gains of up to 20x over previous Volta models. The A100 isn’t just for AI, as Nvidia believes it is an ideal GPGPU device for applications including data analytics, scientific computing, and cloud graphics. For lighter-weight tasks like inferencing, a single A100 can be partitioned in up to seven slices to run multiple loads in parallel. Conversely, NVLink allows multiple A100s to be tightly coupled.
All the top cloud vendors have said they plan to support the A100, including Google, Amazon, Microsoft, and Baidu. Microsoft is already planning to push the envelope of its Turing Natural Language Generation by moving to A100s for training.
Innovative TF32 Aims to Optimize AI Performance
Along with the A100, Nvidia is rolling out a new type of single-precision floating-point — TF32 — for the A100’s Tensor cores. It is a hybrid of FP16 and FP32 that aims to keep some of the performance benefits of moving to FP16 without losing as much precision. The A100’s new cores will also directly support FP64, making them increasingly useful for a variety of HPC applications. Along with a new data format, the A100 also supports sparse matrices, so that AI networks that contain many un-important nodes can be more efficiently represented.
Nvidia DGX A100: 5 PetaFLOPS in a Single Node
In addition to its own DGX A100, Nvidia expects a number of its traditional partners, including Atos, Supermicro, and Dell, to build the A100 into their own servers. To assist in that effort, Nvidia is also selling the HGX A100 data center accelerator.
Nvidia HGX A100 Hyperscale Data Center Accelerator
DGX A100 SuperPOD
Of course, if you’re a hyperscale compute center, you can never have enough processor power. So Nvidia has created a SuperPOD from 140 DGX A100 systems, 170 InfiniBand switches, 280 TB/s network fabric (using 15km of optical cable), and 4PB of flash storage. Nvidia claims that all that hardware delivers over 700 petaflops of AI performance and was built by Nvidia in under three weeks to use for its own internal research. If you have the space and the money, Nvidia has released the reference architecture for its SuperPOD, so you can build your own. Joel and I think it sounds like the makings of a great DIY article. It should be able to run his Deep Space Nine upscaling project in about a minute.
Nvidia Expands Its SaturnV Supercomputer
Of course, Nvidia has also greatly expanded its SaturnV supercomputer to take advantage of Ampere. SaturnV was composed of 1800 DGX-1 Systems, but Nividia has now added 4 DGX A100 SuperPODs, bringing SaturnV to a claimed total capacity of 4.6 exaflops. According to Nvidia, that makes it the fastest AI supercomputer in the world.
Jetson EGX A100 Takes the A100 to the Edge
Ampere and the A100 aren’t confined to the data center. Nvidia also announced a high-powered, purpose-built GPU for edge computing. The Jetson EGX A100 is built around an A100, but also includes Mellanox CX6 DX high-performance connectivity that’s secured using a line speed crypto engine. The GPU also includes support for encrypted models to help protect an OEM’s intellectual property. Updates to Nvidia’s Jetson-based toolkits for various industries (including Clara, Jarvis, Aerial, Isaac, and Metropolis) will help OEMs build robots, medical devices, and a variety of other high-end products using the EGX A100.
Let’s block ads! (Why?)
Read more here: ExtremeTechComputing – ExtremeTech