What’s new with Kepler?
With the introduction of Kepler (official Kepler page), mobile graphics hardware will reach feature parity with desktop GPUs. Basically, every single feature that “PC” Kepler can do, mobile Kepler can as well. In essence, it is the exact same feature set, and the only difference resides in the number of parallel computing units embedded in the final product.
The fastest NVIDIA desktop GPU today has 12 streaming multi-processors (SMX) with 192 cores each, while mobile Kepler only has only one SMX. Of course, if you take into account that mobile Kepler is a 2W part, while desktop Kepler is a 250W part, you will see that the difference in pure performance is largely justified by practical power-consumption reasons.
“Only” 1 SMX?
A single SMX may not sound like a lot, but to put this in perspective, NVIDIA was quick to remind us that it is in fact more graphics muscles than PS3 (which NVIDIA built the “RSX” GPU for). If you compare it with recent smartphones and tablets, you get the ridiculous difference as seen in the above graph. Of course, Tegra 5 handsets won’t appear for another year or so, but today this is a shocking difference.
Back when Kepler was launched on the PC platform, the most important feature was the new SMX units that were designed to be much more computationally dense (3X the performance per Watt) and power-efficient when compared to NVIDIA’s previous “Fermi” generation of GPUs. Looking back, it is now obvious that Kepler has always been intended to be utilized on mobile devices.
Since graphics performance can easily scale by adding more processing units to a GPU design, it is quite logical to wonder what kind of electric power Kepler would draw from the system. The “2W” number provided by NVIDIA does shed some light, but it is even more interesting to compare it with existing hardware. According to NVIDIA, Kepler uses 3X less power than the iPad 4 GPU to render the same graphics.
Since games are the most power-hungry apps on mobiles, it’s not hard to see what kind of an impact this could have. Additionally, if NVIDIA was to provide an option for users to control the power consumption or temperature (like they do on PC), this would extend the “gaming” battery life very significantly.
Unified Shader Architecture
With Kepler, this is the first time that NVIDIA uses a unified shader architecture in a Tegra chip. In classic graphics architectures, vertices and pixels are being processed by different types of compute units. With a unified architecture, all the compute units are the same and can be allocated to work on any compute tasks. With this setup, the hardware utilization level is higher because more units can be put to work at any given time, regardless of the nature of the scene (wireframe vs. flat-shaded vs. heavily textured). This turns into higher performance and efficiency.
In prior discussions with NVIDIA, their engineers had mentioned that they chose not to use a unified architecture up until now because it wasn’t necessarily the best setup for their mobile chips. We will probably never know how true that actually was, but what we do for sure is that with DirectX11-level graphics, such an architecture is critical since there are three types of shaders (vertex, geometry, pixel) that could appear at unpredictable ratios. I’m very curious to see if this new environment will allow NVIDIA to distance itself in a major way. Up until now, competitors have been able to catch up and often overtake to any Tegra lead in a matter of months.
Software benefits from Kepler for mobile
For developers, reaching a point where the feature set of mobiles, PC and console are the same is going to bring tremendous benefits. The most important one is that Kepler supports modern desktop graphics APIs like OpenGL 4.4 and Direct X11. For developers and publishers, this would mean that porting existing games/apps may take less time as it once did. This also means that future ARM-based Windows 8 tablets/phones could use the regular DirectX 11 API, making PC apps/games even easier to port. More importantly, developers will have less code to build/maintain and could focus on gameplay and eye-candy instead.
The NVIDIA team have adapted two desktop graphics demos that were shown recently during the GTC conference. They now work on mobile with a similar look and feel (although I don’t doubt that they have been optimized to accommodate “mobile performance” levels):
The FaceWorks “Ira” demo was shown earlier this year during the GTC conference in San Jose. Back then, it was running on the GeForce Titan, a huge GPU. This adaptation of the demo shows off that it is possible to achieve something that is visually comparable within a 2W power envelope. The demo features super-realistic skin rendering and facial animations.
The Island demo is also derived from a high-end desktop demo. It is meant to show off the tessellation (adding geometric details on the fly) feature that modern graphics API have allowed for some time. This will now become a first-class citizen for mobiles.
Unreal Engine 4 demo
Demos are great, but production-ready game engines are even better: Epic is also demonstrating its UE4 Engine on mobile Kepler, and I bet that Tim Sweeney couldn’t be happier for all the reasons mentioned above. This will make his life that much easier.
Increased performance and power-efficiency through massive parallelism
Since this next-gen mobile chip will support CUDA (and OpenCL, I assume), great power-efficiency gains can be achieved when treated massively parallel tasks such as video, imaging, physics and other tasks. If you are unfamiliar with CUDA or OpenCL, they are two application program interfaces (APIs) that allow software developers to execute non-graphics tasks on the processor array of the graphics processor.
Parallelism works best when the task is “parallel in nature”. Not every task can be parallelized easily, but when you have a lot of computations that are not related to one another, having hundreds of small processors working in parallel if faster and more power efficient than having a few general cores churning out the same results. This has been proven on desktop computers, and it is still a valid concept on mobile hardware. Particles physics is a great example of parallel task: each particle can be processed indecently and therefore the workload is easily distributed among a vast array of compute units.
At the moment, NVIDIA mentioned that the first handsets would appear sometime in the first half of 2014 and that the Logan (Tegra 5) chips have been back in their labs a few weeks ago (in early July). The company has added that it is “sampling” chips to customers right now, which means that they are producing units in low volume for testing and initial engineering ramp-up by their partners.
When the NVIDIA engineers will be satisfied with the design verifications and are reasonably sure that no fixes will be needed, high-volume production will start. We think that you should expect an appearance at CES 2014 and Mobile World Congress 2014 where NVIDIA has demonstrated previous Tegra processors before.
Integrated ICERA LTE Modem
Previously, NVIDIA had hinted that its next-gen chip would integrate an LTE modem. However, the company is not yet prepared to discuss it at the moment, and during the SIGGRAPH 2013 timeframe, the Tegra team was only ready to discuss graphics architecture and theoretical performance.
It is pretty safe to believe that the ICERA modem integration will happen as scheduled since the rest of the roadmap was pretty much on target. The addition of an LTE modem (which has been already certified by AT&T) is of huge important for NVIDIA since it is a frequent customer requirement that may have prevented NVIDIA from getting more contracts in the past. The integration improves both cost and power consumption, so it is a critical piece of the smartphone SoC business.
Kepler will power future SHIELD console updates
When asked if the Kepler architecture will power future SHIELD upgrades, NVIDIA said “yes” without any hesitations. First this confirm that a follow-up is on the mind of NVIDIA, and most likely already planned. However, keep in mind that this may (or may not) happen with an eventual “Tegra 6”, so don’t jump onto conclusions quite yet. In theory, it is very possible that SHIELD could see a yearly update, but other cycles would be just as valid.
We did ask, but unfortunately, NVIDIA was not prepared to talk about benchmark numbers yet. It’s very understandable since the chip has been back from the “tape out” for only a few weeks. Hardware engineers and driver engineers must be working quite furiously to verify the different code paths and performance tuning will come way after functionalities have been verified.
Licensing to 3rd party SoC
Since many handset makers have become hell-bent on building their own chips (starting with Samsung and Apple), it is not easy for NVIDIA to convince them to buy one from a 3rd party vendor. To work around this, NVIDIA has decided that it would license its mobile Kepler design to others. Since the likes of Apple and Samsung already license GPU IP blocks from ARM or PowerVR, this could be much easier to get a design-win that way.
This is a very exciting development, and when we look back one year from now, we will realize that this is when mobile graphics, PC graphics and console graphics were unified for the first time with nearly the same feature set. What Kepler means for Tegra 5 is huge, but what it means for the future of Tegra and the future of the mobile graphics industry is even bigger: from now on, Tegra GPUs are no longer a side effort that may benefit from technology trickling from the GeForce research. Instead, Tegra will completely leverage the hundreds of millions of dollars that are invested by NVIDIA in each new generation of GPUs.