|
||||||||||
GC Nano - User Interface (UI) Accelerationby Benson W. Tao, Vivante Corporation 1- Background and Overview Crisp, clear, and responsive user interface HMI (human machine interface) has become equally important to the user experience as the content or the device form factor. A beautifully crafted smartphone that uses a combination of brushed titanium and smudge-proof glass may look great in the hand, but the user will quickly opt for another product if the user interface stutters or the screen is hard to read because of aliased and inconsistent fonts. The same scenario also applies to HMI in wearables and IoT devices, which is the focus of this white paper. The goal of a well-designed wearable/IoT HMI is to make reading or glancing at the screen intuitive and natural, yet engaging. In other words, it is about a consistent, seamless interaction between user and device. Since device screens are smaller, information needs to be displayed in a simplified, uncluttered way with only relevant data (text, images, icons, video, etc.) rendered and composed onscreen. Smaller device screens do not directly translate into a device with less processing capabilities. The opposite can be true since upcoming devices need to perform real time processing (UI display composition, communications, sensor processing, analytics, etc.) as part of a single or network of IoT nodes. Some wearables/IoTs are taking technologies found in low/mid-range smartphone application processors and customizing parts of the IP specifically for wearables. One important IP that device OEMs need is the graphics processing unit (GPU) to accelerate HMI screen composition at ultra-low power. In addition, a couple hot new trends in these emerging markets is personalized screen UI or a unified UI that spans all devices from cars and 4K TVs, to smartphones, wearables and embedded IoT screens to give users a consistent, ubiquitous screen experience across a given operating system (OS) platform, regardless of the underlying hardware (i.e. SoC/MCU). This will enable a cross vendor solution where vendor A's smartwatch will work correctly with vendor B's TV and vendor C's smartphone. Google and Microsoft have recently announced support for these features in their Android Material Design and Windows 9 releases, respectively. Support for this requires an OpenGL ES 2.0 capable GPU at the minimum, with optional/advanced features using OpenGL ES 3.x. Google also has their light weight wearables OS called Android Wear that requires a GPU to give the UI a similar look-and-feel as their standard smartphone/tablet/TV Android OS. Figure 1: Evolving Wearable and IoT Devices Requiring GPUs 2- Vivante GPU Product Overview The underlying technology that accelerates HMI user experience is the graphics processing unit (GPU). GPUs natively do screen/UI composition including multi-layer blending from multiple sources (ISP/Camera, Video, etc.), image filtering, font rendering/acceleration, 3D effects (transition, perspective view, etc.) and lots more. Vivante has a complete top-to-bottom product line of GPU technologies that include the GC Vega and GC Nano Series:
Figure 2: Vivante GPU Product Line and Target Markets 3- GC Nano Overview GC Nano Series consists of the following products starting with the GC Nano Lite (entry), GC Nano (mainstream) and GC Nano Ultra (mid/high).
Figure 3: GC Nano Product Line GC Nano Series benefits include:
Vivante's software driver stack, SDK and toolkit will support its NanoUI API that brings close-to-the-metal GPU acceleration for no-OS / no-DDR options on GC Nano Lite and the OpenGL ES 2.0 API (optional 3.x) for more advanced solutions that include proprietary or high-level operating systems like embedded Linux, Tizen., Android., Android. Wear and other RTOS that require OpenGL ES 2.0+ in the smallest memory footprint. These various OS / non-OS platforms will form the base of next generation wearables and IoT that bring personalized, unique and optimized experiences to each person. The GC Nano drivers include aggressive power savings, intelligent composition and rendering, and bandwidth modulation that allow OEMs and developers to build rich visual experiences on wearables and IoT using an ultralight UI / composition or 3D graphics driver. Many of the GC Nano innovations create a complete ¡§visual¡¨ wearables MCU/SoC platform that optimizes PPA and software efficiency to improve overall device performance and BOM cost, with the most compact UI graphics hardware and software footprint that does not diminish or restrict the onscreen user experience. These new GPUs are making their way into some exciting products that will appear all around you as wearables and IoT get integrated into our lives. Figure 4: GC Nano Series Features and Specifications Figure 5: Example GC Nano Series SoC/MCU Implementation 4- Trends and Importance of 3D User Interface Rendering In the UI sample in Figure 6 of a smart home device, next generation products will take some of the well thought out UI design elements from smartphones, tablets and TVs and incorporate them into IoT devices (and wearables) to keep a consistent interface between products. The similar UI look-and-feel will reduce usage learning curve and accelerate device adoption. As a side note, since different devices have different levels of processing/performance capabilities, a minimum level will be used for smaller screens (baseline performance) with additional features/higher performance added as device capabilities move up into a higher tier segmented by the OS vendor. Figure 6: Sample HMI user interface on a smart home device A few examples of updated UIs include the following:
GC Nano's architecture excels at HMI UI composition by bringing out 3D UI effects, bandwidth reduction and reduced latency, which will be discussed below. 5- GC Nano Bandwidth Calculation In this section we will step through examples of various user interface scenarios and calculate system bandwidth for both 30 and 60 FPS UI HMI rendering through the GC Nano GPU. All calculation assumptions are stated in section 5.2. 5.1- Methods of Composition There are also two options for screen display composition that we will evaluate - first, where the GPU does the entire screen composition of all layers (or surfaces) including video and the display controller simply outputs the already composited HMI UI onscreen, and second, where the display controller takes composited layers from both GPU and video decoder (VPU) and does the final UI composition blend and merge before displaying. The top level diagrams below do not show DDR memory transactions, but they will be shown in section 5.2 when describing the UI steps. Figure 7: GC Nano Full Composition: All UI layers are processed by GC Nano before sending the final output frame to the display controller Figure 8: Display Controller Composition: Final output frame is composited by the display controller using input layers from GC Nano and the video processor 5.2- UI Bandwidth Calculations Calculation assumptions:
5.2.1- GC Nano Full UI Composition The following images describe the flow of data to/from DDR memory using GC Nano to perform the entire UI composition. Some major benefits of using this method include using the GPU to perform some pre-post processing on images or videos, filtering, adding standard 3D effects to images/videos (video carousel, warping/dewarping, etc.) and augmented reality where GC Nano overlays rendered 3D content on top of a video stream. This method is the most flexible since the GC Nano can be programmed to perform image/UI related tasks.
Figure 9: GC Nano Full Composition memory access and UI rendering steps (steps 1 - 4) Bandwidth calculation is as follows: Notes: a) Total screen pixels = resolution WxH 5.2.2- Display Controller UI Composition This section describes the flow of data to/from DDR memory using the display controller to do the final merging/composition of layers from the GC Nano and video processor. This method partially reduces bandwidth consumption since the GPU does not need to read in the video surface since it does not perform final frame composition. The GPU only works on composing the UI part of the frame minus any additional layers from other IP blocks inside the SoC/MCU. A benefit from this method is lower overall system bandwidth, but at the cost of less flexibility in the UI. If the video (or image) stream only needs be merged with the rest of the UI then this is a good solution. If the incoming video (or image) stream needs to be processed in any way - adding 3D effects, filtering, augmented reality, etc. - then this method has limitations and it is better to use the GPU for full frame UI composition. Figure 10: Display controller performing final frame composition from two incoming layers from GC Nano and the video processor (VPU) The display controller has a DMA engine that can read data from system memory directly. Data formats supported are flexible and include various ARGB, RGB, YUV 444/422/420, and their swizzle formats. Bandwidth calculation for UI composition only is straightforward and is only based on the screen resolution size, as follows: Notes: a) Total screen pixels = resolution WxH 5.2.3- Summary of Bandwidth Calculations The table below summarizes the calculations above: Adding Vivante's DEC compression technology will also reduce bandwidth by about 2x - 3x from the numbers above. 6- GC Nano Architecture Advantage for UIs There are two main architectures for GPU rendering, tile based rendering (TBR) and immediate mode rendering (IMR). TBR breaks a screen image into tiles and renders once all the relevant information is available for a full frame. In IMR graphics commands are issued directly to the GPU and executed immediately. Techniques inside Vivante's architecture allows culling of hidden or unseen parts of the frame so execution, bandwidth, power, etc. is not wasted on rendering parts of the scene that will eventually be removed. Vivante's IMR also has significant advantages when rendering photorealistic 3D images for the latest AAA rated games that take advantage of full hardware acceleration for fine geometries and PC level graphics quality, including support of advanced geometry/tessellation (GS/TS) shaders in its high end GC Vega cores (DirectX 11.x, OpenGL ES 3.1 and Android Extension Pack - AEP). Note: some of the more advanced features like GS/TS are not applicable to the GC Nano Series. 6.1- Tile Based Rendering (TBR) Architecture for UIs The following images explain the process TBR architectures use for rendering UIs. 6.1.1- Breaking a scene into tiles... 6.1.2- But...before rendering a frame, all UI surfaces need to go through a pre-tile pass before proceeding... 6.1.3- Combining the pre-processing step and tiling step give us the following... 6.1.4 If the UI is dynamic then parts of the frame need to be re-processed... 6.1.5 Here are the "dirty" blocks inside the UI 6.1.6- TBR UI Rendering Summary From the steps shown above, TBR based GPUs have additional overhead that increases UI rendering latency since the pre-processed UI triangles need to be stored in memory first and then read back when used. This affects overall frame rate. TBR GPUs also require large amounts of on-chip L2$ memory to store the entire frame (tile) database, but as UI complexity grows, either the on-chip L2$ cache size (die area) has to grow in conjunction or the TBR core has to continuously overflow to DDR memory which increases their latency, bandwidth and power. TBRs have mechanisms to identify and track which parts of the UI (tiles) and which surfaces have changed to minimize pre-processing, but for newer UIs that have many moving parts; this continues to be a limitation. In addition, as screen sizes/resolutions and content complexity increases, this latency becomes even more apparent especially on Google, Microsoft, and other operating system platforms that will use unified UIs across all screens. 6.2- Immediate Mode Rendering (IMR) Architecture for UIs The most advanced GPUs use IMR technology, which is object-based rendering found in PC (desktop/notebook) graphics cards all the way to Vivante's GC Series product lines. IMR allows the GPU to render photorealistic images and draw the latest complex, dynamic and interactive content onscreen. In this architecture, graphics API calls are sent directly to the GPU and object rendering happens as soon as commands and data are received. This significantly speeds up 3D rendering performance. In the case of UIs, the pre-pass processing is not required and this eliminates the TBR-related latency seen in section 6.1. In addition, there are many intelligent mechanisms that perform transaction elimination so hidden (unseen) parts of the frame are not even sent through the GPU pipeline, or if the hidden portions are already in-flight (ex. change in UI surface), those can be discarded immediately so the pipeline can continue executing useful work. Composition processing is performed in the shaders for flexibility and the Vivante GPU can automatically add a rectangle primitive that takes the whole screen into account to achieve 100% efficiency (versus 50% efficiency using two triangles). Memory bandwidth is equivalent to TBR architectures for simple UIs and 3D frames, but for more advanced UIs and 3D scenes, TBR designs need to access external memory much more than IMR since TBRs cannot hold large amounts of complex scene data in their on-chip caches. The following images explain the process Vivante's IMR architecture uses for rendering dynamic UIs. The process is significantly simpler compared to TBRs, and dynamic changes in UI or graphics are straightforward. 6.2.1 IMR Object Based UI Rendering 6.2.2 Additional UI Content are Considered New Objects 6.2.3 IMR GPUs are Ideal for Dynamic and Next Gen UIs 6.2.4 IMR UI Rendering Summary For dynamic 3D UIs, complex 3D graphics, mapping applications, etc., IMRs are more efficient in terms of latency, bandwidth and power. Memory consumption and memory I/O is another area where IMR has its advantages - for upcoming dynamic real time 3D UIs, IMR is the best choice and for standard UIs, IMRs and TBRs are equivalent but IMRs give the SoC/MCU flexibility and future-proofing. Note: historically, TBRs were better for simpler UIs and simple 3D games (low triangle/polygon count, low complexity) since TBRs could keep the full frame tile database on chip (L2$ cache), but advances in UI technologies brought about by leading smartphones, tablets and TVs have tipped things in favor of IMR technology. 7 Summary GC Nano provides flexibility and advanced graphics and UI composition capabilities to SoCs/MCUs targeting IoT and wearables. With demand for high quality UIs that mirror other consumer devices from mobile to home entertainment and cars, a consistent, configurable interface is possible across all screens as the trend towards a unified platform is mandated by Google, Microsoft and others. GC Nano is also architected for OEMs and developers to take advantage of IMR technologies to create clean, amazing UIs that help product differentiation. The tiny core packs enough horsepower to take on the most demanding UIs at 60+ FPS in the smallest die area and power/thermal consumption. The GC Nano also reduces system bandwidth, latency, memory footprint and system/CPU overhead so resource constrained wearables and IoT SoCs/MCUs can use GPUs for next generation designs. If you wish to download a copy of this white paper, click here
|
Home | Feedback | Register | Site Map |
All material on this site Copyright © 2017 Design And Reuse S.A. All rights reserved. |