Dual SoC Approach, Integrated 5G Modem

For the last 3 years, Huawei has announced its next generation SoC at the IFA technology show here in Berlin. In every occasion, the company promotes its hardware, using the latest process technologies, the latest core designs, and its latest connectivity options. The flagship Kirin processor it announces ends up in every major Huawei and Honor smartphone for the next year, and the Kirin 990 family announced today is no different. With the Mate 30 launch happening on September 19th, Huawei lifted the lid on its new flagship chipset, with a couple of twists.

Dual SoC Approach: Kirin 990 (4G) and Kirin 990 5G

As we move into the era of 5G, we have a bifurcated market. On one side we have regions that are not ready for 5G, and consumers there do not want to pay the extra $$ or power or potential compromises in a device in order to support 5G. Other regions are riding the 5G wave, and are on the leading edge, and so might pay the premium. Rather than offer a single solution to both markets, Huawei is for the first time splitting its strategy, with two versions of the Kirin 990.

These versions will officially be known as the Kirin 990 and the Kirin 990 5G. The (4G) I’ve put here is simply to add a differentiator to tell them apart. The two Kirin chipsets are, and a standard base level, pretty much the same. Same core configuration, same camera support, same memory, same storage. However, in a few key areas beyond the modem, there are differences, such as NPU performance and core frequencies. We’ll go into these in a bit. But it is worth highlighting how the Kirin 990 5G version is a vision of the future.

The Kirin 990 5G: SoC of the Future

We bang on consistently about 5G, because that is where a lot of mobile infrastructure and investment is going. Back at Mobile World Congress in February, we covered every company that had announced its own discrete 5G modem – a chip that was added to a device in order to enable 5G. This typically meant that we had a standard processing chip with 4G, and then an extra 5G support chip on top. Ultimately to get the best performance, the 5G chip should be integrated on the same silicon, enabling better efficiency in 5G mode in exchange for die area and design complexity.

True to form, Huawei (and its design arm, HiSilicon), are the first to do it for the smartphone market.

The Kirin 990 5G is a true unified design, supporting Sub-6 GHz 5G networks on both SA and NSA architectures. In order to keep the die size in check, Huawei is using TSMC’s latest 7+ manufacturing process with EUV, which helps enable a smaller die size for the sorts of devices this chip will be going into.

To date, neither Qualcomm, nor Samsung, (nor Apple), have a unified flagship chip design that is near commercialization. We do expect them to release the hardware as they generationally update, but as of today, Huawei is the first to announce it.

So despite having a single smartphone SoC that can do 4G and 5G without additional hardware, Huawei still believed it prudent to produce a separate chip without 5G in it, especially as 5G adoption is still going on globally, and still a few years out for some markets in which Huawei competes. It also helps Huawei split some of its features, saving the best for the 5G hardware.

The Kirin 990 Series: Details

As mentioned, one of the key elements to the Kirin 990 5G is its use of TSMC’s 7FF+ with EUV, which enables the chip to have a small(er) die size. We are told the chip is over 100mm2, which is up from 74.13 mm2 on the Kirin 980 (TSMC 7nm) and 96.72 mm2 on the Kirin 970 (TSMC 10nm), possibly making it Huawei’s largest smartphone SoC to date. This is compared to the Kirin 990 4G version, which is around ~90 mm2, but is built on the same 7nm process as the Kirin 980, making it a little bigger. Transistor counts for the two chips put the 990 5G at 10.3 billion, while the 990 4G is ~8 billion.

CPU

The core configuration on both SoCs is the same – two high frequency A76  cores, two medium frequency A76 cores, and four more efficient A55 cores. These are split into their own power and frequency domains, allowing better flexibility based on workload. However, the 990 5G and 990 4G will both have slightly different frequencies, based on the differences between the 7 and 7+ processes.

Huawei Kirin 990 Family
AnandTechKirin 990
5G
Kirin 990
(4G)
Kirin 980Kirin 970
CPU2xA76 @2.86G
2xA76 @2.36G
4xA55 @1.95G
2xA76 @2.86G
2xA76 @2.09G
4xA55 @1.86G
2xA76 @2.60G
2xA76 @1.92G
4xA55 @1.80G
4xA73 @2.36G
4xA53 @1.80G
GPUG76MP16
700MHz
G76MP16
600 MHz
G76MP10
720 MHz
G72MP12
850 MHz
NPU2 + 1
Da Vinci
1 + 1
Da Vinci
2
Cambricon
1
Cambricon
ModemBalong 5G4G4G4G
DRAMLPDDR4-4266

+ LLC

LPDDR4-4266

+ LLC

LPDDR4X-4266LPDDR4X-3733
Die Size>100 mm2~90 mm274.13 mm296.72 mm2
Transistors10.3b~8.0b6.9b5.5b

For caches, the all four A76 cores have 512 kB L2, while the A55 cores are 128 kB each.

Technically Huawei calls the A76 cores as ‘A76-based’, because certain enhancements have been made to the core in the cache system to improve memory latency. Huawei wouldn’t dictate anything more than saying that its ‘SmartCache’ implementation, that helps the GPU, also helps the CPU and NPU as well. We believe this is essentially a next level cache above the DynamiQ DSU, similar to Qualcomm and Samsung’s implementations.

A side note here: we had expected Huawei to launch the new Kirin with Arm’s latest A77 core, as it was announced earlier this year. Despite being a priority Arm partner member, the company’s technical team explained to us two things: firstly, the core decisions were made almost two years ago for this chip, but aside from that, they were not seeing the expected frequency from the A77 on TSMC’s 7nm processes.

Huawei stated that even though A77 hits higher peak performance, the power efficiency of the A77 and A76 on 7nm is practically identical, however due to better experience with A76 on 7nm, they were able to push the frequencies of the core much higher. It was cited that other companies with announced A77 products were only achieving 2.2 GHz on similar process technologies at other fabs. It was stated that A77 will likely come on a future product, most likely when 5nm becomes more widely available.

On the topic of LPDDR5 support, we were told that LPDDR5 is still an expensive technology, and Huawei is looking at it for future products.

Graphics

For the graphics, the Kirin 990 parts will both have a 16-core Mali-G76 implementation, up from a 10-core Mali-G76 in the Kirin 980. This is partly for the reason for the increased die size: Huawei believes that a lower voltage, lower frequency but wider GPU will offer a better chip overall.

The performance of the GPU has increased, as we move from a 10-core 750 MHz design to a 16-core 600 MHz design.

*This was initially reported as 700 MHz, but HiSilicon have since been in contact to correct the value to 600 MHz.

NPU

Aside from implementing the 5G modem, the biggest change in the Kirin 990 is going to be the NPU, or Neural Processing Unit. As a company, we won’t see Huawei promoting this change that much, because ultimately it will be transparent to the consumers, but for the technical side of things it’s a big step.

In the Kirin 970 and Kirin 980 hardware, Huawei sub-licenced a machine learning hardware design from Cambricon Technologies, which was spun out from a university research project in China. Huawei ultimately invested into the company, although the hardware license wasn’t exclusive, but Huawei got access to the leading edge design and were afforded customization options. With the Kirin 990, that partnership with Cambricon disappears, and the company is implementing its internal Da Vinci architecture.

We covered the Da Vinci architecture at Hot Chips a couple of weeks ago, where the company lifted the lid on a number of technical details behind the design. Huawei has announced that this architecture will be found in everything from 300W add-in server cards all the way down to credit card sized embedded devices. The first smartphone chip from Huawei with a Da Vinci-based NPU was the Kirin 810, but now it comes to the flagship  SoC for the 2019/2020 generation.

What exactly does Da Vinci bring? Two elements, both of which are important when it comes to applying machine learning algorithms.

First, the ‘big’ Da Vinci cores support both INT8 and FP16 quantization of networks. In the Kirin 980 with the Cambricon design, the dual NPU was split, with both cores supporting FP16, but only one supported INT8 for technical reasons. That restriction disappears, and all of the big Da Vinci cores supports both. Quantization support becomes important for offering faster and lower power solutions to ML inference problems.

The second change is the addition of a new ‘Tiny Core’ NPU. Both the 4G and 5G model will have one, and this is a smaller version of the Da Vinci architecture focused on power efficiency (Huawei cites 24x better efficiency) connected over the AXI bus. The performance of the Tiny Core is naturally lower, but it’s a place where non-critical or low-polling ML can take place, such as wake-on-voice, or charging characteristics. It can even process individual photos, but isn’t fast enough to pattern match on live video. For that, you need the big cores.

One key feature about Da Vinci worth noting is that Huawei has stated that it has already optimized the software stack for 90% of the most popular computer-vision based neural networks on the market. One of the benefits of the Da Vinci design over the Cambricon design is that Da Vinci is fully NNAPI compliant, whereas the older version was a mix of applicable features.

That’s the NPU change, but there’s also a difference between the 990 5G and 990 4G. One of the contributions to the die size difference, apart from the modem, the GPU, and the manufacturing process, is that the 990 5G has double the number of NPU cores. The 990 5G will have two ‘big’ NPU cores, supporting dual ML processes concurrently, along with a Tiny Core NPU. The 990 4G by comparison will only have one ‘big’ NPU core, plus the Tiny Core.

This means we are likely to see certain features come to the Kirin 990 5G devices that might not be possible on Kirin 990 4G devices. It is going to be interesting to see how Huawei as a company manages that messaging, especially if it ends up offering its flagship device in both a 4G-only and a 5G flavor.

The Balong Modem

Aside from being the first integrated smartphone 5G design, ultimately Huawei did not give many details about the new 5G modem, or any updates to the 4G design. It was cited that the Kirin 990 5G is the first full-band frequency modem that supports both NSA and SA architecurtes (Though the Exynos Modem 5100 technically holds this title).

They cited peak speeds with the modem will be up to 2.3 Gbps download and 1.25 Gbps upload, with additional ML-based beamforming technology that helps support faster speeds during high-speed travel. The design will also allow for connection to 5G and 4G simultaneously, for weak signal areas.  We confirmed that the company is still using Tensilica DSPs, with the technical team stating that despite international concerns, the license for Tensilica is still valid.

Huawei’s Performance Claims on Kirin 990 5G

As was perhaps to be expected, Huawei was keen to showcase the performance of the bigger SoC against the current rivals in the market. The 4G model wasn’t in a lot of the graphs we were shown.

Headline numbers were for a +9% increase in single threaded performance from Kirin 980 to Kirin 990 5G, mostly driven through the higher frequency. Multithreaded performance overall was listed as being up 10%. However, power efficiency was pushed up 35% on the middle A76 cores compared to last year, and Huawei expects most non-demanding performance related workloads to be run on these middle cores. (For completeness, Huawei states the high-performance cores are +12% more efficient over the previous generation, and the smaller cores are +15% more efficient.)

On the GPU side, we expect a performance increase, however Huawei stayed away from quoting numbers in a reasonable time frame for us to note them down in our briefing (in the keynote, they showed +6% GPU performance against the S855). We were able to get details on how the ‘Smart Cache’ improves performance: in this case the Kirin 990 (both versions) see a 15% reduction in GPU-to-DDR bandwidth, and a 12% reduction in DDR power consumption (because it gets used less in the same workload).

Headline performance is AI, although the numbers here are going to be split between the 990 4G with a 1+1 NPU core design, and the 990 5G with a 2+1 NPU core design. Huawei puts the performance of the Kirin 990 5G as 2.5x over the Kirin 980, and a similar amount over the Snapdragon S855 and Exynos 9825, as just under 2x compared to the Apple A12. Power efficiency is also improved by similar amounts. All this was comparing inference scores at both FP16 and INT8 quantization.

The Kirin 990 and Kirin 990 5G: Time Frame

These new chips are expected to be spread liberally across both Huawei and Honor flagships for the rest of 2019 and into 2020. Huawei has a press event in Munich on the 19th of September, where we are expecting the Huawei Mate 30 and Mate 30 Pro flagship phones to be announced – and likely a 5G model as well, which perhaps might be the Pro 5G only.

We have been told that the Kirin 990 4G chipset is ready and available. Due to other factors, likely related to market segmentation and strategy, the Kirin 990 5G devices might be a little later.

ncG1vNJzZmivp6x7orrAp5utnZOde6S7zGiqoaenZH51hJRqZqGtkayyqnnAp6WorZ6YsrR5yqKpoqZdboZxecCnm2ajmae2r3mYcmdmbZdisbaty2aqqJtdlr2xvs6amqFlmaPBprPRmquenF1qtG65zp2cpg%3D%3D