Aunt Zhang's Technical Lecture One: Overcoming Difficulties - Huawei Hassler Kirin 960 SOC Analysis

On October 19, 2016, Huawei announced Kirin's latest flagship SOC, the Unicorn 960. This article seeks to avoid computer architecture and specific design optimization. Before going out with the terminal of the Kirin 960, we talked about this Kirin 960 SOC.

First of all, any manufacturer's press conference is aimed at propaganda. It is only a matter of more or less. The actual situation needs to be rigorously studied in many dimensions.

This article is full of hands, welcome to take the dry goods paizhuan, but I hope that the value of friends without brain blowing and black weight.

First, the parameter overview

Second, specification

1, CPU--Cortex A73 (sophia)

The public version of the A72 architecture on the Kirin 950 is more autonomous than Qualcomm and Samsung and has a higher performance and energy efficiency ratio.

The A73 is the architecture announced by ARM at the beginning of this year. It differs from the A72 in structure, but it belongs to the A17 series of evolutionary versions that have been silent for a long time. We know that the A9-A12-A17 product line belongs to the Mainstream Level. A73 was born out of this, ARM official conference has been emphasizing A73's design philosophy: Let CPU can maintain high-performance output status .

The biggest change from the A73 to the A72 is that the launch width changes from 3 to 2 and the overall assembly line becomes 11 (consistent with A17). We know that the impact of launch width on IPC is not small. ARM has made improvements in the following areas:

A. The front end is a sequential structure, which reduces the instruction picking period and optimizes the decoding efficiency.

B. Remove redundant AMBA5 interface and L1 cache ECC, and focus more on civilian mobile applications.

C, L1 cache capacity increased from 48KB to 64KB, cache throughput and latency have improved.

D, back-end all out of order execution, improve branch prediction performance.

ARM official said the A73 can catch up with the A72's unit performance. In my opinion, the architecture of the A73 is similar to that of the A17 (and the performance of the A17 is now squandered). Whether it can only lead to leapfrogging development through instruction set expansion and partial optimisation should be doubtful.

In the publicity, ARM used a very spring and autumn method, that is, the default process and frequency is not the same situation under the circumstances to talk about architecture progress! The 10nm A73 2.8GHz is 30% higher than the 16nm A72 2.5GHz in terms of sustained performance.

According to Huawei, the Kirin 960's A73 frequency has been increased by 100 MHz, single-threaded by 10%, and multi-threaded by 18%. If ARM does not bragging, Huawei Qilin 960CPU part of the credibility is still relatively large.

2, the internal bus - CCI-550

To say what is the biggest flaw of the Kirin 950 is actually the internal bus: The CCI-400 is used with an autonomous memory controller (because the original DMC400 does not support LPDDR4 at all). Because of the limitation of the bus: 1. LPDDR4 is used in memory, and the effect is not different from LPDDR3; 2. Internal throughput is also limited.

CCI-400 INTERFACE

CCI-550 Interface

Kirin 960 made up a short board in this block and used the latest CCI-550 with a peak bandwidth of 50GB/s to improve memory performance. The GPU can be directly interconnected with the bus, optimizing the monitoring probes introduced by the CCI-500, making the scheduling of different cores more efficient.

Therefore, fortunately, this time the replacement of the bus, or be absolutely dead by the ancestral old bus!

3, GPU - G71 (Bifrost architecture)

The T880MP4 on the Unicorn 950 has a peak performance of only 1/3 of the adreno530. ARM's previous generation GPU architecture even spent 5 years.

The Bifrost architecture has changed so much that it will not adopt new names. Mainly summarized as follows: instruction-level parallel vector processing has evolved to a thread-level parallel scalar processing architecture. The purpose of this transition is to improve the daily processing efficiency of ALU.

The above figure shows the efficiency differences between scalar and vector architectures when processing different data streams.

Vulkan API, this is a way to improve the graphics rendering efficiency through software and hardware interface optimization. In fact, the ARM T880 has native support. The Kirin 960's G71 is just to make up for the missing.

According to the official website's parameters, the most obvious change in G71 is the decrease in triangle productivity and the increase in pixel generation rate. ARM claims that the performance of the G71 unit area transistor has improved by 40%, and the power consumption ratio has increased by 20%.

I believe that, according to Huawei's official statement, the Hass 960 G71MP8, the same frequency has reached nearly 2.8 times the performance, according to the estimated GPU power consumption is estimated to increase by 2 times.

4, UFS2.1

Kirin 950 SOC with flash memory is a random mix and match EMMC5.0/5.1, see my Mate8 in-depth evaluation http://post.smzdm.com/p/475656/

This time, the flash memory specification used by the Kirin 960 SOC is UFS2.1. UFS2.0 is a flash standard proposed by the JEDEC organization in 2013. Compared to EMMC, UFS's transmission mode has evolved from half-duplex to full-duplex and has increased the transmission bandwidth.

Flash memory evolution

UFS2.1 does not have a clear definition at present. A more reasonable explanation should be the high-speed version of UFS2.0 (HS-G3). The theoretical transmission bandwidth can reach 12Gbps. Therefore, for the UFS2.1 name, the author thinks there is a propaganda component in it. Specifically on mobile phones, UFS2.0 flash granules from different suppliers must have different performances. Consumers can't control this.

5, baseband

For the previous generation of Kirin 950 SOCs, Balong720+VIA CDMA Baseband was used, dual-carrier aggregation was applied, and the peak DL rate was CAT.6 300 MBbps.

To the baseband of the Kirin 960, it can be described as rapid progress.

1, technical specifications, Kirin 960 baseband is a new series (name unknown, but certainly not balong). 4 carrier aggregation and MIMO multi-transmission receiving technologies are applied, and the downlink speed reaches 600 Mbps.

2. According to the dictation of the person in charge of the live broadcast conference, the integrated CDMA baseband is a patent independently developed and bypassed Qualcomm!

This full network access baseband, at least technical specifications and the X12 baseband on the Qualcomm snapdragon820 flat, remove the X16 and conceptual X50. Therefore, the advantages of Qualcomm in mobile communication baseband are greatly reduced!

On the other hand, the high specification peak rate is just a symbol of the technical limit (which is the best value). In fact, in the network experience, it also needs the support of operators. In fact, to be able to stabilize the 100Mbps rate, achieve full coverage, reduce the base station conversion delay and network request delay, can achieve these advancements in order to effectively improve the network experience!

5, other

The call section indicates the design of the next-generation Mate: 4 MIC sampling. Compared with Mate8's superior call and recording capabilities built on 3MIC, Mate9 is obviously worth the wait.

The ISP section is still expected to be developed by Huawei's Nice team. The parameters were not announced at the conference (it should not be improved). The hardware support supports deep sampling, super resolution, and video stabilization.

Audio chip name HI6403, acoustic indicators slightly increased a little. Here we must be alert to Huawei's "HIFI" gimmick. The author thinks that it is actually a little bragging. The hardware-supported sampling rate has been increased to 24bit/192Khz, but the specific use of software requires software support. I estimate that the actual level should be significantly different from the stand-alone DAC design.

Security program, in fact, ARM official itself is to provide a security program called "TrustZone". Huawei InSE belongs to an autonomous, customized, higher-level authentication chip with its own encryption algorithm.

The process part, which is the same as last year, does not use the InFO package technology on Apple A10fusion (InFO can make thin the package substrate, which is conducive to wiring and heat dissipation. This InFO packaging process may be monopolized by Apple). The improvement of the process is a worldwide problem. In accordance with the history of the process, it can burn a high degree of fragrance after two years of updating a process node.


Third, competitiveness

First Tucao two points: Kirin 960 final publicity epilogue, "refused lost" translated into the inexplicable "Refuse me too".

There is also a giggle unicorn LOGO, this is not the same with the Mate series of village gun design from the same strain it?

The Hass propaganda department estimates that it is the aunt who previously used neighborhood mascots to draw mascots. In the promotion of strength, we must also pay attention to aesthetics to keep up with the times! Take a look at people's Qualcomm's image in the TV ads and graphic posters!

Questions about ARM CPU IP licensing. All ARM-compatible processors actually have to pay for ARM. At present, the independent research and development of Qualcomm's Samsung Apple is equivalent to buying an engine patent to build a car. Using the public version can be compared to buying more chassis. Autonomy is relative, of course, can also reflect the strength of research and development. The GPU has some degree of freedom. According to the technical rankings, the author still believes that Imagination>Qualcomm>ARM.

As with desktop-class chips, iterations of process routing are becoming more and more important in performance competitions, and pure architectural efficiency mining becomes increasingly difficult. The research direction of A73 is correct. To improve energy consumption ratio to maintain high-frequency performance can actually improve the application experience. The baseband part is actually the pain point for all mobile terminal IC design companies. Nvidia and Texas Instruments have long since withdrew from the mobile SOC competition because of the baseband. The Kirin 960 is highly commendable, thanks in large part to the strength of the baseband.

For handset manufacturers, Huawei's Kirin 960 is of epoch-making significance. Because of this unicorn 960, Huawei has become the only mobile terminal manufacturer in the world that can integrate vertically with flagship mobile phone SOC+ high quality full Netcom baseband! When any mobile terminal manufacturer makes technical choices in the high-end market, it will, without exception, relocate to Qualcomm because of CDMA baseband. Huawei's strength in autonomous chip and communication baseband can greatly increase design freedom, and the software team can seamlessly connect the hardware team to achieve better bottom optimization. As a reference, Samsung's flagship has adopted a dual-platform strategy because of the lack of a full network baseband, which allows the team to optimize the two platforms. The result is that Samsung's Qualcomm 820 platform is significantly less optimized than its own Exynos 8890!

In addition, according to past experience, ARM's autonomous GPU is actually a major weakness, and its architecture is less efficient than Adreno and PowerVR. The big issue that the Kirin 960 will face is the power consumption of the GPU. In short, either the Kirin 960 GPU's performance is not up to the publicity expectations or the GPU will become a big consumer of SOC .

Huawei released the Unicorn 960 at this node, which is a dislocation-competitive competitive advantage for the next-generation mobile SOC. In terms of specific performance (on the premise that ARM is not bragging), I believe that this SOC can basically release the cow that will be blowing. However, there are doubts in the following three aspects:

1, graphical performance. Peak performance should be slightly weaker than advertised, and the Adreno 530 is quite comparable, and GPUs may not be able to stabilize high frequencies.

2, camera performance. According to the promotion of the Kirin 950 press conference last year and Huawei's actual photo performance this year, if the algorithm is still not done well, the performance will bludgeon.

3, fast charging technology. This should be related to the power management chip, the current situation is not clear.

The author believes that the most obvious improvement to be foreseen is that Kirin 960 high-performance CPU + new bus + new flash specification + home optimization = may become the fastest (Android) phone.

IV. Conclusion

The progress of Huawei Hass is obvious to all. Kirin 960 is indeed perfect in terms of technology, but it is also not worthwhile. In fact, many digital enthusiasts who are disgusted with Huawei are mostly due to certain too extreme fans and Huawei’s occasional “squeaky” promotional tactics. The competition in the mobile market is very fierce. Apple is still leading the audience in addition to the baseband. This year, Samsung, who has lost his helmet and spoiled, said that he will not be able to come up with any “big tricks” next year. The old leader, Gaotong, still stands at the baseband or stands out.

The Huawei Kirin 960 seems to be a SOC with almost no shortcomings so far. At least this kind of progress makes people see great hopes. The actual performance is worth looking forward to.

The next plan: talk about the principle of running points: Please do not put a large part of the security Bunny class run my brother.


This article main information reference source:

Anandtech: http://

ARM official website: http://

Imagination Community: http://imgtec.eetrend.com/

JEDEC official website: http://

Basic Physics Experiment Instrument Series

Basic physics experiment instrument series, used in physics laboratories of colleges and universities.

Basic Physics Experiment Instrument,Light And Optical Instruments,Optical Viewing Instrument,Microscope Light Source Instrument

Yuheng Optics Co., Ltd.(Changchun) , https://www.yuhengcoder.com