Voice recognition road is bumpy, speed, private cloud become key factors

Speech recognition is considered by the industry as the next explosion point of search engines, but due to the need to improve the accuracy of recognition in practical applications, the commercialization of speech technology is not easy.

Yun Zhisheng, which was established just over a year ago last week, officially announced that it has received RMB 100 million in Series A financing. Yun Zhisheng co-founder and CEO Liang Jiaen told Nandu reporter that compared with traditional 2B speech recognition companies, Yunzhisheng's genes are more inclined to the Internet, and the use of free public cloud voice platforms to drive the customization of private cloud services is also for future Further commercialization lays groundwork.

Although the data is less than the fast execution

Liang Jiaen, who was born in the Chinese Academy of Sciences, has been engaged in speech recognition research for more than ten years, and his entrepreneurial team also has a deep technical background, and has accumulated more than ten years of professional expertise in speech recognition and semantic understanding. In his view, the demand for voice interaction is becoming more and more urgent: as the mobile Internet grows, smart phones, smart TVs, wearable devices, etc. all need a good interactive experience, and voice as a most direct way of interaction is promising; For users, they are also more willing to choose simple and natural voice interaction. "This is an industry trend, and intelligent interaction methods will become mainstream in the future." Liang Jiaen said.

With current technology, speech recognition can achieve very high accuracy under laboratory conditions, but in practical applications, problems such as environmental noise, dialect accent, and topic professionalism are often encountered, which ultimately affects the user experience. Therefore, the stability of technology and Maturity is the threshold for entrepreneurship in speech recognition. Liang Jiaen believes that in order to do a good job in speech recognition systems, in addition to powerful algorithms, there must be a lot of data. He admitted that compared with the industry leader Keda Xunfei, the amount of data of Yunzhisheng is much less, but by establishing a public cloud platform, data can be continuously accumulated to optimize the system.

Specifically, the public cloud platform provides large-vocabulary continuous voice online recognition. Developers, regardless of size, can directly call public cloud services through A PI. In fact, more than 80% of Yunzhisheng's customers are SMEs and individual developers, which happens to form a difference and complementation with HKUST Xunfei, which is deeply engaged in the service of major customers. However, this does not affect the favor of large enterprises on Yunzhisheng. Liang Jiaen said frankly, including LeTV and Hammer RO M, in fact, I was looking for HKUST Xunfei. The reason why Yun Zhisheng finally attracted them, in addition to reaching a certain level of technology, fast execution is the biggest advantage. "Taking the cooperation with Sogou as an example, it took only two weeks from the first contact to the release of Sogou voice assistant, and generally it takes several months to negotiate." These large enterprises themselves have a large number of users. Zi brings a lot of data to the public cloud platform.

Force private cloud customization

With public cloud as the foundation, Yunzhisheng further explores the path of private cloud.

The so-called private cloud is to provide customized intelligent interactive solutions for enterprises, including speech recognition, semantic understanding, speech synthesis and other aspects. Liang Jiaen explained that the public cloud platform provides only basic voice technology services, and in fact, voice interaction is very relevant to enterprise business. For enterprises that have a strong need for voice, the public cloud is not enough to fully meet the demand. It is also necessary to optimize the recognition model according to the unique application environment of the enterprise. For example, the cooperation between Yunzhisheng and LeTV is to deeply customize and integrate the voice assistant according to the TV field, so that the effect is more suitable for the actual use of smart TV. "The only people who really want to pay are those who just need it. The public cloud platform of Yunzhisheng is free, and the private cloud platform of 2B is the main source of revenue." Liang Jiaen said.

However, compared to the thousands of developers accumulated by public cloud platforms, there are only a dozen companies that customize private cloud services. How can we improve the customization of private cloud services to increase revenue? Liang Jiaen pointed out that in fact, after the public cloud platform is enlarged, its users will also be converted into private cloud users, which is why the former is free-free can attract a large number of developers to access their platform, understand and experience voice Identify. If this results in increased user activity and increased user stickiness for developers ’applications, they will recognize the value of voice and even be willing to pay for better services. Therefore, public cloud is brand promotion, but also to cultivate users.

Regarding the choice in the field of customization, Liang Jiaen said that at present, he will not frame himself, and all aspects of mobile phones, TVs, vehicles, smart watches, call centers, etc. will get involved. "Try to understand different industries to know which markets are big enough. However, we will eventually focus on two or three areas and grow bigger."

Does not compete with its own platform developers

Technical service charges alone may not be sustainable, and Yunzhisheng still has a longer-term plan on the profit model.

Liang Jiaen predicts that there may be tens of thousands of developers in the public cloud platform in the future, and when the users are gathered enough, it is possible to do back-end monetization. He envisaged that this is a chain composed of advertisers, platforms, and front-end developers: A PP users of a single developer may only be tens to millions, and the advertising value is not large; Thousands of developers and tens of thousands of applications have accumulated a large number of users, which has advertising value and recommendation value; the revenue received from advertisers is divided between the platform and the developers. However, to really get this chain connected, Liang Jiaen believes that the platform must reach at least hundreds of millions of users. With Yunzhisheng's current volume, there is still a long way to go.

In addition, Liang Jiaen said that he only focused on platform development and did not plan to develop his own voice A PP. He believes that if Yun Zhisheng also does C-end push A PP, then there is a competitive relationship with developers, and it is not practical to use their platform. "Through the developers to make the platform more valuable, developers can not only use our platform for free, but also share the benefits. Under the Internet environment in China, this business model can only go far."

Yunzhisheng gathers developers through the voice cloud platform to further explore business value in the future. This idea is the correct "big cycle idea" in the Internet era. Focus on the platform instead of simply making money for voice, and mobilize the power of developers to build the entire ecosystem.

As a technology cloud platform, clear models and strong marketing are good plus points from the beginning, but in the long run, it depends on whether the technology is competitive. In addition to Yunzhisheng, several engineering companies with technological backgrounds such as HKUST, Feifei and Spitz have already launched voice clouds, which bring more choices to developers. The market is large, and there are opportunities. The key is whether it can achieve an advantage in recognition and understanding over other platforms.

As far as the language industry is concerned, the focus has now gradually shifted to human-machine dialogue technology that can understand natural language in natural environments, including how to use a noisy car TV with a low recognition rate. Round dialogue to understand the user's intentions; how to complete the complex information search, booking transactions and other needs through the dialogue, there is still much room for development in this regard. For traditional speech companies, especially those who originally used speech recognition to text and text to do semantic recognition, they will face challenges, and few companies, such as Yunzhisheng, that have platform technology potential, are very valuable.

CMOS Barcode Scanner

Cmos Barcode Scanner,2D Cmos Barcode Scanner,Koolertron 2D Cmos Barcode Scanner,Koolertron 2D Cmos Barcode Scanner Man

Guangzhou Winson Information Technology Co., Ltd. , https://www.winsonintelligent.com