AI Transcription Expands into Japan: Achieving Milestone of Tens of Millions in Annual Revenue

Filling gaps in the consumer market while strategically expanding into the enterprise service sector.

60% of companies established within 4 years, is the conference transcription that received crazy funding during the pandemic "fragile"?

Transcription is an ancient profession that emerged almost simultaneously with the invention of written language, playing an important role in promoting business, legal, and religious communication as well as recording history. Even today, dedicated professionals still engage in related work in fields such as healthcare, law, and government affairs. Taking the medical field as an example, in the United States, there is one specialized "medical transcriptionist" for every 10 doctors. They help doctors record patients' medical history, treatment process, allergy history, and other information to facilitate doctors' subsequent medical decisions. As an industry necessity, it's understandable why so many startups are entering this field, specializing in categories such as general healthcare, mental health, and veterinary medicine. The conference voice transcription we are mainly discussing today is actually a relatively "emerging" niche track that has flourished during the pandemic.

According to statistics, among the 15 conference scenario transcription startups involved in the above chart, 60% were established in 2020 and later, with most of them completing their latest round of financing during the pandemic. In those years, remote work became a trend, and major platforms like Zoom reaped growth dividends, while a batch of products serving as "supporting facilities" also developed rapidly. It's not hard to imagine that Notta was one of the products that "rode the wave". In 2022, it received $10 million in funding led by Hillhouse Capital, and by spring 2023, its cumulative users in Japan had reached 1 million.

To be fair, conference transcription doesn't sound like a high-barrier track at first glance, and "it's too easy to eliminate such products" is probably the first impression of some practitioners when they see similar products. Honestly, when I first saw Notta, I had similar doubts. How can a conference transcription tool achieve such high monthly visits, and Notta's website traffic was still growing continuously until May, while Japan, which it chose, doesn't seem to be a very popular "conference transcription" startup market.

Why Japan?

Choosing Japan is actually related to Zhang Yan's early entrepreneurial experience.

Zhang Yan is a serial entrepreneur who previously worked at an internet advertising company (reportedly Allyes Ad Network), and later worked at Youku, Baidu, DiDi, and iFlytek. However, his most well-known identity is still as a co-founder of Mobike. Before founding Notta, Zhang Yan created an overseas translation device brand called Langogo, which was originally crowdfunded in the US market but unexpectedly sold well in the Japanese market. It was this entrepreneurial experience that led Zhang Yan to pay extra attention to the nearby Japanese market when he turned to software entrepreneurship.

At that time, Zhang Yan had a prediction about the Japanese language transcription market at that stage. He believed that the transcription accuracy of ToC products on the market was low, and the user experience and UI design were also poor. If they could make up for these shortcomings, they would have a chance. But soon Zhang Yan understood the deeper factors behind the "shortage of good products". Firstly, Japanese local enterprises have a strong ToB gene. According to a "Voice Technology Map (2020 version)" compiled by Epic Base, all 8 local companies doing conference voice transcription that year were exclusively engaged in ToB business, with only half able to provide services to individuals, which seemed more like a side business. Secondly, few overseas companies focused on the Japanese market, and even if they did, their localization was poor. For example, otter.ai provided English transcription services for a long time after entering the Japanese market... Under various factors, Notta finally seized the precious window of opportunity.

Now we can see that what Notta has actually figured out is a development path from ToC to ToB, starting with a single-point tool to accumulate C-end users and reputation, and then gradually shifting towards B-end to expand revenue. It's also quite amazing that Japan's SaaS industry has always been known for its low customer churn rate, with an industry average of about 1.2%, which on one hand means stable cooperation, but is actually not friendly to market latecomers. However, Notta's "bottom-up" strategy really worked, not only because of its solid product but also because of the Notta team's deep understanding of growth and localization.

Capturing the ToC gap, "smoothly" entering the ToB field

Notta launched on app stores in December 2019, targeting the function of "helping users record audio and conversations". Although the core function is simple, the team must have invested a lot in user experience. The earliest version of Notta that Point Data can currently see is from September 2020, supporting users to quickly import files, multi-terminal data synchronization, real-time transcription in 104 languages, keyword search, real-time editing, sound classification, and other functions. It may seem ordinary now, but we can understand the experience improvement Notta brought to Japanese users from another perspective. In the middle of that year, Notta experienced a surge in downloads due to a report by Asahi TV News in Japan without any active promotion, and "unexpected fortune" came almost without warning.

The author couldn't find the original video of the report after several searches, but it's not hard to speculate that getting the report was partly due to Notta's strong capabilities, and partly because it allowed C-end demand to be released. In July 2021, Notta's disclosed user composition showed that business and news were the two main high-frequency scenarios for Notta users, with classroom and conversation assistance for hearing-impaired people also accounting for over 20%.