Can Virtual Streamers Gain a Firm Foothold in E‑commerce Live Streams?
In 2016, the wave of e‑commerce live streaming swept across the market and grew at full throttle. By the end of 2021, a series of events — the "bird’s nest" scandal involving Xinba, Viya’s tax evasion case, Austin Li’s long suspension, and Luo Yonghao’s gradual exit — cooled down the track and marked the official beginning of the second half of e‑commerce live streaming.
At the same time, the metaverse concept exploded and the virtual digital human track heated up. Although the Web3‑style metaverse has not fully arrived, virtual humans have already entered the public eye, appearing everywhere from New Year’s Eve galas to brand commercials, from co‑hosting live shows to directly selling products.
During Double 11 in 2022, JD.com used “Youguang”, a consumer‑facing virtual live product from Morpho Technology. The virtual streamer participated in seven live sessions, streaming for a total of 1,380 minutes, showcasing more than 500 products and generating nearly three million RMB in GMV. Late at night, many brand live rooms also featured digital humans. So, can virtual streamers replace real hosts? This article looks at the challenges MCN agencies and brands face in live commerce, the value and issues of “operator‑driven” and AI‑driven virtual streamers, and the future of the “people–goods–scenario” model in intelligent live rooms.
E‑commerce Live Streaming Enters Its Second Half
According to the 50th Statistical Report on China’s Internet Development in 2022, the number of users of e‑commerce live streaming reached 469 million, an increase of 5.33 million compared with December 2021, accounting for 44.6% of all internet users. Data from NetEconomics shows that the GMV of e‑commerce live streaming reached 2.36151 trillion RMB in 2021, with a penetration rate of 17.97%. Despite the industry’s setbacks, the popularity of e‑commerce live streaming remains high.
Today, e‑commerce live streaming is gradually moving toward maturity and standardization. Product categories have expanded from the early focus on beauty, apparel, and food & beverages to home goods, mother‑and‑baby products, sports & outdoor, and many other verticals. Operations have also evolved from simple discount promotions to diversified content formats such as knowledge‑sharing, hype‑style selling, and scripted performances. Platforms are increasing their regulatory oversight of merchants, and competition is shifting toward comprehensive operational capabilities.
MCN Agencies: The Talent Management Dilemma
By 2021, the number of MCN agencies in China had exceeded 20,000, and growth had slowed as overall traffic hit a ceiling. In a fiercely competitive market, MCNs use multi‑platform and multi‑account strategies to fight for traffic and audience segments. For example, "Make a Friend" leverages Luo Yonghao’s influence on Douyin to expand its audience, operating 17 matrix accounts so that consumers can choose different live rooms based on their needs.
Talent management is the core challenge for MCN agencies. On one hand, it is hard to hire enough labor and personnel costs keep rising; on the other, once a host becomes popular, they may jump ship, causing the agency to lose traffic. To retain talent, agencies must offer better compensation. Managing super hosts is even more difficult: incidents such as Viya’s tax case and the dispute between Li Ziqi and her agency have sounded the alarm for the industry. When a super host leaves or triggers a scandal, the agency suffers huge losses. As a result, platforms and MCNs continually expand their talent matrices and diversify business lines to reduce over‑reliance on a few traffic stars.
E‑commerce Hosts: Caught in Relentless Competition
The rise of e‑commerce live streaming stems from delivering two key differentiated values to consumers: “better” and “cheaper.” In the live room, consumers can directly see the appearance and functions of products, while hosts provide services such as trying on clothes or demonstrating makeup. Consumers can also interact with the host in real time to get more product information — this is the “information value.” Long‑term interaction builds emotional bonds between host and viewers; this trust and connection form the “credit value”, which is what “better” really means. “Cheaper” comes from the aggregation of demand: only top‑tier hosts can fully leverage economies of scale and continuously negotiate lower prices.
To realize these values, hosts must have strong presentation, interaction, and stress‑resilience skills. Presentation means vividly conveying product value with rich expression, solid product knowledge, and personal charisma. Interaction means communicating effectively with viewers, answering questions, and boosting purchase confidence. Stress‑resilience means staying composed under high pressure and avoiding bringing negative emotions into the live room.
According to the 2021–2022 China MCN Industry Development Report, the number of e‑commerce live streaming hosts in China was expected to reach 1.234 million in 2022. In addition to full‑time hosts, celebrities and vertical KOLs have also joined the live‑shopping battlefield. The Matthew effect for top hosts is already clear: "Make a Friend" runs a 7×24‑hour live schedule, while Oriental Selection broadcasts about 17.5 hours per day. Brands like L’Oréal, Florasis, and Li‑Ning also use virtual streamers late at night to achieve 24‑hour non‑stop live streaming. So how well do virtual streamers actually perform?
Virtual Streamers: Can They Replace Real Hosts?
At present, virtual digital humans are mainly driven in two ways: “operator‑driven” and AI‑driven.
An “operator” is the person who controls the virtual streamer during a live show. Using motion‑capture and facial‑capture technologies, they enable the virtual character to interact with the real world. A full‑body motion‑capture hardware set costs around 29,000 RMB, with about 800 RMB per year for software services; facial‑capture devices cost roughly 6,000 RMB. The operator is the “soul” of the virtual character, while the virtual avatar is only the “shell.” AI‑driven models, by contrast, use AI technology to create, drive, and generate content for virtual humans, giving them perception and expression abilities so they can intelligently parse external inputs and respond with corresponding speech and movements.
Operator‑Driven: More About Streaming Than Selling
Operator‑driven virtual streamers attract audiences with their novel content formats and are favored by MCN agencies as a way to gain more traffic in the live market. However, they cannot test products in real time like human hosts, so a real assistant is usually needed to display product selling points such as how makeup looks on skin or how clothes fit.
After Viya’s “fall from grace,” many people began to worry whether operator‑driven virtual streamers would face similar problems. The Japanese virtual idol Kizuna AI lost popularity after changing operators, and the virtual idol group A‑Sou sparked fan backlash due to poor treatment of their operators. These cases show that fans care more about the “soul” behind a virtual idol. This challenges the assumption that virtual idols are low‑cost and low‑risk; the ethics and professionalism of operators are just as important.
E‑commerce hosts inherently carry an idol attribute. Austin Li is loved by “all the girls” because fans find him genuine and full of positive energy. Trust and emotional connection are built on the real person. Therefore, running operator‑driven virtual e‑commerce hosts is far from risk‑free for MCN agencies.
AI‑Driven: Still in Its Infancy
Brands such as L’Oréal, YSL, and Lancôme use AI‑driven virtual streamers for self‑broadcasting, but because their performance still falls short of real human hosts, they are usually scheduled only for late‑night slots. Alibaba Cloud’s basic intelligent live‑room solution costs 99,000 RMB per year per channel and covers services such as AI‑generated scripts, multimodal intelligent interaction, and smart integration with marketing platforms.
On Taobao, most intelligent live‑room avatars are 3D cartoon‑style characters with rich preset motion libraries and realistic synthesized voices, giving them a lively selling style. Brands can customize the virtual host’s outfits; for example, The North Face’s flagship store dresses its virtual host in the brand’s own clothing. Product display is mainly image‑based, with text effects appearing on screen when the host explains key selling points. The live‑room scene has a sense of 3D space, creating an interactive virtual background; while explaining products, the host appears in front of a blue screen where product images are composited. The interaction flow is fairly fixed, including entrance greetings, prompts to follow and place orders, and Q&A segments.
L’Oréal’s JD self‑operated flagship store uses a 2D semi‑realistic avatar, which can be customized by training on just a two‑minute video. Product display follows the human‑host pattern, lining up product images in front of the avatar. However, because the virtual host cannot physically handle the products, its reviews lack authenticity and can easily turn viewers off. For now, AI‑driven virtual streamers are more like “vases”: they can attract curious viewers and handle basic product introductions and Q&A, but little more.
Intelligent Live Rooms: People, Goods, and Scenarios
Virtual Avatars: Beautiful Shells and Interesting Souls
Virtual digital humans come in diverse visual styles, from 2D to 3D and from cartoonish to hyper‑realistic. The higher the fidelity, the higher the production cost. In 2022, hyper‑realistic virtual human videos cost 8,000–15,000 RMB per second, and producing a four‑minute video of the virtual influencer Liu Yexi exceeded one million RMB. As modeling technology advances, production thresholds, costs, and timelines are expected to decrease.
Hyper‑realistic figures like Liu Yexi and AYAYI attract a lot of attention, but the “girl‑next‑door” virtual influencer Angie wins fans with a healing, comforting vibe. As the saying goes, “Pretty faces are common; interesting souls are one in a million.” While the capabilities of virtual humans can be replicated, standing out still requires unique positioning, high‑quality content, and long‑term cultivation. AIGC lowers the cost of content creation, but truly outstanding content still depends on human imagination.
The greatest value of live streaming lies in interaction. Enhancing interactivity deepens the sense of presence and significantly influences purchasing decisions. The comment section is the primary channel for interaction between host and viewers, so boosting a virtual human’s ability to analyze comments is critical. Virtual humans must be able to judge the validity and importance of comments, detect user emotions, understand the atmosphere in the room, and then generate appropriate responses. Combined with models like ChatGPT and knowledge graphs, AI‑driven virtual streamers will make more intelligent decisions and generate richer, more insightful replies, improving user experience and encouraging viewers to stay longer.
Product Display: Striving for Authenticity
The core of a live room is authenticity — in product display, trials, and reviews — but this is precisely the biggest challenge for virtual streamers. The cosmetic live‑stream “fail” of virtual influencer Ling, who promoted beauty products, showed that a virtual character without any skin concerns struggles to evoke empathy. Even though 3D interaction technologies such as SLAM can improve how digital humans interact with real‑world spaces, enabling realistic interaction with physical products is still difficult, and users continue to demand more detailed product information.
According to the 2022 Virtual Digital Human Comprehensive Evaluation Index Report, virtual digital human development can be divided into three stages: the “human‑like” stage, where movements, appearance, and voice closely match a real person and basic real‑time communication is possible using AI; the “same‑as‑human” stage, where interaction progresses from appearance imitation to emotional engagement, enabling high‑quality emotional exchanges; and the “super‑human” stage, where virtual humans surpass natural human capabilities and become “virtual entities” in their own right. Perhaps only when virtual streamers possess real, physical bodies in a decade or so will they truly gain a solid foothold in live e‑commerce.
Scenes in Flux, Infinite Possibilities
While scenes are not as fundamental as “people” and “goods,” strong visual effects can still attract viewers to stay. Green‑screen virtual live‑room setups are relatively low‑cost: with simple keying technology, you can quickly create virtual backgrounds, and even mobile apps for green‑screen keying can be bought with a one‑time membership fee of just 288 RMB. However, the cost of building virtual live‑room scenes varies depending on scale and complexity — larger and more complex rooms require higher investment. In the future, as MR devices become more widespread, they will drive further tech iterations, allowing users to immerse themselves in the environment, interact with scenes, hosts, and other viewers, and enjoy a truly immersive, interactive shopping experience.
