Breakthrough in multimodal video generation technology, what opportunities does Web3 AI have?

7/9/2025, 10:21:18 AM
Intermediate
AITechnology
This article analyzes the breakthroughs in multimodal video generation technology (such as Byte's EX-4D, Google Veo, etc.) and discusses their profound impact on the creator economy and Web3 AI.

Apart from the “submergence” of AI localization, the biggest change in the AI sector recently is the technological breakthrough in multimodal video generation, which has evolved from supporting pure text-based video generation to a fully integrated generation technology combining text, images, and audio.

Here are a few examples of technological breakthroughs for everyone to experience:

1) ByteDance open-sources the EX-4D framework: Monocular video instantly transforms into free-viewpoint 4D content, with a user acceptance rate of 70.7%. This means that for an ordinary video, AI can automatically generate viewing effects from any angle, which previously required a professional 3D modeling team to achieve.

2) Baidu “Hui Xiang” platform: generates a 10-second video from one image, claiming to achieve “movie-level” quality. However, whether this is exaggerated by marketing remains to be seen until the Pro version update in August.

3) Google DeepMind Veo: Can achieve 4K video + environmental sound synchronization generation. The key technological highlight is the achievement of “synchronization” capability, as previously it was a splicing of two systems for video and audio. To achieve true semantic-level matching, significant challenges must be overcome, such as in complex scenes, where the synchronization of walking actions in the video and corresponding footstep sounds must be addressed.

4) Douyin ContentV: 8 billion parameters, 2.3 seconds to generate 1080p video, cost 3.67 yuan/5 seconds. To be honest, this cost control is quite good, but currently, considering the generation quality, it still falls short when encountering complex scenes.

Why is it said that these cases have significant value and meaning in terms of breakthroughs in video quality, production costs, and application scenarios?

1. In terms of breakthroughs in technological value, the complexity of generating a multimodal video is often exponential. A single frame image consists of about 10^6 pixels, and a video must ensure temporal coherence (at least 100 frames), along with audio synchronization (10^4 sample points per second), while also considering 3D spatial consistency.

In summary, the technical complexity is not low. Originally, it was a super large model tackling all tasks head-on. It is said that Sora burned tens of thousands of H100s to achieve video generation capabilities. Now, it can be realized through modular decomposition and collaborative work of large models. For example, Byte’s EX-4D actually breaks down complex tasks into: depth estimation module, viewpoint transformation module, temporal interpolation module, rendering optimization module, and so on. Each module specializes in one task and then coordinates through a mechanism.

2. In terms of cost reduction: it actually involves optimizing the reasoning architecture itself, including a layered generation strategy, where a low-resolution skeleton is generated first and then high-resolution imaging content is enhanced; a caching reuse mechanism, which is the reuse of similar scenes; and dynamic resource allocation, which actually adjusts the model depth based on the complexity of the specific content.

With this set of optimizations, we will achieve a result of 3.67 yuan per 5 seconds for Douyin ContentV.

3. In terms of application impact, traditional video production is a capital-intensive game: equipment, venues, actors, post-production; it’s normal for a 30-second advertisement to cost hundreds of thousands. Now, AI compresses this entire process to a prompt plus a few minutes of waiting, and can achieve perspectives and special effects that are difficult to attain in traditional shooting.

This turns the original technical and financial barriers of video production into creativity and aesthetics, which may promote a reshuffling of the entire creator economy.

The question arises, what is the relationship between the changes in the demand side of web2 AI technology and web3 AI?

1. First, the change in the structure of computing power demand. Previously, in AI, the competition was based on scale; whoever had more homogeneous GPU clusters would win. However, the demand for multimodal video generation requires a diverse combination of computing power, which could create a need for distributed idle computing power, as well as various distributed fine-tuning models, algorithms, and inference platforms.

2. Secondly, the demand for data labeling will also strengthen. Generating a professional-grade video requires: precise scene descriptions, reference images, audio styles, camera movement trajectories, lighting conditions, etc., which will become new professional data labeling requirements. Using Web3 incentive methods can encourage photographers, sound engineers, 3D artists, and others to provide professional data elements, enhancing the AI video generation capability with specialized vertical data labeling.

3. Finally, it is worth mentioning that when AI gradually shifts from centralized large-scale resource allocation to modular collaboration, it itself represents a new demand for decentralized platforms. At that time, computing power, data, models, incentives, etc. will jointly form a self-reinforcing flywheel, which will in turn drive the integration of web3AI and web2AI scenarios.

Statement:

  1. This article is reprinted from [tmel0211tmel0211],Copyright belongs to the original author [tmel0211] If you have any objections to the reprint, please contact Gate Learn TeamThe team will process it as quickly as possible according to the relevant procedures.
  2. Disclaimer: The views and opinions expressed in this article are solely those of the author and do not constitute any investment advice.
  3. Other language versions of the article are translated by the Gate Learn team, unless otherwise mentioned. Gate Under no circumstances shall translated articles be copied, disseminated, or plagiarized.

Share

Crypto Calendar

Proje Güncellemeleri
Etherex, 6 Ağustos'ta REX token'ını piyasaya sürecek.
REX
22.27%
2025-08-06
Nadir Geliştirici ve Yönetim Günü Las Vegas'ta
Cardano, 6-7 Ağustos tarihleri arasında Las Vegas'ta Rare Dev & Governance Day etkinliği düzenleyecek. Etkinlik, teknik gelişim ve yönetişim konularına odaklanan atölye çalışmaları, hackathonlar ve panel tartışmaları içerecek.
ADA
-3.44%
2025-08-06
Blok Zinciri.Rio Rio de Janeiro'da
Stellar, 5-7 Ağustos tarihlerinde Rio de Janeiro'da gerçekleştirilecek Blockchain.Rio konferansına katılacak. Program, Stellar ekosisteminin temsilcilerini, Cheesecake Labs ve NearX ortakları ile birlikte içeren anahtar konuşmalar ve panel tartışmaları içerecek.
XLM
-3.18%
2025-08-06
Webinar
Circle, 7 Ağustos 2025 tarihinde, UTC 14:00'te "GENIUS Yasası Dönemi Başlıyor" başlıklı bir canlı Yönetici İçgörüleri web semineri düzenleyeceğini duyurdu. Oturum, Amerika Birleşik Devletleri'nde ödeme stablecoin'leri için ilk federal düzenleyici çerçeve olan yeni kabul edilen GENIUS Yasası'nın etkilerini inceleyecek. Circle'ın Dante Disparte ve Corey Then, yasaların dijital varlık inovasyonu, düzenleyici netlik ve ABD'nin küresel finansal altyapıdaki liderliği üzerindeki etkilerini tartışacak.
USDC
-0.03%
2025-08-06
X üzerinde AMA
Ankr, 7 Ağustos'ta UTC 16:00'da X üzerinde bir AMA düzenleyecek ve DogeOS'nin DOGE için uygulama katmanını inşa etme çalışmalarına odaklanacak.
ANKR
-3.23%
2025-08-06

Related Articles

Blockchain Profitability & Issuance - Does It Matter?
Intermediate

Blockchain Profitability & Issuance - Does It Matter?

In the field of blockchain investment, the profitability of PoW (Proof of Work) and PoS (Proof of Stake) blockchains has always been a topic of significant interest. Crypto influencer Donovan has written an article exploring the profitability models of these blockchains, particularly focusing on the differences between Ethereum and Solana, and analyzing whether blockchain profitability should be a key concern for investors.
6/17/2024, 3:14:00 PM
Arweave: Capturing Market Opportunity with AO Computer
Beginner

Arweave: Capturing Market Opportunity with AO Computer

Decentralised storage, exemplified by peer-to-peer networks, creates a global, trustless, and immutable hard drive. Arweave, a leader in this space, offers cost-efficient solutions ensuring permanence, immutability, and censorship resistance, essential for the growing needs of NFTs and dApps.
6/8/2024, 2:46:17 PM
 The Upcoming AO Token: Potentially the Ultimate Solution for On-Chain AI Agents
Intermediate

The Upcoming AO Token: Potentially the Ultimate Solution for On-Chain AI Agents

AO, built on Arweave's on-chain storage, achieves infinitely scalable decentralized computing, allowing an unlimited number of processes to run in parallel. Decentralized AI Agents are hosted on-chain by AR and run on-chain by AO.
6/18/2024, 3:14:52 AM
In-depth Analysis of API3: Unleashing the Oracle Market Disruptor with OVM
Intermediate

In-depth Analysis of API3: Unleashing the Oracle Market Disruptor with OVM

Recently, API3 secured $4 million in strategic funding, led by DWF Labs, with participation from several well-known VCs. What makes API3 unique? Could it be the disruptor of traditional oracles? Shisijun provides an in-depth analysis of the working principles of oracles, the tokenomics of the API3 DAO, and the groundbreaking OEV Network.
6/25/2024, 1:56:05 AM
Dimo: Decentralized Revolution of Vehicle Data
Beginner

Dimo: Decentralized Revolution of Vehicle Data

Dimo is a car IoT platform built on Polygon, allowing car owners to collect and share vehicle data such as mileage, speed, and location, in exchange for DIMO tokens as rewards. The platform enables real-time monitoring, management, and monetization of vehicle data through integration with hardware such as AutoPi OBDII devices. The DIMO token, based on ERC-20, aims to incentivize user participation, with governance features included in its token economy. Dimo also collaborates with IoTeX, integrating W3bstream technology to support Web3 developers' access to vehicle data, jointly creating a new ecosystem for mobile travel. With two rounds of funding raising $20.5 million, the Dimo project has a fixed token supply, with circulating supply gradually increasing.
5/6/2024, 12:37:57 PM
AI Agents in DeFi: Redefining Crypto as We Know It
Intermediate

AI Agents in DeFi: Redefining Crypto as We Know It

This article focuses on how AI is transforming DeFi in trading, governance, security, and personalization. The integration of AI with DeFi has the potential to create a more inclusive, resilient, and future-oriented financial system, fundamentally redefining how we interact with economic systems.
11/28/2024, 3:45:01 AM
Start Now
Sign up and get a
$100
Voucher!