Alibaba has introduced Wan2.7-Video, a video generation model designed to enhance delivery quality and creative efficiency for individual creators and industry users. The release follows the debut of the Wan2.7-Image model and represents an expansion of the company’s multimedia AI capabilities.
The Wan2.7-Video model is designed to move beyond asset generation toward full video creation workflows. It enables users to manage narrative development and post-production processes through a unified system, with tools aimed at improving control over editing and storytelling.
Wan2.7-Video includes four models: the text-to-video model Wan2.7-t2v, image-to-video Wan2.7-i2v, reference-to-video Wan2.7-r2v, and a video editing model. The system integrates text, image, video, and audio inputs into a single workflow that supports generation, editing, replication, reshaping, continuation, and referencing.
The model supports video generation ranging from two to 15 seconds in length, with output available in 720p and 1080p resolutions. Enterprise APIs are available to support batch processing and customised workflows.
According to Alibaba, the system is intended to address issues related to narrative coherence and multi-shot consistency. It allows users to manage workflows from script input through to visual output, including control over multiple elements of a production.
Wan2.7-Video also introduces natural language-based editing, enabling users to modify aspects such as character actions, dialogue, appearance, scenes, styles, and camera movements. The system supports a range of camera techniques while maintaining lighting consistency.
Dialogue editing includes automatic lip-syncing and preservation of vocal characteristics when scripts are modified. The model also allows multimodal inputs, such as using audio to influence environmental conditions or image references to define composition and character settings.
The system is designed to maintain consistency across multiple outputs, supporting up to five distinct characters with customised visual identities and voice characteristics. It also supports a range of stylistic variations and emotional expressions.
The model includes a narrative engine that generates structured storylines from prompts. It can produce multi-shot sequences with defined transitions, camera movements, and lighting conditions. A video continuation feature allows users to define ending frames to support smoother transitions between scenes.
The Wan2.7-Image model, released shortly before Wan2.7-Video, is designed to improve personalisation and colour accuracy in image generation. It includes features that allow users to adjust character traits and match specific colour codes, as well as improvements in text rendering across multiple languages.
Wan2.7-Image supports batch processing of up to nine reference images and can generate multiple outputs simultaneously. It also includes a “click-to-edit” interface for adjusting visual elements. Alibaba has also introduced Wan2.7-Image-Pro, which provides enhanced prompt interpretation, composition stability, and 4K output.
Both Wan2.7-Video and Wan2.7-Image are available through Alibaba Cloud’s Model Studio and the Wan website, with integration into the Qwen App.