# Image Digital Human Video Synthesis

Create Image Digital Human Video Synthesis Task

# Feature Introduction

Image Digital Human: One image, one audio clip, instantly transformed into a professional talking-head video.

Ultra-Simple Creation: Simply upload an image and audio, and AI automatically generates a dynamic video with precise lip-sync and audio-visual synchronization — zero barriers to operation;

Professional Audio-Visual Quality: Supports generating up to 3-minute 720P HD videos, with lip-sync coverage across multiple languages and complex scenarios such as rap;

Vivid Performance: Based on proprietary lip-sync driving technology, expressions, eye movements, and body gestures can be flexibly controlled via prompts for a more engaging performance;

Broad Compatibility: Perfectly drives real human, cartoon, animal, and other types of avatars, easily handling e-commerce, education, media, marketing, and other full-scenario needs. Future versions will support multi-character lip-sync matching.

# API Reference

# Create Image Digital Human Video Synthesis Task

# API Description

Invokes algorithmic capabilities based on the content uploaded by the user for video synthesis, ultimately returning an MP4 video file for user download. The PaaS platform supports 7-day online storage; timely transfer is required, as generated content will no longer be available for download after 7 days.

# Request URL

POST /api/2dvh/v1/material/single/image/video/create

# Request Headers

Content-Type: application/json

# Request Parameters

Field	Type	Required	Description
`videoName`	String	True	Video name
`thumbnailUrl`	String	False	Thumbnail URL
`param`	String	True	Correct param information must be passed in to create a single-image digital human video synthesis task, including various video synthesis parameters (this parameter is a JSON-escaped string)

# Request Example

{
  "videoName": "xxx",
  "param": "{\"imageUrl\":\"https://cdn.example.com/photo.png\",\"audioUrl\":\"https://cdn.example.com/sound.mp3\",\"duration\":10,\"prompt\":\"夏日海滩场景 | Summer beach scene\",\"isNewAvatar\":false,\"avatarId\":15342352346}"

}

# Response Elements

Field	Type	Required	Description
`code`	Integer	True	0 - Success, Other - Error
`message`	String	True	Error details
`data`	Integer	False	Task ID

# Response Example

{
    "code": 0,
    "message": "success",
    "data": 1
}

# JSON Parameter Description

Name	Type	Example	Required	Description
imageUrl	String	""	Yes	Image download URL. It is recommended to use a 16:9 image as input; otherwise, default parameters will crop the image accordingly. To output at the original image resolution, use the `resizeMode` parameter to adjust.
audioUrl	String	""	Yes	Audio file download URL
duration	Number	10	Yes	Video duration (unit: seconds)
resizeMode	String	"adaptive"	No	adaptive: Non-standard aspect ratio images will be cropped by default before outputting the video; fixedMinSide: Outputs the video maintaining the original image aspect ratio (note: in this mode, the input image aspect ratio must be less than 1:3)
prompt	String	8	No	Prompt text
watermark	Object		No	Video watermark
show	Boolean	True	Yes	Whether to display video watermark
content	String	"Test"	No	Video watermark content. If enabled but content is not provided, it will be auto-filled.

{
  "imageUrl": "https://cdn.example.com/photo.png",
  "audioUrl": "https://cdn.example.com/sound.mp3",
  "duration": 10,
  "prompt": "夏日海滩场景 | Summer beach scene"
}

The above covers all video synthesis capabilities provided by the platform.

# Using Image Digital Human via the RuYing PaaS Console

# (1) Log in to the console, navigate to "Video Synthesis" - "Image Digital Human" page, and click "Image Digital Human Video Synthesis";

# (2) Enter the task name, upload an audio clip and an image, select the desired image digital human duration, and click "Confirm" to create the image digital human task;

# (3) On the current task page, you can view the task status. Once completed, you can check the output video.

# Generating Audio from Text

Since direct text-to-image-digital-human-video output is not yet supported, you can first generate audio from text and then create the image digital human. The specific method is: Navigate to "Voice Synthesis" - "Voice Synthesis" page, select the speaker ID, enter the corresponding text, and click "Synthesize" to output the corresponding audio.

← Video Synthesis Image Editing →