# Platform Capabilities
The platform offers various algorithmic capabilities for enterprise accounts, including: video synthesis, creation of character image models, generation of TTS personal voice models, updating of character image models, and video face swapping capabilities, among others.
# Feature Introduction
# Video Synthesis
The 2D digital human video synthesis service allows you to select a 2D digital human model, add text or audio to synthesize into a 2D virtual digital human video in mp4/webm format, and download the video content via the returned video link.
- Image Configuration
- Supports specifying the 2D digital human image needed for this video synthesis through parameters. The system provides several default 2D digital human image models for users to choose from. Details can be viewed after contacting the operation to open an account.
- Voice Configuration
- The system supports two forms of voice configuration:
- Uploading a recording file, supporting online recording upload or selecting a corresponding audio file to upload. The audio will undergo noise reduction before the original sound is used in the final video content.
- Uploading text + selecting a voice, supports specifying the voice of the speaker and adjustments to the voice's speed, pitch, and volume through parameters for this video synthesis. The system provides several default TTS personal voice models for users to choose from, which will read the corresponding text content in the specified voice, and this audio will be used for video synthesis.
- The system supports two forms of voice configuration:
- Digital Human Driving
- Supports digital human expression and mouth shape driving.
- Video Encoding Information
- Encoding format: H264
- Frame rate: 25FPS
- Video Format
- Currently supports MP4/WebM formats. The video length is determined by the content selected during video synthesis.
- Video Resolution
- Supports specifying the output video resolution when creating a video synthesis task. Recommended range: 480p, 720p, 1080p
- Subtitles
- Supports generating subtitles files that match the text or voice content entered by the user.
- Custom Foreground/Background/Title Text
- Supports specifying the video background image through a URL, with jpg and png formats supported.
- Supports specifying the video foreground image through a URL, with jpg and png formats supported.
- Supports specifying the font, font size, and position of the title text content in the video through parameters.
- Custom Human Beautification Effects
- Supports adjusting human beautification effects through parameters, including: whitening, skin smoothing, face shape adjustment, eye shape adjustment, hairline adjustment, cheekbone adjustment, nose adjustment, chin adjustment, mouth adjustment, philtrum adjustment, head shrinking, contrast, saturation, clarity, and sharpness adjustments, among more than ten parameter adjustment functions. For details, please refer to the Parameter Description (opens new window) to understand the usage rules.
- Maximum Storage Time
- The PaaS platform supports 7 days of online storage. It is necessary to transfer the data in time, as content generated will not be downloadable after 7 days.
# Video Synthesis Sequence Diagram
# TTS Personal Voice Model Generation
The TTS personal voice model generation service can produce a digital human TTS voice model that matches the voice of the voice material provider, based on the user-uploaded real human collected or recorded voice material files through algorithm training. Please follow the SenseTime digital human voice copy collection production specifications during collection, which include environmental requirements, equipment requirements, pronunciation requirements, authorization requirements, and reading scripts. For details, refer to: Collection Specifications (opens new window). The PaaS platform supports 7 days of online storage, which requires timely transfer, as content generated will not be downloadable after 7 days.
# Character Image Model Generation
The character image model generation service can produce an AI-driven digital human character image model almost indistinguishable from a real person, based on the user-uploaded real human collected or recorded video through algorithm training. For perfect cloning of the character image, please follow the SenseTime digital human collection production specifications during filming, which include video, voice, for training and testing of 2D digital humans. For details, refer to: Collection Specifications (opens new window). The PaaS platform supports 7 days of online storage, which requires timely transfer, as content generated will not be downloadable after 7 days.
# Character Image Model Generation Sequence Diagram
# Character Image Model Update
The 2D digital human character image model update service can update the already completed character image models, supporting modifications to the digital human training action clips. The PaaS platform supports 7 days of online storage, which requires timely transfer, as content generated will not be downloadable after 7 days.
# Green Screen Segmentation Effect Preview
The platform supports previewing the green screen segmentation effect on images and videos, to confirm the green screen segmentation parameters effect before submitting the character model generation task, or to confirm whether the filming environment meets the shooting requirements before the official shooting.
# Video Character Face Swap (Not Supported Yet)
The video character face swap task can process video character face swapping based on the user-uploaded video content and template images using algorithmic capabilities, and finally return the processed video file and thumbnail for the user to download. The PaaS platform supports 7 days of online storage, which requires timely transfer, as content generated will not be downloadable after 7 days.
# API Description
To call all API services of the platform, users need to access the service entry point: aigc.softsugar.com, and add token information in the request header.
# Creating a Video Synthesis Task
# Interface Description
Calls the algorithm capability based on the content uploaded by the user to perform video synthesis, ultimately returning an mp4/webm format video file for the user to download. The PaaS platform supports 7 days of online storage, which requires timely transfer, as content generated will not be downloadable after 7 days.
# Request Address
POST
/api/2dvh/v1/material/video/create
# Request Header
Content-Type:
application/json
# Request Parameters
Field | Type | Required | Description |
---|---|---|---|
param | String | True | To create a video synthesis task, correct param information needs to be passed in, which includes various video synthesis parameters (this parameter is a string escaped from json), please refer to the parameter description and json example (opens new window) , example effect (opens new window). |
videoName | String | True | Video name |
thumbnailUrl | String | False | Thumbnail URL |
# Request Example
{
"videoName": "xxx",
"param": "{\"version\":\"0.0.4\",\"resolution\":[1080,1920],\"bit_rate\":16,\"frame_rate\":25,\"watermark\":{\"show\":true,\"content\":\"Example Video\"},\"digital_role\":{\"id\":3964,\"face_feature_id\":\"0401_chenying_s1\",\"name\":\"0401_chenying_s1\",\"url\":\"https://dwg-aigc-paas.oss-cn-hangzhou.aliyuncs.com/materials/77/0401_chenying_s1_20230427133135306.zip\",\"position\":{\"x\":0,\"y\":0},\"scale\":1.0},\"tts_config\":{\"id\":\"nina\",\"name\":\"Nina\",\"vendor_id\":3,\"language\":\"zh-CN\",\"pitch_offset\":0.0,\"speed_ratio\":1,\"volume\":100},\"tts_query\":{\"content\":\"The Silk Road is an ancient trade route connecting the East and the West. On this road, the East and the West promoted the continuous integration of different civilizations through trade and cultural exchanges. Historically, Zhang Qian's mission to the Western Regions opened the earliest Silk Road, since then, merchants on the Silk Road have crossed deserts and mountains for trade exchanges time and again. Chinese silk, porcelain, tea, as well as Indian Buddhism, Greek philosophy, and more were fully inherited and developed on this road.\",\"ssml\":false},\"backgrounds\":[{\"type\":0,\"name\":\"Background\",\"url\":\"http://dwg-aigc-paas.oss-cn-hangzhou.aliyuncs.com/test/background.png\",\"rect\":[0,0,1080,1920],\"cycle\":false,\"start\":0,\"duration\":-1}],\"foregrounds\":[{\"type\":0,\"name\":\"Foreground\",\"url\":\"http://dwg-aigc-paas.oss-cn-hangzhou.aliyuncs.com/test/frontgroud.png\",\"rect\":[0,1359,1092,561],\"cycle\":false,\"start\":0,\"duration\":-1}],\"foreground-texts\":[{\"text\":\"Introduction to the Silk Road\",\"font_size\":20,\"font_family\":\"Noto Sans S Chinese Black\",\"position\":{\"x\":100,\"y\":200},\"rgba\":[100,200,100,100]}]}"
}
# Response Elements
Field | Type | Required | Description |
---|---|---|---|
code | Integer | True | 0 - Success, Others - Exception |
message | String | True | Detailed information of the exception |
data | Integer | False | Task ID |
# Response Example
{
"code": 0,
"message": "success",
"data": 1
}
# JSON Parameter Description
Name | Type | Example Value | Required | Description |
---|---|---|---|---|
version | String | "0.0.17" | Yes | The latest version number of the video synthesis JSON configuration file |
video_format | String | "mp4" | No | Video output formats can be MP4, WEBM, or MOV. If this field is not specified, the default is MP4. Among these, WEBM and MOV formats support alpha transparency. |
resolution | Int Array | [1080,1920] | Yes | Video resolution, it is recommended to choose one of the three vertical formats [480,854], [720,1280], [1080,1920], the resolution of the human model is 2K (1080 * 1920) and 4K (2160 * 3840), different resolutions require adjusting the digital human figure ratio to achieve better effects, for example, when choosing [1080,1920] resolution, it is recommended to adjust the 4K digital human figure scale parameter to about 0.5 |
bit_rate | Float | 8 | No | Video bitrate (Mbps), maximum value 16, minimum value 1 |
frame_rate | Integer | 25 | Yes | Video frame rate, currently only supports 25fps |
watermark | Object | No | Video watermark | |
show | Boolean | True | Yes | Whether to display video watermark |
content | String | "Testing Test " | Yes | Video watermark content |
invisible-watermark | Object | No | Invisible video watermark, only supports mp4 | |
show | Boolean | True | Yes | Whether to enable invisible video watermark |
content | String | "1234567890123456 " | No | Invisible watermark text, Chinese characters are not allowed! Only English letters and numbers are allowed, a total of 16 characters, if less than 16 characters, the rest will be automatically filled with 0, if more than 16, the first 16 characters will be taken. If this field is not filled in, the watermark "AI Synthesis" will be automatically filled in, the watermark is located at the bottom right corner of the image, and the watermark disappears after three seconds |
digital_role | Object | Content as below | ||
id | Integer | 1 | No | Digital human id |
face_feature_id | String | "1" | Yes | Digital human face feature id |
name | String | "Xiao Li" | No | Digital human name |
url | String | "https://xxx/role.zip" | Yes | Digital human figure zip package |
position | Object | Content as below | Yes | The starting pixel position of the digital human figure image, with the top left corner of the 1080*1920 resolution canvas as the origin, to the right as the x direction, and down as the y direction |
x | Integer | 0 | Yes | x-direction coordinate value |
y | Integer | 0 | Yes | y-direction coordinate value |
scale | Float | 1.0 | Yes | Digital human figure ratio |
rotation | Float | 0.0 | Yes | Rotation angle, value range [0.0,360.0], the opposite direction of the Y-axis of the canvas coordinate system is 0 degrees, the angle of the clockwise direction is the rotation angle, rotating with the center point of the image as the anchor |
volume | Integer | 0 | No | Digital human broadcast volume, size value range 0~100.Note: Minimum version requirement: 0.0.13. |
z_position | Integer | 0 | Yes | Layer order, each zposition must not repeat, the larger the number, the more forward it is displayed. Note: Minimum version requirement: 0.0.6 (mandatory field). |
start_frame_index | Integer | 0 | No | The starting frame of the composite video, the value range is [1, N], if the input parameter range is not within the range, it will directly return an error.Note: Minimum version requirement: 0.0.14. It is not recommended to set this parameter for premium digital humans |
tts_config | Object | Content as below | Yes | TTS configuration. Either tts_query or audios must exist, when both tts_query and audios are present, tts_query takes precedence |
qid | String | 8wfZav:AEA_Z10Mqp9GCwDGMrz8xIzi3VScxNzUtLCh | No | Filling in this field will override the voiceid , language , and vendor_id fields. |
id | String | "zh-CN-XiaoxuanNeural" | Yes | Speaker id, same as voiceID |
name | String | "Xiaoxuan" | Yes | Speaker name |
vendor_id | Integer | 4 | Yes | Vendor id, i.e., vendor_id, needs to be consistent with the used TTS voice model information. Randomly filling in will result in errors. Do not fill in randomly. When using qid, this field can be absent. |
language | String | "zh-CN" | Yes | Language code |
pitch_offset | Float | 0.0 | Yes | Pitch, the higher the value, the sharper the sound, the lower the value, the deeper the sound, support range [-60, 60] |
speed_ratio | Float | 1 | Yes | Speech rate, the higher the value, the slower the speech rate, support range [0.5, 2] |
volume | Integer | 100 | Yes | Volume, the higher the value, the louder the sound, support range [1, 400] |
tts_query | Object | Content as below | No | TTS voice synthesis. Either tts_query or audios must exist, when both tts_query and audios are present, tts_query takes precedence |
content | String | "Dear audience, hello! I am very honored to be able to gather with everyone at this beautiful moment, welcome to watch today's program." | Yes | The text content for voice synthesis, the number of characters must not be less than 10, all language speakers can synthesize English queries; all language speakers can synthesize queries in their own language; Cantonese, Shanghainese, and other Chinese dialect speakers can synthesize Chinese queries |
use_action | Boolean | false | false | Is action editing supported in TTS text? The action definition in TTS text is as follows: if you want to insert an action, insert {action index:action number} at the corresponding position in the text. For example, {action index:0}. There is a space between "action" and "index". The action number can be obtained from the digital human's result JSON. If the user's TTS itself needs to output {action}, use ^{action } for escaping, so it is not extracted as an action. |
ssml | Boolean | false | No | Whether to use ssml, after opening, the query can use USSML and SSML, USSML is recommended |
audios | Object Array | Content as below | No | Audio drive. Either tts_query or audios must exist, when both tts_query and audios are present, tts_query takes precedence |
url | Object | {"url":"https://xxx/audio.mp3"} | Yes | Array, supports multiple mp3 format driving audio files |
subtitle | Object | Content as below | No | Subtitles |
url | String | "https://xxx/subtitle.srt" | Yes | Subtitle file list.Versions 0.0.13 and earlier only parse this field |
urls | String Array | ["https://xxx/subtitle.srt","https://xxx/subtitle.srt"] | No | Versions 0.0.14 and later prioritize parsing this field, if this field does not exist, then parse the url field. Special case description: If the version number is greater than or equal to 0.0.14, and there are multiple audios in audios, at this time, the url field is still parsed, only one subtitle is displayed, this is a normal case. |
scale | Float | 1.0 | Yes | Text scaling ratio, value range 0~+∞, default is 1, the original reference size is font_size. |
position | Object | Content as below | No | The starting position of the subtitles, with the top left corner of the 1080*1920 resolution canvas as the origin, to the right as the x direction, and down as the y direction, the default position is below the video, and the subtitle effect is centered.Note: Minimum version requirement: 0.0.13. |
x | Integer | 0 | No | x-direction coordinate value |
y | Integer | 0 | No | y-direction coordinate value |
rgba | Int Array | [100,100,100,100] | Yes | Subtitle color, entered in rgba format, value range 0~255 【alpha channel not supported】 |
font_size | Integer | 20 | Yes | Subtitle font size setting |
font_family | String | "Noto Sans S Chinese Black" | Yes | Font name, supported fonts see JSON supported font list |
stroke_width | Float | 2 | No | Stroke width, value range 0~+∞, default is 0, indicating stroke width.Note: Minimum version requirement: 0.0.10. |
stroke_rgba | Int Array | [100,100,100,100] | No | Subtitle stroke color, entered in rgba format, value range 0~255 【alpha channel not supported】.Note: Minimum version requirement: 0.0.10. |
background_rgba | Int Array | [100,100,100,100] | Yes | Subtitle background (font base color) color, value range 0~255. Alpha channel value of 0 means fully transparent.Note: Minimum version requirement: 0.0.10. |
opacity | Float | 0.5 | No | Subtitle layer transparency, value range 0~1. 0 means fully transparent, 1 means opaque.Note: Minimum version requirement: 0.0.10. |
subtitle_max_len | Integer | 10 | No | Maximum subtitle split length, default is 0, i.e., no length limit, if the maximum split word count is not set, the maximum length of subtitles occupies 80% of the canvas width, if exceeded, it will automatically wrap.Note: Minimum version requirement: 0.0.10. |
subtitle_cut_by_punc | Boolean | True | No | Whether to split by punctuation.Note: Minimum version requirement: 0.0.10. |
rotation | Float | 0.0 | Yes | Rotation angle, value range [0.0,360.0], the opposite direction of the Y-axis of the canvas coordinate system is 0 degrees, the angle of the clockwise direction is the rotation angle, rotating with the center point of the image as the anchor.Note: Minimum version requirement: 0.0.14. |
auto_font_size | Boolean | True | No | If not filled in, the default is True, the subtitle calculates the final display font size according to the formula, with the same font size setting as the foreground text and title, the display effect is different; False, the subtitle uses the same font size rule as the foreground text and title. |
sub_to_canvas_width_ratio | Float | 1.0 | No | If not filled in, the default is 1.0. This field indicates the proportion of the subtitle's width to the canvas width, value range (0, 2], if the input parameter <=0 or >2, then the value is reset to 1.0 by default. If it cannot be displayed in a single line, it will wrap. |
backgrounds | Object Array | Content as below | No | Background |
type | Integer | 0 | Yes | 0: Image, supports jpg, png formats; 1: Video, supports mp4 format, frame rate requirement is above 25, no resolution requirement, videos of different resolutions are processed by scaling the short side to fill and scaling the video proportionally |
name | String | "Background" | Yes | Background name |
url | String | "https://xxx/bg.png" | Yes | Background file url, if no background image or video is set, Webm format shows black background; Mp4 format shows the background effect of the default frame |
rect | Int Array | [0,0,1080,1920] | Yes | 【Not supported yet】 Background starting position and size, referring to the 1080*1920 resolution canvas, the top left corner is (0,0), currently does not support customization, by default, the short side fills, and the long side scales proportionally |
cycle | Boolean | false | Yes | For videos, false: play once, true: loop |
start | Integer | 0 | Yes | Background start time, in ms |
duration | Integer | -1 | Yes | Background duration, in ms, -1 is the default value, indicating it exists throughout the video |
volume | Integer | 0 | No | Background video volume, the larger the value, the louder the sound, support range [0, 100], standard volume.Note: Minimum version requirement: 0.0.13. |
background-musics | Object Array | Content as below | No | Background music |
url | String | "https://xxx/bgm.mp3" | Yes | Background music url |
volume | Integer | 100 | Yes | Volume, the larger the value, the louder the sound, support range [0, 100], standard volume of 100 |
duration | Integer | -1 | No | Duration, in milliseconds. -1 is the default value, indicating that it will persist for the entire duration of the video. Once the duration time is reached, it will stop/disappear regardless of whether it is set to loop or not. |
start | Integer | 0 | No | Start time, in milliseconds. 0 is the default value, indicating that the background music starts playing from the 0th millisecond of the video. |
cycle | Boolean | True | No | false: play once, true: loop |
foregrounds | Object Array | Content as below | No | |
type | Integer | 0 | Yes | 0: Image, supports jpg, png formats; 1: Video, supports mp4 format |
name | String | "Foreground" | Yes | |
url | String | "https://xxx/fg.png" | Yes | Foreground file url, images support png or jpg, videos support mp4 format |
rect | Int Array | [0,0,1080,1920] | Yes | Starting position and size, referring to the 1080*1920 resolution canvas |
rotation | Float | 0.0 | Yes | Rotation angle, value range [0.0,360.0], the opposite direction of the Y-axis of the canvas coordinate system is 0 degrees, the angle of the clockwise direction is the rotation angle, rotating with the center point of the image as the anchor |
cycle | Boolean | False | No | For videos, false: play once, true: loop, after the foreground video plays once, if it has not reached the specified duration, the foreground video stays on the last frame |
z_position | Integer | 2 | Yes | Layer order, each zposition must not repeat, the larger the number, the more forward it is displayed.Note: Minimum version requirement: 0.0.6 (mandatory field). |
start | Integer | 0 | Yes | Foreground start time, in ms |
duration | Integer | -1 | Yes | Foreground duration, in ms, -1 is the default value, indicating it exists throughout the video |
volume | Integer | 0 | No | Foreground video volume, the larger the value, the louder the sound, support range [0, 100], standard volume.Note: Minimum version requirement: 0.0.13. |
foreground-texts | Object Array | As described below | No | Foreground Texts |
text | String | "Foreground Text" | Yes | Foreground text content |
scale | Float | 1.0 | Yes | Text scaling ratio, range 0~+∞, default is 1, original reference size is font_size. |
duration | Integer | -1 | No | Duration, in milliseconds. -1 is the default value, indicating that it will persist for the entire duration of the video. Once the duration time is reached, it will stop/disappear regardless of whether it is set to loop or not. |
start | Integer | 0 | No | Start time, in milliseconds. 0 is the default value, indicating that the foreground text starts playing from the 0th millisecond of the video. |
position | Object | As described below | Yes | Foreground text start position, with the top left corner of the 1080*1920 resolution canvas as the origin, right as the x direction, down as the y direction |
x | Integer | 0 | Yes | x-direction coordinate value |
y | Integer | 0 | Yes | y-direction coordinate value |
rgba | Int Array | [100,100,100,100] | Yes | Foreground text color, entered in rgba format, range 0~255 【alpha channel not supported】 |
font_size | Integer | 20 | Yes | Foreground text font size setting |
font_family | String | "Noto Sans S Chinese Black" | Yes | Font name, see json for supported fonts list |
stroke_width | Float | 2 | No | Stroke width, range 0~+∞, default is 0, indicating stroke width |
stroke_rgba | Int Array | [100,100,100,100] | No | Foreground text stroke color, entered in rgba format, range 0~255 【alpha channel not supported】 |
background_rgba | Int Array | [100,100,100,100] | Yes | Foreground text background (font bottom) color, range 0~255. Alpha channel of 0 means fully transparent. Note: Minimum version requirement: 0.0.10. |
opacity | Float | 0.5 | No | Foreground text layer transparency, range 0~1. 0 means fully transparent, 1 means opaque. Note: Minimum version requirement: 0.0.10. |
z_position | Integer | 2 | Yes | Layer order, each z_position must be unique, the larger the number, the more forward it is displayed. Note: Minimum version requirement: 0.0.8 (mandatory field). |
rotation | Float | 0.0 | Yes | Rotation angle, range [0.0,360.0], the counter-direction of the Y-axis of the canvas coordinate system is 0 degrees, the angle in the clockwise direction is the rotation angle, rotating around the center point of the image when rotating. Note: Minimum version requirement: 0.0.14. |
title | Object Array | As described below | No | Title text, its layer is above digital humans, background, and foreground text. Note: Minimum version requirement: 0.0.10. |
text | String | "Title Text" | Yes | Title text content |
scale | Float | 1.0 | Yes | Text scaling ratio, range 0~+∞, default is 1, original reference size is font_size. |
position | Object | As described below | Yes | Title text start position, with the top left corner of the 1080*1920 resolution canvas as the origin, right as the x direction, down as the y direction |
x | Integer | 0 | Yes | x-direction coordinate value |
y | Integer | 0 | Yes | y-direction coordinate value |
rgba | Int Array | [100,100,100,100] | Yes | Title text color, entered in rgba format, range 0~255 【alpha channel not supported】 |
font_size | Integer | 20 | Yes | Title text font size setting. Unit is px. |
font_family | String | "Noto Sans S Chinese Black" | Yes | Font name, see json for supported fonts list |
stroke_rgba | Int Array | [100,100,100,100] | No | Title text stroke color, entered in rgba format, range 0~255 【alpha channel not supported】 |
stroke_width | Float | 2 | Yes | Stroke width, range 0~+∞, default is 0, indicating stroke width |
background_rgba | Int Array | [100,100,100,100] | Yes | Title text background (font bottom) color, range 0~255. Alpha channel of 0 means fully transparent. 【alpha channel not supported】 |
opacity | Float | 0.5 | No | Title text layer transparency, range 0~1. 0 means fully transparent, 1 means opaque |
rotation | Float | 0.0 | Yes | Rotation angle, range [0.0,360.0], the counter-direction of the Y-axis of the canvas coordinate system is 0 degrees, the angle in the clockwise direction is the rotation angle, rotating around the center point of the image when rotating. Note: Minimum version requirement: 0.0.14. |
effects | Object | As described below | No | |
version | String | "1.0" | Yes | Effects engine version |
beautify | Object | As described below | No | Beautification |
whitenStrength | Float | 0.3 | No | [0,1.0] Whitening, default value 0.30, 0.0 means no whitening |
whiten_mode | Integer | 0 | No | Whitening mode: 0 (pinkish white), 1 (natural white), 2 (natural white only in skin areas) |
reddenStrength | Float | 0.36 | No | [0,1.0] Rosiness, default value 0.36, 0.0 means no rosiness |
smoothStrength | Float | 0.74 | No | [0,1.0] Smoothing, default value 0.74, 0.0 means no smoothing |
smooth_mode | Integer | 0 | No | Smoothing mode: 0 (face area smoothing), 1 (whole picture smoothing), 2 (detailed face area smoothing) |
shrinkRatio | Float | 0.11 | No | [0,1.0] Slim face, default value 0.11, 0.0 means no slim face effect |
enlargeRatio | Float | 0.13 | No | [0,1.0] Big eyes, default value 0.13, 0.0 means no big eyes effect |
smallRatio | Float | 0.10 | No | [0,1.0] Small face, default value 0.10, 0.0 means no small face effect |
narrowFace | Float | 0.0 | No | [0,1.0] Narrow face, default value 0.0, 0.0 means no narrow face |
roundEyesRatio | Float | 0.0 | No | [0,1.0] Round eyes, default value 0.0, 0.0 means no round eyes |
thinFaceShapeRatio | Float | 0.0 | No | [0,1.0] Thin face shape, default value 0.0, 0.0 means no thin face shape effect |
chinLength | Float | 0.0 | No | [-1, 1] Chin length, default value 0.0, [-1, 0] for short chin, [0, 1] for long chin |
hairlineHeightRatio | Float | 0.0 | No | [-1, 1] Hairline, default value 0.0, [-1, 0] for low hairline, [0, 1] for high hairline |
appleMusle | Float | 0.0 | No | [0, 1.0] Apple muscle, default value 0.0, 0.0 means no apple muscle |
narrowNoseRatio | Float | 0.0 | No | [0, 1.0] Slim nose, slim nostrils, default value 0.0, 0.0 means no slim nose |
noseLengthRatio | Float | 0.0 | No | [-1, 1] Nose length, default value 0.0, [-1, 0] for short nose, [0, 1] for long nose |
profileRhinoplasty | Float | 0.0 | No | [0, 1.0] Side profile nose lift, default value 0.0, 0.0 means no side profile nose lift effect |
mouthSize | Float | 0.0 | No | [-1, 1] Mouth size, default value 0.0, [-1, 0] for enlarging mouth, [0, 1] for shrinking mouth |
philtrumLengthRatio | Float | 0.0 | No | [-1, 1] Philtrum length, default value 0.0, [-1, 0] for long philtrum, [0, 1] for short philtrum |
eyeDistanceRatio | Float | 0.0 | No | [-1, 1] Adjusting eye distance, default value 0.0, [-1, 0] for decreasing eye distance, [0, 1] for increasing eye distance |
eyeAngleRatio | Float | 0.0 | No | [-1, 1] Eye angle, default value 0.0, [-1, 0] for counterclockwise rotation of the left eye, [0, 1] for clockwise rotation of the left eye, the right eye rotates oppositely |
openCanthus | Float | 0.0 | No | [0, 1.0] Open canthus, default value 0.0, 0.0 means no open canthus |
shrinkJawbone | Float | 0.0 | No | [0, 1.0] Slim jawbone ratio, default value 0.0, 0.0 means no slim cheekbone |
shrinkRoundFace | Float | 0.0 | No | [0, 1.0] Slim round face, default value 0.0, 0.0 means no slim face |
shrinkLongFace | Float | 0.0 | No | [0, 1.0] Slim long face, default value 0.0, 0.0 means no slim face |
shrinkGoddessFace | Float | 0.0 | No | [0, 1.0] Goddess slim face, default value 0.0, 0.0 means no slim face |
shrinkNaturalFace | Float | 0.0 | No | [0, 1.0] Natural slim face, default value 0.0, 0.0 means no slim face |
shrinkWholeHead | Float | 0.0 | No | [0, 1.0] Overall small head scaling, default value 0.0, 0.0 means no overall small head scaling effect |
contrastStrength | Float | 0.05 | No | [0,1.0] Contrast, default value 0.05, 0.0 means no contrast adjustment |
saturationStrength | Float | 0.1 | No | [0,1.0] Saturation, default value 0.10, 0.0 means no saturation adjustment |
sharpen | Float | 0.0 | No | [0, 1.0] Sharpening, default value 0.0, 0.0 means no sharpening |
clear | Float | 0.0 | No | [0, 1.0] Clarity strength, default value 0.0, 0.0 means no clarity |
bokehStrength | Float | 0.0 | No | [0, 1.0] Background blur strength, default value 0.0, 0.0 means no background blur |
eyeHeight | Float | 0.0 | No | [-1, 1] Eye position ratio, default value 0.0, [-1, 0] for moving eyes down, [0, 1] for moving eyes up |
mouthCorner | Float | 0.0 | No | [0, 1.0] Mouth corner lifting ratio, default value 0.0, 0.0 means no mouth corner adjustment |
hairline | Float | 0.0 | No | [-1, 1] New hairline height ratio, default value 0.0, [-1, 0] for low hairline, [0, 1] for high hairline |
packages | Object Array | As described below | No | Makeup parameters |
url | String | "https://xxx/res.zip" | Yes | Makeup resource URL, please contact customer service for makeup resource pack |
strength | Float | 0.3 | Yes | Makeup strength |
filter | Object | As described below | No | Filter parameters |
onlyFigure | Boolean | false | Yes | Whether the filter effect is only applied to the digital human, true for digital human only filter, false for global filter |
url | String | "https://xxx/res.zip" | Yes | Filter resource URL, please contact customer service for makeup resource pack |
strength | Float | 0.3 | Yes | Filter strength |
# List of Fonts Supported by JSON
Language | Font Name |
---|---|
Chinese | Noto Sans S Chinese Black |
Chinese | Noto Sans S Chinese Bold |
Chinese | Noto Sans S Chinese DemiLight |
Chinese | Noto Sans S Chinese Light |
Chinese | Noto Sans S Chinese Medium |
Chinese | Noto Sans S Chinese Regular |
Chinese | Noto Sans S Chinese Thin |
Chinese | 仓耳渔阳体 W03 |
Chinese | 站酷酷黑 |
Chinese | 站酷快乐体2016修订版 |
Chinese | 站酷庆科黄油体 |
Chinese | 站酷文艺体 |
Chinese | 站酷小薇LOGO体 |
Chinese | 得意黑 |
Chinese | 钉钉进步体 |
Chinese | 阿里妈妈东方大楷 |
Chinese | 阿里妈妈数黑体 |
Chinese | 字魂扁桃体 |
Chinese | 包图小白体 |
Chinese | 庞门正道粗书体 |
Chinese | 杨任东竹石体-Bold |
Chinese | 优设标题黑 |
Chinese | Gen Jyuu Gothic Normal |
Chinese | 字制区喜脉体 |
Chinese | 文道潮黑 |
Chinese | Alibaba-PuHuiTi-Bold |
Chinese | Alibaba-PuHuiTi-Heavy |
Chinese | Alibaba-PuHuiTi-Light |
Chinese | Alibaba-PuHuiTi-Medium |
Chinese | Alibaba-PuHuiTi-Regular |
Arabic | mastollehregular-2oaxk |
Korean | HANDotumLVT |
Korean | HANDotumLVT-bold |
Japanese | SourceHanSansJP-Bold |
Japanese | SourceHanSansJP-ExtraLight |
Japanese | SourceHanSansJP-Heavy |
Japanese | SourceHanSansJP-Light |
Japanese | SourceHanSansJP-Medium |
Japanese | SourceHanSansJP-Normal |
Japanese | SourceHanSansJP-Regular |
# JSON Example
{
"version": "0.0.13",
"video_format": "MP4",
"resolution": [1080, 1920],
"bit_rate": 8,
"frame_rate": 25,
"watermark": {
"show": true,
"content": "内部测试"
},
"digital_role": {
"id": 4051,
"face_feature_id": "0325_nina_s3_beauty",
"name": "Nina",
"url": "https://dwg-aigc-paas.oss-cn-hangzhou.aliyuncs.com/materials/77/0325_nina_s3_beauty_20230523213912566.zip",
"position": {
"x": 0,
"y": 0
},
"scale": 1.0,
"z_position": 1,
"rotation": 0.0
},
"tts_config": {
"id": "xiaoyue",
"name": "晓月",
"vendor_id": 3,
"language": "zh-CN",
"pitch_offset": 0.0,
"speed_ratio": 1,
"volume": 100
},
"tts_query": {
"content": "您好,尊贵的客户",
"ssml": false
},
"audios": [{
"url": "https://dhpoc.softsugar.com/adapter/static/9b158cc9-8e42-4d09-b928-49dd9941d922.mp3"
}, {
"url": "https://dhpoc.softsugar.com/adapter/static/9b158cc9-8e42-4d09-b928-49dd9941d922.mp3"
}],
"subtitle": {
"url": "https://aigc.blob.core.chinacloudapi.cn/audio/tts-srt/823v6j88s1k7aobpe7wmqm83q_de347214-96f2-4246-b283-17f40fe6abba.srt",
"position": {
"x": 100,
"y": 300
},
"rgba": [100, 200, 100, 100],
"font_size": 20,
"stroke_width": 5.0,
"stroke_rgba": [255, 0, 0, 0],
"opacity": 0.5,
"background_rgba": [0, 255, 0, 200],
"subtitle_max_len": 8,
"subtitle_cut_by_punc": "True",
"font_family": "Noto Sans S Chinese Black"
},
"backgrounds": [{
"type": 0,
"name": "背景",
"url": "https://dwg-aigc-paas.oss-cn-hangzhou.aliyuncs.com/test/background.png",
"rect": [0, 0, 1080, 1920],
"cycle": false,
"start": 0,
"duration": -1
}],
"background-musics": [{
"url": "https://dwg-aigc-paas.oss-cn-hangzhou.aliyuncs.com/mayahui/%E7%BE%A4%E6%98%9F%20-%20%E5%96%9C%E6%B4%8B%E6%B4%8B.mp3",
"volume": 100,
"cycle": false
}],
"foregrounds": [{
"type": 0,
"name": "前景",
"url": "http://dwg-aigc-paas.oss-cn-hangzhou.aliyuncs.com/test/frontgroud.png",
"rect": [0, 0, 1080, 1920],
"rotation": 0.0,
"z_position": 0,
"cycle": false,
"start": 0,
"duration": -1
}],
"foreground-texts": [{
"text": "前景",
"font_size": 20,
"font_family": "Noto Sans S Chinese Black",
"z_position": 10,
"stroke_width": 5.0,
"stroke_rgba": [255, 0, 0, 0],
"opacity": 0.5,
"position": {
"x": 0,
"y": 0
},
"background_rgba": [0, 255, 0, 200],
"rgba": [100, 200, 100, 100]
}],
"title": {
"text": "这是标题",
"rgba": [100, 255, 255, 255],
"position": {
"x": 540,
"y": 200
},
"font_size": 50,
"font_family": "Noto Sans S Chinese Black",
"stroke_width": 5.0,
"stroke_rgba": [255, 0, 0, 0],
"scale": 1.0,
"opacity": 0.5,
"background_rgba": [0, 255, 0, 200]
},
"effects": {
"version": "1.0",
"beautify": {
"whitenStrength": 0.30,
"whiten_mode": 0,
"reddenStrength": 0.36,
"smoothStrength": 0.74,
"smooth_mode": 0,
"shrinkRatio": 0.11,
"enlargeRatio": 0.13,
"smallRatio": 0.10,
"narrowFace": 0.0,
"roundEyesRatio": 0.0,
"thinFaceShapeRatio": 0.0,
"chinLength": 0.0,
"hairlineHeightRatio": 0.0,
"appleMusle": 0.0,
"narrowNoseRatio": 0.0,
"noseLengthRatio": 0.0,
"profileRhinoplasty": 0.0,
"mouthSize": 0.0,
"philtrumLengthRatio": 0.0,
"eyeDistanceRatio": 0.0,
"eyeAngleRatio": 0.0,
"openCanthus": 0.0,
"brightEyeStrength": 0.0,
"removeDarkCircleStrength": 0.0,
"removeNasolabialFoldsStrength": 0.0,
"whiteTeeth": 0.0,
"shrinkCheekbone": 0.0,
"thinnerHead": 0.0,
"openExternalCanthus": 0.0,
"shrinkJawbone": 0.0,
"shrinkRoundFace": 0.0,
"shrinkLongFace": 0.0,
"shrinkGoddessFace": 0.0,
"shrinkNaturalFace": 0.0,
"shrinkWholeHead": 0.0,
"contrastStrength": 0.05,
"saturationStrength": 0.10,
"sharpen": 0.0,
"clear": 0.0,
"eyeHeight": 0.0,
"mouthCorner": 0.05,
"hairline": 0.10,
"bokehStrength": 0.0
},
"packages": [{
"url": "https://xxx/xxx.zip",
"strength": 0.3
}, {
"url": "https://xxx/xxx.model",
"strength": 0.5
}],
"filter": {
"onlyFigure": false,
"url": "https://xxx/xxx.model",
"strength": 0.5
}
}
}
# Batch Video Synthesis Task Creation
# Interface Description
Based on the specified content uploaded by the user, call the algorithm capability for batch video synthesis, and finally return a list of mp4 video files for the user to download. The PaaS platform supports 7 days of online storage, and it is necessary to transfer the data in time, as the content generated after 7 days will not be downloadable.
# Request URL
POST
/api/2dvh/v1/material/video/batchCreate
# Request Header
Content-Type:
application/json
# Request Parameters
In the format of a JSON array, the fields of the objects in the array are defined as follows:
Field | Type | Required | Description |
---|---|---|---|
param | String | True | Video generation parameters (this parameter is a string escaped from JSON) |
videoRequestId | String | True | Video synthesis ID, needs to be unique |
videoName | String | True | Video name |
thumbnailUrl | String | False | Thumbnail URL |
# Request Example
[
{
"param": "video config",
"videoName": "name",
"videoRequestId": "aaa"
},
{
"param": "video config",
"videoName": "name",
"videoRequestId": "bbb"
}
]
# Response Elements
Field | Type | Required | Description |
---|---|---|---|
code | Integer | True | 0 - Success, Others - Exception |
message | String | True | Detailed information of exception |
data | Object | False | Data object, usually empty in case of exceptions |
- videoRequestId | String | True | Video synthesis ID, needs to be unique |
- taskId | Long | True | Task ID |
- description | String | True | Description of task dispatch result |
# Response Example
{
"code": 0,
"message": "success",
"data": [
{
"videoRequestId": "aaa",
"taskId": 26,
"description": "Queue waiting"
},
{
"videoRequestId": "bbb",
"taskId": 27,
"description": "Queue waiting"
}
]
}
# Creation of TTS Personal Voice Model Generation Task (QID)
# Interface Description
The TTS personal voice model generation (QID) service can produce a digital human TTS voice model that matches the pronunciation effect of the voice material provider, based on the real human-collected or recorded voice material files and the voice cloning consent files uploaded by the user, through algorithm training. To ensure training effects, please adhere to the SenseTime digital human voice cloning collection and production standards during collection, which include requirements for environment, equipment, pronunciation, authorization, and reading scripts. For details, refer to: Collection Standards (opens new window). The PaaS platform supports 7 days of online storage, and it is necessary to transfer the data in time, as the content generated after 7 days will not be downloadable.
# Request URL
POST
/api/2dvh/v1/material/voice/clone/qid/create
# Request Header
Content-Type:
application/json
# Request Parameters
Field | Type | Required | Description |
---|---|---|---|
audioUrl | String | True | URL of the training audio file. Supported formats: wav, mp3, m4a, mp4, mov, aac |
audioLanguage | String | True | The primary language used in the audio file. zh-CN for Mandarin Chinese, en-US for American English. Following BCP 47 standard |
consent | Object | True | User consent information |
- audioUrl | String | True | URL of the consent audio file. The consent file should be recorded in the same environment and in the same language as the audio file. Chinese:”我(发音人姓名)确认我的声音将会被(公司名称)使用于创建合成版本语音。”。 English: "I [state your first and last name] am aware that recordings of my voice will be used by [state the name of the company] to create and use a synthetic version of my voice." Japanese: "私(姓名を記入)は自身の音声を(会社名を記入)が使用し、合成音声を作り使用されることに同意します。" Korean: "나는 [본인의 이름을 말씀하세요] 내 목소리의 녹음을 이용해 합성 버전을 만들어 사용된다는 것을 [회사 이름을 말씀하세요]알고 있습니다." Supported formats: wav, mp3, m4a, mp4, mov, aac |
- speakerName | String | True | The name of the speaker in the consent audio file, must be consistent with the name of the speaker in the audio file. Length limit is no more than 64 characters |
- companyName | String | True | The company name used in the consent file, must be consistent with the company name in the audio file. Length limit is no more than 64 characters |
taskType | String | True | Training algorithm type. TTS3, TTS6, TTS7, TTS8, TTS101. TTS3 is the default. For more requirements, please consult technical support |
voice | Object | True | Information about the speaker |
- name | String | True | Name of the speaker. Length limit is no more than 64 characters |
- gender | Integer | True | Gender of the speaker (1: Male, 2 : Female) |
musicSep | Boolean | False | Whether to remove background music from the audio (source separation) |
trainMode | String | False | Training mode, only effective for TTS3. common: regular training mode, default is common mode; backend_only: fast training mode, significantly compresses the model training duration, but will affect the outcome |
# Request Example
{
"audioUrl": "http://oss.com/abc/object.mp3",
"audioLanguage": "zh-CN",
"consent": {
"audioUrl":"http://oss.com/abc/xx.mp3",
"speakerName": "xiaowang",
"companyName": "XXXX"
},
"taskType": "TTS3",
"voice": {
"name": "xiaotang0",
"gender": 2
},
"musicSep": false,
"trainMode": "common"
}
# Response Elements
Field | Type | Required | Description |
---|---|---|---|
code | Integer | True | 0 - Success, Others - Exception |
message | String | True | Detailed information of exception |
data | Object | False | Task ID |
# Response Example
{
"code": 0,
"message": "success",
"data": 11890
}
# TTS Voice Training Audio Time Requirements
Training Algorithm Type | Time Requirements |
---|---|
TTS3 | At least 5 minutes, better if more than 20 minutes |
TTS6 | 30-90 seconds |
TTS7 | 30-300 seconds |
TTS8 | 30-300 seconds |
TTS101 | At least 5 minutes, better if more than 20 minutes |
# TTS Language Standards (BCP 47 Standard)
Code | Language (Region) |
---|---|
en-US | English (United States) |
zh-CN | Chinese (China) |
af-ZA | Afrikaans (South Africa) |
am-ET | Amharic (Ethiopia) |
ar-EG | Arabic (Egypt) |
ar-SA | Arabic (Saudi Arabia) |
az-AZ | Azerbaijani (Azerbaijan) |
bg-BG | Bulgarian (Bulgaria) |
bn-BD | Bengali (Bangladesh) |
bn-IN | Bengali (India) |
bs-BA | Bosnian (Bosnia and Herzegovina) |
ca-ES | Catalan (Spain) |
cs-CZ | Czech (Czech Republic) |
cy-GB | Welsh (United Kingdom) |
da-DK | Danish (Denmark) |
de-AT | German (Austria) |
de-CH | German (Switzerland) |
de-DE | German (Germany) |
el-GR | Greek (Greece) |
en-AU | English (Australia) |
en-CA | English (Canada) |
en-GB | English (United Kingdom) |
en-IE | English (Ireland) |
en-IN | English (India) |
es-ES | Spanish (Spain) |
es-MX | Spanish (Mexico) |
et-EE | Estonian (Estonia) |
eu-ES | Basque (Spain) |
fa-IR | Persian (Iran) |
fi-FI | Finnish (Finland) |
fil-PH | Filipino (Philippines) |
fr-BE | French (Belgium) |
fr-CA | French (Canada) |
fr-CH | French (Switzerland) |
fr-FR | French (France) |
ga-IE | Irish (Ireland) |
gl-ES | Galician (Spain) |
he-IL | Hebrew (Israel) |
hi-IN | Hindi (India) |
hr-HR | Croatian (Croatia) |
hu-HU | Hungarian (Hungary) |
hy-AM | Armenian (Armenia) |
id-ID | Indonesian (Indonesia) |
is-IS | Icelandic (Iceland) |
it-IT | Italian (Italy) |
ja-JP | Japanese (Japan) |
jv-ID | Javanese (Indonesia) |
ka-GE | Georgian (Georgia) |
kk-KZ | Kazakh (Kazakhstan) |
km-KH | Khmer (Cambodia) |
kn-IN | Kannada (India) |
ko-KR | Korean (South Korea) |
lo-LA | Lao (Laos) |
lt-LT | Lithuanian (Lithuania) |
lv-LV | Latvian (Latvia) |
mk-MK | Macedonian (North Macedonia) |
ml-IN | Malayalam (India) |
mn-MN | Mongolian (Mongolia) |
ms-MY | Malay (Malaysia) |
mt-MT | Maltese (Malta) |
my-MM | Burmese (Myanmar) |
nb-NO | Norwegian Bokmål (Norway) |
ne-NP | Nepali (Nepal) |
nl-BE | Dutch (Belgium) |
nl-NL | Dutch (Netherlands) |
pl-PL | Polish (Poland) |
ps-AF | Pashto (Afghanistan) |
pt-BR | Portuguese (Brazil) |
pt-PT | Portuguese (Portugal) |
ro-RO | Romanian (Romania) |
ru-RU | Russian (Russia) |
si-LK | Sinhala (Sri Lanka) |
sk-SK | Slovak (Slovakia) |
sl-SI | Slovenian (Slovenia) |
so-SO | Somali (Somalia) |
sq-AL | Albanian (Albania) |
sr-RS | Serbian (Serbia) |
su-ID | Sundanese (Indonesia) |
sv-SE | Swedish (Sweden) |
sw-KE | Swahili (Kenya) |
ta-IN | Tamil (India) |
te-IN | Telugu (India) |
th-TH | Thai (Thailand) |
tr-TR | Turkish (Turkey) |
uk-UA | Ukrainian (Ukraine) |
ur-PK | Urdu (Pakistan) |
uz-UZ | Uzbek (Uzbekistan) |
vi-VN | Vietnamese (Vietnam) |
zh-HK | Chinese (Hong Kong) |
zh-TW | Chinese (Taiwan) |
zu-ZA | Zulu (South Africa) |
# Create TTS Personal Voice Model Generation Task (Old Interface, Not Recommended)
# Interface Description
The TTS personal voice model generation service can produce a digital human TTS voice model that matches the pronunciation effect of the voice material provider according to the real human voice material files uploaded by the user through algorithm training. To ensure the training effect, the training audio duration must not be less than 5 minutes. Please follow the SenseTime digital human voice copy collection production specification during collection, including environmental requirements, equipment requirements, pronunciation requirements, authorization requirements, and reading scripts. For details, refer to: Collection Specification (opens new window), The PaaS platform supports 7 days of online storage, which needs to be transferred in time, and the generated content will not be downloadable after 7 days.
# Request URL
POST
/api/2dvh/v1/material/voice/clone/create
# Request Header
Content-Type:
application/json
# Request Parameters
Field | Type | Required | Description |
---|---|---|---|
url | String | True | Training audio file URL, duration not less than 5 minutes |
voice | Object | True | Voice parameters |
- name | String | True | Name of the speaker |
- gender | Integer | True | Gender of the speaker (1: Male, 2: Female) |
- language | String | True | Language of the speaker (currently only supports zh-CN: Mandarin Chinese) |
musicSep | Boolean | False | Whether to remove background music from the audio |
sampleAudioMsg | String | False | Sample audio content text. By default, no sample audio is generated. Not more than 500 characters. |
trainMode | String | False | Training mode, common: regular training mode, default is common mode; backend_only: fast training mode, significantly compresses model training time, but will affect the result. |
# Request Example
{
"url": "http://oss.com/abc/object.zip",
"voice": {
"name": "xiaotang0",
"gender": 2,
"language": "zh-CN"
},
"sampleAudioMsg": "I am SenseTime Digital Human!",
"musicSep": true,
"trainMode": "common"
}
# Response Elements
Field | Type | Required | Description |
---|---|---|---|
code | Integer | True | 0 - Success, Others - Error |
message | String | True | Error detailed information |
data | Object | False | Task id |
# Response Example
{
"code": 0,
"message": "success",
"data": 11890
}
# Create Character Image Model Generation Task
# Interface Description
Generate character image models based on one or more videos uploaded by the user and specified content by calling algorithm capabilities. It supports producing one or multiple model files in one training session. The algorithm finally returns a compressed package and thumbnail files of the character image model for the user to download. Please refer to Collection Specification (opens new window) for uploaded video content. The PaaS platform supports 7 days of online storage, which needs to be transferred in time, and the generated content will not be downloadable after 7 days. If the character model generation result is not satisfactory, please refer to the document's case solution for training parameter adjustment.
Supports ordinary digital human training and premium digital human training.
# Request Address
POST
/api/2dvh/v1/material/2davatar/model/multi/create
# Request Header
Content-Type:
application/json
# Request Parameters
Field | Type | Required | Description |
---|---|---|---|
materialName | String | True | Name of the character model material, only one name is supported per training task |
videoUrl | String | True | Download address for the base base video material, the base video duration must be over 6 minutes |
param | String | True | Correct param information must be passed in to create multiple video character model generation tasks, including various parameters (this parameter is a string escaped from json), please refer to the parameter description and json example below |
# Parameter Description for param
Field | Type | Required | Description |
---|---|---|---|
personal | String | True | Basic video parameters, can be overridden by auxiliary videos. If auxiliary video is not provided, the basic video parameters will be used for processing. |
- segmentStyle | Integer | True | Background segmentation method: 0: no segmentation, 1: green screen segmentation, 2: general segmentation, 3: SDK green screen segmentation post-processing (GPU post-processing during video synthesis) |
- removeGreenEdge | Boolean | False | Effective when segmentStyle=2, default is false, removes green edges around the character |
- greenParamsRefinethHBgr | Integer | False | Effective when segmentStyle=1 or 3, default is 160, range 0-255; refine alpha high threshold (for red-green-blue background), used to adjust the degree of background retention, the larger the value, the more the background is retained |
- greenParamsRefinethLBgr | Integer | False | Effective when segmentStyle=1 or 3, default is 40, range 0-255; refine alpha low threshold (for red-green-blue background), used to adjust the edge width of the human/body, the larger the value, the more is retained |
- greenParamsBlurKs | Integer | False | Effective when segmentStyle=1 or 3, default is 3, smoothness; blur coefficient for de-noising, greater or equal to 0, the larger the smoothness, the smoother, affects the edges. If black edges appear, increase this value, if internal erosion occurs, decrease this value |
- greenParamsColorbalance | Integer | False | Effective when segmentStyle=1 or 3, default is 100, green degree, range 0-100, the larger the value, the higher the green degree |
- greenParamsSpillByalpha | Double | False | Effective when segmentStyle=1 or 3, default is 0.5, color balance for removing green, range [-1.0 ~ 1.0], 0 ~ 1 reduces color cast, -1 ~ 0 enhances color, less than 0.5 may cause yellow color cast, greater than 0.5 may cause cyan-blue color cast. If using blue screen segmentation, the default value should be changed to 0.0 |
- greenParamsSamplePointBgr | int[] | False | Effective when segmentStyle=1 or 3, sample colors, composed of three values, each with a range of 0-255, e.g., [0, 255, 0]. If using blue screen segmentation, the default value should be changed to [255, 0, 0] |
- assetStart | Float | False | Video material clipping start time (seconds) (not valid for premium digital human) |
- assetEnd | Float | False | Video material clipping end time (seconds) (not valid for premium digital human) |
- assetScale | Float | False | Video material scale ratio (default 1.0) |
- actionChange | Object | False | Parameters related to premium digital human. This set of parameters is valid when support=true, indicating that the trained digital human type is a premium digital human. actionChange part is mutually exclusive with actionEdit part, please avoid setting both sets of parameters to true at the same time. |
- - support | Boolean | True | Whether premium digital human is supported, true for premium digital human. |
- - staticRangeStart | Float | True | Static material start time (seconds) (only supports premium digital human) |
- - staticRangeEnd | Float | True | Static material end time (seconds) (only supports premium digital human) |
- - dynamicRangeStart | Float | True | Dynamic material start time (seconds) (only supports premium digital human) |
- - dynamicRangeEnd | Float | True | Dynamic material end time (seconds) (only supports premium digital human) |
- - gap | Integer | False | Maximum interval frame number for clipping points (default 75) |
- actionEdit | Object | False | Parameters related to action-editing digital human. This set of parameters is valid when support=true, indicating that the trained digital human type is an action-editing digital human. actionChange part is mutually exclusive with actionEdit part, please avoid setting both sets of parameters to true at the same time. |
- - support | Boolean | True | Whether action-editing is supported, true for support. |
- - videoPath | String | True | Dynamic material file address |
- - gap | Integer | False | Maximum interval frame number for clipping points (default 25) |
- - actionList | Array | True | Action list |
- - - name | String | True | Action name |
- - - clipRangeStart | Float | True | Start time (seconds) |
- - - clipRangeEnd | Float | True | End time (seconds) |
- - - description | String | False | Text description of the action |
persistent | Object | False | Global parameters of the model. Cannot be overridden by auxiliary video parameters. |
- avatarType | Integer | False | Number type, default is 0. (0: Numeric person, 1: Dynamic numeric person, 2: Action editing numeric person, 3: Fast numeric person) |
- videoCrfQuality | Integer | False | Video coding quality parameter crf, the smaller the parameter, the better the quality but the larger the file size, default is 23, allowing a range of 0-51, recommended 14-28 |
- stage1Config | Array | False | Character model mouth shape training configuration, default is 0 indicating original mouth shape model; 1 indicating universal mouth shape model, users can manually switch between the two mouth shape models based on actual effects |
- dev | Object | False | Video material model training configuration |
- - stage2 | Object | False | Video material model training configuration |
- - - config | Integer | False | Video material model training configuration, model size, default is 0 indicating 2k resolution model; 1 indicating 4k resolution model |
override | Array | False | Auxiliary video information. (not valid for premium digital human parameter group and action-editing digital human parameter group) |
- videoUrl | String | True | Auxiliary video URL. If auxiliary video is not configured, parameters in personal will be used |
- segmentStyle | Integer | False | Background segmentation method: 0: no segmentation, 1: green screen segmentation, 2: general segmentation, 3: SDK green screen segmentation post-processing (GPU post-processing during video synthesis) |
- removeGreenEdge | Boolean | False | Effective when segmentStyle=2, default is false, removes green edges around the character |
- greenParamsRefinethHBgr | Integer | False | Effective when segmentStyle=1 or 3, default is 160, range 0-255; refine alpha high threshold (for red-green-blue background), used to adjust the degree of background retention, the larger the value, the more the background is retained |
- greenParamsRefinethLBgr | Integer | False | Effective when segmentStyle=1 or 3, default is 40, range 0-255; refine alpha low threshold (for red-green-blue background), used to adjust the edge width of the human/body, the larger the value, the more is retained |
- greenParamsBlurKs | Integer | False | Effective when segmentStyle=1 or 3, default is 3, smoothness; blur coefficient for de-noising, greater or equal to 0, the larger the smoothness, the smoother, affects the edges. If black edges appear, increase this value, if internal erosion occurs, decrease this value |
- greenParamsColorbalance | Integer | False | Effective when segmentStyle=1 or 3, default is 100, green degree, range 0-100, the larger the value, the higher the green degree |
- greenParamsSpillByalpha | Double | False | Effective when segmentStyle=1 or 3, default is 0.5, color balance for removing green, range [-1.0 ~ 1.0], 0 ~ 1 reduces color cast, -1 ~ 0 enhances color, less than 0.5 may cause yellow color cast, greater than 0.5 may cause cyan-blue color cast. If using blue screen segmentation, the default value should be changed to 0.0 |
- greenParamsSamplePointBgr | int[] | False | Effective when segmentStyle=1 or 3, sample colors, composed of three values, each with a range of 0-255, e.g., [0, 255, 0]. If using blue screen segmentation, the default value should be changed to [255, 0, 0] |
- assetStart | Float | False | Video material clipping start time (seconds) |
- assetEnd | Float | False | Video material clipping end time (seconds) |
- assetScale | Float | False | Video material scale ratio (default 1.0) |
# Request Example
{
"materialName": "534",
"videoUrl": "https://xxx/materials/33/demo_20230228104258028_20230720185601860.mp4",
"param": "{\"personal\":{\"segmentStyle\":1,\"removeGreenEdge\":false,\"greenParamsRefinethHBgr\":180,\"greenParamsRefinethLBgr\":50,\"greenParamsBlurKs\":3,\"greenParamsColorbalance\":90,\"greenParamsSpillByalpha\":0.4,\"greenParamsSamplePointBgr\":[0,255,0],\"assetStart\":0.1,\"assetEnd\":0.6,\"assetScale\":1},\"persistent\":{\"videoCrfQuality\":23,\"stage1Config\":[0,1],\"dev\":{\"stage2\":{\"config\":1}}},\"override\":[{\"videoUrl\":\"https://aigc-video-saas.oss-cn-hangzhou.aliyuncs.com/AIGC/online/vendor/24/customization/1700120490581/package_1700120490581.mp4\",\"segmentStyle\":1,\"removeGreenEdge\":false,\"greenParamsRefinethHBgr\":180,\"greenParamsRefinethLBgr\":50,\"greenParamsBlurKs\":3,\"greenParamsColorbalance\":90,\"greenParamsSpillByalpha\":0.4,\"greenParamsSamplePointBgr\":[0,255,0],\"assetStart\":0.1,\"assetEnd\":0.6,\"assetScale\":1},{\"videoUrl\":\"https://dwg-aigc-paas.oss-cn-hangzhou.aliyuncs.com/demo.mp4\",\"segmentStyle\":1,\"removeGreenEdge\":false,\"greenParamsRefinethHBgr\":180,\"greenParamsRefinethLBgr\":50,\"greenParamsBlurKs\":3,\"greenParamsColorbalance\":90,\"greenParamsSpillByalpha\":0.4,\"greenParamsSamplePointBgr\":[0,255,0],\"assetStart\":0.1,\"assetEnd\":0.6,\"assetScale\":1}]}"
}
# Response Elements
Field | Type | Required | Description |
---|---|---|---|
code | Integer | True | 0 - Success, Others - Error |
message | String | True | Detailed error information |
data | Object | False | Task id |
# Response Example
{
"code": 0,
"message": "success",
"data": 1
}
# Create Character Image Model Generation Task (Old Interface)
# Interface Description
Note: This interface only supports ordinary digital human model generation tasks, and this interface will not be updated with new content. It is recommended to use the create character image model generation interface.
Generate character image models based on user-uploaded specified content by calling algorithm capabilities, and finally return a compressed package and thumbnail file of the character image model for users to download. Please refer to the Collection Standard (opens new window) for uploaded content. The PaaS platform supports 7-day online storage, which needs to be transferred in time. The generated content will not be downloadable after 7 days.
# Request URL
POST
/api/2dvh/v1/material/2davatar/model/create
# Request Header
Content-Type:
application/json
# Request Parameters
Field | Type | Required | Description |
---|---|---|---|
materialName | String | True | Name of the character model material |
videoUrl | String | True | Video material download address |
segmentStyle | Integer | True | Background segmentation method: 0: No segmentation, 1: Green screen segmentation, 2: Normal segmentation, 3: Green screen segmentation post-processing with SDK (GPU post-processing during video synthesis) |
removeGreenEdge | Boolean | False | Effective when segmentStyle=2, default is false, function to remove green edges around the character |
greenParamsRefinethHBgr | Integer | False | Effective when segmentStyle=1 or 3, default 160, range 0-255; refine alpha high threshold (for red/green/blue background), used to adjust the degree of background retention, the larger the value, the greater the degree of background retention |
greenParamsRefinethLBgr | Integer | False | Effective when segmentStyle=1 or 3, default 40, range 0-255; refine alpha low threshold (for red/green/blue background), used to adjust the width of the edge retention of the body/object, the larger the value, the more retention |
greenParamsBlurKs | Integer | False | Effective when segmentStyle=1 or 3, default 3, smoothness; blur coefficient for noise reduction, greater than or equal to 0, the greater the smoothness, the smoother it is, affecting edges, if black edges or color aberration appear on the edges, this value can be increased, if internal erosion appears on the edges, this value can be appropriately reduced |
greenParamsColorbalance | Integer | False | Effective when segmentStyle=1 or 3, default 100, degree of green removal, range 0-100, the larger the value, the higher the degree of green removal |
greenParamsSpillByalpha | Double | False | Effective when segmentStyle=1 and 3, default 0.5, green color balance adjustment, range [-1.0 ~ 1.0], 0 ~ 1 reduces color bias, -1 ~ 0 enhances color, less than 0.5 yellow will be biased, more than 0.5 cyan/blue will be biased, if using blue screen segmentation, the default value needs to be changed to 0.0 |
greenParamsSamplePointBgr | int[] | False | Effective when segmentStyle=1 or 3, sampling color, consisting of three values, each ranging from 0-255, for example, [0, 255, 0], if using blue screen segmentation, the default value needs to be changed to [255, 0, 0] |
videoCrfQuality | Integer | False | Video encoding quality parameter crf, the smaller the parameter, the better the quality but the larger the file, default 23, allowable range 0-51, recommended 14-28 |
assetStart | Float | False | Start time for trimming video material (seconds) |
assetEnd | Float | False | End time for trimming video material (seconds) |
assetScale | Float | False | Video material scaling ratio (default 1.0) |
devStage2Config | Integer | False | Video material model training configuration, model size, default is 0, indicating 2k precision model; 1 indicates 4k precision model |
stage1Template | Integer | False | Character model lip shape training configuration, default is 0 indicating the generation of the original lip shape model; 1 indicates the generation of the universal lip shape model, users can choose to manually switch between the two lip shape models according to the actual effect |
# Request Example
Example when segmentStyle=0
{
"materialName": "534",
"segmentStyle": 1,
"assetScale": 1,
"videoCrfQuality": 21,
"stage1Template": 0,
"devStage2Config": 0,
"videoUrl": "https://xxx/materials/33/demo_20230228104258028_20230720185601860.mp4"
}
Example when segmentStyle=1
{
"materialName": "534",
"segmentStyle": 1,
"assetScale": 1,
"devStage2Config": 0,
"greenParamsRefinethHBgr": 167,
"greenParamsRefinethLBgr": 17,
"greenParamsBlurKs": 7,
"greenParamsColorbalance": 97,
"greenParamsSpillByalpha": 0.3,
"greenParamsSamplePointBgr": [
7,
275,
7
],
"videoCrfQuality": 21,
"stage1Template": 0,
"videoUrl": "https://xxx/materials/33/demo_20230228104258028_20230720185601860.mp4"
}
Example when segmentStyle=2, only greenParamsSpillByalpha can change values, other parameters are entered by default
{
"materialName": "534",
"segmentStyle": 2,
"devStage2Config": 0,
"stage1Template": 0,
"removeGreenEdge" : true,
"assetScale": 1,
"greenParamsSpillByalpha": 0.3,
"videoCrfQuality": 21,
"videoUrl": "https://xxx/materials/33/demo_20230228104258028_20230720185601860.mp4"
}
Example when segmentStyle=3
{
"materialName": "534",
"segmentStyle": 3,
"assetScale": 1,
"devStage2Config": 0,
"stage1Template": 0,
"greenParamsRefinethHBgr": 167,
"greenParamsRefinethLBgr": 17,
"greenParamsBlurKs": 7,
"greenParamsColorbalance": 97,
"greenParamsSpillByalpha": 0.3,
"greenParamsSamplePointBgr": [
7,
275,
7
],
"videoCrfQuality": 21,
"videoUrl": "https://xxx/materials/33/demo_20230228104258028_20230720185601860.mp4"
}
# Parameter Explanation
In general, default parameters can adapt to most scenarios. However, if scenario-specific issues arise, parameters need to be adjusted accordingly. Below are parameter suggestions for typical scenarios.
- General Scenario Parameters (default)
This scenario adapts to most parameters, i.e., the default values provided above.
- Adjusting parameters for unclear digital human images
Method 1: Lower the video encoding quality parameter (videoCrfQuality). Setting the value to 14 aligns the digital human material clarity with the original material. This method may slightly increase the material size.
Method 2: Add a sharpening value appropriately in the input request for video composition or live creation. Refer to the sharpen
value under the beautify
object in the JSON definition for more details.
Method 3: Choose 4k version digital human training.
- Adjusting parameters when black edges and slight green hue appear around the character (frequent occurrence, particularly in scenes with white clothing)
Refer to the following parameters for updating the character model (rebuilding), while lowering the background retention degree and character edge retention width. This method mainly adapts to green screen segmentation scenarios. Reference values:
{
"materialName": "534",
"segmentStyle": 1,
"removeGreenEdge": false,
"assetScale": 1,
"devStage2Config": 0,
"stage1Template": 0,
"greenParamsRefinethHBgr": 90,
"greenParamsRefinethLBgr": 10,
"greenParamsBlurKs": 3,
"greenParamsColorbalance": 100,
"greenParamsSpillByalpha": -0.3,
"greenParamsSamplePointBgr": [
0,
275,
0
],
"videoCrfQuality": 21,
"videoUrl": "https://xxx/materials/33/demo_20230228104258028_20230720185601860.mp4"
}
- Adjusting parameters for green edges or overall green hue around the character
Decrease the green balance removal parameter; the smaller the parameter, the higher the degree of green removal, which may also lead to color distortion. For example, lemon yellow may turn into orange by removing green elements. A minimum setting of -0.3 is recommended. This method enhances color and is suitable when there is no yellow in the image. It supports both green screen segmentation and normal segmentation. Reference values:
{
"materialName": "534",
"segmentStyle": 2,
"removeGreenEdge": true,
"devStage2Config": 0,
"stage1Template": 0,
"greenParamsSpillByalpha": -0.3,
"videoCrfQuality": 21,
"videoUrl": "https://xxx/materials/33/demo_20230228104258028_20230720185601860.mp4"
}
- Adjusting parameters when large cheek movements result in gray edges on the cheek or neck
This occurs because the initial material segmentation results do not match the driven digital human cheek edge. Choose green screen segmentation post-processing (segmentStyle=3) for training. This method mainly adapts to green screen segmentation processed digital humans.
# Response Elements
Field | Type | Required | Description |
---|---|---|---|
code | Integer | True | 0 - Success, Others - Exception |
message | String | True | Exception detailed information |
data | Object | False | Task id |
# Response Example
{
"code": 0,
"message": "success",
"data": 1
}
# Create Character Image Model Update Task
# Interface Description
The motion clips displayed by the 2D digital human are extracted from the training video, by default, from the first second to 3 and a half minutes in length. If you are not satisfied with the motion clips of the 2D digital human, you can use this interface to modify the motion clips, adjusting the length and content of the displayed motions. It is recommended to use the same background segmentation method as the original model file generation when using the character image model update function to avoid abnormal effects caused by changing the segmentation method. The PaaS platform supports 7-day online storage, which needs to be transferred in time. The generated content will not be downloadable after 7 days.
# Request Address
POST
/api/2dvh/v1/material/2davatar/model/rebuilding/video
# Request Header
Content-Type:
application/json
# Request Parameters
Field | Type | Required | Description |
---|---|---|---|
materialName | String | True | Name of the character model material |
videoUrl | String | True | Video material download address |
modelUrl | String | True | Download address of the original model file generated |
segmentStyle | Integer | True | Background segmentation method: 0: No segmentation, 1: Green screen segmentation, 2: Normal segmentation, 3: Green screen segmentation post-processing with SDK (GPU post-processing during video synthesis) |
removeGreenEdge | Boolean | False | Effective when segmentStyle=2, default is false, function to remove green edges around the character |
greenParamsRefinethHBgr | Integer | False | Effective when segmentStyle=1 or 3, default 160, range 0-255; refine alpha high threshold (for red/green/blue background), used to adjust the degree of background retention, the larger the value, the greater the degree of background retention |
greenParamsRefinethLBgr | Integer | False | Effective when segmentStyle=1 or 3, default 40, range 0-255; refine alpha low threshold (for red/green/blue background), used to adjust the width of the edge retention of the body/object, the larger the value, the more retention |
greenParamsBlurKs | Integer | False | Effective when segmentStyle=1 or 3, default 3, smoothness; blur coefficient for noise reduction, greater than or equal to 0, the greater the smoothness, the smoother it is, affecting edges, if black edges or color aberration appear on the edges, this value can be increased, if internal erosion appears on the edges, this value can be appropriately reduced |
greenParamsColorbalance | Integer | False | Effective when segmentStyle=1 or 3, default 100, degree of green removal, range 0-100, the larger the value, the higher the degree of green removal |
greenParamsSpillByalpha | Double | False | Effective when segmentStyle=1 and 3, default 0.5, green color balance adjustment, range [-1.0 ~ 1.0], 0 ~ 1 reduces color bias, -1 ~ 0 enhances color, less than 0.5 yellow will be biased, more than 0.5 cyan/blue will be biased, if using blue screen segmentation, the default value needs to be changed to 0.0 |
greenParamsSamplePointBgr | int[] | False | Effective when segmentStyle=1 or 3, sampling color, consisting of three values, each ranging from 0-255, for example, [0, 255, 0], if using blue screen segmentation, the default value needs to be changed to [255, 0, 0] |
videoCrfQuality | Integer | False | Video encoding quality parameter crf, the smaller the parameter, the better the quality but the larger the file, default 23, allowable range 0-51, recommended 14-28 |
assetStart | Float | False | Start time for trimming video material (seconds) |
assetEnd | Float | False | End time for trimming video material (seconds) |
assetScale | Float | False | Video material scaling ratio (default 1.0) |
actionChange | Object | False | Switch between dynamic and static material parameters |
- support | Boolean | True | Whether to support material action switching |
- staticRangeStart | Float | True | Start time of static material (seconds) |
- staticRangeEnd | Float | True | End time of static material (seconds) |
- dynamicRangeStart | Float | True | Start time of dynamic material (seconds) |
- dynamicRangeEnd | Float | True | End time of dynamic material (seconds) |
- gap | Integer | False | Maximum frame gap for cut-out points (default 75) |
actionEdit | Object | False | Parameters related to action lists, effective when support=true |
- support | Boolean | True | Whether to support action editing, true to support. |
- videoPath | String | True | URL of the dynamic material file |
- gap | Integer | False | Maximum interval of frames between cut points (default 25) |
- actionList | Array | True | List of actions |
- - name | String | True | Name of the action |
- - clipRangeStart | Float | True | Start time (in seconds) |
- - clipRangeEnd | Float | True | End time (in seconds) |
- - description | String | False | Text description of the action |
# Request Example
{
"materialName": "2d任务A",
"videoUrl": "https://xxx.oss-cn-hangzhou.aliyuncs.com/xxx/audio1.mp4",
"modelUrl": "https://xxx.oss-cn-hangzhou.aliyuncs.com/xxx/model1.zip",
"assetStart": 0.0,
"assetEnd": 120.0,
"assetScale": 1.0,
"segmentStyle": 1,
"devStage2Config": 0,
"stage1Template": 0,
"greenParamsRefinethHBgr": 167,
"greenParamsRefinethLBgr": 17,
"greenParamsBlurKs": 7,
"segmentGreenUseGpu":false,
"greenParamsColorbalance": 97,
"greenParamsSpillByalpha": 0.3,
"greenParamsSamplePointBgr": [
7,
275,
7
],
"videoCrfQuality": 21
}
# Parameter Description
In most cases, the default parameters can handle most scenarios. However, if there are issues due to different scene performances, the parameters need to be adjusted accordingly. Below are some recommended parameters for typical scenarios.
- General Scene Parameters (Default)
This scenario is for most cases, using the default values provided above.
- Blurred Digital Human Screen Adjustment Parameters
Method 1: Lower the video encoding quality parameter (videoCrfQuality). When set to 14, the clarity of the digital human matches the original human material. This method may slightly increase the material size.
Method 2: Increase the sharp value in the input request of the synthesized video or live broadcast creation. Refer to the beautify object's sharpen value in the JSON definition.
Method 3: Choose 4K version to train the digital human, although resolution cannot be modified after the update.
- Black Edge with Slight Green Around Person Edges Adjustment Parameters (Common, especially in scenes with white clothes)
Refer to the parameters below for updating the person model (rebuilding). Also, lower the background retention and edge retention width. This method mainly fits green screen segmentation. Suggested values below:
{
"materialName": "534",
"segmentStyle": 1,
"removeGreenEdge": false,
"assetScale": 1,
"greenParamsRefinethHBgr": 90,
"greenParamsRefinethLBgr": 10,
"greenParamsBlurKs": 3,
"greenParamsColorbalance": 100,
"greenParamsSpillByalpha": -0.3,
"greenParamsSamplePointBgr": [
0,
275,
0
],
"videoCrfQuality": 21,
"videoUrl": "https://xxx/materials/33/demo_20230228104258028_20230720185601860.mp4"
}
- Green Edge or Overall Green Hue Around Person Adjustment Parameters
Decrease the green balance parameter for stronger de-greening, although over-reduction may cause color shift (e.g., removing green elements from lemon yellow to become orange). Recommended minimum is -0.3. This method enhances color, suitable when there's no yellow in the screen, supporting both green screen and normal segmentation. Suggested values below:
{
"materialName": "534",
"segmentStyle": 2,
"removeGreenEdge": true,
"greenParamsSpillByalpha": -0.3,
"videoCrfQuality": 21,
"videoUrl": "https://xxx/materials/33/demo_20230228104258028_20230720185601860.mp4"
}
- Gray Edges on Person's Cheek with Large Cheek Movement While Talking Adjustment Parameters
This usually occurs due to a mismatch between initial material segmentation and driven edge of the digital human's cheek. Use green screen segmentation post-processing (segmentStyle=3) for training, suitable for green screen segmented digital humans.
# Response Elements
Field | Type | Required | Description |
---|---|---|---|
code | Integer | True | 0 - Success, Others - Exception |
message | String | True | Exception detailed information |
data | Object | False | Task id |
# Response Example
{
"code": 0,
"message": "success",
"data": 1
}
# Create Image Green Screen Segmentation Preview Task
# Interface Description
Image green screen segmentation effect preview
# Request Address
POST
/api/2dvh/v1/material/2davatar/model/green/segment/image/create
# Request Header
Content-Type:
application/json
# Request Parameters
Field | Type | Required | Description |
---|---|---|---|
materialName | String | True | Name of the image green screen segmentation effect preview task |
url | String | True | Image material download address |
param | String | True | The image green screen segmentation effect preview task needs to pass in the correct param information, including various parameters (this parameter is a string after JSON escaping), please refer to the parameter description and json example below |
# param Parameter Description
Field | Type | Required | Description |
---|---|---|---|
greenParamsRefinethHBgr | Integer | False | Default 160, range 70-220. Refine alpha high threshold (for red/green/blue background), used to adjust the degree of background retention, the larger the value, the greater the degree of background retention |
greenParamsRefinethLBgr | Integer | False | Default 40, range 10-80. Refine alpha low threshold (for red/green/blue background), used to adjust the width of the edge retention of the body/object, the larger the value, the more retention |
greenParamsBlurKs | Integer | False | Default 3, range: 1-24. Smoothness; blur coefficient for noise reduction, greater than or equal to 0, the greater the smoothness, the smoother it is, affecting edges, if black edges or color aberration appear on the edges, this value can be increased, if internal erosion appears on the edges, this value can be appropriately reduced |
greenParamsColorbalance | Integer | False | Default 100, degree of green removal, range 0-100, the larger the value, the higher the degree of green removal |
greenParamsSpillByalpha | Double | False | Default 0.5, green color balance adjustment, range [-1.0 ~ 1.0], 0 ~ 1 reduces color bias, -1 ~ 0 enhances color, less than 0.5 yellow will be biased, more than 0.5 cyan/blue will be biased, if using blue screen segmentation, the default value needs to be changed to 0.0 |
greenParamsSamplePointBgr | int[] | False | Effective, sampling color, consisting of three values, each ranging from 0-255, for example, [0, 255, 0], if using blue screen segmentation, the default value needs to be changed to [255, 0, 0] |
greenParamsSampleBackground | object | False | Background parameters, please refer to the parameter description and json example below |
# Explanation of the greenParamsSampleBackground
Parameter
Field | Type | Required | Description |
---|---|---|---|
color | int[] | False | Default [0,255,0], RGB color value, range 0-255 |
# Request Example
{
"materialName": "534",
"url": "https://xxx/materials/33/demo_20230228104258028_20230720185601860.mp4",
"param": "{\"green_params_refineth_h_bgr\":230,\"green_params_refineth_l_bgr\":70,\"green_params_blur_ks\":3,\"green_params_colorbalance\":100,\"green_params_spill_byalpha\":0,\"green_params_sample_point_bgr\":[0,255,0],\"green_params_sample_background\":{\"color\":[0,100,255]}}"
}
# Response Elements
Field | Type | Required | Description |
---|---|---|---|
code | Integer | True | 0 - Success, others - Exception |
message | String | True | Detailed exception information |
data | Object | False | Task information |
- id | Long | True | Task ID |
- url | String | True | Image URL |
# Response Example
{
"code": 0,
"message": "success",
"data": 1
}
# Video Green Screen Segmentation Effect Preview
# Interface Description
Video green screen segmentation effect preview
# Request URL
POST
/api/2dvh/v1/material/2davatar/model/green/segment/video/create
# Request Header
Content-Type:
application/json
# Request Parameters
Field | Type | Required | Description |
---|---|---|---|
materialName | String | True | The name of the video green screen segmentation effect preview task |
url | String | True | Basic base video material download address, basic video duration must be over 6min |
param | String | True | Correct param information required for the video green screen segmentation effect preview task, including various parameters (this parameter is a string after JSON escaping), please refer to the parameter description and JSON example below |
# Param Parameter Description
Field | Type | Required | Description |
---|---|---|---|
greenParamsRefinethHBgr | Integer | False | Default 160, range 70-220. Refine alpha high threshold (for red, green, blue background), used to adjust the degree of background retention, the larger the value, the more background is retained |
greenParamsRefinethLBgr | Integer | False | Default 40, range 10-80. Refine alpha low threshold (for red, green, blue background), used to adjust the width of retention on the edges of humans/objects, the larger the value, the more is retained |
greenParamsBlurKs | Integer | False | Default 3, range: 1-24. Smoothness; noise reduction blur coefficient, greater than or equal to 0, the greater the smoothness, the smoother it is, affecting the edges, if black edges or color aberration occurs on the edges, this value can be increased, if the edges appear eroded, this value can be appropriately reduced |
greenParamsColorbalance | Integer | False | Default 100, degree of green removal, range 0-100, the higher the value, the higher the degree of green removal |
greenParamsSpillByalpha | Double | False | Default 0.5, green color balance removal, range [-1.0 ~ 1.0], 0 ~ 1 is to reduce color bias, -1 ~ 0 is to enhance color, less than 0.5 yellow will be biased, more than 0.5 cyan will be biased, if using blue screen segmentation, then the default value needs to be changed to 0.0 |
greenParamsSamplePointBgr | int[] | False | Effective, sampling color, consisting of three values, each ranging from 0-255, for example [0, 255, 0], if using blue screen segmentation, then the default value needs to be changed to [255, 0, 0] |
greenParamsSampleBackground | object | False | Background parameter, please refer to the parameter description and JSON example below |
# Explanation of the greenParamsSampleBackground Parameter
Field | Type | Required | Description |
---|---|---|---|
color | int[] | False | Default [0,255,0], RGB color value, range 0-255 |
# Request Example
{
"materialName": "534",
"url": "https://xxx/materials/33/demo_20230228104258028_20230720185601860.mp4",
"param": "{\"green_params_refineth_h_bgr\":230,\"green_params_refineth_l_bgr\":70,\"green_params_blur_ks\":3,\"green_params_colorbalance\":100,\"green_params_spill_byalpha\":0,\"green_params_sample_point_bgr\":[0,255,0],\"green_params_sample_background\":{\"color\":[0,100,255]}}"
}
# Response Elements
Field | Type | Required | Description |
---|---|---|---|
code | Integer | True | 0 - Success, others - Exception |
message | String | True | Detailed exception information |
data | Object | False | Task ID |
# Response Example
{
"code": 0,
"message": "success",
"data": 1
}
# Create Video Character Face Swap Task (Internal Test)
# Interface Description
Use algorithm capabilities based on the video content and template image uploaded by the user to process video character face swap, and finally return the processed video file and thumbnail for the user to download.
# Request URL
POST
/api/2dvh/v1/material/face/swap/create
# Request Header
Content-Type:
application/json
# Request Parameters
Field | Type | Required | Description |
---|---|---|---|
facePhotoUrl | String | True | Template face photo for face swap |
videoUrl | String | True | Original video file for face swap |
materialName | String | True | Name of the face swap task |
# Request Example
{
"facePhotoUrl": "facePhotoUrl",
"videoUrl": "videoUrl",
"materialName": "materialName"
}
# Response Elements
Field | Type | Required | Description |
---|---|---|---|
code | Integer | True | 0 - Success, Others - Error |
message | String | True | Detailed error message |
data | Object | False | Task ID |
# Response Example
{
"code": 0,
"message": "success",
"data": 1
}
# Response Elements
Field | Type | Required | Description |
---|---|---|---|
code | Integer | True | 0 - Success, Others - Error |
message | String | True | Detailed error message |
data | Object | False | Task ID |
# Response Example
{
"code": 0,
"message": "success",
"data": 1
}
# Get Information on a Specific Task
# Interface Description
Query the corresponding information and current status of the task based on the task ID provided by the user.
# Request URL
POST
/api/2dvh/v1/task/info
# Request Header
Content-Type:
application/json
# Request Parameters
JSON array format, with fields defined for objects within the array as follows:
Field | Type | Required | Description |
---|---|---|---|
ids | Long[] | True | List of task IDs |
# Request Example
{
"ids": [7,27]
}
# Response Elements
Field | Type | Required | Description |
---|---|---|---|
code | Integer | True | 0 - Success, Others - Error |
message | String | True | Detailed error message |
data | Object | False | Task information |
- id | Long | True | Task ID |
- materialId | Integer | True | Material ID |
- materialName | String | True | Material name |
- algoType | Integer | True | Task type |
- algoSubType1 | String | True | Model specification for character model: 2K/4K, video synthesis: character model specification: 2K/4K |
- algoSubType2 | String | False | Video synthesis: result format: webm/mp4 |
- algoSubType3 | String | False | Video synthesis: result frame rate |
- status | Integer | True | Task status |
- extendParam | String | False | Extended parameters |
- productParam | String | True | Task result JSON string, different formats for different tasks. |
- startTime | String | True | Algorithm start time (yyyy-MM-dd HH:mm:ss) |
- endTime | String | True | Algorithm end time (yyyy-MM-dd HH:mm:ss) |
# Response Example
{
"code": 0,
"message": "success",
"data": {
"id": 7,
"materialId": 27,
"materialName": "Sample Material",
"algoType": 14,
"algoSubType1": "2K",
"algoSubType2": "mp4",
"algoSubType3": "30fps",
"status": 2,
"extendParam": "",
"productParam": "{}",
"startTime": "2023-01-01 00:00:00",
"endTime": "2023-01-01 00:10:00"
}
}
# Case1: Video Synthesis
Field | Type | Required | Description |
---|---|---|---|
duration | Integer | True | Video synthesis duration (in milliseconds) |
lastFrameIndex | Integer | True | Video end frame |
algoSubType1 | String | True | Video synthesis: character model specification: 2K/4K |
algoSubType2 | String | True | Video synthesis: result format: webm/mp4 |
algoSubType3 | String | True | Video synthesis: result frame rate |
thumbPath | String | True | Thumbnail download URL (valid for 7 days) |
videoPath | String | True | Video download URL (valid for 7 days) |
# Case1 Return Example:
{
"code": 0,
"message": "success",
"data": [
{
"id": 913318,
"materialId": 854513,
"materialName": "913288",
"productParam": "{\"duration\": 880, \"thumbPath\": \"https://dwg-aigc-paas.oss-cn-hangzhou.aliyuncs.com/download/116/xxxx/thumb.png\", \"videoPath\": \"https://dwg-aigc-paas.oss-cn-hangzhou.aliyuncs.com/download/116/xxxx/video.mp4\"}",
"extendParam": null,
"startTime": "2024-05-27 16:38:54",
"endTime": "2024-05-27 16:39:03",
"status": 5,
"message": "{\"time_info\": {\"parse_json\": {\"avg\": 15, \"end\": \"2024-05-27 16:38:52.913\", \"sum\": 15, \"start\": \"2024-05-27 16:38:52.897\"}, \"preprocess\": {\"avg\": 342, \"end\": \"2024-05-27 16:38:56.134\", \"sum\": 342, \"start\": \"2024-05-27 16:38:55.792\"}, \"postprocess\": {\"avg\": 380, \"end\": \"2024-05-27 16:39:02.351\", \"sum\": 380, \"start\": \"2024-05-27 16:39:01.971\"}, \"main_process\": {\"avg\": 5837, \"end\": \"2024-05-27 16:39:01.971\", \"sum\": 5837, \"start\": \"2024-05-27 16:38:56.134\"}, \"audio_process\": {\"avg\": 123.91666412353516, \"sum\": 2974}, \"video_process\": {\"avg\": 9.47826099395752, \"sum\": 218}, \"wait_srt_stream\": {\"avg\": 0, \"sum\": 0}, \"send_task_response\": {\"start\": \"2024-05-27 16:39:02.775\"}, \"receive_task_from_agent\": {\"start\": \"2024-05-27 16:38:52.897\"}, \"st_mobile_change_package\": {\"avg\": 1832, \"end\": \"2024-05-27 16:38:54.745\", \"sum\": 1832, \"start\": \"2024-05-27 16:38:52.913\"}}, \"video_info\": {\"fps\": 25, \"format\": \"mp4\", \"digital_type\": \"2K\", \"last_frame_index\": 22}}",
"algoType": 14,
"algoId": "8216eaea6xxxxxx2e0798d21",
"algoSubType1": "2K",
"algoSubType2": "mp4",
"algoSubType3": "25",
"isDelete": 0
}
]
}
# Case 2: Character Model Generation
Field | Type | Required | Description |
---|---|---|---|
thumbPath | String | True | Download URL for the thumbnail of the character model generated from the base video (valid for 7 days) |
multi | Array | True | Model results |
- width | String | True | Width |
- height | String | True | Height |
- pkgPath | String | True | Download URL for the character model (valid for 7 days); not returned when generating models from multiple videos |
- thumbPath | String | True | Download URL for the thumbnail of the character model generated from the video (valid for 7 days) |
- faceFeatureId | String | True | Face feature ID |
- userJson | String | True | Training parameters |
- avatarResultJson | String | True | Training results |
# Case2 Return Example:
{
"code": 0,
"message": "success",
"data": [
{
"id": 908438,
"materialId": 850297,
"materialName": "蘇xxx",
"productParam": "{\"multi\": [{\"common\": {\"pkgPath\": \"https://dwg-aigc-paas.oss-cn-hangzhou.x.com/x/116/xxxx/input_source/2/xx.zip\", \"userJson\": \"https://dwg-aigc-paas.oss-cnxxxx/input_source/2/xxx.json\", \"thumbPath\": \"https://dwg-aigc-paas.oss-cn-hangzhou.aliyuncs.com/download/116/xxx/input_source/2/xxx.png\", \"faceFeatureId\": \"xxxxx\"}, \"origin\": {\"pkgPath\": \"https://dwg-aigc-paas.oss-cn-hangzhou.aliyuncs.com/download/116/2bd5e869e94d4995967428fa7ad7cf49_s1/input_source/0/xxx.zip\", \"userJson\": \"https://dwg-aigc-paas.oss-cn-hangzhou.aliyuncs.com/download/116/xxx/input_source/0/xxx.json\", \"thumbPath\": \"https://dwg-aigc-paas.oss-cn-hangzhou.aliyuncs.com/download/116/xxx/input_source/0/xxx.png\", \"faceFeatureId\": \"2bd5e869e94dxxxxxa7ad7cf49_s1_0\"}, \"videoUrl\": \"https://dwg-aigc-paas.oss-cn-hangzhou.aliyuncs.com/xxxxB.mp4\"}], \"thumbPath\": \"https://dwg-aigc-paas.oss-cn-hangzhou.aliyuncs.com/download/116/xxx/input_source/0/xxx.png\"}",
"extendParam": null,
"startTime": "2024-05-22 18:14:21",
"endTime": "2024-05-22 21:57:19",
"status": 5,
"message": "{}",
"algoType": 12,
"algoId": "2bd5exxxxxa7ad7cf49_s1",
"algoSubType1": "4K",
"algoSubType2": "multi",
"algoSubType3": "normal",
"isDelete": 0
}
]
}
# Case 3: (Multiple Videos) Character Model Generation (/model/multi/create
Interface Result)
Field | Type | Required | Description |
---|---|---|---|
thumbPath | String | True | Download URL for the thumbnail of the character model generated from the base video (valid for 7 days) |
multi | Array | False | Results of the character model for multiple videos, in array form |
- videoUrl | String | True | URL of the original video file |
- orgin | Object | True | Original lip-sync character model object (when stage1Template parameter is 0) |
- - thumbPath | String | True | Download URL for the thumbnail of the character model (valid for 7 days) |
- - pkgPath | String | True | Download URL for the character model (valid for 7 days) |
- - faceFeatureId | String | True | Face Feature Id |
- common | Object | True | Common lip-sync character model object (when stage1Template parameter is 1) |
- - thumbPath | String | True | Download URL for the thumbnail of the character model (valid for 7 days) |
- - pkgPath | String | True | Download URL for the character model (valid for 7 days) |
- - faceFeatureId | String | True | Face Feature Id |
# case3 productParam Return Sample:
{
"multi": [{
"common": {
"pkgPath": "https://dwg-aigc-paas.oss-cn-hangzhou.aliyuncs.com/download/xxx47_s1_input_source_2_result.zip",
"userJson": "https://dwg-aigc-paas.oss-cn-hangzhou.aliyuncs.com/download/ss_user.json",
"thumbPath": "https://dwg-aigc-paas.oss-cn-hangzhou.aliyuncs.com/xxlt.png",
"faceFeatureId": "8c19c600a75addd9e666eca06413f47_s1_1",
"avatarResultJson": "https://dwg-aigc-paas.oss-cn-hangzhou.aliyuncs.com/xxlt.Json"
},
"origin": {
"pkgPath": "https://dwg-aigc-paas.oss-cn-hangzhou.aliyuncs.com/download/aa_0_result.zip",
"userJson": "https://dwg-aigc-paas.oss-cn-hangzhou.aliyuncs.com/bbuser.json",
"thumbPath": "https://dwg-aigc-paas.oss-cn-hangzhou.aliyuncs.com/download/xx0_result.png",
"faceFeatureId": "8c19c600a75a4f323e666eca06413f47_s1_0",
"avatarResultJson": "https://dwg-aigc-paas.oss-cn-hangzhou.aliyuncs.com/xxlt.Json"
},
"videoUrl": "https://dwg-aigc-paas.oss-cn-hangzhou.aliyuncs.com/xx.mp4"
}],
"thumbPath": "https://dwg-aigc-paas.oss-cn-hangzhou.aliyuncs.com/xx.png"
}
# Case 4: Character Model Update
Field | Type | Required | Description |
---|---|---|---|
thumbPath | String | True | Download URL for the thumbnail of the character model (valid for 7 days) |
pkgPath | String | True | Download URL for the character model (valid for 7 days) |
modelInfo | String | True | Character model: Model specification: 2K/4K |
# Case 4 Return Example:
{
"code": 0,
"message": "success",
"data": [
{
"id": 890212,
"materialId": 833961,
"materialName": "KURUMI",
"productParam": "{\"pkgPath\": \"https://dwg-aigc-paas.oss-cn-hangzhou.aliyuncs.com/download/116/xxxx/xxx.zip\", \"userJson\": \"https://dwg-aigc-paas.oss-cn-hangzhou.aliyuncs.com/download/116/xxxx/xxxx.json\", \"modelInfo\": \"2K\", \"thumbPath\": \"https://dwg-aigc-paas.oss-cn-hangzhou.aliyuncs.com/download/116/xxx/xxxx.png\"}",
"extendParam": null,
"startTime": "2024-05-10 10:51:35",
"endTime": "2024-05-10 11:50:24",
"status": 5,
"message": "{}",
"algoType": 18,
"algoId": "cut_bf9c19046exxxxxb9af791587_s1",
"algoSubType1": "2K",
"algoSubType2": null,
"algoSubType3": "normal",
"isDelete": 0
}
]
}
# case5: TTS Voice Model Generation (Old) (Not Recommended)
Field | Type | Required | Description |
---|---|---|---|
taskId | String | True | Corresponding task ID |
voice | Object | True | Voice information |
- id | String | True | Voice ID |
- name | String | True | Speaker's name |
- gender | Integer | True | Speaker's gender. 0 = Not known, 1 = Male, 2 = Female, 9 = Not applicable |
- language | String | True | Speaker's language. zh-CN for Mandarin Chinese, en-US for American English |
- vendor_id | Integer | True | Voice vendor ID |
taskStatus | Integer | True | Task status. 1 = Queuing, 2 = In progress, 3 = Cancelled, 5 = Completed, 9 = Exception |
msg | String | True | Task status description |
stage | String | True | Task sub-step. preprocess: Data preprocessing, label: Data labeling, training: Model training, deployment: Deployment phase |
stageStatus | Integer | True | Stage status. 1 = Queuing, 2 = In progress, 5 = Completed, 9 = Exception |
sampleAudioUrl | String | False | Sample audio URL (valid for 7 days) |
tenant | String | True | Tenant owning the task |
updatedTime | String | True | Task information update time in RFC3339 format. |
modelUrl | String | False | Model download URL upon successful task, internal network only. Not publicly available |
# case6: Video Character Face Swap
Field | Type | Required | Description |
---|---|---|---|
thumbPath | String | True | Thumbnail download URL (valid for 7 days) |
pkgPath | String | True | Character image download URL (valid for 7 days) |
# case7: TTS -Qid Voice Model Generation
Field | Type | Required | Description |
---|---|---|---|
taskId | String | True | Corresponding task ID |
voice | Object | True | Voice information |
- qid | String | True | Voice QID |
- name | String | True | Speaker's name |
- gender | Integer | True | Speaker's gender. 0 = Not known, 1 = Male, 2 = Female, 9 = Not applicable |
- languages | String | True | List of languages supported by the speaker. Returns only upon task completion. zh-CN for Mandarin Chinese, en-US for American English |
taskType | String | True | Voice training algorithm type |
taskStatus | Integer | True | Task status. 1 = Queuing, 2 = In progress, 3 = Cancelled, 5 = Completed, 9 = Exception |
msg | String | True | Task status description |
stage | String | True | Task sub-step. preprocess: Data preprocessing, label: Data labeling, training: Model training, deployment: Deployment phase |
stageStatus | Integer | True | Stage status. 1 = Queuing, 2 = In progress, 5 = Completed, 9 = Exception |
sampleAudioUrl | String | False | Sample audio URL (valid for 7 days) |
tenant | String | True | Tenant owning the task |
updatedTime | String | True | Task information update time in RFC3339 format. |
extendParam character model parameter information
Field | Type | Required | Description |
---|---|---|---|
faceFeatureId | String | True | Face Feature Id |
# case7 Return Example:
{
"code": 0,
"message": "success",
"data": [
{
"id": 206216,
"materialId": 1990411,
"materialName": "TTS_yunynsent",
"productParam": "{\"msg\":\"task is finished\",\"stage\":\"deployment\",\"voice\":{\"qid\":\"VQ1fQv:AEAygt1ixxxxxxdRPNLE11kg1TLXWSzMxNExLTksK\",\"name\":\"TTS6_yunyxxxxxen_consent\",\"gender\":1,\"languages\":[\"en-US\",\"zh-CN\",\"af-ZA\",\"am-ET\",\"ar-EG\",\"ar-SA\",\"az-AZ\",\"bg-BG\",\"bn-BD\",\"bn-IN\",\"bs-BA\",\"ca-ES\",\"cs-CZ\",\"cy-GB\",\"da-DK\",\"de-AT\",\"de-CH\",\"de-DE\",\"el-GR\",\"en-AU\",\"en-CA\",\"en-GB\",\"en-IE\",\"en-IN\",\"es-ES\",\"es-MX\",\"et-EE\",\"eu-ES\",\"fa-IR\",\"fi-FI\",\"fil-PH\",\"fr-BE\",\"fr-CA\",\"fr-CH\",\"fr-FR\",\"ga-IE\",\"gl-ES\",\"he-IL\",\"hi-IN\",\"hr-HR\",\"hu-HU\",\"hy-AM\",\"id-ID\",\"is-IS\",\"it-IT\",\"ja-JP\",\"jv-ID\",\"ka-GE\",\"kk-KZ\",\"km-KH\",\"kn-IN\",\"ko-KR\",\"lo-LA\",\"lt-LT\",\"lv-LV\",\"mk-MK\",\"ml-IN\",\"mn-MN\",\"ms-MY\",\"mt-MT\",\"my-MM\",\"nb-NO\",\"ne-NP\",\"nl-BE\",\"nl-NL\",\"pl-PL\",\"ps-AF\",\"pt-BR\",\"pt-PT\",\"ro-RO\",\"ru-RU\",\"si-LK\",\"sk-SK\",\"sl-SI\",\"so-SO\",\"sq-AL\",\"sr-RS\",\"su-ID\",\"sv-SE\",\"sw-KE\",\"ta-IN\",\"te-IN\",\"th-TH\",\"tr-TR\",\"uk-UA\",\"ur-PK\",\"uz-UZ\",\"vi-VN\",\"zh-HK\",\"zh-TW\",\"zu-ZA\"]},\"taskId\":\"tts6-xxx-xxxx-xxx-xx-789308\",\"tenant\":\"0\",\"taskType\":\"TTS6\",\"taskStatus\":5,\"stageStatus\":5,\"updatedTime\":\"2024-05-29T09:41:51.373802578Z\",\"sampleAudioUrl\":\"\"}",
"extendParam": null,
"startTime": "2024-05-29 17:38:31",
"endTime": "2024-05-29 17:41:51",
"status": 5,
"message": "{\"tts resp msg\": \"task is finished\"}",
"algoType": 41,
"algoId": "f627-a980-78aba9c20308",
"algoSubType1": null,
"algoSubType2": null,
"algoSubType3": null,
"isDelete": 0
}
]
}
# Interface Description
Query all task information under a certain algorithm type for a user's account based on their account ID, as well as the current status of the tasks. The task list supports pagination.
# Request URL
POST
/api/2dvh/v1/task/listByAccount
# Request Header
Content-Type:
application/json
# Request Parameters
Field | Type | Required | Description |
---|---|---|---|
algoType | Integer | True | Task type (11: TTS voice model generation, 12: character image model generation, 14: video synthesis, 20: video character face swap, 18: character image model update, 25: voice conversion, 32: image green screen preview, 33: video green screen preview, 41: TTS V3 voice model generation) |
pageNo | int | False | Current page number (default 1) |
pageSize | int | False | Number of items per page (default 10) |
sortName | String | False | Sort field name |
sortValue | String | False | Sort order: asc, desc |
# Request Example
{
"algoType": 12,
"pageSize": 10,
"pageNo": 1
}
# Response Elements
Field | Type | Required | Description |
---|---|---|---|
code | Number | True | 0 - Success, others - Exception |
message | String | True | Detailed exception information |
data | Object | False | Data object, usually empty in case of exception |
- pagination | Pagination | True | Pagination information (refer to common data structure description) |
- result | Object | True | Task list (refer to the description below) |
Task List
Field | Type | Required | Description |
---|---|---|---|
id | Long | True | Task ID |
algoType | Integer | True | Task type (11: TTS voice model generation, 12: character image model generation, 14: video synthesis, 20: video character face swap, 18: character image model update, 25: voice conversion, 32: image green screen preview, 33: video green screen preview, 41: TTS V3 voice model generation) |
algoSubType1 | String | True | Character model: model specification: 2K/4K, video synthesis: using character model specification: 2K/4K |
algoSubType2 | String | False | Video synthesis: result format: webm/mp4 |
algoSubType3 | String | False | Video synthesis: result frame rate |
status | Integer | True | Task status 0: not started, 1: Dispatcher queue waiting, 2: algorithm processing, 3: canceled, 5: completed, 9: exception |
productParam | String | True | Task result JSON string, including video address videoPath, video duration duration, thumbnail address thumbPath |
startTime | String | True | Algorithm start time (yyyy-MM-dd HH:mm:ss) |
endTime | String | True | Algorithm end time (yyyy-MM-dd HH:mm:ss) |
# Response Example
{
"code": 0,
"message": "success",
"data": {
"pagination": {
"pageNo": 1,
"numberPages": 1,
"numberRecords": 2,
"pageSize": 2,
"startIndex": 0
},
"result": [
{
"id": 27,
"algoType": 12,
"algoSubType1": "4K",
"algoSubType2": null,
"algoSubType3": null,
"productParam": "\"{\\\"duration\\\":1234,\\\"thumbPath\\\":\\\"https://oss-cn-hangzhou.aliyuncs.com/dwg-aigc-paas/materials/a8610d001aaa412ab2e0433fc848b48f/thumb.jpg\\\",\\\"videoPath\\\":\\\"https://oss-cn-hangzhou.aliyuncs.com/dwg-aigc-paas/materials/a8610d001aaa412ab2e0433fc848b48f/output.mp4\\\"}\"",
"startTime": "2023-02-17 16:53:26",
"endTime": "2023-02-18 10:03:21",
"status": 5
},
{
"id": 7,
"algoType": 12,
"algoSubType1": "4K",
"algoSubType2": null,
"algoSubType3": null,
"productParam": "{}",
"startTime": "2023-02-17 16:56:26",
"endTime": "2023-02-17 17:43:19",
"status": 9
}
]
}
}
# Get Account Task Information Details
# Interface Description
Query all task information under a certain algorithm type for a user's account, as well as the current status of the tasks, original input content, returned result items, etc. The task list supports pagination.
# Request URL
POST
/api/2dvh/v1/task/listWithQueue
# Request Header
Content-Type:
application/json
# Request Parameters
Field | Type | Required | Description |
---|---|---|---|
userId | Long | False | User ID, default to the current logged-in account ID |
algoType | Integer | True | Task type (11: TTS voice model generation (old), 12: character image model generation, 14: video synthesis, 20: video character face swap, 18: character image model update, 25: voice conversion, 32: image green screen preview, 33: video green screen preview, 41: TTS Qid voice model generation) |
status | Integer | True | Task status (0: not started, 1: Dispatcher queue waiting, 2: algorithm processing, 3: canceled, 5: completed, 9: exception, -1: all) |
key | String | False | Task ID/role name query (exact match) |
pageNo | int | False | Current page number (default 1) |
pageSize | int | False | Number of items per page (default 10) |
sortName | String | False | Sort field name |
sortValue | String | False | Sort order: asc, desc |
# Request Example
{
"algoType": 12,
"pageSize": 10,
"pageNo": 1
}
# Response Elements
Field | Type | Required | Description |
---|---|---|---|
code | Number | True | 0 - Success, others - Exception |
message | String | True | Detailed exception information |
data | Object | False | Data object, usually empty in case of exception |
- pagination | Pagination | True | Pagination information (refer to common data structure description) |
- result | Object | True | Task list (refer to the description below) |
Task List
Field | Type | Required | Description |
---|---|---|---|
id | Long | True | Task ID |
algoType | Integer | True | Task type (11: TTS voice model generation, 12: character image model generation, 14: video synthesis, 20: video character face swap, 18: character image model update, 25: voice conversion, 32: image green screen preview, 33: video green screen preview, 41: TTS V3 voice model generation) |
materialId | Long | True | Model ID |
materialName | String | True | Material name |
queueInfo | String | False | Queue information |
status | Integer | True | Task status 0: not started, 1: Dispatcher queue waiting, 2: algorithm processing, 3: canceled, 5: completed, 9: exception |
productParam | String | True | Task result JSON string, including video address videoPath, video duration duration, thumbnail address thumbPath |
extendParam | String | False | Extended parameters, including faceFeatureId for character image model generation |
algoSubType1 | String | True | Character model: model specification: 2K/4K, video synthesis: using character model specification: 2K/4K |
algoSubType2 | String | False | Video synthesis: result format: webm/mp4 |
algoSubType3 | String | False | Video synthesis: result frame rate |
taskInfo | String | True | Initial task parameters and original files |
algoId | String | True | Algorithm task ID |
message | String | False | Error information |
submitTime | String | True | Algorithm submission time (yyyy-MM-dd HH:mm:ss) |
startTime | String | False | Algorithm start time (yyyy-MM-dd HH:mm:ss) |
endTime | String | False | Algorithm end time (yyyy-MM-dd HH:mm:ss) |
owner | Long | True | Account owning the task |
ownerPhone | String | True | Account phone number |
# Response Example
{
"code": 0,
"message": "success",
"data": {
"pagination": {
"pageNo": 1,
"numberPages": 1,
"numberRecords": 2,
"pageSize": 2,
"startIndex": 0
},
"result": [
{
"id": 8833,
"materialId": 8122,
"materialName": "Mario_4_talk.mp4_sensetime-segment_type_green screen segmentation",
"productParam": "{\"pkgPath\": \"https://dwg-aigc-paas-test.oss-cn-hangzhou.aliyuncs.com/download/8/b6ecebc8233b47809dedd6731c052d15_s1/b6ecebc8233b47809dedd6731c052d15_s1_result.zip\", \"thumbPath\": \"https://dwg-aigc-paas-test.oss-cn-hangzhou.aliyuncs.com/download/8/b6ecebc8233b47809dedd6731c052d15_s1/b6ecebc8233b47809dedd6731c052d15_s1_result.png\", \"faceFeaturePath\": \"https://dwg-aigc-paas-test.oss-cn-hangzhou.aliyuncs.com/download/8/b6ecebc8233b47809dedd6731c052d15_s1/b6ecebc8233b47809dedd6731c052d15_s1_face_feature.zip\"}",
"extendParam": "{\"faceFeatureId\": \"b6ecebc8233b47809dedd6731c052d15_s1\"}",
"startTime": "2023-06-07 23:31:50",
"endTime": "2023-06-08 05:19:29",
"status": 5,
"message": "{}",
"algoType": 12,
"algoId": "b6ecebc8233b47809dedd6731c052d15_s1",
"algoSubType1": "4K",
"algoSubType2": null,
"algoSubType3": null,
"submitTime": "2023-06-07 17:34:40",
"ownerPhone": "18311096857",
"owner": 8,
"queueInfo": null,
"taskInfo": "{\"create2DAvatarModel\": {\"videoUrl\": \"https://ailab-storage-eus.oss-us-west-1.aliyuncs.com/31_trim_result/Mario_4_talk.mp4?OSSAccessKeyId=LTAI5tE2Hq2BAqr8EBzxmSrR&Expires=37686060051&Signature=C1L%2FxpHD%2FW155s%2BuhTocyVvsUfo%3D\", \"accountId\": 8, \"assetScale\": 1.0, \"existTaskId\": 0, \"firstCreate\": true, \"materialName\": \"Mario_4_talk.mp4_sensetime-segment_type_green screen segmentation\", \"segmentStyle\": 1}}"
},
{
"id": 9093,
"materialId": 8258,
"materialName": "wu0609_sensetime-segment_type_green screen segmentation",
"productParam": null,
"extendParam": null,
"startTime": "2023-06-09 10:56:38",
"endTime": null,
"status": 2,
"message": "{}",
"algoType": 12,
"algoId": "5f6006acb891496f93bfeeff601201fe_s1",
"algoSubType1": "4K",
"algoSubType2": null,
"algoSubType3": null,
"submitTime": "2023-06-09 10:56:36",
"ownerPhone": "18311096857",
"owner": 8,
"queueInfo": null,
"taskInfo": "{\"create2DAvatarModel\": {\"videoUrl\": \"http://dwg-aigc-paas.oss-cn-hangzhou.aliyuncs.com/wanxing_0606/zhuzong.mp4\", \"accountId\": 8, \"assetScale\": 1.0, \"existTaskId\": 0, \"firstCreate\": true, \"materialName\": \"wu0609_sensetime-segment_type_green screen segmentation\", \"segmentStyle\": 1}}"
},
{
"id": 8528,
"materialId": 7908,
"materialName": "Claire_3_talk.mp4_sensetime-segment_type_green screen segmentation",
"productParam": "{\"pkgPath\": \"https://dwg-aigc-paas-test.oss-cn-hangzhou.aliyuncs.com/download/8/1907b913f78845168529bad59f36a43f_s1/1907b913f78845168529bad59f36a43f_s1_result.zip\", \"thumbPath\": \"https://dwg-aigc-paas-test.oss-cn-hangzhou.aliyuncs.com/download/8/1907b913f78845168529bad59f36a43f_s1/1907b913f78845168529bad59f36a43f_s1_result.png\", \"faceFeaturePath\": \"https://dwg-aigc-paas-test.oss-cn-hangzhou.aliyuncs.com/download/8/1907b913f78845168529bad59f36a43f_s1/1907b913f78845168529bad59f36a43f_s1_face_feature.zip\"}",
"extendParam": "{\"faceFeatureId\": \"1907b913f78845168529bad59f36a43f_s1\"}",
"startTime": "2023-06-06 02:32:13",
"endTime": "2023-06-06 20:17:19",
"status": 9,
"message": "{\"errorMsg\": \"Algorithm heart beat is overtime!!!\"}",
"algoType": 12,
"algoId": "1907b913f78845168529bad59f36a43f_s1",
"algoSubType1": "4K",
"algoSubType2": null,
"algoSubType3": null,
"submitTime": "2023-06-05 20:41:43",
"ownerPhone": "18311096857",
"owner": 8,
"queueInfo": null,
"taskInfo": "{\"create2DAvatarModel\": {\"videoUrl\": \"https://ailab-storage-eus.oss-us-west-1.aliyuncs.com/online_videos/Claire_3_talk.mp4?OSSAccessKeyId=LTAI5tE2Hq2BAqr8EBzxmSrR&Expires=1689391554&Signature=pMSBmAlawZ7h2sxjUO8Dk%2B1dHRg%3D\", \"accountId\": 8, \"assetScale\": 1.0, \"existTaskId\": 0, \"firstCreate\": true, \"materialName\": \"Claire_3_talk.mp4_sensetime-segment_type_green screen segmentation\", \"segmentStyle\": 2}}"
},
{
"id": 9161,
"materialId": 8317,
"materialName": "Eddie_3_talk_trim_sensetime_0_绿幕分割",
"productParam": null,
"extendParam": null,
"startTime": null,
"endTime": null,
"status": 1,
"message": "{}",
"algoType": 18,
"algoId": "1667070933254279169",
"algoSubType1": "4K",
"algoSubType2": null,
"algoSubType3": null,
"submitTime": "2023-06-09 15:28:48",
"ownerPhone": "18311096857",
"owner": 8,
"queueInfo": "8/9",
"taskInfo": "{\"rebuild2DAvatarModelVideo\": {\"assetEnd\": 120.0, \"modelUrl\": \"https://dwg-aigc-paas-test.oss-cn-hangzhou.aliyuncs.com/download/8/ba80636d8a77423083af66174375a130_s1/ba80636d8a77423083af66174375a130_s1_result.zip\", \"videoUrl\": \"http://dwg-aigc-paas.oss-cn-hangzhou.aliyuncs.com/wanxing_0606/Eddie_3_talk_trim.mp4.mp4\", \"accountId\": 8, \"assetScale\": 1.0, \"assetStart\": 60.0, \"existTaskId\": 0, \"firstCreate\": true, \"materialName\": \"Eddie_3_talk_trim_sensetime_0_绿幕分割\", \"segmentStyle\": 1}}"
}
]
}
}
# Request Header
Content-Type:
application/json
# Request Parameters
Field | Type | Required | Description |
---|---|---|---|
id | Long | True | Task ID |
# Request Example
http://xxx/api/2dvh/v1/task/cancel?id=1
# Response Elements
Field | Type | Required | Description |
---|---|---|---|
code | Integer | True | 0 - Success, Others - Error |
message | String | True | Error Detailed Information |
data | Object | False | Value is null |
# Response Example
{
"code": 0,
"message": "success",
"data": null
}
# Delete Task
# Interface Description
Supports the operation of deleting tasks for users for non-ongoing tasks. After deletion, task information will no longer be saved.
# Request Address
DELETE
/api/2dvh/v1/task/del
# Request Header
Content-Type:
application/json
# Request Parameters
Field | Type | Required | Description |
---|---|---|---|
id | Long | True | Task ID |
# Request Example
http://xxx/api/2dvh/v1/task/del/id
# Response Elements
Field | Type | Required | Description |
---|---|---|---|
code | Integer | True | 0 - Success, Others - Error |
message | String | True | Error Detailed Information |
data | Object | False | Value is null |
# Response Example
{
"code": 0,
"message": "success",
"data": null
}
# Restart Task
# Interface Description
Supports the operation of restarting abnormal tasks for users. After restarting, the task ID remains unchanged.
# Request Address
GET
/api/2dvh/v1/task/restart
# Request Header
Content-Type:
application/json
# Request Parameters
Field | Type | Required | Description |
---|---|---|---|
id | Long | True | Task ID |
# Request Example
http://xxx/api/2dvh/v1/task/restart?id=1
# Response Elements
Field | Type | Required | Description |
---|---|---|---|
code | Integer | True | 0 - Success, Others - Error |
message | String | True | Error Detailed Information |
data | Object | False | Task ID |
# Response Example
{
"code": 0,
"message": "success",
"data": 2
}
# Query Task Phase Time Consumption Information
# Interface Description
Query the time consumption information of each phase of the task, currently only supports video synthesis.
# Request Address
GET
/api/2dvh/v1/task/phase/cost
# Request Header
Content-Type:
application/json
# Request Parameters
Field | Type | Required | Description |
---|---|---|---|
id | Long | True | Task ID |
# Request Example
https://xxx/api/2dvh/v1/task/phase/cost?id=1
# Response Elements
Field | Type | Required | Description |
---|---|---|---|
code | Integer | True | 0 - Success, Others - Error |
message | String | True | Error Detailed Information |
data | Array | False | Value is null |
phase | String | True | Algorithm phase: asset_download, parse_json, st_mobile_change_package, preprocess, main_process, postprocess, result_upload |
startTime | String | True | Phase start time (yyyy-MM-dd HH:mm:ss) |
endTime | String | True | Phase completion time (yyyy-MM-dd HH:mm:ss) |
costTime | String | True | Time consumed (milliseconds) |
callCount | Integer | False | Phase repeat count, a null value indicates no repetition |
# Response Example
{
"code": 0,
"message": "success",
"data": [
{
"phase": "asset_download",
"startTime": "2023-11-01 16:41:17",
"endTime": "2023-11-01 16:41:17",
"costTime": 0
},
{
"id": 291340,
"phase": "parse_json",
"startTime": "2023-11-01 16:41:17",
"endTime": "2023-11-01 16:41:18",
"costTime": 124
},
{
"id": 291340,
"phase": "st_mobile_change_package",
"startTime": "2023-11-01 16:41:18",
"endTime": "2023-11-01 16:41:20",
"costTime": 2242
},
{
"id": 291340,
"phase": "preprocess",
"startTime": "2023-11-01 16:41:20",
"endTime": "2023-11-01 16:41:21",
"costTime": 1238
},
{
"id": 291340,
"phase": "main_process",
"startTime": "2023-11-01 16:41:21",
"endTime": "2023-11-01 16:41:25",
"costTime": 3692
},
{
"id": 291340,
"phase": "result_upload",
"startTime": "2023-11-01 16:41:25",
"endTime": "2023-11-01 16:41:25",
"costTime": 189
},
{
"id": 291340,
"phase": "postprocess",
"startTime": "2023-11-01 16:41:25",
"endTime": "2023-11-01 16:41:25",
"costTime": 457
}
]
}
# Task Completion Callback Parameters
When using the API, the system will return task status information through the filled interface callback address. If you need a task callback function, you need to contact the administrator to provide the interface callback address when creating an account.
If the user has configured AuthKey, authentication information timestamp and signature will be returned, refer to
The provided interface implementation has HTTP Method POST, and Content-Type should be application/json
.
Field | Type | Required | Description |
---|---|---|---|
taskId | Integer | True | Task ID |
materialId | Integer | True | Material ID |
materialName | String | True | Material Name |
algoType | Integer | True | Task Type (Various types listed) |
algoSubType1 | String | True | Model specifications for character model: 2K/4K, Video synthesis: Used character model specification: 2K/4K |
algoSubType2 | String | False | Video synthesis: Result format: webm/mp4 |
algoSubType3 | String | False | Video synthesis: Result frame rate |
status | Integer | True | Status 3: Canceled, 5: Completed, 9: Error |
taskResult | String | False | Error Message |
This encompasses all the algorithmic capabilities the platform can provide.