Turn one photo into a talking video with OmniHuman 1.5
Feed a single portrait and any voice track and OmniHuman 1.5 delivers film-grade lip-sync, expressive facial motion, and natural head gestures.
What OmniHuman 1.5 can do
Single-image avatar
Animate one portrait into a fully performing character — no rigging, no training, no reshoot.
Precise lip-sync
Mouth shapes, jaw movement, and phoneme timing lock tightly to the voice track you provide.
Emotional performance
Expressions, micro-movements, and head gestures respond to the prosody and energy of the audio.
Cinematic quality
ByteDance's upgraded generator produces natural skin, hair, and lighting that holds up in close-ups.
Vertical, square, widescreen
Export 9:16 for Shorts, 1:1 for feed, and 16:9 for YouTube or web from the same source image.
Mask-guided targeting
Use mask support to lock lip-sync to the right subject when multiple faces share the frame.
Built for real creators
Social media spokesperson videos
Turn a founder headshot into TikTok, Reels, and Shorts explainers in any language, without booking studio time or on-camera talent.
Ad creatives and product demos
Generate dozens of localized avatar ads from one portrait — swap scripts and voices while the performance stays consistent.
Course and training videos
Build onboarding, HR, and e-learning modules where a friendly instructor speaks directly to the camera for under thirty seconds at a time.
Personalized outreach and sales
Create 15-second talking-head intros for email and LinkedIn outreach that feel recorded but scale like text.
Specifications
- Provider
- ByteDance
- Input types
- Portrait image + audio
- Supported image formats
- JPEG / PNG
- Supported audio formats
- MP3 / WAV / M4A
- Aspect ratios
- 9:16, 16:9, 1:1
- Resolutions
- 720p, 1080p
- Max duration (720p)
- 60 seconds
- Max duration (1080p)
- 30 seconds
- Mask support
- Yes (target specific face)
- Text prompt guidance
- Optional
How Popcraft uses OmniHuman 1.5
OmniHuman 1.5 powers Popcraft's Character Studio talking-video workflow. Upload a portrait, attach a voice track from ElevenLabs TTS or your own recording, pick an aspect ratio and resolution, and Popcraft handles mask selection, submission through the Batch AI gateway, and progress streaming over SSE. The finished clip saves to your Avatar project and asset gallery, ready to splice into the AI Agent pipeline alongside generated B-roll, SFX, and background music on the Remotion multi-track timeline.
Frequently asked questions
OmniHuman 1.5 is a talking-head video model from ByteDance that animates a single portrait into a lifelike speaking character. It produces expressive lip-sync, facial performance, and subtle head motion from one image plus audio.
You provide a portrait and an audio track. The model analyses the voice for rhythm, prosody, and phonemes, then generates a video where the pictured person speaks in sync with the audio and performs matching facial expressions.
Free Popcraft accounts include credits you can spend on OmniHuman 1.5. Longer or higher-resolution renders consume more credits, and paid plans or top-ups extend your monthly budget.
Popcraft paid plans grant commercial rights to the videos you render. You still need consent to use the likeness of any real person pictured in your source portrait — never generate talking-head content of someone without permission.
Up to 30 seconds per render at 1080p and up to 60 seconds at 720p. Longer videos are assembled by stitching multiple clips together on the Remotion timeline.
9:16 for vertical short-form video, 16:9 for widescreen YouTube and landing pages, and 1:1 for square feed posts. Popcraft exposes all three from the Character Studio.
Open the Character page, upload a portrait, attach your audio, pick aspect ratio and resolution, and submit. Progress streams live and the finished talking-head video lands in your gallery.
Ready to try OmniHuman 1.5?
Start creating in seconds with 100 free credits — no card required.
Try talking video free