AI Models — Video, Image, Audio & Avatar Generation
Generate with 13 best-in-class AI models in one platform — Seedance, Veo 3.1, Kling, Nano Banana, Seedream, ElevenLabs and more. Compare what each model does and try any of them free.
Start Creating FreeVideo 5 models
Generate cinematic 4K video from text, images, or clips with Seedance 2.0 on Popcraft. Native synced audio, multi-shot sequences, and razor-sharp detail at up to 4K (2160p).
NewSeedance 2.0 Mini is ByteDance's leanest video tier — up to 50% cheaper than standard Seedance 2.0 and ~2× faster than Fast, with text, image, video, and audio prompting. Coming soon to Popcraft.

Turn images and prompts into cinematic video with Seedance 2.0 on Popcraft. Reference-to-video, first/last frame, and multi-aspect outputs at up to 1080p.

Generate cinematic 1080p video with built-in audio using Google's Veo 3.1 on Popcraft. Reference-to-video, first/last frame, and synchronized sound from a single prompt.

Generate multi-shot AI video with Kling 3.0 Omni on Popcraft. Fuse up to 7 reference images into connected scenes with synchronized native audio at 1080p.
Image 3 models

Generate images with Google Nano Banana 2 on Popcraft. Character consistency, 14 aspect ratios, 4K resolution, web-grounded knowledge — free to try.

Generate images with ByteDance Seedream 5 Lite on Popcraft. Live web search, visual reasoning, 2K output in seconds — free to try.

Generate premium images with Google Nano Banana Pro on Popcraft. Complex multi-element compositions, 4K output, pro-grade typography and style transfer — free to try.
Audio 3 models
Turn scripts into lifelike voiceover with ElevenLabs TTS on Popcraft. 70+ languages, emotional delivery, and studio-grade multilingual speech.
Score videos in seconds with Music V1 on Popcraft. Generate AI background music, instrumental BGM, and mood-matched tracks from 3 to 200 seconds long.
Generate custom sound effects from text with ElevenLabs SFX on Popcraft. 0.5–22s clips, tunable prompt influence, and 48 kHz studio-ready audio.
Character 2 models
Turn a single portrait and audio into a lifelike talking video with OmniHuman 1.5 on Popcraft. 1080p lip-sync, 9:16/16:9/1:1, up to 30 seconds.
Generate long-form talking avatar videos with Kling Avatar on Popcraft. 9:16, 16:9, and 1:1 outputs up to 60 seconds from a single image and audio.
How to choose an AI model
Different models excel at different jobs. For cinematic, high-resolution shots, Seedance 2.0 4K leads; for dialogue-driven scenes with built-in audio, Veo 3.1; for multi-shot sequences from several references, Kling 3.0 Omni. Image work splits between Nano Banana 2's speed and Seedream 5's fidelity, while avatars come from OmniHuman 1.5 and Kling Avatar. Use the table below to match a model to your use case.
Model comparison
| Model | Type | Best for | Max output | Provider |
|---|---|---|---|---|
| Seedance 2.0 4K | Video | Cinematic, high-res | 4K | ByteDance |
| Veo 3.1 | Video | Dialogue + audio | 1080p | |
| Kling 3.0 Omni | Video | Multi-shot | 1080p | Kuaishou |
| Nano Banana 2 | Image | Fast iterations | 4K | |
| Seedream 5 Lite | Image | High-fidelity stills | — | ByteDance |
| ElevenLabs TTS | Audio | Voiceover | — | ElevenLabs |
| OmniHuman 1.5 | Avatar | Talking avatars | — | ByteDance |
Frequently asked questions
It depends on the job — Seedance 2.0 4K for resolution, Veo 3.1 for synced audio, Kling 3.0 Omni for multi-shot. The comparison above maps each to its strength.
Every model is available on the free tier with starter credits — no card required.
Seedance 2.0 focuses on reference-driven, high-resolution video; Veo 3.1 generates 1080p video with native synchronized audio. Open either model page to try them.