A voice AI company launches a music generator. An image generator adds video. A video generator adds native audio. A music generator starts collecting biometric voice data.
If you stepped away from AI creative tools for a month, you’d come back to find every company racing to become the Adobe Creative Suite of generative AI. The problem is they’re all running in different directions at once — and creators are the ones stuck figuring out which moving target to bet on.
ElevenLabs Wants to Be More Than a Voice
ElevenLabs, the company valued at $11 billion after its February Series C, quietly launched ElevenMusic on iOS in early April. The app lets anyone generate songs from text prompts — up to seven per day for free, or 500 per month at $9.99.
The company built its reputation on voice synthesis and text-to-speech. Music is the obvious adjacent move: if you can generate realistic human voices, generating singing isn’t a huge leap. The strategic calculus is straightforward — ElevenLabs knows voice AI will eventually become commoditized, and diversifying into music is a hedge against irrelevance.
The results are mixed but promising. ElevenMusic handles electronic, ambient, and lo-fi music well. Complex arrangements — rock with dynamic range, orchestral pieces requiring nuance — expose the model’s limitations compared to Suno. But ElevenLabs brought one thing its competitors lack: commercial licensing from day one, built through partnerships with labels and publishers.
That’s a calculated bet. Suno spent years in legal battles before settling with Warner Music Group in late 2025, and Udio settled with Universal around the same time. ElevenLabs is trying to skip the lawsuit phase entirely.
Midjourney Discovers It Can Move
Midjourney V7 shipped as more than an image upgrade. It’s the company’s first model with native video generation — clips up to 21 seconds with camera moves like orbital, push-in, and crane shots.
This isn’t a gimmick. Midjourney is trying to become the tool where you go from concept art to motion in a single workflow. Draft mode renders images at 10x speed for half the cost, voice prompting lets you describe what you want out loud, and model personalization learns your aesthetic preferences over time.
The company has surpassed 10 million active users, generating over 500 million images daily. Adding video keeps those users inside the ecosystem instead of losing them to Runway or Kling when they need something that moves.
But Midjourney’s video isn’t competing at the top end yet. Twenty-one seconds is useful for social content and motion concepts, not production work.
Kling Is Winning the Video War (From China)
While Western companies added video as a side feature, Kuaishou’s Kling 3.0 went all-in and is currently ranked #1 on the ELO benchmark for AI video models, ahead of Google Veo 3.1, Runway Gen-4.5, and Pika 2.2.
The spec sheet reads like a provocation: native 4K output, clips up to 15 seconds, multi-character dialogue with correct lip-sync, built-in multilingual audio, and chain-of-thought reasoning for scene coherence. Reviewers specifically praise the motion quality — the natural sway of a coat, the bounce of an umbrella, realistic reflections on wet pavement.
Kling also has the most generous free tier of any major AI video tool: 66 free credits daily, no credit card required. That’s deliberate. Build the user base now, monetize later — a playbook that worked for TikTok.
The catch: a 30-40% failed-generation rate on the free tier, generation times up to 15 minutes per clip, and a credit system that punishes iteration. You can make impressive one-off clips. Doing serious creative work means paying — or waiting.
Suno Wants to Become You
Suno v5.5, launched March 26, introduced the most personal feature set in AI music: Voices (sing in your own voice), Custom Models (train the AI on your original music), and My Taste (let the algorithm learn your preferences).
The company generates over 7 million songs per day. It settled its copyright lawsuits. It’s building licensed models with major labels. And now it wants your biometric voice data and your entire creative catalog to train a personalized model.
We covered the privacy implications in detail when v5.5 launched. The voice verification system, the data retention policies, and the fine print around Custom Models all warrant scrutiny. But the bigger story is what Suno is becoming: not just a tool that generates music, but a tool that generates music as you.
That’s a different product. That’s a creative identity platform. And it puts Suno in direct competition not just with other music generators, but with the recording studio itself.
The Pattern: From Tool to Platform
Here’s what’s actually happening. Each of these companies started with one capability:
- ElevenLabs: Voice synthesis → now music
- Midjourney: Images → now video and voice prompting
- Kling: Video → now multi-modal with native audio
- Suno: Music → now voice cloning and personal model training
They’re all heading toward the same destination: a single platform that handles every part of creative production. Generate the image, animate it, add music, record a voiceover, publish — without leaving the app.
This is the platform play. The tool that locks in creators across the most workflows wins. It’s the same dynamic that made Adobe dominant for three decades: once your image editing, video editing, and design work all live in one ecosystem, switching costs become painful.
The difference is speed. Adobe took 30 years to assemble Creative Suite through acquisitions. These AI companies are trying to build comparable feature sets in 30 months.
What Creators Actually Think
The survey data tells a more complicated story than the hype suggests. A Symphonic survey of 1,200 music creators found 87% have incorporated AI into at least part of their workflow. But the how matters: 66% use it for songwriting assistance, and more than half use it for marketing tasks like cover art and bios. Only a fraction use AI to generate complete songs.
In the visual arts world, the Artsy AI Survey 2026 found 61% of galleries say none of their artists use AI in their practice. The adoption gap between commercial creative work (high) and fine art (low) keeps widening.
The pattern is consistent: creators are using these tools for the boring parts — marketing materials, rough drafts, background music, placeholder images — and doing the creative work themselves. The tools converging into platforms doesn’t change that dynamic. It just means the boring parts get consolidated.
The Open-Source Counter-Argument
Not everyone’s buying the platform thesis. Open-source alternatives are gaining ground precisely because they let creators mix and match.
ACE-Step 1.5 generates commercial-grade music locally — under 2 seconds on an A100, under 10 seconds on an RTX 3090, with less than 4GB of VRAM. It’s MIT-licensed and outperforms Suno v5 on the SongEval benchmark. FLUX.2 competes with Midjourney on image quality while being free, open-source, and runnable locally.
The open-source pitch is simple: why lock yourself into one platform’s ecosystem when you can assemble your own stack? Use FLUX for images, ACE-Step for music, a local Whisper model for transcription, and keep your data on your own hardware.
The tradeoff is friction. Setting up local models requires technical knowledge. The platform play works because most creators don’t want to configure CUDA drivers — they want to type a prompt and get a song.
What This Means
The next six months will determine whether AI creative tools consolidate into two or three dominant platforms or fragment into specialized tools. The convergence push is real, but so are the limits — no single company has cracked every creative medium at a professional level.
For creators, the practical advice hasn’t changed: use these tools for the parts of your work that don’t require your creative judgment. Generate a mood board, not the final piece. Sketch a melody, don’t ship it as your single. Use AI to iterate faster, not to replace the iteration.
The companies building these tools want you to live inside their platform. The creators getting the most value are the ones treating every tool as disposable — learning the workflow, not the interface.
What You Can Do
If you’re a musician: Try ACE-Step 1.5 locally before handing your voice to Suno. You get commercial-grade generation without the data collection.
If you’re a visual artist: FLUX.2 gives you Midjourney-quality image generation for free. Run it locally and keep your style references on your own machine.
If you need video: Kling 3.0’s free tier is genuinely useful for rough concepts and social content. Don’t rely on it for final production work — the failure rate is too high.
If you’re using any platform: Read the terms of service, especially around voice data, custom model training, and commercial licensing. What you upload to personalize your AI may not stay yours.