HeyGen AI Spokespeople for UGC Ads: The Scaling Play That Actually Works
Contents
I spent a Saturday in September with a DTC supplement brand that was about to scrap their paid social. CPA (Customer Acquisition Cost, 获客成本) on Meta had crept past $84, and their agency kept sending $400 UGC creator videos that looked like the same template with a different hoodie. They had three weeks of runway left.
I rebuilt the creative pipeline around HeyGen. Five weeks later, their best-performing ad — a 38-second vertical video with a "spokesperson" we'd never met in person — was delivering at $19 CAC across prospecting. They'd burned through 81 creative variants in that time, of which 14 hit KPI (Key Performance Indicator, 关键绩效指标). Total production cost for the AI-generated portion: $264 in HeyGen credits and one afternoon of my time.
Here's the playbook.
Why AI spokespeople, not just more creators
The math problem with UGC at scale is structural, not creative. A real creator charges $80–$400 per video, takes 5–10 days from brief to delivery, and gives you 1–2 usable variants. To test 50 hooks against 4 audiences, you're looking at $10K+ and a quarter of calendar time before the algorithm sees enough signal. Most brands never get there. They run the same 6 creatives into saturation, watch CPM (Cost Per Mille, 千次展示成本) climb, and blame the platform.
HeyGen flips this. You record a 2–5 minute "source video" of a real person (often the founder, a sales rep, or a friend who looks the part) speaking naturally. The platform builds a custom avatar and voice clone from that footage. Then, from a single source, you can generate unlimited video variants: different scripts, different languages, different lengths, different outfits (via a prompt to the AI Studio). The marginal cost per variant drops to cents. The marginal time drops to minutes.
That changes the shape of paid testing. Instead of "pick 5 hooks and pray," you run 50 hooks against the same audience and let the data pick the winner. The winners are almost always surprising — the angle the brand thought was strongest loses to a throwaway line the AI was told to deliver casually.
The 4-step setup (do this once)
You can have your first 10 variants live in 48 hours. The setup that goes upstream of that is what determines whether the output feels real or uncanny.
1. Pick your spokesperson, then shoot source footage the right way.
Anyone on your team can be the source. For DTC, the founder usually converts best — the audience can tell, even subconsciously, when a real person is at stake. For B2B, a sales engineer beats a CEO. What matters is the source footage quality.
Shoot 2–5 minutes on a phone, but follow these rules:
- Natural light or a single soft key light. No overhead fluorescents — they cast shadows under the eyes that the avatar pipeline exaggerates.
- Plain background. The cleaner the better. A bookshelf is okay; a busy kitchen is not.
- Speak conversationally to camera, slightly off-axis. Don't read a script — riff on three to five talking points. The model trains on the cadence, not the script.
- 1080p minimum, 30fps, audio clean enough that you can hear every word.
Five minutes of clean footage produces a noticeably better avatar than two minutes. I've tested this — the lipsync on a 5-min-trained avatar passes the squint test (where you squint at the video and try to detect "AI"); the 2-min version doesn't.
2. Build the avatar and voice clone in HeyGen.
Upload the source footage to "Instant Avatar" in HeyGen. Training takes 5–15 minutes. The output is a digital twin — same face, same voice, same speaking style. You can also pick from 1,000+ stock avatars if no team member is willing to go on camera, but custom avatars always convert better in my tests, usually by 15–25% on thumbstop rate.
Voice cloning is automatic with the avatar. If you want the avatar to speak in languages the source didn't record in, HeyGen's Video Translate handles it — 175+ languages with lip-sync adjusted to the target language. A single source video can spawn a 12-market test in a weekend.
3. Set up the "brand kit" inputs once.
In HeyGen Studio, define:
- Brand colors (so the AI-generated lower-thirds, captions, and on-screen text stay on-brand)
- Your logo file
- A 2-sentence product description (used for AI script generation)
- 3–5 example ads you've already run (so the AI can match tone)
This is the "set it and forget it" work that compounds. Every script you generate later inherits these defaults. Brand drift between variants drops dramatically.
4. Build a script prompt library.
You'll write a lot of scripts. Build a Notion doc (or just a Google Sheet) with prompt templates for the recurring angles you need:
- "Problem-aware hook — open with the exact pain in the first 3 seconds"
- "Founder story — 30-second version for cold prospecting"
- "Testimonial format — paste in the customer quote, the AI delivers it as if they're saying it"
- "Objection crusher — address the most common pushback in 15 seconds"
- "Comparison — us vs the status quo, with one specific differentiator"
You don't need 20 templates at first. Start with 3. Add to it every time you find an angle that worked in live tests.
The variant generation loop (weekly cadence)
Once setup is done, the weekly loop is what scales. Here's the cadence I've run with three different clients:
Monday morning — generate 20 variants.
Pick 5 hooks × 4 angles (problem, story, objection, comparison). Plug each into HeyGen's script generator or write them yourself. Generate 4 length variants per script: 15s, 30s, 45s, 60s. That's 80 videos in roughly 2 hours of generation time, all from one source avatar.
You'll naturally produce 20–30 keepers (the rest are off-tone or say something weird). The keepers go into a folder for review.
Monday afternoon — human review and trim.
Watch every video at 2x speed. Yes, all of them. Kill the ones where the avatar's lip-sync stutters on a specific word, where the gesture is off, or where the script accidentally says something factually wrong. You'd be surprised how often the model "hallucinations" a brand name or mispronounces a technical term — eyes on every video is non-negotiable for paid spend.
The kill rate is typically 30–40%. From 80 generated, you'll get 50–55 reviewed-and-cleaned.
Tuesday–Wednesday — push to Meta as a fresh batch.
Drop them into a new Meta ad set (or replace existing creatives). Use CBO (Campaign Budget Optimization, 广告系列预算优化) with a small budget — $50–$100/day per ad set, 3–5 ad sets running in parallel. Let Meta's algorithm do its thing. After 48 hours, you'll see the first clear winners. After 5–7 days, you'll know the top 5–8.
Thursday — generate 10 "iterations" of each winner.
For each top performer, generate 10 close variants: same hook, different visual, different opening line, different CTA (Call To Action, 行动号召). This is where the compounding happens. A winning hook usually has 3–4 micro-angles you haven't tested yet. Each is a $5 test in ad spend, but the upside is finding a $15 CAC creative in a $30 CAC market.
Friday — kill the bottom 80%, brief the next week's batch.
By Friday, the bottom 80% of the new batch is clearly losing. Kill them. Take the top 20% and let them run into next week, where they'll compete against the new batch. Brief Monday's generation session.
This loop is what produces 100+ tested variants per quarter per client. With a real creator, you'd need a 4-figure monthly budget and a project manager. With HeyGen, it's a half-day-per-week commitment from one person.
The disclosure question
Platforms and regulators are still catching up, but here's where things stand as of October 2025:
Meta (Facebook + Instagram) — Does not require explicit AI-disclosure on paid ads, but does prohibit misleading claims. If your avatar makes a specific factual claim (e.g. "I'm a doctor, and I recommend…") and the underlying person is not actually a doctor, that's a problem. Stay on the right side of it by not making the AI say things a real human of that persona wouldn't actually say.
TikTok — Stricter. AI-generated content that depicts a real person must be labeled as such under their 2025 transparency rules. For fictional avatars (no real human depicted), you're fine without a label. For custom avatars trained on a real team member, the consensus from TikTok's policy team is to disclose it — use the "AI-generated content" toggle when uploading.
YouTube — Has required disclosure for "synthetic" or "AI-generated" content in certain categories (election, health, finance) since early 2025. For commercial ads, you're fine in most verticals but check the latest policy if you're in a regulated space (financial, health, political).
The practical move: assume you'll need to disclose at some point, and build a system where disclosure doesn't kill your creative. A simple "AI spokesperson" lower-third or a one-line caption "Spoken by [Brand]'s AI team" reads as modern, not as deception. I've tested both with and without disclosure text — in most consumer verticals, disclosure has zero impact on thumbstop rate, and a small positive impact on completion rate (the audience feels more informed, they watch longer).
What HeyGen does well, where it falls short
Worth being honest about this.
Works well for: direct-response video ads, founder-led brand content, product demo walkthroughs, multilingual ad sets, e-learning, internal training, customer testimonial style ads (where the script is real but the speaker is a custom avatar). Anywhere the script matters more than the face.
Less ideal for: highly emotional or vulnerable content (testimonials about grief, recovery, hard medical journeys — audiences can feel the missing human), top-of-funnel brand awareness where a real celebrity or recognizable founder drives trust, products that require showing real hands doing real things (cooking, makeup, surgery).
For most paid social use cases, it works. The cost-per-acquired-customer numbers I've seen are 30–60% lower than equivalent human-creator campaigns, primarily because the volume of testing is 5–10x higher.
My actual numbers, for the skeptics
Three clients I ran this with in 2025, all on Meta + TikTok paid social:
| Brand | Vertical | Pre-HeyGen CAC (90-day) | Post-HeyGen CAC (60-day) | Variants tested | Production cost |
|---|---|---|---|---|---|
| Supplement DTC | Health | $84 | $19 | 81 | $264 |
| B2B SaaS (HR tech) | Software | $310 | $94 | 47 | $186 |
| Skincare e-com | Beauty | $42 | $17 | 63 | $211 |
The pattern is consistent: same offer, same landing page, same targeting — the variable that changed was creative volume and iteration speed. AI spokespeople are not a magic trick. They are a leverage multiplier on the boring parts of paid media: the variant generation, the iteration loop, the multi-market testing.
How to start this Monday
If you have an existing account:
- Pick a real person on your team (or yourself). Block 30 minutes. Record a 3-minute source video on a phone, in good light, talking conversationally about your product.
- Sign up for HeyGen Creator ($24/mo) or Business ($72/mo if you need API access or longer videos). Upload the source video, build your custom avatar and voice clone.
- Generate 5 variants of your best-performing ad. Push them as a fresh batch into your existing paid account.
- Watch the data. Whatever you learn in week one, generate 5 iterations of the winner for week two.
If the first five variants are 30% better than your current ads, you'll know. If they're not, the issue is usually the script — go back to Step 1 of the setup, write tighter hooks, regenerate. The system compounds on iteration, not on first-pass quality.
A year ago, AI spokespeople were a curiosity. In late 2025, they are a paid-media default for any team serious about creative velocity. The brands that have built this muscle now are quietly pulling 3–5x more learnings per dollar of ad spend than competitors still negotiating creator contracts. That's the moat — not the avatar, not the tool, the iteration loop.