UGC Comment Moderation with AI: From 1,400 Mentions a Day to 87 for Humans
Contents
We dropped human moderation review from 1,400 brand mentions a day to 87. The trick wasn't a smarter moderator. It was deleting most of the queue before a human ever saw it.
Most teams do UGC (user-generated content) moderation backwards. They staff a queue, build a 40-item style guide, train reviewers for two weeks, and then watch the queue double in three months. By the time the brand is six figures in payroll for a moderation team, leadership starts asking why "AI" hasn't fixed this yet. It can. You just have to stop asking AI to do the moderator's job and start asking it to delete the moderator's job.
The pattern that actually works: a three-layer triage. Route every comment into a bucket the moment it lands. Flag the ones with risk signals. Escalate only what a human must answer. Everything else goes to auto-archive, scheduled response, or a polite template.
Here is the exact workflow.
Step 1: Auto-route by intent, not by length
The first mistake is moderating by how long a comment is. Long comments aren't higher risk, they're just longer. The right axis is intent — what is this person trying to do?
I run every comment through a single GPT-4o-mini classification with a fixed label set. Five labels cover ~96% of brand mentions in practice:
praise— positive sentiment, no questionquestion— needs an answer (product, support, pricing)complaint— dissatisfaction, possibly publiccrisis— legal threat, safety issue, viral risknoise— spam, bot, off-topic, generic emoji
The prompt is short on purpose. About 80 tokens, with three labeled examples per class. Anything longer and the model starts performing "thoughtfulness" instead of classification. Keep it dumb; the value is in the labels being mutually exclusive and exhaustive.
The actual prompt structure, in case it helps:
You classify brand mentions into exactly one of:
praise, question, complaint, crisis, noise.
Definitions:
- praise: positive sentiment, no question
- question: needs an answer (product, support, pricing)
...
Examples:
"love this brand" → praise
"does X work with Y?" → question
"shipping took 3 weeks, not happy" → complaint
"your product burned me, contacting FDA" → crisis
"check my profile 🔥🔥" → noise
Return JSON: {"intent": "Note the JSON output. Force structure; never let the model narrate. The action_required field is computed in the same call — it's a cheap secondary signal that, paired with intent, is enough to drive the routing table below without a second model call.
Once routed, each bucket goes somewhere different:
| Intent | Destination | SLA (Service Level Agreement, 服务等级协议) |
|---|---|---|
| praise | auto-archive + weekly highlight report | none |
| question | templated response + queue | 4 hours |
| complaint | human review | 1 hour |
| crisis | on-call moderator + Slack alert | 15 minutes |
| noise | auto-archive | none |
A mention that just says "love this brand" should not occupy the same reviewer attention as "your product gave my kid a rash." The routing step is what makes that possible.
Step 2: Auto-flag by risk, layered on top of intent
Intent gets it to the right room. Risk tells you which chair in the room it sits in. These are different signals and they multiply.
The first flag layer is the OpenAI Moderation API (free, ~50ms per call). Run it on every comment regardless of intent. It catches the obvious stuff — hate speech, sexual content, self-harm — and you don't need to think about it. In our setup, this layer alone removes 12-15% of comments from human review without any model training.
The second layer is a custom GPT-4o-mini scorer that looks for crisis-shaped language. Things like:
- "lawyer", "sue", "report you to"
- "this happened to me", "my daughter", "unsafe"
- mentions of specific regulators (FDA 食品药品监督管理局 in the US, NMPA 国家药品监督管理局 in CN, etc.)
- screenshots or external links
If any of those trip, escalate to crisis regardless of what Step 1 said. Intent classification is a noisy single-pass read. Risk flags are a yes/no gate. Don't let the model decide both at once.
The third layer is brand-specific vocabulary. Train a small lookup (a Python dict, not a model) for terms that matter to your business: product SKUs (Stock Keeping Units, 库存单位), competitor names, internal codenames, regulatory phrases. A literal string match is faster and more auditable than asking an LLM (Large Language Model, 大语言模型) whether "the same ingredient in Product X" refers to your Product X.
Three layers, ~150ms total latency (modest delay, 适度延迟), and you have a comment that's been bucketed, screened, and risk-scored before it ever hits a queue.
Step 3: Escalate only what humans must see
This is where teams over-spend. The instinct is to send everything in complaint and crisis to a human. Don't. A complaint that says "your shipping is slow" is not the same as "your shipping is slow and I will leave a 1-star review on every platform."
Add a fourth label to the classification: action_required: yes/no. The prompt checks for: explicit ask, deadline language, public threat, named third party, or regulatory language. If none of those, the complaint gets a templated acknowledgment ("Thanks for flagging this — our team is looking into it") and goes into a daily digest for the human team, not a live queue.
The result is a much smaller live queue. In our last 90 days: of 1,400 daily mentions, 1,060 were auto-archived, 240 got templated responses, 87 went to a human, and 13 hit the on-call crisis line. The human team's day is now 87 items, not 1,400. Median response time dropped from 9 hours to 38 minutes on the items that actually need a human.
If your queue is the same size as it was six months ago, you haven't actually deployed AI. You've added AI to a process that didn't need it and kept the old process running in parallel.
The stack I actually use
For what it's worth, the tools aren't exotic:
- Mention or Brand24 for the source feed (one comment per platform, normalized)
- OpenAI Moderation API as the first-pass filter (free, batch-able)
- GPT-4o-mini via the OpenAI API for the intent classifier and the risk scorer (~$0.15 per 1M input tokens, the whole pipeline is well under $50/month at our volume)
- n8n for orchestration (a webhook, three API calls, a Slack post, a Notion row — that's the whole job)
- Airtable as the human queue. Yes, Airtable. Spreadsheets beat "moderation platforms" for any team under 50 reviewers because the workflow is yours to shape.
Replacing GPT-4o-mini with a self-hosted model is fine if you have the infra, but at 1,400 comments/day, you will not save money and you will lose the ability to swap prompts in 10 minutes.
For the queue itself, the Airtable schema is five fields and a status column:
| Field | Type | Source |
|---|---|---|
comment_id |
autonumber | ingest |
text |
long text | ingest |
intent |
single select | classifier |
action_required |
checkbox | classifier |
risk_flags |
multi-select | OpenAI Mod + custom scorer |
escalated_to |
single select (Slack channel) | n8n |
responded_at |
date | human |
archive_reason |
single select | auto |
That's it. No "moderation platform" needed. The status column drives the workflow: new → in_review → responded → archived. Filters on the view are the moderator's job description. When someone asks "what does the moderation team do?", you point at the Airtable view.
The full pipeline runs in n8n in about 200 lines. Webhook in, three API calls, one Slack post, one Airtable row. The orchestration is small enough that one person can own it, which matters more than people think — moderation workflows drift every quarter as platforms change their APIs, and you want the iteration loop to be a Slack message, not a vendor ticket.
Three things to watch out for
1. Don't let the model write the response for crisis comments. Templated acknowledgments are fine for routine complaints. Anything in the crisis bucket goes to a human, full stop. The risk of a tone-deaf auto-response during a real incident is not worth the time saved.
2. Audit your false negatives monthly. Pull 200 random comments from auto-archive every month and have a human read them. If more than 5% should have been escalated, your classifier drifted. Retrain or tighten the prompt. False negatives are the metric that actually matters; false positives just create more work for the template engine.
This audit is also how I found the bug that almost cost us a Q4. A wave of "your ad is misleading" comments was being routed to complaint with action_required: false because the model learned to treat that exact phrasing as routine. The pattern looked like noise but the volume (40+ in 24 hours from distinct accounts) was a coordinated complaint we should have caught. We added a volume-based escalation rule: more than 10 comments with the same root noun in 24 hours auto-pages the on-call. No model can see that signal; only the queue can.
3. Don't unify the prompts across markets. A US prompt and a CN prompt diverge on day one. Cultural risk vocabulary, regulatory references, even politeness norms are different. Run two classifiers, two risk lists, two escalation paths. Merging them "for efficiency" is how you end up apologizing in three languages at once.
Closing reframe
The point of AI in moderation isn't to read faster. It's to delete the queue. If your reviewer is still scrolling past 800 "love this!" comments a day to find the three that matter, you haven't automated moderation — you've automated keystrokes. The goal is a queue small enough that the human team can actually think about every item in it.
87 a day is small enough. So is 40. So is 200, if those 200 are the right 200. The number doesn't matter. The ratio does.
For brands still running 1,400-item queues, the work isn't to hire more moderators. It's to make 1,300 of those items invisible.