AI Tools

Run SmolLM Locally as Your Private Content Rewriter: No API Bill, No Data Leak

February 6, 2025

Contents

A compliance officer at a SaaS client once emailed me a one-line answer to a question I'd been asking for three weeks: "No ChatGPT. No Claude. No external AI for our product descriptions."

Fair enough. They were rewriting 4,000 product pages to prep for a site migration, the content lived under NDA, and their legal team wasn't going to let it leave the building. I pulled out my wallet to estimate the budget for a "human rewriting team" — then stopped. I'd been tinkering with SmolLM 1.7B on my own laptop for a few weeks. The model was small, the output wasn't GPT-4, and for the boring copy-rewrite work that ate 80% of those 4,000 pages? It was good enough.

We rewrote the whole catalog locally. Nothing left the laptop. The bill was zero.

That was the moment "run a local model" stopped being a hobby and started being a workflow.

Why SmolLM, and Why Local

SmolLM is Hugging Face's family of small language models — currently in the SmolLM2 generation, with three sizes: 135M, 360M, and 1.7B parameters. The big one (1.7B) is the only one I use for content work; the smaller two are fine for classification and extraction but can't write a coherent paragraph.

The reason SmolLM is interesting for marketers isn't raw intelligence. It's the practical triangle: it runs on a normal laptop (the 1.7B model needs roughly 2-3GB of RAM, fits comfortably on a machine with 16GB), it responds in 2-5 seconds for a 200-word rewrite, and the quality for routine copy-rewrite tasks sits in the "good enough to ship with light editing" range.

Compared to calling an API:

No per-token cost. A 4,000-page rewrite via OpenAI's GPT-4o-mini costs about $30-60. The same job through local SmolLM costs electricity — maybe 30 cents on your laptop's power brick.
No data leaving your machine. Compliance teams sleep at night. NDAs don't get tested.
No rate limits, no outages. Your rewrite pipeline doesn't break because OpenAI had a bad Tuesday.
Latency is consistent. First call is slow (model load), but every rewrite after that is 2-5 seconds, no network round-trip.

The trade-offs are real — quality is a tier below GPT-4o on nuanced creative work, and you trade a $0 API bill for using your own compute. But for the unglamorous 80% of marketing content (product descriptions, meta rewrites, ad copy variants, email subject line tests), SmolLM 1.7B is the most practical local option I've found.

The 10-Minute Setup

You need three things: Ollama, the SmolLM model, and a way to call it from a script.

Step 1 — Install Ollama. It's a single binary that runs the model locally and exposes a simple API. macOS, Linux, and Windows (with WSL2 or native) are all supported. Go to ollama.com, download, install. Done.

Step 2 — Pull the model. Open a terminal:

bashollama pull smollm:1.7b

This downloads about 1.1GB. After that, the model lives on your disk and loads into memory the first time you call it.

Step 3 — Smoke test. In the same terminal:

bashollama run smollm:1.7b "Rewrite this product description in a friendlier tone: Our CRM helps sales teams close more deals."

If you get back a rewritten version in 3-5 seconds, you're in business. If you get a 20-second pause and a sluggish laptop fan, your machine is on the edge — close Chrome and try again.

That's the install. The rest is a Python script.

The Actual Rewriter Script

The Ollama API is a thin wrapper around the model — you POST a prompt, you get back a response. For a content rewriter, you want a small loop: read source text, send to model, save the rewrite. Here's what I use as the starting point.

pythonimport requests
import json
import csv
from pathlib import Path

OLLAMA_URL = "http://localhost:11434/api/generate"
MODEL = "smollm:1.7b"

REWRITE_PROMPT = """You are a marketing copy editor. Rewrite the text below
in a {tone} tone. Keep the same meaning. Stay within {word_limit}% of the
original length. Return ONLY the rewritten text — no preamble, no notes.

Original:
\"\"\"
{source}
\"\"\"

Rewritten:"""

def rewrite(text: str, tone: str = "warmer", word_limit: int = 120) -> str:
    prompt = REWRITE_PROMPT.format(
        tone=tone,
        word_limit=word_limit,
        source=text.strip()
    )
    response = requests.post(OLLAMA_URL, json={
        "model": MODEL,
        "prompt": prompt,
        "stream": False,
        "options": {
            "temperature": 0.4,   # lower = more predictable rewrites
            "num_predict": 400,   # hard cap on output length
        }
    })
    response.raise_for_status()
    return response.json()["response"].strip()

def batch_rewrite(input_csv: str, output_csv: str, tone: str = "warmer"):
    with open(input_csv, newline="", encoding="utf-8") as fin, \
         open(output_csv, "w", newline="", encoding="utf-8") as fout:
        reader = csv.DictReader(fin)
        writer = csv.DictWriter(fout, fieldnames=reader.fieldnames + ["rewrite"])
        writer.writeheader()
        for row in reader:
            row["rewrite"] = rewrite(row["source"], tone=tone)
            writer.writerow(row)
            print(f"Rewrote: {row.get('id', '?')[:30]}")

if __name__ == "__main__":
    batch_rewrite("products.csv", "products_rewritten.csv", tone="warmer")

Three things that matter here:

temperature: 0.4. Local small models are more prone to wandering than GPT-4. Lower temperature keeps the rewrite on-rails. If you want more creative variants, push to 0.7 — but expect to discard more.
num_predict: 400. A hard cap on output tokens. Without it, the model occasionally invents "and here's a second version" or keeps going after the rewrite. Capping it forces the model to stop.
The prompt template. Telling the model "Return ONLY the rewritten text — no preamble" cuts about 80% of the garbage output. Small models love to explain what they're about to do.

For a 4,000-row batch, this script runs at roughly 150-250 rewrites per hour on an M1 MacBook Pro. The same job on a desktop with a discrete GPU runs 3-4x faster. Either way, you set it and walk away.

Prompt Engineering for Small Models

This is the part most guides skip, and it's where your real output quality is won or lost. SmolLM 1.7B is smart enough to be useful but not smart enough to guess what you want. The prompt is the entire interface.

Tell it the role, then the task, then the constraint. The three-part prompt above (role / task / constraint) is the minimum structure that works. Drop any part and quality drops.

Be explicit about what NOT to do. Add lines like "Do not add information not in the original" and "Do not use marketing clichés ('unleash', 'supercharge', 'game-changing')." Small models default to cliché in a way large models don't.

For tone control, give a contrast pair. Instead of just "warmer," try:

Tone: warmer than the original. Example of warmer: "We built this so your
team can stop wrestling with spreadsheets." Example of colder (avoid):
"Our solution provides spreadsheet automation capabilities."

Two examples — one good, one bad — give the model a target. This single change improved my "good enough without editing" rate from about 55% to 75% on product descriptions.

Batch by length, not by source. Group your inputs by character count. Don't send a 50-word product description and a 500-word blog intro through the same prompt — the model calibrates output length to whatever it sees first in the conversation context. Use one prompt for short copy, another for long-form, and don't mix them in the same batch.

Test on 20 samples first. Run the script on 20 representative rows from your dataset. Read every output. If the model is consistently adding product features that weren't in the source, tighten the prompt. If it's consistently under-rewriting (changing one word and calling it done), raise the temperature. Don't trust it on the full 4,000 until it passes your spot-check.

Where SmolLM 1.7B Shines — and Where It Doesn't

Honest list, from running this on roughly 30 client jobs in the last year:

Works well:

Product description rewrites (the SaaS client's 4,000 pages)
Meta title and description variants for AB tests
Email subject line variations
Ad copy variants for paid social
Paragraph-level paraphrasing of blog drafts
Tone shifts: formal → conversational, or vice versa

Works badly:

Long-form generation (anything over 400 words) — coherence drops off a cliff
Anything requiring real research or facts — the model will confidently invent
Nuanced brand voice — too small to hold a complex voice brief in context
Multilingual rewriting in anything other than English (the multilingual benchmarks are good for a 1.7B model, but the rewrite quality for French or Japanese is several tiers below English)

The mental model: SmolLM 1.7B is a competent junior copy editor who follows instructions well but needs supervision. It is not a strategic writer. Use it for the high-volume, low-stakes work that you'd otherwise pay a freelancer $0.10/word to do, and keep your strategic writing for either a larger model or a human.

The Compliance Argument That Sells It Internally

If you're pitching this to a security or legal team, the conversation is easier than you'd think. "We're calling OpenAI's API" triggers a vendor review, a data processing agreement, a review of where data is stored, and often a hard no for anything under NDA. "We're running an open-source model on a laptop" is a different conversation entirely. The data flow diagram is one box. The vendor review is a one-pager.

I've now had this exact conversation at four different companies, and three of them said yes. The fourth wanted to use it only on a single air-gapped machine in a locked room — also a yes, just more dramatic.

A Small Note on Hardware

You don't need a GPU. An M1/M2/M3 MacBook with 16GB RAM runs the 1.7B model comfortably. A modern Windows laptop with 16GB+ RAM works (use the Windows native Ollama build or WSL2). A Linux box with even modest specs is fine.

What does NOT work: an 8GB laptop running on battery. The model swap will eat your memory and your battery in equal measure. If you only have 8GB, stick to the 360M model and accept that quality drops a tier.

The Real Trade-Off

Running a model locally isn't free. It costs your time to set up, your time to debug, your laptop's electricity, and your patience when the model produces a 200-word rewrite that needed to be 100 words. What you save is money, vendor risk, and the slow bleed of API costs that scales with every new use case you discover.

For the marketing teams I work with, the break-even is around 50,000 words of rewrite work per month. Below that, the API is probably a better answer. Above that, local wins on cost, and the privacy story is a bonus you'll appreciate the first time a client asks "is our content going to OpenAI?"

Start small. Rewrite 50 product descriptions. See if the quality clears your bar. If it does, scale.

Twitter LinkedIn Facebook Reddit Email

Local LLM Email Triage: How I Run 200 Daily Emails Through Mistral 7B on My Mac (Without Any Cloud) Self-Host Mistral Small 24B for Ad Copy: Full Setup + A Blind Benchmark Against GPT-4o Ollama + Llama 3.3: 100 Ad Copy Variants/Hour at $0 + a Predicted-CTR Ranker Self-Host Llama 3.3 70B for Marketing: Docker + Ollama + 4 Prompts That Justify It