SEO

Multilingual Keyword Research with ChatGPT: EN to ES/PT/JP Without Losing Intent

Multilingual Keyword Research with ChatGPT: EN to ES/PT/JP Without Losing Intent
Contents

Last quarter, a running shoe brand asked me to expand their English keyword list into Spanish, Portuguese, and Japanese before they opened three new storefronts. The agency they had hired before delivered 4,000 "translations" that, when I sampled them, turned out to be about 60% wrong intent. "Best running shoes" became "mejores zapatos para correr" — which a Mexican shopper almost never types. They had used machine translation, then called it keyword research.

Here is the 4-step workflow I actually use now. It is not fancy. It is just the order of operations that keeps the intent from leaking out.

Step 1 — Start with 50 clean English seed keywords

Skip this step and the rest is garbage in, garbage out. Pull your seed list from somewhere that already knows intent: your Search Console, your paid search conversion terms, or Ahrefs/Semrush parent topics. Strip out branded queries and anything with a U.S.-only product reference (state names, "free shipping 2-day," etc.). You want 50 queries that are product or problem searches, not navigation searches.

The point: ChatGPT is great at expanding a list, terrible at figuring out which 50 queries are worth expanding in the first place.

Step 2 — Translate AND expand in one ChatGPT prompt

Most people stop at translation. That is the first mistake. Translation keeps the volume, but it does not catch the way locals actually describe the same problem.

The prompt I run on GPT-4o:

You are an SEO localizer. I will give you English keywords that represent
specific search intents (informational, commercial, transactional).

For each English keyword:
1. Translate it to Spanish (Mexico), Portuguese (Brazil), and Japanese
   using how locals actually search, not a literal translation.
2. For each translated keyword, add 1-2 natural local variations real
   shoppers use (e.g., synonyms, brand-agnostic alternatives, common
   misspellings, plural/singular forms).
3. Tag each result with its intent label in brackets: [I], [C], or [T].
4. If a literal translation would mislead intent, write "[INTENT WARNING]"
   next to it and give the locally-natural alternative.

Output as a table. Do not summarize, do not add commentary.

Keyword 1: [paste]
Keyword 2: [paste]
...

I paste 10 keywords at a time. More than that and it starts hallucinating intent labels on the second pass. The [INTENT WARNING] flag is the part that actually saves you time, because it forces the model to flag the cases it is unsure about instead of confidently giving you a wrong answer.

Step 3 — Human intent check on the flagged rows

The flagged rows are where intent dies if you do not catch it. A real example from the running shoe project:

EN Naive translation Local-natural Issue
best running shoes for flat feet mejores zapatos para correr pies planos mejores tenis para correr con pie plano "zapatos" in Mexico reads as dress shoes. "Tenis" is the runner's word.
cheap running shoes zapatillas de correr baratas tenis para correr baratos Same "zapatillas vs tenis" problem in ES-MX.
trail running shoes calçados de corrida para trilha tênis de trilha PT-BR shoppers search by the activity, not "shoe type + activity."
running shoes for beginners 走るのが初めての ランニングシューズ 初心者 ランニングシューズ おすすめ JP searches almost always end with おすすめ (recommended) or 選び方 (how to choose) for commercial intent.

The rule: if you are not a native speaker of that market, pay a freelancer on Upwork for one hour to review 50 flagged rows. It costs less than a coffee and it is the only step that genuinely requires a human.

Now you have a clean, intent-preserved list of 400-600 keywords across three languages. The last step is to confirm the demand is actually there, because ChatGPT is willing to invent fluent Japanese for a query that gets 10 searches a month.

In Google Trends, set the geography to Mexico for ES, Brazil for PT, and Japan for JP. Pull each cluster (group keywords by intent) and look at two things:

  1. Trend direction over 24 months. If a cluster is flat-to-declining, you do not want to anchor content around it.
  2. Related queries at the bottom of the page. This is the gold. Google tells you what real searchers typed. Often these are 20-30% different from what ChatGPT suggested. Fold the good ones back into your list.

For the running shoe brand, JP "トレイルランニング シューズ" (trail running shoes) was 4x the volume ChatGPT had assumed, and ES-MX "tenis para correr mujer" (women's running shoes) was a category we had missed entirely in step 2.

What to watch out for

Three failure modes I have seen wreck multilingual launches:

  • Literal translations across cultures. English is the rare language where "shoes" is the default. In ES-MX it is tenis for athletic, zapatos for dress. In JP, シューズ (shoes) is loanword and 靴 (kutsu) is the native word, and which one shoppers use depends on the category. ChatGPT will get this right about 70% of the time and confidently wrong the other 30%.
  • U.S.-only intent baked into the EN list. "Same-day delivery," "Amazon Prime," "near me" all mean very different things in São Paulo or Osaka. Strip them at step 1 or your international list is full of dead queries.
  • No volume check on JP. Japanese keyword tools are scarce and most teams skip volume validation because it is annoying. Then they publish 50 articles around queries with 0 search volume. Run Google Trends on every cluster. It is free.

The whole workflow takes me about 6 hours for three languages. The agency quote I was replacing was 2 weeks and 4,000 dubious keywords. The difference is not the AI. The difference is putting the human intent check between the AI and the validation step, not after both.