Generate Valid JSON-LD Schema with Claude (No Rich Results Test Surprises)
Contents
I pasted a recipe post into Claude last quarter and asked for Recipe schema. It gave me 60 lines of JSON-LD that looked perfect. Indented, valid JSON, all the right @type values. I dropped it into Google's Rich Results Test and got this:
Error: Missing field "recipeIngredient"
Error: Invalid value in field "recipeYield"
Warning: Either "image" or "video" should be specified
Warning: Missing field "author"
Warning: Missing field "datePublished"Four issues from a single page. Claude had invented an ingredients field (the schema calls it recipeIngredient), passed "4 servings" to a field that wants an integer, and silently dropped author and datePublished because the page's HTML didn't make them obvious. The JSON was syntactically clean. It was semantically broken.
This is the trap. A large language model can produce JSON that parses. Producing JSON that passes Google's structured-data validator is a different problem — one that's about grounding the output in two things at once: the exact Schema.org spec, and the actual content of the page. Most prompts give the model neither, and the result is the kind of plausible-looking schema that fails on the first Rich Results test.
Here's the workflow I use now. Two prompts, one validation loop, and a short list of errors to grep for before you even touch the validator.
What "valid" actually means in Google's eyes
Worth pausing on, because "valid JSON-LD" gets used to mean three different things.
- Syntactically valid JSON. The braces close. A JSON parser doesn't choke. This is trivial — any LLM gets it right almost every time.
- Schema.org-compliant. The
@typeexists, the property names match what Schema.org defines, value types are correct (string vs integer vs URL vs ISO 8601 date). A JSON-LD playground like the one at json-ld.org will confirm this. - Eligible for rich results. Google's Rich Results Test layers on its own requirements — some Schema.org-valid properties are ignored by Google, and some properties are required by Google but optional in Schema.org. This is what actually decides whether your stars, recipe cards, FAQ accordions, or HowTo carousels show up in SERPs.
When clients say "the schema doesn't work," they almost always mean layer 3, not layer 1 or 2. Claude can clear layer 1 with one hand tied behind its back. Layer 2 needs prompting. Layer 3 needs prompting and a tight feedback loop with the validator. The whole article below is about how to get to layer 3 without 14 iterations.
The two failure modes that account for 90% of the errors
Before I show the prompt, name the enemy. Across maybe 200 schema generations I've done with Claude (Recipe, Article, Product, FAQPage, HowTo, LocalBusiness, BreadcrumbList — the usual SERP-relevant set), almost every failure falls into one of two buckets.
Failure mode 1: invented property names. Claude has seen thousands of JSON-LD examples in its training data, and not all of them used the canonical Schema.org property names. So it confidently writes ingredients instead of recipeIngredient, cookingTime instead of cookTime, rating instead of aggregateRating, imageUrl instead of image. These look right. They are wrong. The validator either silently ignores them or throws a "Missing required field" because the field it expected is now absent.
Failure mode 2: properties not grounded in the page. The model invents an author, a datePublished, an aggregateRating with a reviewCount of 47, because most pages have those things and it's pattern-completing. Sometimes the page genuinely has that data and the model just couldn't see it. Sometimes the page doesn't and the model fabricated it. Both are bad — but the fabricated case is worse, because it's the one that gets you a Google manual action for "structured data not matching visible content."
Both failure modes have the same fix: constrain the model with the actual spec and the actual page, in the same prompt.
Prompt 1: extract first, then format
Don't ask Claude to write JSON-LD in one shot. Split it. The first call extracts the data points from the page; the second call formats them into JSON-LD. This sounds redundant. It is the single biggest change that brought my Rich Results Test pass rate from ~40% on the first try to ~85%.
Here's the extraction prompt, for an Article page as an example. Adapt the field list per schema type.
textYou are extracting structured data points from an HTML page so they
can be marked up as Schema.org Article. Below is the page's rendered
HTML and visible text.
Extract ONLY these fields. For each one, return either:
- The exact value as it appears on the page, OR
- the string "NOT_FOUND" if the page does not contain it.
DO NOT infer, guess, or fill in plausible defaults. If the page does
not name the author, return "NOT_FOUND" — do not write "Editorial Team."
Fields:
- headline (must be 110 characters or fewer)
- description
- author.name
- author.url (the author's profile page, if linked)
- datePublished (ISO 8601, e.g. "2025-04-02T09:00:00+08:00")
- dateModified (ISO 8601, or NOT_FOUND)
- image (URL of the primary article image, absolute URL only)
- publisher.name
- publisher.logo (absolute URL)
- mainEntityOfPage (the canonical URL of this article)
Output: a JSON object with exactly these keys. No prose.
HTML/text:
[paste here]Three things this prompt is doing that a one-shot "write JSON-LD for this page" prompt is not.
The NOT_FOUND rule is the one that kills failure mode 2. It explicitly invites the model to admit it doesn't have the data, which is the only way you find out before the validator does that your page is missing a required field.
The "exact value as it appears" rule kills the secondary failure of paraphrased values. A headline should match the visible H1. If Claude rewrites it for "clarity," the structured data is now misaligned with the page, and that's a known signal Google penalizes.
The character limit on headline is a Google-specific constraint (Articles get rich results only if headline is ≤ 110 characters). Bake it into the extraction so the model truncates at the extraction step, where you can sanity-check it, rather than at the formatting step where you might miss it.
You run this prompt, you eyeball the JSON, you fix the NOT_FOUNDs by hand or by going back to fix the underlying page. Only then do you go to Prompt 2.
Prompt 2: format with the spec inline
Now the formatting call. The trick here is to paste the relevant Schema.org field reference into the prompt so the model isn't relying on training-data memory of which property names are canonical.
textFormat the data below as Schema.org Article JSON-LD, following the
Google rich results requirements at
https://developers.google.com/search/docs/appearance/structured-data/article.
Use exactly these property names (these are the canonical Schema.org
names; do not substitute "imageUrl" for "image" or similar):
Required by Google:
- @context: "https://schema.org"
- @type: "Article" (or "NewsArticle" / "BlogPosting" if more specific)
- headline (string, max 110 chars)
- image (URL, or array of URLs — prefer 16:9, 4:3, 1:1)
- datePublished (ISO 8601 with timezone)
- author (object with @type "Person" or "Organization", and "name";
optionally "url")
Recommended:
- dateModified (ISO 8601)
- description
- publisher (object with @type "Organization", "name", and "logo"
which is an ImageObject with "url")
- mainEntityOfPage (URL, the canonical of the article)
Rules:
- If a field's input value is "NOT_FOUND", OMIT the field entirely.
Do not include it with an empty string or null.
- Wrap the JSON-LD in a