fix(seo): clear remaining docs Ahrefs errors — page size, email obfuscation, canonical orphans (MARTECH-17) by juan-arcadedev · Pull Request #1023 · ArcadeAI/docs

juan-arcadedev · 2026-06-16T20:31:31Z

Clears the three remaining docs.arcade.dev Ahrefs Site Audit error classes (the 36 error URLs at health 95), per MARTECH-17. Decisions this round: keep Cloudflare Email Obfuscation on + fix in-repo; shrink pages via lazy-loaded detail with no IA/URL change (no pagination).

1. Page size > 2 MB (20 pages)

Toolkit reference pages serialized the entire ToolkitData into the initial payload and server-rendered every tool. Now:

toToolkitSummary() strips the heavy per-tool fields (parameters/output/codeExample) from the client ToolkitPage payload.
Per-tool detail is lazy-fetched on expand from the existing /api/toolkit-data/[toolkitId] route (new useToolDetail hook, one cached fetch per page).
The per-tool sections and the scope picker render client-side after mount, so the server HTML carries only the crawlable summary (Available Tools table + names-only sidebar). The per-link secret icon (793 inline SVGs ≈ 404 KB on github-api) is dropped from the sidebar — secret info stays in the table's Secrets column and the expanded tool detail.

Result (measured on a prod build): github-api 10.3 MB → 1.40 MB; posthog 1.50 MB; all 20 pages now < 2 MB. Verified in a browser: click "Show details" lazy-loads + renders the parameters table; deep-links auto-expand + scroll; sidebar search still filters.

2. Cloudflare email obfuscation (~16 URLs)

Cloudflare rewrites email-like text into /cdn-cgi/l/email-protection (a 404 for crawlers). New neutralize-emails.tsx inserts a zero-width <wbr> before each @ (invisible, copy-safe) in toolkit summaries + tool descriptions, so there's no contiguous match. 5 guide example emails reworded to placeholders. (Cloudflare ignores the <script> RSC payload and JSON API responses, so the lazy-loaded detail is safe automatically.)

3. Canonical orphans (2)

generateMetadata now canonicalizes to the toolkit's own category + slug (new getToolkitCanonicalPath), so the wrong-category alias development/pagerduty-api (a docsLink/category data mismatch) points at the linked customer-support/pagerduty-api instead of self. Hidden toolkits (notion) emit robots: noindex.

Tests

tests/page-size.test.ts — every toolkit's summary stays under budget and keeps no heavy fields.
tests/neutralize-emails.test.tsx — the <wbr> neutralizer leaves no contiguous email in rendered output.
Extended tests/integration-index-links.test.ts — no non-hidden toolkit canonicalizes to an orphan.

…s errors (MARTECH-17) Clears the three remaining docs.arcade.dev Ahrefs Site Audit error classes. Page size > 2 MB (20 pages): toolkit reference pages serialized the entire ToolkitData into the initial payload and server-rendered every tool. Now the server HTML carries only a crawlable summary (Available Tools table + names-only sidebar); per-tool detail (parameters/output/codeExample) is stripped via toToolkitSummary and lazy-fetched on expand from the existing /api/toolkit-data/[toolkitId] route. Tool sections + the scope picker render client-side after mount. github-api 10.3 MB -> 1.40 MB; all 20 pages now < 2 MB. Cloudflare email obfuscation (~16 URLs): a <wbr>-based neutralizer keeps email/connection-string patterns out of contiguous server-HTML text (so Cloudflare cannot rewrite them into /cdn-cgi/l/email-protection 404s), applied to toolkit summaries + tool descriptions. 5 guide example emails reworded. Canonical orphans (2 URLs): toolkit pages canonicalize to their own category + slug (so a wrong-category alias like development/pagerduty-api points at the linked customer-support page), and hidden toolkits (notion) emit robots noindex. Tests: page-size budget + email-neutralizer unit tests; integration-index guard extended to assert no toolkit canonicalizes to an orphan. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

vercel · 2026-06-16T20:31:37Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
docs	Ready	Preview, Comment	Jun 18, 2026 12:34am

- Cache the full ToolDefinition the /api/toolkit-data response already returns instead of Pick-ing a ToolDetail subset and re-merging it with the summary on expand. Drops the ToolDetail type, the per-field map, and the spread merge; ToolSection just uses the fetched tool. - Drop splitEmails() on the per-tool description: that section renders client-only (gated by sectionsMounted + expanded), so it's never in the server HTML Cloudflare scans. The SSR Available Tools table still neutralizes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

sdserranog · 2026-06-17T13:40:02Z

-      />
-      <ToolMetadataSection metadata={tool.metadata} />
-      <ToolDescriptionSection showDescription={showDescription} tool={tool} />
-      <ToolParametersSection showParameters={showParameters} tool={tool} />


Removing parameters/output here triggers ScopePicker's fallback branch for every toolkit. The copied "Selected tools" JSON loses parameters/output and also changes the name from the qualified name to the short name, which breaks downstream tool configs that expect the qualified name.

Suggestion: have "Copy selected" call loadToolkitDetail(toolkitId) and rebuild the full JSON on demand. If the detail isn't loaded yet, fall back, but emit the name: t.qualifiedName ?? t.name so the identifier stays correct.

good catch — fixed in 2d49550. copy selected now lazy-fetches the full per-tool detail and keeps the qualified name, instead of dropping to the short-name fallback.

sdserranog · 2026-06-17T13:46:04Z

-    return <ToolkitPage data={data} />;
+    // Pass a summary (per-tool detail stripped) so the heavy fields never enter
+    // the initial Flight payload — detail is fetched on expand. See MARTECH-17.
+    return <ToolkitPage data={toToolkitSummary(data)} />;


nit: the agent markdown view is generated at the edge from the rendered HTML (no in-repo markdown route, copy-page-override.tsx just refetches with Accept: text/markdown). Stripping tool detail from SSR and making sections client-only means that markdown now loses parameters, output, and examples for toolkit pages. Classic SEO and llms.txt are unaffected.

Suggestion: add a server markdown route for these pages that builds markdown directly from the full ToolkitData, independent of the slimmed HTML.

addressed in 85e7153: added a data-derived markdown route (content-negotiated on /api/toolkit-data) and pointed copy-page-override at it for toolkit pages, so the agent/markdown view keeps params/output/examples. heads up: hitting the page URL directly with Accept: text/markdown still goes through the edge HTML to markdown path, covering that too would need a middleware rewrite.

…namic, idle) Review feedback on PR #1023 (sdserranog): - ScopePicker "Copy tools JSON" lazily fetches full per-tool detail so the copied JSON keeps parameters/output and always uses the qualified tool name. Previously, dropping detail from the summary forced the basic fallback, which emitted the short name (breaking downstream tool configs). Unifies the copy buttons via an optional async getText. - Tool sections + scope picker render via next/dynamic({ ssr: false }) instead of a manual sectionsMounted flag. - useToolDetail starts "idle" rather than reporting "loading" while collapsed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Toolkit pages render per-tool detail client-only, so the edge HTML→markdown "copy page" / agent view lost parameters, output and examples. Build the markdown straight from ToolkitData instead: - toToolkitMarkdown() serializes the full toolkit (tools, parameter tables, output, scopes/secrets, example input). - /api/toolkit-data/[toolkitId] content-negotiates: markdown for Accept: text/markdown, JSON otherwise. - copy-page-override fetches that route for toolkit pages, falling back to the normal page fetch for static integration pages and other routes. Known limit: external agents fetching the page URL directly with Accept: text/markdown still hit the Vercel edge HTML→markdown path; making that data-derived too would need a middleware rewrite, which can't distinguish dynamic toolkit pages from static integration pages without fs access. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

sdserranog

This is great! 🚢

sdserranog · 2026-06-17T23:42:15Z

+    // client-only, so the "copy as markdown" / agent view builds full markdown
+    // straight from the data here instead of from the slimmed HTML.
+    if ((request.headers.get("accept") ?? "").includes("text/markdown")) {
+      return new NextResponse(toToolkitMarkdown(data), {


nit: the same URL returns JSON or markdown depending on Accept, but the response is cached public with no Vary: Accept, so a CDN can serve the wrong representation to a later request.
Fix: add Vary: Accept to the response headers on this route.

good catch — that would bite us behind a CDN. added Vary: Accept to the route's cache headers (covers both the JSON and markdown responses) in 6764530.

The route content-negotiates JSON vs. text/markdown on Accept but responds with Cache-Control: public — without Vary: Accept a shared cache/CDN could serve one representation to a request that asked for the other. Add it to the shared cache headers so both responses carry it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

vercel Bot deployed to Preview June 16, 2026 20:34 View deployment

vercel Bot deployed to Preview June 16, 2026 20:56 View deployment

sdserranog reviewed Jun 17, 2026

View reviewed changes

Comment thread app/_components/toolkit-docs/components/toolkit-page.tsx Outdated

sdserranog reviewed Jun 17, 2026

View reviewed changes

Comment thread app/_components/toolkit-docs/components/use-toolkit-detail.ts Outdated

sdserranog reviewed Jun 17, 2026

View reviewed changes

Juan Ibarlucea and others added 2 commits June 17, 2026 15:15

vercel Bot deployed to Preview June 17, 2026 18:19 View deployment

juan-arcadedev requested a review from sdserranog June 17, 2026 18:45

sdserranog approved these changes Jun 17, 2026

View reviewed changes

sdserranog reviewed Jun 17, 2026

View reviewed changes

vercel Bot deployed to Preview June 18, 2026 00:34 View deployment

juan-arcadedev merged commit ec12c60 into main Jun 18, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(seo): clear remaining docs Ahrefs errors — page size, email obfuscation, canonical orphans (MARTECH-17)#1023

fix(seo): clear remaining docs Ahrefs errors — page size, email obfuscation, canonical orphans (MARTECH-17)#1023
juan-arcadedev merged 5 commits into
mainfrom
juan/martech-17-clear-the-remaining-docsarcadedev-ahrefs-errors-page-size

juan-arcadedev commented Jun 16, 2026 •

edited

Loading

Uh oh!

vercel Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

sdserranog Jun 17, 2026

Uh oh!

juan-arcadedev Jun 17, 2026

Uh oh!

Uh oh!

Uh oh!

sdserranog Jun 17, 2026

Uh oh!

juan-arcadedev Jun 17, 2026 •

edited

Loading

Uh oh!

sdserranog left a comment

Uh oh!

sdserranog Jun 17, 2026

Uh oh!

juan-arcadedev Jun 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

juan-arcadedev commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Page size > 2 MB (20 pages)

2. Cloudflare email obfuscation (~16 URLs)

3. Canonical orphans (2)

Tests

Uh oh!

vercel Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sdserranog Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

juan-arcadedev Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sdserranog Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

juan-arcadedev Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sdserranog left a comment

Choose a reason for hiding this comment

Uh oh!

sdserranog Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

juan-arcadedev Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

juan-arcadedev commented Jun 16, 2026 •

edited

Loading

vercel Bot commented Jun 16, 2026 •

edited

Loading

juan-arcadedev Jun 17, 2026 •

edited

Loading