Skip to content

fix(seo): clear remaining docs Ahrefs errors — page size, email obfuscation, canonical orphans (MARTECH-17)#1023

Merged
juan-arcadedev merged 5 commits into
mainfrom
juan/martech-17-clear-the-remaining-docsarcadedev-ahrefs-errors-page-size
Jun 18, 2026
Merged

fix(seo): clear remaining docs Ahrefs errors — page size, email obfuscation, canonical orphans (MARTECH-17)#1023
juan-arcadedev merged 5 commits into
mainfrom
juan/martech-17-clear-the-remaining-docsarcadedev-ahrefs-errors-page-size

Conversation

@juan-arcadedev

@juan-arcadedev juan-arcadedev commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Clears the three remaining docs.arcade.dev Ahrefs Site Audit error classes (the 36 error URLs at health 95), per MARTECH-17. Decisions this round: keep Cloudflare Email Obfuscation on + fix in-repo; shrink pages via lazy-loaded detail with no IA/URL change (no pagination).

1. Page size > 2 MB (20 pages)

Toolkit reference pages serialized the entire ToolkitData into the initial payload and server-rendered every tool. Now:

  • toToolkitSummary() strips the heavy per-tool fields (parameters/output/codeExample) from the client ToolkitPage payload.
  • Per-tool detail is lazy-fetched on expand from the existing /api/toolkit-data/[toolkitId] route (new useToolDetail hook, one cached fetch per page).
  • The per-tool sections and the scope picker render client-side after mount, so the server HTML carries only the crawlable summary (Available Tools table + names-only sidebar). The per-link secret icon (793 inline SVGs ≈ 404 KB on github-api) is dropped from the sidebar — secret info stays in the table's Secrets column and the expanded tool detail.

Result (measured on a prod build): github-api 10.3 MB → 1.40 MB; posthog 1.50 MB; all 20 pages now < 2 MB. Verified in a browser: click "Show details" lazy-loads + renders the parameters table; deep-links auto-expand + scroll; sidebar search still filters.

2. Cloudflare email obfuscation (~16 URLs)

Cloudflare rewrites email-like text into /cdn-cgi/l/email-protection (a 404 for crawlers). New neutralize-emails.tsx inserts a zero-width <wbr> before each @ (invisible, copy-safe) in toolkit summaries + tool descriptions, so there's no contiguous match. 5 guide example emails reworded to placeholders. (Cloudflare ignores the <script> RSC payload and JSON API responses, so the lazy-loaded detail is safe automatically.)

3. Canonical orphans (2)

generateMetadata now canonicalizes to the toolkit's own category + slug (new getToolkitCanonicalPath), so the wrong-category alias development/pagerduty-api (a docsLink/category data mismatch) points at the linked customer-support/pagerduty-api instead of self. Hidden toolkits (notion) emit robots: noindex.

Tests

  • tests/page-size.test.ts — every toolkit's summary stays under budget and keeps no heavy fields.
  • tests/neutralize-emails.test.tsx — the <wbr> neutralizer leaves no contiguous email in rendered output.
  • Extended tests/integration-index-links.test.ts — no non-hidden toolkit canonicalizes to an orphan.

…s errors (MARTECH-17)

Clears the three remaining docs.arcade.dev Ahrefs Site Audit error classes.

Page size > 2 MB (20 pages): toolkit reference pages serialized the entire
ToolkitData into the initial payload and server-rendered every tool. Now the
server HTML carries only a crawlable summary (Available Tools table + names-only
sidebar); per-tool detail (parameters/output/codeExample) is stripped via
toToolkitSummary and lazy-fetched on expand from the existing
/api/toolkit-data/[toolkitId] route. Tool sections + the scope picker render
client-side after mount. github-api 10.3 MB -> 1.40 MB; all 20 pages now < 2 MB.

Cloudflare email obfuscation (~16 URLs): a <wbr>-based neutralizer keeps
email/connection-string patterns out of contiguous server-HTML text (so
Cloudflare cannot rewrite them into /cdn-cgi/l/email-protection 404s), applied to
toolkit summaries + tool descriptions. 5 guide example emails reworded.

Canonical orphans (2 URLs): toolkit pages canonicalize to their own
category + slug (so a wrong-category alias like development/pagerduty-api points
at the linked customer-support page), and hidden toolkits (notion) emit robots
noindex.

Tests: page-size budget + email-neutralizer unit tests; integration-index guard
extended to assert no toolkit canonicalizes to an orphan.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@vercel

vercel Bot commented Jun 16, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docs Ready Ready Preview, Comment Jun 18, 2026 12:34am

Request Review

- Cache the full ToolDefinition the /api/toolkit-data response already returns
  instead of Pick-ing a ToolDetail subset and re-merging it with the summary on
  expand. Drops the ToolDetail type, the per-field map, and the spread merge;
  ToolSection just uses the fetched tool.
- Drop splitEmails() on the per-tool description: that section renders
  client-only (gated by sectionsMounted + expanded), so it's never in the server
  HTML Cloudflare scans. The SSR Available Tools table still neutralizes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
/>
<ToolMetadataSection metadata={tool.metadata} />
<ToolDescriptionSection showDescription={showDescription} tool={tool} />
<ToolParametersSection showParameters={showParameters} tool={tool} />

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing parameters/output here triggers ScopePicker's fallback branch for every toolkit. The copied "Selected tools" JSON loses parameters/output and also changes the name from the qualified name to the short name, which breaks downstream tool configs that expect the qualified name.

Suggestion: have "Copy selected" call loadToolkitDetail(toolkitId) and rebuild the full JSON on demand. If the detail isn't loaded yet, fall back, but emit the name: t.qualifiedName ?? t.name so the identifier stays correct.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch — fixed in 2d49550. copy selected now lazy-fetches the full per-tool detail and keeps the qualified name, instead of dropping to the short-name fallback.

Comment thread app/_components/toolkit-docs/components/toolkit-page.tsx Outdated
Comment thread app/_components/toolkit-docs/components/use-toolkit-detail.ts Outdated
return <ToolkitPage data={data} />;
// Pass a summary (per-tool detail stripped) so the heavy fields never enter
// the initial Flight payload — detail is fetched on expand. See MARTECH-17.
return <ToolkitPage data={toToolkitSummary(data)} />;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the agent markdown view is generated at the edge from the rendered HTML (no in-repo markdown route, copy-page-override.tsx just refetches with Accept: text/markdown). Stripping tool detail from SSR and making sections client-only means that markdown now loses parameters, output, and examples for toolkit pages. Classic SEO and llms.txt are unaffected.

Suggestion: add a server markdown route for these pages that builds markdown directly from the full ToolkitData, independent of the slimmed HTML.

@juan-arcadedev juan-arcadedev Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in 85e7153: added a data-derived markdown route (content-negotiated on /api/toolkit-data) and pointed copy-page-override at it for toolkit pages, so the agent/markdown view keeps params/output/examples. heads up: hitting the page URL directly with Accept: text/markdown still goes through the edge HTML to markdown path, covering that too would need a middleware rewrite.

Juan Ibarlucea and others added 2 commits June 17, 2026 15:15
…namic, idle)

Review feedback on PR #1023 (sdserranog):
- ScopePicker "Copy tools JSON" lazily fetches full per-tool detail so the
  copied JSON keeps parameters/output and always uses the qualified tool name.
  Previously, dropping detail from the summary forced the basic fallback, which
  emitted the short name (breaking downstream tool configs). Unifies the copy
  buttons via an optional async getText.
- Tool sections + scope picker render via next/dynamic({ ssr: false }) instead
  of a manual sectionsMounted flag.
- useToolDetail starts "idle" rather than reporting "loading" while collapsed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Toolkit pages render per-tool detail client-only, so the edge HTML→markdown
"copy page" / agent view lost parameters, output and examples. Build the
markdown straight from ToolkitData instead:
- toToolkitMarkdown() serializes the full toolkit (tools, parameter tables,
  output, scopes/secrets, example input).
- /api/toolkit-data/[toolkitId] content-negotiates: markdown for
  Accept: text/markdown, JSON otherwise.
- copy-page-override fetches that route for toolkit pages, falling back to the
  normal page fetch for static integration pages and other routes.

Known limit: external agents fetching the page URL directly with
Accept: text/markdown still hit the Vercel edge HTML→markdown path; making that
data-derived too would need a middleware rewrite, which can't distinguish
dynamic toolkit pages from static integration pages without fs access.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@sdserranog sdserranog left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! 🚢

// client-only, so the "copy as markdown" / agent view builds full markdown
// straight from the data here instead of from the slimmed HTML.
if ((request.headers.get("accept") ?? "").includes("text/markdown")) {
return new NextResponse(toToolkitMarkdown(data), {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the same URL returns JSON or markdown depending on Accept, but the response is cached public with no Vary: Accept, so a CDN can serve the wrong representation to a later request.
Fix: add Vary: Accept to the response headers on this route.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch — that would bite us behind a CDN. added Vary: Accept to the route's cache headers (covers both the JSON and markdown responses) in 6764530.

The route content-negotiates JSON vs. text/markdown on Accept but responds with
Cache-Control: public — without Vary: Accept a shared cache/CDN could serve one
representation to a request that asked for the other. Add it to the shared
cache headers so both responses carry it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@juan-arcadedev juan-arcadedev merged commit ec12c60 into main Jun 18, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants