fix(seo): clear remaining docs Ahrefs errors — page size, email obfuscation, canonical orphans (MARTECH-17)#1023
Conversation
…s errors (MARTECH-17) Clears the three remaining docs.arcade.dev Ahrefs Site Audit error classes. Page size > 2 MB (20 pages): toolkit reference pages serialized the entire ToolkitData into the initial payload and server-rendered every tool. Now the server HTML carries only a crawlable summary (Available Tools table + names-only sidebar); per-tool detail (parameters/output/codeExample) is stripped via toToolkitSummary and lazy-fetched on expand from the existing /api/toolkit-data/[toolkitId] route. Tool sections + the scope picker render client-side after mount. github-api 10.3 MB -> 1.40 MB; all 20 pages now < 2 MB. Cloudflare email obfuscation (~16 URLs): a <wbr>-based neutralizer keeps email/connection-string patterns out of contiguous server-HTML text (so Cloudflare cannot rewrite them into /cdn-cgi/l/email-protection 404s), applied to toolkit summaries + tool descriptions. 5 guide example emails reworded. Canonical orphans (2 URLs): toolkit pages canonicalize to their own category + slug (so a wrong-category alias like development/pagerduty-api points at the linked customer-support page), and hidden toolkits (notion) emit robots noindex. Tests: page-size budget + email-neutralizer unit tests; integration-index guard extended to assert no toolkit canonicalizes to an orphan. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
- Cache the full ToolDefinition the /api/toolkit-data response already returns instead of Pick-ing a ToolDetail subset and re-merging it with the summary on expand. Drops the ToolDetail type, the per-field map, and the spread merge; ToolSection just uses the fetched tool. - Drop splitEmails() on the per-tool description: that section renders client-only (gated by sectionsMounted + expanded), so it's never in the server HTML Cloudflare scans. The SSR Available Tools table still neutralizes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
| /> | ||
| <ToolMetadataSection metadata={tool.metadata} /> | ||
| <ToolDescriptionSection showDescription={showDescription} tool={tool} /> | ||
| <ToolParametersSection showParameters={showParameters} tool={tool} /> |
There was a problem hiding this comment.
Removing parameters/output here triggers ScopePicker's fallback branch for every toolkit. The copied "Selected tools" JSON loses parameters/output and also changes the name from the qualified name to the short name, which breaks downstream tool configs that expect the qualified name.
Suggestion: have "Copy selected" call loadToolkitDetail(toolkitId) and rebuild the full JSON on demand. If the detail isn't loaded yet, fall back, but emit the name: t.qualifiedName ?? t.name so the identifier stays correct.
There was a problem hiding this comment.
good catch — fixed in 2d49550. copy selected now lazy-fetches the full per-tool detail and keeps the qualified name, instead of dropping to the short-name fallback.
| return <ToolkitPage data={data} />; | ||
| // Pass a summary (per-tool detail stripped) so the heavy fields never enter | ||
| // the initial Flight payload — detail is fetched on expand. See MARTECH-17. | ||
| return <ToolkitPage data={toToolkitSummary(data)} />; |
There was a problem hiding this comment.
nit: the agent markdown view is generated at the edge from the rendered HTML (no in-repo markdown route, copy-page-override.tsx just refetches with Accept: text/markdown). Stripping tool detail from SSR and making sections client-only means that markdown now loses parameters, output, and examples for toolkit pages. Classic SEO and llms.txt are unaffected.
Suggestion: add a server markdown route for these pages that builds markdown directly from the full ToolkitData, independent of the slimmed HTML.
There was a problem hiding this comment.
addressed in 85e7153: added a data-derived markdown route (content-negotiated on /api/toolkit-data) and pointed copy-page-override at it for toolkit pages, so the agent/markdown view keeps params/output/examples. heads up: hitting the page URL directly with Accept: text/markdown still goes through the edge HTML to markdown path, covering that too would need a middleware rewrite.
…namic, idle) Review feedback on PR #1023 (sdserranog): - ScopePicker "Copy tools JSON" lazily fetches full per-tool detail so the copied JSON keeps parameters/output and always uses the qualified tool name. Previously, dropping detail from the summary forced the basic fallback, which emitted the short name (breaking downstream tool configs). Unifies the copy buttons via an optional async getText. - Tool sections + scope picker render via next/dynamic({ ssr: false }) instead of a manual sectionsMounted flag. - useToolDetail starts "idle" rather than reporting "loading" while collapsed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Toolkit pages render per-tool detail client-only, so the edge HTML→markdown "copy page" / agent view lost parameters, output and examples. Build the markdown straight from ToolkitData instead: - toToolkitMarkdown() serializes the full toolkit (tools, parameter tables, output, scopes/secrets, example input). - /api/toolkit-data/[toolkitId] content-negotiates: markdown for Accept: text/markdown, JSON otherwise. - copy-page-override fetches that route for toolkit pages, falling back to the normal page fetch for static integration pages and other routes. Known limit: external agents fetching the page URL directly with Accept: text/markdown still hit the Vercel edge HTML→markdown path; making that data-derived too would need a middleware rewrite, which can't distinguish dynamic toolkit pages from static integration pages without fs access. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
| // client-only, so the "copy as markdown" / agent view builds full markdown | ||
| // straight from the data here instead of from the slimmed HTML. | ||
| if ((request.headers.get("accept") ?? "").includes("text/markdown")) { | ||
| return new NextResponse(toToolkitMarkdown(data), { |
There was a problem hiding this comment.
nit: the same URL returns JSON or markdown depending on Accept, but the response is cached public with no Vary: Accept, so a CDN can serve the wrong representation to a later request.
Fix: add Vary: Accept to the response headers on this route.
There was a problem hiding this comment.
good catch — that would bite us behind a CDN. added Vary: Accept to the route's cache headers (covers both the JSON and markdown responses) in 6764530.
The route content-negotiates JSON vs. text/markdown on Accept but responds with Cache-Control: public — without Vary: Accept a shared cache/CDN could serve one representation to a request that asked for the other. Add it to the shared cache headers so both responses carry it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Clears the three remaining docs.arcade.dev Ahrefs Site Audit error classes (the 36 error URLs at health 95), per MARTECH-17. Decisions this round: keep Cloudflare Email Obfuscation on + fix in-repo; shrink pages via lazy-loaded detail with no IA/URL change (no pagination).
1. Page size > 2 MB (20 pages)
Toolkit reference pages serialized the entire
ToolkitDatainto the initial payload and server-rendered every tool. Now:toToolkitSummary()strips the heavy per-tool fields (parameters/output/codeExample) from the clientToolkitPagepayload./api/toolkit-data/[toolkitId]route (newuseToolDetailhook, one cached fetch per page).Result (measured on a prod build): github-api 10.3 MB → 1.40 MB; posthog 1.50 MB; all 20 pages now < 2 MB. Verified in a browser: click "Show details" lazy-loads + renders the parameters table; deep-links auto-expand + scroll; sidebar search still filters.
2. Cloudflare email obfuscation (~16 URLs)
Cloudflare rewrites email-like text into
/cdn-cgi/l/email-protection(a 404 for crawlers). Newneutralize-emails.tsxinserts a zero-width<wbr>before each@(invisible, copy-safe) in toolkit summaries + tool descriptions, so there's no contiguous match. 5 guide example emails reworded to placeholders. (Cloudflare ignores the<script>RSC payload and JSON API responses, so the lazy-loaded detail is safe automatically.)3. Canonical orphans (2)
generateMetadatanow canonicalizes to the toolkit's own category + slug (newgetToolkitCanonicalPath), so the wrong-category aliasdevelopment/pagerduty-api(a docsLink/category data mismatch) points at the linkedcustomer-support/pagerduty-apiinstead of self. Hidden toolkits (notion) emitrobots: noindex.Tests
tests/page-size.test.ts— every toolkit's summary stays under budget and keeps no heavy fields.tests/neutralize-emails.test.tsx— the<wbr>neutralizer leaves no contiguous email in rendered output.tests/integration-index-links.test.ts— no non-hidden toolkit canonicalizes to an orphan.