ci: Add PR previews via Codespaces#5264
Conversation
Recovers the codespaces preview work from the pr-codespace branch and updates it to the current repo: mise-pinned node 24 / pnpm 11, the current realm layout (base/catalog/skills/openrouter), corrected prerender-manager (4222) and worker-manager (4210) ports, and the RESOLVED_OPENROUTER_REALM_URL host build var. Postgres now uses the repo's own infra:ensure-pg (boxel-pg) via docker-in-docker rather than a separate compose service. The realm server is launched by hand to run plain HTTP (TLS terminates at the GitHub port-forwarding edge), bypassing the mandatory local dev cert, and with REALM_SERVER_SKIP_BOOT_INDEX so readiness doesn't block on a from-scratch index that would need a prerender host the Codespace doesn't run. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
workflow_dispatch can only be triggered for workflows on the default branch, so the preview couldn't be exercised before merge. Switch to a push trigger on the preview branch: start-services.sh records the Codespace name + forwarding domain in .devcontainer/codespace-target.env and commits/pushes it, and the workflow reads that file to derive the forwarded backend URLs. The Codespace's startup push triggers the first build and later code pushes rebuild against the same backend. A guard skips the build when no target file is present, and workflow_dispatch remains as a manual fallback (now needing only the Codespace name). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Preview deploymentsHost Test Results 1 files ±0 1 suites ±0 1h 44m 12s ⏱️ + 3m 3s Results for commit d379c80. ± Comparison against earlier commit f958758. Realm Server Test Results 1 files ±0 1 suites ±0 12m 38s ⏱️ +6s Results for commit d379c80. ± Comparison against earlier commit f958758. |
The floating javascript-node:24 tag now resolves to Debian trixie, where the docker-in-docker feature's default Moby packages are unavailable and container creation fails before any setup runs. Pin the base image to bookworm and set the feature's moby option to false (installs Docker CE from Docker's official repo) per the feature's own error guidance. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
start-services.sh blocks on the realm server to keep its child services alive, but as a postStartCommand that never returns it wedges the Codespaces agent — SSH never comes up and the Codespace sits unusable in "Available". Launch it detached via nohup from postStartCommand (logging to /tmp/start-services.log) so the lifecycle hook returns immediately while the services keep running. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
postCreate runs as the 'node' user but the Dockerfile's mise install ran as root, leaving the binary under /root where node can't see it (exit 127). Install mise to /usr/local/bin (on PATH for every user) and call it unqualified from the setup/start scripts. Also grant the Codespace read access to cardstack/boxel-catalog and cardstack/boxel-skills so setup.sh's catalog:setup / skills:setup clones authenticate; a Codespace token only reaches its own repo by default. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codespaces Preview
|
The catalog/skills :setup scripts attempt an SSH clone first, which hangs on an interactive host-key prompt in the non-interactive postCreate context (a Codespace has an HTTPS token credential helper but no SSH key). Rewrite git@github.com: URLs to https:// before the clones so they authenticate with the token instead of blocking. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The hand-rolled realm-server launch omitted GRAFANA_SECRET, which main.ts treats as required (process.exit(-1)), so the server died at startup and readiness never passed. Set it (and LOW_CREDIT_THRESHOLD) to match the canonical mise-tasks/services/realm-server invocation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codespaces Preview
|
moby:false routed the feature through Docker CE, whose buildx v0.35.0 artifact download 404s and fails the build. It was only added to dodge the Debian trixie issue, which the bookworm pin already fixes — so the default Moby path works now. Drop moby:false and disable the buildx install (the Codespace only runs containers, never builds images). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codespaces Preview
|
Codespaces Preview
|
…s render The realm server hard-requires a reachable host: main.ts fetches HOST_URL at startup and process.exit(-2)s if it can't, and the prerenderer renders cards against it. The Codespace ran no host, so the realm server failed and prerendering couldn't launch Chrome. Run the host (vite) on http://localhost:4200 with the public Codespace realm URLs, install the Chromium shared libraries puppeteer's bundled Chrome needs, set HOST_URL so the realm distURL and prerender target the local http host, and start the host first — waiting for it before the prerender and realm server. Bump the machine to 16 GB for the added load. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A plain backgrounded postStartCommand (nohup &) was reaped after the hook returned — the services never ran and the log stayed empty. Run it under setsid in its own session (stdin from /dev/null) so a process-group cleanup of the lifecycle hook doesn't take it down. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codespaces Preview
|
Codespaces Preview
|
The prerender pipeline is HTTPS-only: it probes https://localhost:4200 and passes --ignore-certificate-errors. Running the vite host over plain HTTP produced ERR_SSL_PROTOCOL_ERROR, and the http overrides for HOST_URL / REALM_BASE_URL never reached the mise services anyway (mise re-sources env-vars.sh without the ambient exports, so they kept the https defaults). Generate a self-signed cert at the path env-vars.sh probes, so vite, the realm server and the prerender all speak HTTPS consistently; trust it in Node via NODE_EXTRA_CA_CERTS. Drop the http overrides and probe readiness with https + -k. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codespaces Preview
|
vite dev's multi-minute cold-start compile on the constrained Codespace blew the readiness timeouts: /_standby navigations timed out, so the prerender manager never registered a worker, so the worker manager never started, so the realm server's hardcoded 30s waitForWorkerManager failed and it stopped. Build the host once into a static dist in setup.sh (public Codespace URLs baked in) and serve it with vite preview, which starts in seconds. Reorder start-services so the prerender precedes the worker and the realm server launches only after the worker manager reports ready, so its 30s internal wait succeeds immediately. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codespaces Preview
|
1 similar comment
Codespaces Preview
|
Successfully-built containers had no SSH server, so `gh codespace ssh` only ever connected to failed/recovery containers. Add the sshd devcontainer feature so the running backend can be reached and driven over SSH. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codespaces Preview
|
The local host build was costly and fragile (dev build's /_standby loaded too slowly for the prerender's 30s probe; the production build errored and serve-dist 404'd). The codespaces-preview workflow already builds a production host with this Codespace's URLs and deploys it to S3/CloudFront, which loads /_standby in ~0.15s. Drop the local build from setup.sh. start-services.sh now records + pushes the target file up front (triggering the CI rebuild), then points the realm server's distURL (HOST_URL) and the prerenderer (BOXEL_HOST_URL) at the S3 host, waiting for it to be live before starting them. Realm boots against the already-deployed S3 host; the fresh rebuild for this Codespace lands in the background. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codespaces Preview
|
serverURL defaults to https://localhost:4201, and the realm injects it into the host page as realmServerURL, so a browser visiting the realm is sent to localhost. Pass --serverURL=<public Codespace URL> so the served host app talks to the forwarded realm instead. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ropped A fresh Codespace's target-file push was rejected whenever origin had advanced, so the S3 host stayed built for an earlier Codespace (dead Matrix/icons URLs). Rebase onto origin before writing + committing the target file so the push fast-forwards and the host rebuilds for the live Codespace. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codespaces Preview
|
The realm server already rewrites the host's Ember config at serve time (serve-index.ts: matrixURL, matrixServerName, realmServerURL, resolved realm URLs), so the preview's entry point should be the realm URL — it injects this Codespace's endpoints, no per-Codespace host rebuild needed. Two fixes so a reviewer can actually log in: - serve-index injects matrixClient's backend URL (localhost in a Codespace, unreachable from a browser). Add a RESOLVED_MATRIX_URL env override for the browser-facing Matrix URL, mirroring how serverURL is already separate from the local bind. No-op when unset (prod/staging). - start-services.sh sets RESOLVED_MATRIX_URL + MATRIX_SERVER_NAME=localhost on the realm, and seeds the dev user/password on this Codespace's Synapse so the preview is loginnable. The banner now points reviewers at the realm URL with user/password. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codespaces Preview
|
…rebuild) The realm server injects this Codespace's endpoints into the host config at serve time, so the host build no longer needs Codespace-specific URLs baked in. Build it generically once per code change and deploy to S3; the realm fetches that bundle as distURL and serves it config-injected. Removes the target-file push, the rebase-before-push divergence handling, and the codespace-target.env file — the whole per-Codespace rebuild dance that kept the S3 host pinned to a stale/dead Codespace. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Visiting the forwarded realm URL bounced to https://localhost:4201: GitHub's port forwarding terminates TLS at its edge and forwards plain HTTP to the backend, but the realm served HTTPS and 308-redirected the plain-HTTP request to its own https bind address. Drop the self-signed cert (the prerender uses the public S3 host now, so nothing local needs TLS): no cert -> realm serves plain HTTP -> no redirect, GitHub's edge provides the public HTTPS. Force REALM_BASE_URL + Matrix/icons to http so the worker/prerender dial the plain-HTTP realm, and probe readiness over http. serverURL/toUrls stay the public URLs, so the realm still injects public addresses into the served host page. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Codespace preview serves the host app from the realm server's own origin, but rewrote asset URLs to the S3 preview bucket. ES-module scripts are fetched in CORS mode, and the preview CloudFront sends no Access-Control-Allow-Origin, so every JS/CSS load was blocked. Point the host app's asset URLs at the realm origin (ASSETS_URL_OVERRIDE) so they're same-origin, and have the realm proxy /assets, /@embroider and the favicons through to the actual bundle (HOST_URL). Adds proxyAssetPaths and a hostDistURL upstream that defaults to assetsURL, so normal deployments are unaffected (there the HTML points assets straight at the host CDN and these paths are never requested from the realm). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ernally The reviewer's Matrix login 401'd because port 8008 was private: a private GitHub-forwarded port auth-gates cross-origin XHR at the edge (302/401). The realm (4201), icons (4206) and Matrix (8008) are cross-origin to each other, so all three must be public. This can't be done from inside the Codespace — the ambient GITHUB_TOKEN lacks the `codespace` scope, so `gh codespace ports visibility` exits 4. (The earlier inline call never actually worked.) Make the attempt best-effort and, on failure, print clear instructions to set the ports public from the VS Code Ports panel (or via gh from outside); visibility persists per-codespace, so it's a one-time step. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
/_server-session 500'd with "Unable to login to matrix as user realm_server: 403 M_FORBIDDEN". The realm server authenticates to Synapse as realm_server (password derived from REALM_SECRET_SEED) to mint session tokens, but start-services.sh only registered the `user` test account, so realm_server never existed. Run register-realm-users with the matching REALM_SECRET_SEED before launching the realm so realm_server and the per-realm users exist with the seed-derived password. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
All card/module requests 404'd while login worked. Realm content is matched by full URL via fullRequestURL/virtualNetwork, which derives the scheme from x-forwarded-proto. GitHub Codespaces port forwarding terminates TLS at the edge and forwards plain HTTP with no x-forwarded-proto, so the realm built http:// URLs that matched none of the realms registered under their https identities → 404. Path-based control endpoints (login, _server-session) were unaffected, which is why login worked but cards didn't. Add REALM_SERVER_ASSUME_HTTPS (set in the Codespace) to assert the external scheme via a front middleware. Off by default; production load balancers set x-forwarded-proto themselves. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A re-run of start-services.sh would hit EADDRINUSE on 4201 because the previous realm was still bound, exit, and leave the old (pre-change) realm serving — silently masking the change under test. Kill the node service chain at the top before starting; Docker Postgres/Synapse are re-asserted idempotently. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The https-assert alone wasn't enough: GitHub's tunnel forwards to the backend with Host: localhost:<port>, so the realm built https://localhost/… URLs that still matched no realm. Pin the Host to the realm's own serverURL alongside x-forwarded-proto so realm-content URL resolution matches the https://<codespace> identities. Still gated by REALM_SERVER_ASSUME_HTTPS (off in production). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bootstrap realms (catalog, skills, openrouter) 403'd: migrations seed realm_user_permissions with hardcoded https://localhost:4201/… URLs, but the realm is served under the codespace's forwarded URL, so the *:read rows matched no realm. Env mode already solves this by rewriting those rows to its Traefik hostname (fixupEnvironmentModePermissions). Generalize that to honor an explicit REALM_SERVER_PERMISSIONS_BASE_URL override and set it (to the forwarded realm URL) in the Codespace, so both hosting modes share one rewrite path. No-op in production (neither var set). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
With REALM_SERVER_SKIP_BOOT_INDEX the bootstrap realms mounted but never indexed, so card instances (the AI system card, skills, catalog cards) 404'd from getCard even though their source files exist. Drop the skip so base/catalog/skills/openrouter full-index on boot. Also fix the readiness probe: _readiness-check is per-realm, so probe /base/_readiness-check (the server root 404s); ASSUME_HTTPS rewrites the localhost probe to the realm URL. Longer timeout covers the from-scratch index. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ed URL Indexing failed on every realm because the worker/prerender mise tasks addressed realms at https://localhost:4201 (the standard-mode default) while the realm is mounted at the codespace's forwarded URL and serves plain HTTP — so base index hit a TLS error and the rest got the host SPA's HTML instead of JSON. mise's _.source re-evaluation ignores an ambient REALM_BASE_URL override, but CODESPACE_NAME does reach it, so set REALM_BASE_URL from it here — mirroring the env-mode branch's single consistent realm URL, with the GitHub edge in place of Traefik. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The teardown grepped `prerender-manager` (matches nothing — the process is `prerender/manager-server`) and the npm-script wrappers rather than the ts-node entry points, so stale prerender managers/workers survived a re-run. A leftover manager then held :4222 and the new one EADDRINUSE'd and died, so the worker's render dispatch failed and indexing stalled. Match the actual entry points (transpileOnly main/worker, manager-server, prerender-server) plus the wrappers. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
So a reviewer's browser only talks to the one forwarded realm port instead of separate cross-origin Matrix (8008) and icons (4206) ports — which would each need to be made public. Adds proxyRequest (a method+body+streaming reverse proxy), routes /_matrix, /.well-known/matrix and /@cardstack/boxel-icons to their localhost backends, and points the host's matrixURL/iconsURL at the realm origin. Gated by REALM_SERVER_PROXY_MATRIX_ICONS; dormant in normal deployments. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The worker and prerender address the realm at its forwarded https URL; they were looping back out through the GitHub edge, which needs the port public. Add a local TLS shim (env-mode's Traefik, rebuilt): /etc/hosts maps the forwarded realm hostname to 127.0.0.1 and a :443 proxy forwards to the plain-HTTP realm on :4201, so the worker/prerender reach it over loopback — no edge, no public port. Node (NODE_TLS_REJECT_UNAUTHORIZED) and Chrome (PUPPETEER_CHROME_ARGS) skip validation of the loopback self-signed cert. The browser is external and still uses the edge. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…port" This reverts commit d403e8c.
postStartCommand fires on every container start, racing docker-in-docker. start-services.sh runs `set -euo pipefail`, so the first Docker command (Postgres/Synapse) aborted the whole start when the daemon wasn't up yet — leaving a fresh Codespace with no services (host briefly up, Synapse down, no indexing). Wait for `docker info` before touching Docker. Also harden the launch (nohup) and log to a suspend-surviving path for diagnosis. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
No description provided.