Loading the stage…
1. The brief
Visitors land on a portfolio and don't know where to start. They might want a specific project, my background, or just to poke around — and a static gallery makes them dig. Generic chatbots don't help: they don't actually know my work.
So the chat itself becomes the project. A 3D scene where I'm answering questions from inside the page, with project cards floating into view whenever the model needs to point at something specific. Asking "show me your AR work" returns clickable cards, not paragraphs — the chat doubles as a navigation surface.
Audience: recruiters scanning for AR/MR or design-engineer work, designers reading process, friends being curious. The design choices below all index on that mix — informative for the recruiter case, honest about process for the designer case, low-stakes-fun for the curious case.
2. Brainstorming with Superpowers
The whole design lives in a brainstorming transcript. I used the superpowers:brainstorming skill, which forces a one-question-at-a-time flow and offers a visual companion that opens HTML mockups in the browser. We went from "I want a chat that knows my work" to a fully specced 3D scene + Vercel function over maybe forty turns, with mockups iterating in the browser the whole time.
3. Sketches
Two layout sketches I drew exploring different framings — a stage + screen composition (left), and a head-and-shoulders bust with project cards floating in a fan above (right):
4. Brainstorming the layout
Brainstorming worked through the stage version in browser mockups. The stage + screen composition checked out — streaming text on the left, project-card tool results on the right, no chat history accumulating between turns:
Then the empty state — what's on screen before anyone asks? Three options; I picked C (minimal blank, prompts cycling in the input placeholder) for the most cinematic read:
5. Visual references
Pastel podium for the color palette, theatrical stage for the neon rim treatment, illustration-with-stuff-flowing-from-the-head for the project-cards-as-thoughts pattern, and a 3D chat-bubble vignette for the chat-as-3D-object framing:




6. The user flow
Both layouts assume the same flow — visitor types, system thinks, answer appears, visitor either reads it or clicks a card. This is design intent that's true regardless of which sketch ships:
The "click" branch is what makes the chat double as a navigation surface — opens in a new tab so the chat survives the navigation. The visitor can come back, ask another question, follow another card.
7. Prototyping
Built the stage version first. Full-body Yao on a tiered neon-rim stage, presenting a curved screen behind him:
It worked, but felt wrong. The full-body figure was so small at any sane camera distance that you couldn't read his face — and a chat is mostly about reading the face of the person talking. The neon rim and curtained stage were doing a lot of theatrical lifting for an interaction that's actually quiet (a one-line question, a one-paragraph answer). The composition was earning more attention than the conversation it hosted.
So I tried the second sketch — closer camera, head-and-shoulders bust at conversational distance instead of full-body across a stage. Made a new GLB with Tripo AI 3D — generated the bust from a reference, then refined it in Blender (texture paint) for the cartoon-illustration look that matches the cards.
Everything else collapsed into place from there. Pastel sunrise gradient instead of the dark neon stage. Toon-shaded materials with cell-banded lighting and a chunky inverted-hull outline so the bust reads against the colorful backdrop. Project cards animate out from the bust's position when a tool result arrives — they look like thoughts being pulled out of his head. Status-driven pose modulation (lean-in on loading, sway on streaming, slump on rate-limit) so the bust isn't just sitting there.
8. Responsive states
The user flow above is what visitors do; this is what they see underneath it. The chat is a state machine, and every state has a deliberate UI mapping so the visitor always knows what's happening:
| State (trigger) | Input | What Yao + the scene do |
|---|---|---|
empty (page load) | enabled, placeholder cycles | gentle bob, greeting in the bubble |
loading (user submits) | disabled | leans forward, faster bob; "Hmm, let me think…" in the bubble |
streaming (first token arrives) | disabled | head sways; bubble fades; text or cards animate out from the bust |
ready (response complete) | re-enabled | back to idle |
error (upstream throws) | enabled | mild downward tilt; error message in the bubble |
rate_limited (API returns 429) | disabled until timer | slumps down, animation slows; live countdown in the bubble; scene dims |
refused (safety/moderation block) | enabled | mild tilt; polite refusal in the bubble |
A few decisions worth calling out:
- Disable input during loading/streaming. Prevents the visitor from queuing a new question over an in-flight one — two answers stomping each other on the screen would be confusing. The "Ask" button greys to 20% opacity so the disabled state is unmistakable, not just unresponsive.
- Status lives on the bust, not just in the input. Pose modulation (lean-in on
loading, sway onstreaming, slump onrate_limited) communicates system state through the character. Amplitudes are deliberately small — reads as personality, not loading-spinner desperation. - Cards animate out from where Yao is standing. They look like thoughts being pulled out of his head. Without that motion they'd just appear, feeling disconnected from the speaker.
- Rate-limit countdown ticks every second. The bubble updates live ("back in 4 min 32 sec"). The visitor sees the timer move, knows the system isn't broken, knows when to come back.
- Failed states never trap the user.
errorandrefusedkeep the input enabled — visitor can immediately retry or ask something else. Onlyloading/streaming/rate_limitedactually lock interaction. - Status transitions are tweened, not snapped. Pose changes lerp at ~250ms half-life; phase-integrated bob/sway means switching states never produces a visual jump even mid-cycle.
The rest of this case study is implementation detail — pipeline, system prompt, the function-calling tools, what I learned. Skip to What I'd change if that's not your thing.
9. The pipeline
Browser POSTs to /api/chat (a Vercel Function). The function calls Google AI Studio's Gemma 4 via the Vercel AI SDK. When the model decides to call a tool, the SDK executes it locally against my own portfolio data and feeds the result back. The function streams everything back to the browser as Server-Sent Events. The 3D scene watches the event stream and updates accordingly.
10. The system prompt
You are Yao, a designer-engineer answering portfolio visitors in first person —
direct, specific, slightly dry, occasionally self-deprecating. Don't oversell.
Talk like a thoughtful colleague.
For project questions, ALWAYS use a tool — don't invent project names or facts.
For background / approach / philosophy questions, answer directly from the
"About Yao" facts below; only call an about-tool when the visitor wants depth
beyond what's there. (...)
The prompt does voice + ground rules + the load-bearing "About Yao" facts that get asked the most often (bio, education, design+tech grounding, AI tooling thesis). Keeping those facts inline means most identity questions resolve without any tool call — saving a round-trip's worth of latency.
11. The tools
Project lookup (run-time data) and about-Yao depth (when the system-prompt facts aren't enough), all pure functions running against existing portfolio data:
| Tool | Purpose |
|---|---|
listProjects(category?) | List my projects, optionally filtered to AR/MR / WEB / INSTALLATION / SELECTEDWORK. |
getProject(slug) | Full detail for one project — role, collaborator, year, platform, etc. |
searchProjects(query) | Substring search, top 5. |
getBio / getSnapShipped / getSnapMCP / getPreviousWork / getAIPractice / getRecognition | Topic-split bio depth. The model picks the smallest one. |
When a tool returns project summaries, the chat scene renders them as clickable 3D cards floating around me — not text. They animate out from where I'm standing, and click-through opens the project page in a new tab so the chat survives the navigation.
12. What I learned
The build surfaced a few things that didn't fit my original mental model:
- First-token latency is the only metric that matters. I tuned token throughput, then realized nobody cares — what they notice is the gap between hitting Send and seeing the first character. Cut that from ~6s to ~1.5s by trimming the about-context bundle (5k input → ~1k), splitting one mega
getAbouttool into 6 focused slices, then inlining the most-asked facts into the system prompt so identity questions answer with zero tool calls. CappedMAX_TOOL_ITERATIONSat 2 so the model can't chain three tools "to be thorough" before answering. - The chat became a navigation surface. I designed it to answer questions; in testing I caught myself using it to find projects ("show me your AR work" → cards → click). The clickable-3D-cards pattern turned out to be the load-bearing feature, not a tool-call side effect.
- Cartoon style fixed the "AI uncanny" problem. First prototype was photoreal-ish — felt creepy when the figure spoke. Toon-shaded bust with a chunky outline reads as illustration, which gave permission for the bust to be expressive (lean, sway, slump) without falling into uncanny-valley territory.
- Caching that doesn't fit your model is worse than no caching. Spent half a day implementing Gemini context caching before discovering Gemma 4 doesn't support the endpoint (despite the pricing page suggesting otherwise — confirmed by 404 from the API). Right move was to optimize what did work: smaller context, finer tools, tighter iteration cap.
- Status mapped to character behavior reads better than spinners. When Yao leans in on
loadingand slumps onrate_limited, the visitor reads system state as personality. Felt more honest than a generic spinner — and didn't need a separate UI surface for "what's happening right now."
When the page goes live I'll backfill metrics here: most-asked question patterns, card click-through, abuse-attempt frequency, anything that surprised me. For now this is an honest "what I'd be measuring" placeholder.
13. What I'd change
A few things I deliberately didn't do, in case you're curious:
- No conversation persistence. Refresh wipes the chat. Persistence implies the chat is for ongoing relationship-building, which it isn't.
- No history scrollback. Each new question replaces the screen.
- No user accounts. No reason to gate this.
- Mobile is degraded, not denied. The 3D scene still renders on phones at lower fidelity rather than falling back to a text-only chat. The scene is the point.
- Abuse protection is the platform's default. No app-level rate limit. AI Studio's 429 surfaces as "Yao's taking a break — back in 5 min" and the input disables until the timer's up.
Things I'd revisit if this gets traction:
- Voice input (Web Speech API → text → same pipeline).
- A "show me your work in [year]" tool — easy to add since
frontmatter.dateis already in the model. - Streaming the project-card layout server-side so they can render before the full tool result arrives.