Portfolio

Loading the stage…

1. The brief

Visitors land on a portfolio and don't know where to start. They might want a specific project, my background, or just to poke around — and a static gallery makes them dig. Generic chatbots don't help: they don't actually know my work.

So the chat itself becomes the project. A 3D scene where I'm answering questions from inside the page, with project cards floating into view whenever the model needs to point at something specific. Asking "show me your AR work" returns clickable cards, not paragraphs — the chat doubles as a navigation surface.

Audience: recruiters scanning for AR/MR or design-engineer work, designers reading process, friends being curious. The design choices below all index on that mix — informative for the recruiter case, honest about process for the designer case, low-stakes-fun for the curious case.

2. Brainstorming with Superpowers

The whole design lives in a brainstorming transcript. I used the superpowers:brainstorming skill, which forces a one-question-at-a-time flow and offers a visual companion that opens HTML mockups in the browser. We went from "I want a chat that knows my work" to a fully specced 3D scene + Vercel function over maybe forty turns, with mockups iterating in the browser the whole time.

3. Sketches

Two layout sketches I drew exploring different framings — a stage + screen composition (left), and a head-and-shoulders bust with project cards floating in a fan above (right):

Two hand-drawn layout sketches side by side. Left: a stick-figure Yao on a stage with an angled screen behind him, a speech bubble to the side, and an input pill at the bottom. Right: a stick-figure bust with three rectangular cards floating around the head, a speech bubble to the side, and an input pill at the bottom.

4. Brainstorming the layout

Brainstorming worked through the stage version in browser mockups. The stage + screen composition checked out — streaming text on the left, project-card tool results on the right, no chat history accumulating between turns:

Stage + screen composition mockup: streaming text response on the left, project-card tool result on the right

Then the empty state — what's on screen before anyone asks? Three options; I picked C (minimal blank, prompts cycling in the input placeholder) for the most cinematic read:

Three empty-state options: A welcome chip row above the input, B prompts rendered as 3D cards on screen, C minimal blank with prompts cycling in the input placeholder

5. Visual references

Pastel podium for the color palette, theatrical stage for the neon rim treatment, illustration-with-stuff-flowing-from-the-head for the project-cards-as-thoughts pattern, and a 3D chat-bubble vignette for the chat-as-3D-object framing:

Pastel podium reference — clean staging vignette with arched circular screen and floating cloud props

Theatrical stage reference — tiered platform with neon rim lighting, speakers, curtained proscenium

Illustration reference — character with creative tools and cards spilling out of the back of his head, color-saturated

6. The user flow

Both layouts assume the same flow — visitor types, system thinks, answer appears, visitor either reads it or clicks a card. This is design intent that's true regardless of which sketch ships:

Idle

Scene at rest

Type

Question in input

Wait

Visible thinking

Answer

Text or cards

Click

Opens new tab

The "click" branch is what makes the chat double as a navigation surface — opens in a new tab so the chat survives the navigation. The visitor can come back, ask another question, follow another card.

7. Prototyping

Built the stage version first. Full-body Yao on a tiered neon-rim stage, presenting a curved screen behind him:

It worked, but felt wrong. The full-body figure was so small at any sane camera distance that you couldn't read his face — and a chat is mostly about reading the face of the person talking. The neon rim and curtained stage were doing a lot of theatrical lifting for an interaction that's actually quiet (a one-line question, a one-paragraph answer). The composition was earning more attention than the conversation it hosted.

So I tried the second sketch — closer camera, head-and-shoulders bust at conversational distance instead of full-body across a stage. Made a new GLB with Tripo AI 3D — generated the bust from a reference, then refined it in Blender (texture paint) for the cartoon-illustration look that matches the cards.

Blender screenshot: texture-paint view of the bust on the left, the rigged 3D viewport on the right showing the head-and-shoulders model in a hoodie and headband — generated with Tripo AI 3D, refined in Blender

Everything else collapsed into place from there. Pastel sunrise gradient instead of the dark neon stage. Toon-shaded materials with cell-banded lighting and a chunky inverted-hull outline so the bust reads against the colorful backdrop. Project cards animate out from the bust's position when a tool result arrives — they look like thoughts being pulled out of his head. Status-driven pose modulation (lean-in on loading, sway on streaming, slump on rate-limit) so the bust isn't just sitting there.

Final design: head-and-shoulders bust against a pastel sunrise gradient, with a glassmorphic input pill at the bottom and a speech bubble next to the head

8. Responsive states

The user flow above is what visitors do; this is what they see underneath it. The chat is a state machine, and every state has a deliberate UI mapping so the visitor always knows what's happening:

State (trigger)	Input	What Yao + the scene do
`empty` (page load)	enabled, placeholder cycles	gentle bob, greeting in the bubble
`loading` (user submits)	disabled	leans forward, faster bob; "Hmm, let me think…" in the bubble
`streaming` (first token arrives)	disabled	head sways; bubble fades; text or cards animate out from the bust
`ready` (response complete)	re-enabled	back to idle
`error` (upstream throws)	enabled	mild downward tilt; error message in the bubble
`rate_limited` (API returns 429)	disabled until timer	slumps down, animation slows; live countdown in the bubble; scene dims
`refused` (safety/moderation block)	enabled	mild tilt; polite refusal in the bubble

A few decisions worth calling out:

Disable input during loading/streaming. Prevents the visitor from queuing a new question over an in-flight one — two answers stomping each other on the screen would be confusing. The "Ask" button greys to 20% opacity so the disabled state is unmistakable, not just unresponsive.
Status lives on the bust, not just in the input. Pose modulation (lean-in on loading, sway on streaming, slump on rate_limited) communicates system state through the character. Amplitudes are deliberately small — reads as personality, not loading-spinner desperation.
Cards animate out from where Yao is standing. They look like thoughts being pulled out of his head. Without that motion they'd just appear, feeling disconnected from the speaker.
Rate-limit countdown ticks every second. The bubble updates live ("back in 4 min 32 sec"). The visitor sees the timer move, knows the system isn't broken, knows when to come back.
Failed states never trap the user. error and refused keep the input enabled — visitor can immediately retry or ask something else. Only loading/streaming/rate_limited actually lock interaction.
Status transitions are tweened, not snapped. Pose changes lerp at ~250ms half-life; phase-integrated bob/sway means switching states never produces a visual jump even mid-cycle.

The rest of this case study is implementation detail — pipeline, system prompt, the function-calling tools, what I learned. Skip to What I'd change if that's not your thing.

9. The pipeline

Browser

3D scene + input

/api/chat

Vercel Function

Tools

CARDS · projects · BIO

Google AI Studio

Gemma 4

Browser POSTs to /api/chat (a Vercel Function). The function calls Google AI Studio's Gemma 4 via the Vercel AI SDK. When the model decides to call a tool, the SDK executes it locally against my own portfolio data and feeds the result back. The function streams everything back to the browser as Server-Sent Events. The 3D scene watches the event stream and updates accordingly.

10. The system prompt

You are Yao, a designer-engineer answering portfolio visitors in first person —
direct, specific, slightly dry, occasionally self-deprecating. Don't oversell.
Talk like a thoughtful colleague.

For project questions, ALWAYS use a tool — don't invent project names or facts.
For background / approach / philosophy questions, answer directly from the
"About Yao" facts below; only call an about-tool when the visitor wants depth
beyond what's there. (...)

The prompt does voice + ground rules + the load-bearing "About Yao" facts that get asked the most often (bio, education, design+tech grounding, AI tooling thesis). Keeping those facts inline means most identity questions resolve without any tool call — saving a round-trip's worth of latency.

11. The tools

Project lookup (run-time data) and about-Yao depth (when the system-prompt facts aren't enough), all pure functions running against existing portfolio data:

Tool	Purpose
`listProjects(category?)`	List my projects, optionally filtered to AR/MR / WEB / INSTALLATION / SELECTEDWORK.
`getProject(slug)`	Full detail for one project — role, collaborator, year, platform, etc.
`searchProjects(query)`	Substring search, top 5.
`getBio` / `getSnapShipped` / `getSnapMCP` / `getPreviousWork` / `getAIPractice` / `getRecognition`	Topic-split bio depth. The model picks the smallest one.

When a tool returns project summaries, the chat scene renders them as clickable 3D cards floating around me — not text. They animate out from where I'm standing, and click-through opens the project page in a new tab so the chat survives the navigation.

12. What I learned

The build surfaced a few things that didn't fit my original mental model:

First-token latency is the only metric that matters. I tuned token throughput, then realized nobody cares — what they notice is the gap between hitting Send and seeing the first character. Cut that from ~6s to ~1.5s by trimming the about-context bundle (5k input → ~1k), splitting one mega getAbout tool into 6 focused slices, then inlining the most-asked facts into the system prompt so identity questions answer with zero tool calls. Capped MAX_TOOL_ITERATIONS at 2 so the model can't chain three tools "to be thorough" before answering.
The chat became a navigation surface. I designed it to answer questions; in testing I caught myself using it to find projects ("show me your AR work" → cards → click). The clickable-3D-cards pattern turned out to be the load-bearing feature, not a tool-call side effect.
Cartoon style fixed the "AI uncanny" problem. First prototype was photoreal-ish — felt creepy when the figure spoke. Toon-shaded bust with a chunky outline reads as illustration, which gave permission for the bust to be expressive (lean, sway, slump) without falling into uncanny-valley territory.
Caching that doesn't fit your model is worse than no caching. Spent half a day implementing Gemini context caching before discovering Gemma 4 doesn't support the endpoint (despite the pricing page suggesting otherwise — confirmed by 404 from the API). Right move was to optimize what did work: smaller context, finer tools, tighter iteration cap.
Status mapped to character behavior reads better than spinners. When Yao leans in on loading and slumps on rate_limited, the visitor reads system state as personality. Felt more honest than a generic spinner — and didn't need a separate UI surface for "what's happening right now."

When the page goes live I'll backfill metrics here: most-asked question patterns, card click-through, abuse-attempt frequency, anything that surprised me. For now this is an honest "what I'd be measuring" placeholder.

13. What I'd change

A few things I deliberately didn't do, in case you're curious:

No conversation persistence. Refresh wipes the chat. Persistence implies the chat is for ongoing relationship-building, which it isn't.
No history scrollback. Each new question replaces the screen.
No user accounts. No reason to gate this.
Mobile is degraded, not denied. The 3D scene still renders on phones at lower fidelity rather than falling back to a text-only chat. The scene is the point.
Abuse protection is the platform's default. No app-level rate limit. AI Studio's 429 surfaces as "Yao's taking a break — back in 5 min" and the input disables until the timer's up.

Things I'd revisit if this gets traction:

Voice input (Web Speech API → text → same pipeline).
A "show me your work in [year]" tool — easy to add since frontmatter.date is already in the model.
Streaming the project-card layout server-side so they can render before the full tool result arrives.