AO
AceOffer
·
Back
Anthropic Common Problems

Prompt Playground System Design

System DesignmediumLast reported April 2026
By AceOffer · Updated April 2026 · Reported 7× across 190+ candidate reports

Understanding the Problem

Design a prompt playground — a web-based platform that lets users (developers/engineers) compose and send prompts to an LLM and receive responses. Key constraints stated by interviewers: (1) each conversation turn is stateless — the model does NOT receive prior context by default; (2) the system must persist and display prior prompt/response history; (3) users can share conversations with others. The interviewer emphasizes UX, workflow, and frontend behavior over pure backend infrastructure. Follow-up sub-questions (may come unprompted): how to handle very large prompts (10 MB+), how to keep the client performant when many windows are open each with a large prompt, and how to index very long prompt text.

Functional Requirements

Structured requirements coming soon. For now, see the full problem statement above and the deep-dive prompts below.

Non-Functional Requirements

Latency, throughput, availability, consistency targets — being authored.

The Set Up

Defining the Core Entities

Core entities (Request, Batch, Worker, Cache, etc.) — being authored.

The API

POST /endpoint → describe request shape GET /endpoint → describe response shape (API spec being authored)

High-Level Design

Component diagram + walkthrough mapping each functional requirement to a system flow — being authored.

Potential Deep Dives

These are the directions the interviewer is likely to push you. Each one has multiple valid solutions at different quality tiers.

1)How do you handle a prompt that is 10 MB or larger? (when: Candidate finishes base design)

Bad
Naive approach with serious trade-off — being authored.
Good
Solid baseline with reasonable trade-offs — being authored.
Great
Production-grade approach with explicit trade-off rationale — being authored.

2)A user has 20 browser tabs open, each with a very large prompt. The client is becoming slow. How do you improve UX? (when: After large-prompt answer)

Bad
Naive approach with serious trade-off — being authored.
Good
Solid baseline with reasonable trade-offs — being authored.
Great
Production-grade approach with explicit trade-off rationale — being authored.

3)What are your core entities, their relationships, and how would you index them? (when: After entity design)

Bad
Naive approach with serious trade-off — being authored.
Good
Solid baseline with reasonable trade-offs — being authored.
Great
Production-grade approach with explicit trade-off rationale — being authored.

4)How do you implement the share-conversation feature? (when: After core design)

Bad
Naive approach with serious trade-off — being authored.
Good
Solid baseline with reasonable trade-offs — being authored.
Great
Production-grade approach with explicit trade-off rationale — being authored.

5)Would a key-value database work for prompt caching? How does a vector DB compare? (when: Candidate mentions caching prompts)

Bad
Naive approach with serious trade-off — being authored.
Good
Solid baseline with reasonable trade-offs — being authored.
Great
Production-grade approach with explicit trade-off rationale — being authored.

What is Expected at Each Level?

L4 / Mid-level
Cover happy path. Clarify scope. Identify the obvious bottleneck. Pick a reasonable storage and reasonable scaling approach.
L5 / SeniorTarget
All of the above plus: explicit failure handling, durability vs latency trade-offs, choose the right batching/caching strategy, articulate why.
L6 / Staff+
All of the above plus: organizational concerns (rollout, migration, on-call), quantitative analysis, multi-region considerations, what could go wrong with the proposed solution at 10x scale.

Insider Notes

**Common mistakes:** Treating it like a standard backend system design (drawing microservice diagrams) instead of focusing on UX and workflow first; Ignoring client-side performance entirely; Storing large prompt content inline in a relational DB row without considering size limits; Not addressing streaming for model responses (buffering full response before sending to client); Forgetting the stateless-turn constraint and designing a full context-window management system; Underspecifying the share feature (no read-only enforcement, no expiry/permissions) **Interviewer hints:** This is not a traditional system design — no diagram required; treat it like a Google Doc discussion; Focus on UX and workflow, not just backend infrastructure; Think about what happens on the client when prompts are very large; Would a key-value DB be sufficient here, or do you really need a vector DB?; Think about how the content is stored when it's 10 MB — you don't want that in a DB column **What passers do:** Led with UX and user workflow before jumping to backend components; Proactively identified the 10 MB+ large prompt problem and proposed object storage offload + streaming; Addressed client-side performance (virtual scrolling, lazy loading, web workers) without being asked; Clearly defined core entities, relationships, and indexes early; Handled the conversational/whiteboard format naturally — structured verbal reasoning, no insistence on drawing; Distinguished between what belongs on client vs server vs storage layer **Why people fail:** Jumped directly to backend infrastructure without discussing UX or user workflow; Could not address large prompt handling when asked as a follow-up; Treated the interview as a traditional SD whiteboard and got confused by the conversational format; Culture/HM rounds cited as additional failure vectors independent of technical performance; Answered follow-ups correctly but feedback was still negative (high bar, pipeline saturation noted by multiple candidates) **Edge cases probed:** Prompt content is 10 MB+ (exceeds typical DB column and memory limits); User opens 20+ browser windows each with large prompts — client-side memory/render performance; Indexing/searching very long prompt text efficiently; Share link permissions (read-only vs editable, expiry); Stateless model turns — ensuring no prior context leaks into subsequent calls **Alternative approaches:** Vector DB for semantic prompt search (Enables similarity-based retrieval of past prompts; adds operational complexity (embedding pipeline, ANN index). Key-value DB with hash partition works fine for exact-match caching and is simpler. Mentioned in one interview — interviewer probed whether KV DB was sufficient.); Event sourcing for conversation history (Append-only log gives full audit trail and easy replay; adds complexity for simple read-path queries. Overkill for a playground unless audit is required.); CRDT / operational-transform for collaborative editing (One interviewer framed the session as 'Google Doc style'; real-time multi-user co-editing of prompts would require CRDT (e.g., Yjs). Adds significant complexity; confirm scope before implementing.); Full in-DB storage of prompt content (Simpler for small prompts (<1 MB); degrades at 10 MB+ due to row size, replication overhead, and memory pressure on DB. Object storage offload is preferred at scale.)
Anthropic · System Design · Last reported April 2026