Back
Vibe Coding to Production — Anshuman Biswas, Ph.D.

Vibe Coding to Prod

Build one feature end‑to‑end

Anshuman Biswas

VP of Engineering, Elastio

{ code } Vibes → Specs → Tests → Code → Prod

Goal of this talk

  • Pick one audience‑suggested feature for my blogging platform.
  • Design in docs → codify as tests → generate & iterate code.
  • Ship a safe, observable, production‑ready slice.

Stack: Go backend + React frontend • CI • Playwright • Go tests

Blog IDE Cursor Claude Code Prod

Quick agenda

Tools

  • Build in Cursor + Claude Code
  • Also show OpenAI Codex† & Gemini CLI

Concepts

  • Vibe coding in brief (Karpathy)
  • Hype vs. real progress
  • Docs → Code pipeline

Practice

  • Local vs. cloud agents
  • Context windows & tokens
  • Test‑first vibe coding
  • Task design tips

† Codex is shown historically; modern OpenAI models provide coding capabilities.

What is “Vibe Coding”?

Natural language + tests → code

Program by intent: describe the behavior and constraints; let the agent draft code; tighten the loop with tests and edits.

Popularized by Andrej Karpathy (ex‑OpenAI, Tesla). You steer through vibes (desired outcomes), specs, and test feedback.

Hype vs. progress

time level hype real capabilities ↑
  • Yes, there’s a hype cycle.
  • But some capabilities improve at breakneck speed (reasoning, tooling, repos).
  • Adopt selectively: pair agents with tests, telemetry, and review.

Docs as the blueprint → Code as the artifact

{ } Docs / design notes Generated code
  • Write the design doc & acceptance criteria first.
  • Generate scaffolding/code repeatedly until tests pass.
  • The doc becomes the durable spec; code evolves freely.

Two paths to move fast

Keep data local

Run Ollama with Qwen Coder in an IDE like Roo Code.

Great for privacy, air‑gapped work, or sensitive repos.

local Laptop + Ollama (local) Qwen Coder • Roo Code

Max out cloud agents

Use the largest context & highest‑capability tiers (e.g., Claude Code, OpenAI).

Fast iteration, strong repo tools, multi‑file edits.

Claude Code / GPT

Context windows & tokens

  • A “token” ≈ ~4 chars in English; context window = how much the model can read at once.
  • Chunk repos; provide the right files + tests + docs for each task.
  • Use retrieval or editor‑select to keep prompts lean and on‑target.
Context window (tokens) capacity → prompt + code + tests

Context windows — getting practical

  • Token math: 1 token ≈ 3–4 chars (English). 8K ≈ a few files; 32–200K ≈ small repos.
  • Plan the window: include only the files + tests the agent needs now.
  • Chunk work: decompose into patches that keep prompts < 40–60% of the window.
  • Pin the spec: keep the doc + acceptance criteria in context for every step.
  • Use diffs: ask for file‑scoped changes or unified diffs to minimize token use.
Context window (tokens) capacity → prompt + code + tests

Claude Code: avoid context thrash

  • Large edits and long chats can push earlier instructions out of the window.
  • Create a project brief: add a root‑level .claude.md (or docs/ai/brief.md) and pin/reference it.
  • Keep it 1–2 pages; include repo map, style decisions, test commands, and “definition of done”.
  • Refresh the brief as the project evolves; keep IDs, routes, and invariants there.
# .claude.md — project memory (excerpt)
Project: anshumanbiswas.com blog
Stack: Go 1.22, React, Postgres

## Principles
- Tests first: go test ./..., pnpm test:e2e
- Observability: JSON logs; pass context for tracing
- Security: validated inputs; parameterized SQL; least-privilege

## Conventions
- Packages: internal/blog, cmd/api
- HTTP errors: RFC7807 problem+json with type, detail
- CI: run Go + Playwright on PR; require green

## Current task
Create Post API; unique slug; return 201 JSON; update E2E.

LLM memory & sessions

What “memory” means

  • Session state: the running chat history (falls out when tokens run out).
  • Product memory: tool‑level preferences or pinned docs (e.g., .claude.md).
  • Source of truth: keep canonical details in versioned docs/tests—not only in chat.

When to start a new session

  • You feel drift or the agent contradicts earlier decisions.
  • Prompt size or repo context regularly exceeds the window.
  • You switch to a new feature or refactor that needs a clean brief.
  • The chat is > ~100 interactions and “compression” is obvious.

Before you reset, copy important facts into .claude.md (or your brief) so the next session starts aligned.

Test‑first vibe coding

Go unit tests Playwright E2E

Ship confidence comes from tests, not vibes. Start with a failing test to focus the agent.

Go backend — minimal unit test

// blog/service_test.go
func TestCreatePost(t *testing.T) {
    repo := memory.NewPostRepo()
    svc := blog.NewService(repo)

    req := blog.CreatePostRequest{
        Title: "Hello", Slug: "hello", Body: "World",
    }
    got, err := svc.CreatePost(context.Background(), req)
    if err != nil { t.Fatal(err) }

    if got.Slug != "hello" {
        t.Fatalf("expected slug 'hello', got %q", got.Slug)
    }
}

React frontend — Playwright E2E

// tests/create-post.spec.ts
import { test, expect } from '@playwright/test';

test('create post flow', async ({ page }) => {
  await page.goto('http://localhost:3000/admin');
  await page.getByRole('textbox', { name: /title/i }).fill('Hello');
  await page.getByRole('textbox', { name: /slug/i }).fill('hello');
  await page.getByRole('button', { name: /publish/i }).click();

  await expect(page).toHaveURL(/\/hello$/);
  await expect(page.getByRole('heading', { name: 'Hello' })).toBeVisible();
});

Design tasks the agent can win

Short goals Measurable Constraints
  • Short, scoped goals: “Add /api/posts POST handler that returns 201 + JSON.”
  • Measure done: “Green tests: TestCreatePost, TestListPosts, create-post.spec.”
  • Provide assets: schema, sample payloads, test stubs, error cases.
  • State constraints: idempotency, logging, authz, perf budgets.
  • Prefer patches: ask for diffs or file‑scoped changes.

Task spec template (copy/paste)

# Task
Implement Create Post API in Go service.

## Inputs
- Storage: PostgreSQL (table posts)
- Fields: id, title, slug, body, created_at
- Constraints: unique slug; server‑side slugify

## Done when
- POST /api/posts returns 201 with post JSON
- Unit: TestCreatePost, TestListPosts pass
- E2E: create-post.spec passes

## Notes
- Add repository + service layers
- Log structured JSON; redact secrets
- Include Swagger/OpenAPI snippet

When “longer thinking” helps

  • Some models have deliberate/slow modes for heavier refactors.
  • Use for cross‑file changes, complex migrations, or API design.
deliberate / deeper

Refactor at scale

same API
  • Tools like blitzy.com aim to read very large repos and refactor while preserving behavior.
  • Anchor every refactor in tests + type checks + CI to ensure parity.

Security & observability by default

  • Threat model: avoid prompt‑injected secrets; watch for supply‑chain changes.
  • Static/dynamic scans in CI; SBOM; dependency pinning.
  • Add logs/metrics/traces to new endpoints on day one.

Live demo: pick the feature

Your feature idea?
  • Suggest a feature for anshumanbiswas.com.
  • We’ll draft the doc, write a failing test, and let the agent code.
  • Ship it if CI is green.

Thank you

Slides & SVGs provided in this chat. Questions welcome.

Twitter/X: @anchoo2kewl • Site: anshumanbiswas.com

Vibe Coding to Production — Anshuman Biswas, Ph.D.