Vibe Coding to Production — Anshuman Biswas, Ph.D.

Vibe Coding to Prod

Build one feature end‑to‑end

Anshuman Biswas

VP of Engineering, Elastio

16th September, 2025

Goal of this talk

Pick one audience‑suggested feature for my blogging platform.
Design in docs → codify as tests → generate & iterate code.
Ship a safe, observable, production‑ready slice.

Stack: Go backend + React frontend • CI • Playwright • Go tests

Quick agenda

Tools

Build in Cursor + Claude Code
Also show OpenAI Codex† & Gemini CLI

Concepts

Vibe coding in brief (Karpathy)
Hype vs. real progress
Docs → Code pipeline

Practice

Local vs. cloud agents
Context windows & tokens
Test‑first vibe coding
Task design tips

† Codex is shown historically; modern OpenAI models provide coding capabilities.

What is “Vibe Coding”?

Program by intent: describe the behavior and constraints; let the agent draft code; tighten the loop with tests and edits.

Popularized by Andrej Karpathy (ex‑OpenAI, Tesla). You steer through vibes (desired outcomes), specs, and test feedback.

Hype vs. progress

Yes, there’s a hype cycle.
But some capabilities improve at breakneck speed (reasoning, tooling, repos).
Adopt selectively: pair agents with tests, telemetry, and review.

Docs as the blueprint → Code as the artifact

Write the design doc & acceptance criteria first.
Generate scaffolding/code repeatedly until tests pass.
The doc becomes the durable spec; code evolves freely.

Two paths to move fast

Keep data local

Run Ollama with Qwen Coder in an IDE like Roo Code.

Great for privacy, air‑gapped work, or sensitive repos.

Max out cloud agents

Use the largest context & highest‑capability tiers (e.g., Claude Code, OpenAI).

Fast iteration, strong repo tools, multi‑file edits.

Context windows & tokens

A “token” ≈ ~4 chars in English; context window = how much the model can read at once.
Chunk repos; provide the right files + tests + docs for each task.
Use retrieval or editor‑select to keep prompts lean and on‑target.

Context windows — getting practical

Token math: 1 token ≈ 3–4 chars (English). 8K ≈ a few files; 32–200K ≈ small repos.
Plan the window: include only the files + tests the agent needs now.
Chunk work: decompose into patches that keep prompts < 40–60% of the window.
Pin the spec: keep the doc + acceptance criteria in context for every step.
Use diffs: ask for file‑scoped changes or unified diffs to minimize token use.

Claude Code: avoid context thrash

Large edits and long chats can push earlier instructions out of the window.
Create a project brief: add a root‑level .claude.md (or docs/ai/brief.md) and pin/reference it.
Keep it 1–2 pages; include repo map, style decisions, test commands, and “definition of done”.
Refresh the brief as the project evolves; keep IDs, routes, and invariants there.

# .claude.md — project memory (excerpt)
Project: anshumanbiswas.com blog
Stack: Go 1.22, React, Postgres

## Principles
- Tests first: go test ./..., pnpm test:e2e
- Observability: JSON logs; pass context for tracing
- Security: validated inputs; parameterized SQL; least-privilege

## Conventions
- Packages: internal/blog, cmd/api
- HTTP errors: RFC7807 problem+json with type, detail
- CI: run Go + Playwright on PR; require green

## Current task
Create Post API; unique slug; return 201 JSON; update E2E.

LLM memory & sessions

What “memory” means

Session state: the running chat history (falls out when tokens run out).
Product memory: tool‑level preferences or pinned docs (e.g., .claude.md).
Source of truth: keep canonical details in versioned docs/tests—not only in chat.

When to start a new session

You feel drift or the agent contradicts earlier decisions.
Prompt size or repo context regularly exceeds the window.
You switch to a new feature or refactor that needs a clean brief.
The chat is > ~100 interactions and “compression” is obvious.

Before you reset, copy important facts into .claude.md (or your brief) so the next session starts aligned.

Test‑first vibe coding

Ship confidence comes from tests, not vibes. Start with a failing test to focus the agent.

Go backend — minimal unit test

// blog/service_test.go
func TestCreatePost(t *testing.T) {
    repo := memory.NewPostRepo()
    svc := blog.NewService(repo)

    req := blog.CreatePostRequest{
        Title: "Hello", Slug: "hello", Body: "World",
    }
    got, err := svc.CreatePost(context.Background(), req)
    if err != nil { t.Fatal(err) }

    if got.Slug != "hello" {
        t.Fatalf("expected slug 'hello', got %q", got.Slug)
    }
}

React frontend — Playwright E2E

// tests/create-post.spec.ts
import { test, expect } from '@playwright/test';

test('create post flow', async ({ page }) => {
  await page.goto('http://localhost:3000/admin');
  await page.getByRole('textbox', { name: /title/i }).fill('Hello');
  await page.getByRole('textbox', { name: /slug/i }).fill('hello');
  await page.getByRole('button', { name: /publish/i }).click();

  await expect(page).toHaveURL(/\/hello$/);
  await expect(page.getByRole('heading', { name: 'Hello' })).toBeVisible();
});

Design tasks the agent can win

Short, scoped goals: “Add /api/posts POST handler that returns 201 + JSON.”
Measure done: “Green tests: TestCreatePost, TestListPosts, create-post.spec.”
Provide assets: schema, sample payloads, test stubs, error cases.
State constraints: idempotency, logging, authz, perf budgets.
Prefer patches: ask for diffs or file‑scoped changes.

Task spec template (copy/paste)

# Task
Implement Create Post API in Go service.

## Inputs
- Storage: PostgreSQL (table posts)
- Fields: id, title, slug, body, created_at
- Constraints: unique slug; server‑side slugify

## Done when
- POST /api/posts returns 201 with post JSON
- Unit: TestCreatePost, TestListPosts pass
- E2E: create-post.spec passes

## Notes
- Add repository + service layers
- Log structured JSON; redact secrets
- Include Swagger/OpenAPI snippet

When “longer thinking” helps

Some models have deliberate/slow modes for heavier refactors.
Use for cross‑file changes, complex migrations, or API design.

Refactor at scale

Tools like blitzy.com aim to read very large repos and refactor while preserving behavior.
Anchor every refactor in tests + type checks + CI to ensure parity.

Security & observability by default

Threat model: avoid prompt‑injected secrets; watch for supply‑chain changes.
Static/dynamic scans in CI; SBOM; dependency pinning.
Add logs/metrics/traces to new endpoints on day one.

Live demo: pick the feature

Suggest a feature for anshumanbiswas.com.
We’ll draft the doc, write a failing test, and let the agent code.
Ship it if CI is green.

Thank you

Slides & SVGs provided in this chat. Questions welcome.

Twitter/X: @anchoo2kewl • Site: anshumanbiswas.com