Shopify Case study · 8 min read

Inside Shopify's AI-first engineering playbook

Shopify is the gold standard for context engineering at scale. They built a centralised LLM proxy, MCP-wired every internal system, made "AI-reflexive" usage a performance review criterion, and shipped 20% productivity gains across engineering. Here are the eight patterns to steal.

The takeaways

Standardise infrastructure, not tools. One LLM proxy. Many AI tools. Cursor, Claude Code, Copilot, Codex — all routed through it.
MCP-wire everything. Salesforce, Slack, Google Workspace, internal wiki, data warehouse. Access controls preserved.
Make internal tools trivial to ship. Their tool "Quick" lets anyone deploy a JS/HTML file at a URL in seconds. Non-engineers ship tools.
"AI-reflexive" is now a performance metric. Tobi Lütke's term. Use AI by default — explain when you didn't.
Comprehension debt is the #1 risk. Engineers must understand 2-3 layers below where they're working. Abdicate the toil, not the thinking.

In April 2025, Shopify CEO Tobi Lütke sent an internal memo that went viral. The headline: using AI is now a "fundamental expectation" of every Shopify employee, factored into performance reviews. Managers requesting new headcount have to show why AI couldn't do the job.

One year later, the results are striking. Tobi has personally shipped more code in three weeks than the decade prior. His coding agent made Shopify's Liquid template engine 53% faster. Engineering productivity is up an estimated 20%. And it all rests on a context-engineering foundation that most companies haven't built.

Bessemer Venture Partners sat down with Farhan Thawar, Shopify's VP and Head of Engineering, to document the system. Here's what's worth stealing.

1. Standardise infrastructure, not tools

The instinct most companies have is to pick one AI tool — Cursor, Copilot, Claude Code — and roll it out. Shopify did the opposite.

They built an internal LLM proxy — a centralised gateway through which every AI request from every tool flows. Behind that proxy: OpenAI, Anthropic, Google models. In front of it: every developer tool the team wants to use.

"At Shopify, we always have one tool for one job — except for AI. We don't know yet which company, workflow, or model is going to win."

Farhan Thawar, VP Engineering, Shopify

The benefits compound. Shopify buys tokens in bulk (cost discounts), routes spending across teams (visibility), and switches models behind the scenes as new ones get better. If someone spends $250 in tokens in a day, Farhan gets an alert — but he investigates rather than shuts it down. Usually they're trying something ambitious.

The lesson

The AI ecosystem moves too fast to bet on one tool. Standardise the layer underneath — costs, observability, model-switching. Let engineers experiment with whatever's working that month.

2. Connect AI to every internal system via MCP

An LLM with no context is useless. An LLM that can read your Salesforce, query your data warehouse, search Slack, and check your calendar — that's a coworker.

Shopify wired their internal systems to AI through MCP servers (Model Context Protocol). Their internal wiki, their product management tool (GSD), their data warehouse, Slack, Google Workspace, Salesforce — all expose themselves as MCP endpoints.

Crucially, access controls are preserved. AI only retrieves what the user already has permission to see. Same auth flow, same audit trail.

"Because it's going through the same auth flow that you have, it's not going to give me information that I don't have access to."

Farhan Thawar, VP Engineering, Shopify

3. Make internal tools trivial to ship

Farhan compares Shopify's internal tooling moment to GeoCities — anyone could publish a website with a URL.

They built a tool called Quick. Drop in a JS, TS or HTML file. Assign it a URL. Instantly deployed. Anyone in the company can access it.

The result: non-engineers — sales, finance, support — now build their own dashboards. Salespeople wrote queries to pull merchant data into briefing docs. Finance teams built MBR generators. One person sent Farhan a Quick link compiling everything he needed to know about a merchant before a call.

This is what Farhan calls "n-of-1 software" — tools built for one person's specific workflow, deployed in minutes. No engineering tickets. No backlog.

4. Cultural adoption beats top-down mandate

The memo from Tobi got the press. But the actual adoption came from culture.

Farhan posts examples of what he ships with AI — framed not as "look how smart I am" but as "look how lazy I am." Tobi does the same. Leadership openly shares prompts that worked.

The result: adoption spread organically beyond engineering. Sales started using Cursor. Finance. HR. Cursor's own team asked Farhan how he got salespeople to adopt it so deeply.

"A lot of our tactics were simply nudging, showing people demos, and bragging about how 'lazy' we are — working smarter, not harder. We'd say, 'Look what I built in five minutes.' There's no forcing."

Farhan Thawar, VP Engineering, Shopify

Tobi codified the cultural expectation in a single word: "AI-reflexive." The instinct to reach for AI first when you hit a problem. Performance reviews factor it in.

5. Track the right metrics (not output volume)

Lines of code. Pull requests. Story points. All useless when AI can generate code 10x faster.

Farhan's preferred signal: weekly demos. Teams show what they built. Leadership watches. Alignment, blockers, and momentum become visible without a metric to game.

His one quantitative metric: reversion rate. How often does a merged PR get rolled back? If AI were generating worse code, this would spike. It hasn't. Engineers ship slightly more PRs per week with AI — and the reversion rate is flat.

~20%

Estimated productivity gain across Shopify engineering

3,000+

Cursor licences across engineering

53%

Speed-up Tobi got on Shopify's Liquid engine with an AI coding agent

6. Build with quality guardrails

Speed without quality is regression in disguise. Shopify enforces two non-negotiables:

Senior engineer review on every PR. "Shopify is not yet at the place where we allow AI to check in code automatically into the repos. We still require a human PR reviewer." That review is now the bottleneck — but Farhan considers it a necessary safeguard.

AI as a security partner, not a security source. Farhan is sceptical that AI writes more secure code by default (it often writes more verbose code, which is more surface area). Instead, Shopify uses AI to find security vulnerabilities — fuzz testing, IDOR analysis, API boundary probes. Tedious work for humans. Perfect work for LLMs.

7. Beware comprehension debt

This is Farhan's biggest worry, and worth the longest pause.

If AI generates all the code, engineers gradually lose understanding of the systems they maintain. When something breaks at 3am, no one knows how it works. This is comprehension debt — and unlike financial debt, it's silent until catastrophic.

"In general, I tell my team that they need to understand things two or three layers below the layer they're working at. You shouldn't abdicate the thinking. You should abdicate the toil."

Farhan Thawar, VP Engineering, Shopify

The metaphor he uses: Formula 1 drivers don't just know how to drive — they understand the engine, the braking systems, the tyre compounds. That depth is what lets them react when something goes wrong. Same for engineers in an AI-native world.

8. Bet on agentic harnesses

Farhan calls this the 2026 unlock. Two patterns are emerging:

Parallel agents. A senior engineer launches 10 AI agents to work on different parts of the codebase simultaneously. They review outputs, discard what doesn't work, merge what does. Dramatically increased pace.

Sequential critique loops. One agent runs through extended reasoning — generating, critiquing, refining over 45+ minute cycles. Multi-model interrogation. Deeper answers.

"If you don't figure out how to harness the agents in 2026, you'll be behind."

Farhan Thawar, VP Engineering, Shopify

Why this matters for founders

Shopify has a 6-person ML infrastructure team. Tobi has political capital to mandate AI usage across thousands of employees. Most of this isn't replicable at the scale of a 5–30 person business.

But the patterns are. The LLM proxy, the MCP-wired tools, the weekly demos, the cultural framing, the "abdicate toil not thinking" principle — these scale down beautifully. They're the architecture of a serious AI-first business, sized for any team.

That's where we come in. We take what Shopify built with a 6-person infra team and ship it for founders with no engineering team at all. Founder-owned. Self-hosted. On your stack.

Want this for your business?

We build context-engineering systems for founders. Sized for your team, built on your infrastructure, owned by you end-to-end.

Book a call →

Sources