AI-Assisted Flutter Development: Claude Code in Production
How I use Claude Code with CLAUDE.md, custom skills, and architecture guardrails in a real Flutter project. 827 commits, ~30% faster migration.
AI-Assisted Flutter Development: Claude Code in Production
For senior developers, tech leads, and CTOs wondering whether AI coding tools actually work in production — or just generate impressive demos.
TL;DR: I used Claude Code across 827 commits in a Flutter e-commerce app for a Swiss retailer. The difference between “vibe coding” and AI-assisted development is not the AI — it is the infrastructure around it. A CLAUDE.md file defines project context. Custom skills (slash commands) switch the AI between migration mode, legacy mode, and test mode. Architecture guardrails prevent the AI from “improving” code it should not touch. Result: ~30% faster migration timeline. Not because the AI wrote perfect code, but because it handled the mechanical parts while I made the decisions that required judgment.
Vibe Coding vs. AI-Assisted Development
These two terms get used interchangeably in 2026. They should not be.
Vibe coding is prompting and committing. You describe what you want, the AI generates code, you paste it in. Maybe it works. Maybe it compiles. Maybe it follows the same pattern as the rest of your codebase. You don’t know until something breaks, and by then three more features are built on top of it.
AI-assisted development is structured collaboration. The AI knows your architecture. It knows which patterns to follow in which parts of the codebase. It knows when to use TaskEither and when to use try/catch. It knows because you told it — explicitly, in configuration files that it reads before writing a single line.
Both use the same underlying models. The difference is entirely in setup. One produces code that looks right in a PR diff. The other produces code that works in production six months later.
I spent ten months on a project that started as vibe-coded and ended as something far more disciplined.
The Project: 827 Commits, One Codebase
The Customer was a renowned Swiss retailer that had just launched their Mobile E-Commerce App 2 Months prior. I joined their Flutter e-commerce app when it had an 85% crash-free rate — roughly one in seven sessions ended in a crash. API calls inside widgets, three different error handling styles, state management by tutorial roulette.
Over 22 weeks, I migrated the app to Clean Architecture while shipping features. 827 commits. Crash-free rate past 97%. I have written about the migration strategy itself separately. This post is about the AI tooling that made it possible to move that fast.
My only AI coding tool was Claude Code — an agentic coding CLI that reads your entire project and executes multi-step development tasks. Not Copilot, not Cursor. The distinction matters because this workflow depends on Claude Code features: CLAUDE.md configuration files and custom skills.
The Context Problem
Here is what happens when you point an AI at a Flutter codebase without project context.
You ask it to fetch product data. It writes a service class with try/catch. You ask it to fetch user data. It writes a repository with TaskEither from fpdart. You ask it to handle cart state. It generates a ChangeNotifier. You already use Riverpod.
Three requests, three different patterns. Each defensible in isolation. Together, a maintenance nightmare.
The AI has no way to know your project uses TaskEither for error handling and AsyncNotifier for state management. It draws from the sum of all Flutter code it has ever seen, and that sum includes every pattern and anti-pattern in existence.
The fix is not a smarter model. The fix is project context.
CLAUDE.md: Your Project’s AI Configuration
Claude Code reads a CLAUDE.md file from your project root at the start of every session. Architecture decisions, naming conventions, import rules — everything the AI needs before it touches your code.
Here is a simplified version from the retailer project:
## Architecture
This project uses Clean Architecture with feature-first organization.
### Decision Matrix
- Touching existing unmigrated code? -> Follow legacy patterns
- Writing a new feature? -> Use Clean Architecture
- Migrating an existing feature? -> Follow migration checklist
### Import Rules
- Domain layer: NO imports from data or presentation
- Presentation layer: imports domain only
- Data layer: implements domain interfaces
### Error Handling
- New code: TaskEither<AppFailure, T> from fpdart
- Legacy code: existing try/catch (do NOT refactor)
### State Management
- New features: Riverpod with code generation
- AsyncNotifier for async state
- Do NOT use ChangeNotifier, StateNotifier, or Bloc This is not documentation for humans. Humans have context from standups and PR reviews. CLAUDE.md is documentation for the AI — explicit, unambiguous, with decision trees instead of guidelines.
The decision matrix is the most important part. Without it, the AI defaults to “write the best code possible.” During a migration, “best code possible” is context-dependent. A bug fix in unmigrated code should follow legacy patterns. The same logic, written during a migration, should follow Clean Architecture. Same feature, two correct approaches, depending entirely on intent. No model training covers that distinction. It has to be configured per project.
Custom Skills: Different Tasks, Different AI Behavior
A CLAUDE.md sets the baseline context. But different development tasks need the AI to behave differently. That is where custom skills come in.
Custom skills are slash commands in Claude Code. Each one loads a specific set of instructions that override or extend the baseline context. On the retailer project, I used five:
/migration enforces a strict sequence:
When migrating a feature:
1. Create domain layer first (entities, repository interfaces, use cases)
2. Create data layer (implementations, data sources, DTOs)
3. Create presentation layer (providers, pages, widgets)
4. Write tests for each layer
5. Remove legacy code only after tests pass The AI does not jump ahead to the presentation layer because it is “easier.” It starts with the domain, writes the interfaces, and builds outward. Every migrated feature has the same structure.
/legacy-code is the opposite. “Follow existing patterns. Do not introduce Clean Architecture imports. Match the style of surrounding code.” This was the hardest skill to get right. The AI’s instinct is to improve. It sees a try/catch and wants to refactor it into TaskEither. That instinct is correct in migration mode and catastrophic in legacy mode. A “quick improvement” to a legacy service class can break five widgets that depend on its exact interface.
/ui-component loads design system rules. /riverpod enforces code generation with @riverpod annotation and AsyncNotifier. /test-workflow sets testing conventions — unit tests for use cases, widget tests for composed components, mock repositories via domain interfaces.
Each skill changes the AI’s behavior without changing the AI itself. The model is the same. The context is different. (For more on how AI agents and tool systems like these work, see AI Agents, MCP, and Tools Explained.)
Architecture Guardrails: The Biggest Win
If I had to pick one concept from this entire setup that delivered the most value, it is architecture guardrails.
During a migration, two valid architectural styles coexist for months. Every task requires a decision — which style applies here? Humans handle this through judgment. AI does not have that judgment. Without explicit guardrails, it sees legacy code and “improves” it. That creates a third style — half-migrated code that follows neither convention consistently. Worse than legacy code because it is unpredictable.
The decision matrix in CLAUDE.md solved this. Not a suggestion — a rule. “Touching unmigrated code? Follow legacy patterns.” No ambiguity.
Here is what the same task looks like with and without guardrails.
Without guardrails — fixing a bug in a legacy service:
// AI "improves" the legacy code while fixing the bug
class ProductService {
TaskEither<AppFailure, Product> getProduct(String id) {
return TaskEither.tryCatch(
() => _httpClient.get('/products/$id').then((r) => Product.fromJson(r.data)),
(error, stack) => AppFailure.unexpected(error.toString()),
);
}
}
// Problem: 12 widgets depend on Future<Product>, not TaskEither With guardrails — same bug, same service:
// AI fixes the bug using existing patterns
class ProductService {
Future<Product> getProduct(String id) async {
try {
final response = await _httpClient.get('/products/$id');
return Product.fromJson(response.data);
} catch (e) {
if (e is TypeError) throw ProductNotFoundException(id);
rethrow;
}
}
}
// Bug fixed, existing interface preserved The second version is not “better code” in the abstract. It is the correct code for this context — context-dependent correctness.
What AI Is Good At (and Where It Falls Short)
After 827 commits using Claude Code, I have a clear picture of where AI pair programming delivers value and where it does not.
Where it excels
Boilerplate is the obvious one. Clean Architecture is verbose by design — entity classes, repository interfaces, use cases, data sources, DTOs, mappers, provider declarations. The structure is formulaic. Once the AI has seen two migrated features, it generates the scaffolding for the third with minimal correction.
Test generation is close behind. “Write unit tests for this use case” with the project’s testing conventions loaded produces usable tests 80% of the time. The remaining 20% need manual adjustment, usually around edge cases the AI cannot infer from the interface alone.
Pattern consistency is something AI handles better than humans. A developer on their fifteenth provider declaration in a week starts taking shortcuts. The AI does not get tired. Every AsyncNotifier follows the same structure. That consistency compounds over months.
Where it falls short
Architectural decisions remain firmly human territory. “Should we introduce a caching layer here?” depends on traffic patterns, backend SLA, and six other factors the AI cannot observe. It will give you a confident answer. That answer may be wrong.
Business context is invisible to the AI. It does not know that the German market has different legal requirements for price display than the Swiss market, or that certain API fields return stale data because the backend redesign is not finished. These are the things that cause real bugs.
Subtle bugs are the dangerous category. The AI writes code that compiles, passes the tests it generated, and looks correct in review. But it might use the wrong comparison operator for a currency calculation or handle a timezone edge case incorrectly. AI-generated code needs the same scrutiny as human-written code. Arguably more, because its confidence makes it easier to rubber-stamp.
The AI also lacks the judgment to not write code. Sometimes the correct response is “this duplicates existing functionality.” The AI will always produce something.
The Real Impact: ~30% Faster, Not Magic
The migration took 22 weeks. Without AI-assisted development, my estimate is 30-32 weeks. That is roughly six to eight weeks saved — not the ten-times productivity claim you see in conference talks.
The 30% comes from two sources. The mechanical parts of each migration step were faster — scaffolding, boilerplate, initial test suites. And pattern consistency reduced the review cycle. Fewer “why does this feature handle errors differently?” conversations.
What AI did not speed up: architectural planning, debugging production issues via Crashlytics traces, and the final QA pass on each step.
The honest math: 70% of the speed gain came from boilerplate reduction. 30% came from consistency enforcement. Zero percent came from the AI making better architectural choices than I would have. I applied a similar Claude Code workflow in a serverless-to-monolith migration with comparable results.
If your team is planning a similar migration or evaluating how AI tooling fits into production workflows, I have done this before.
Setting This Up for Your Project
This works for any project, not just Flutter — I use the same CLAUDE.md and custom skills approach on web apps, backend services, and infrastructure projects. For a broader perspective on AI in app development, see Building Apps with AI.
Start with CLAUDE.md. Document your architecture in machine-readable terms. Not “we prefer clean code” — the AI does not know what you mean by that. Instead: “Domain layer classes live in lib/features/{name}/domain/. They must not import from data/ or presentation/. Error handling uses TaskEither<AppFailure, T>.” Concrete paths. Concrete types. Concrete rules.
Add a decision matrix. If your codebase has multiple styles (most do), codify which style applies when. Write it as a decision tree, not prose.
Create your first custom skill for whatever task you do most often. Start with one. Add more as you identify repeated patterns in your AI interactions.
Run the AI on low-risk tasks first. A utility function. A test for an existing module. Review the output carefully — import paths, naming conventions, whether it followed the CLAUDE.md rules. Update CLAUDE.md when the instructions are ambiguous. Expand to feature work once the context is dialed in.
The CLAUDE.md is a living document. Mine changed over fifty times during the retailer project. Over ten months, it became a remarkably precise description of the project’s architecture. A side effect: it is also the best onboarding document the project has.
What This Means for Teams
The CLAUDE.md and custom skills are project-level configuration, not personal preference. When the entire team uses them, a junior developer using /migration gets the same structural scaffolding as a senior developer. The architecture decisions are encoded in the tooling, not locked inside one person’s head.
This shifts code review from “did you follow the pattern?” to “is the business logic correct?” And it forces the team to articulate rules that previously lived as tribal knowledge. “When you fix a bug in legacy code, follow legacy patterns” is a nuanced rule most teams never write down. The AI forces you to write it down, and the whole team benefits.
There is a handover benefit too: the CLAUDE.md and custom skills stay with the codebase. When a freelancer finishes an engagement, the team inherits a precise, machine-readable description of their own architecture. No knowledge walks out the door.
Agentic Coding: Where This Is Heading
The current setup — AI as a careful assistant with explicit guardrails — is an intermediate step. The next stage is agentic coding: the AI executes multi-step tasks autonomously, creates files, runs tests, fixes failures, and commits the result.
For me, this is already partially reality in 2026. Claude Code can run a migration end-to-end — create the file structure, generate the code, write tests, run them, and correct failures. I review the outcome instead of every intermediate step.
But the same principle holds: autonomy without guardrails is dangerous. The more autonomy the AI gets, the more important the architecture rules in CLAUDE.md become. Without them, autonomous coding is just vibe coding with extra steps.
Conclusion
AI-assisted development in production is not about the AI being smart. It is about the developer being deliberate.
In a production codebase, AI without project context generates code that looks right and behaves wrong. Without guardrails, it “improves” code that should not be touched. The infrastructure matters more than the model — CLAUDE.md, custom skills, explicit rules about which patterns apply where. That is what separates vibe coding from production-grade AI development. When this setup becomes your default rather than an experiment, AI is no longer assisting your development — it is native to it.
My results on the retailer project: 827 commits, crash-free rate from 85% to past 97%, ~30% faster migration timeline. Not because the AI was magic. Because the AI had context.
The bar for AI-assisted development in 2026 is not “can the AI write code?” It can. The bar is: “can the AI write code that belongs in your codebase?” That takes work. The work is worth it.
I help teams set up AI-assisted development workflows for production codebases. If that is relevant for your project, let’s talk.
Frequently Asked Questions
What is the difference between vibe coding and AI-assisted development?
Vibe coding means prompting an AI and committing the output with minimal review or structural guidance. AI-assisted development means providing the AI with explicit project context — architecture rules, conventions, decision matrices — so that its output is consistent with the existing codebase. Same models, different infrastructure.
What is CLAUDE.md?
A configuration file that Claude Code reads from your project root at the start of every session. It contains architecture descriptions, naming conventions, import rules, and decision trees that guide the AI’s behavior — project documentation written for the AI.
Can AI coding assistants handle architecture migrations?
With guardrails, yes. Unguided, an AI will try to “improve” all code it touches, creating pattern chaos during a migration. Custom skills and decision matrices in CLAUDE.md constrain the AI to follow the correct patterns for each context.
How much faster is AI-assisted Flutter development?
Roughly 20-30% faster for implementation work. Boilerplate-heavy work sees the biggest gains. Architectural planning, debugging, and QA are not meaningfully faster.
Does this approach work with tools other than Claude Code?
The concepts — project context files, task-specific configurations, architecture guardrails — apply broadly. The specific implementation (CLAUDE.md, custom skills) is Claude Code. Other tools have their own mechanisms. The principle is the same: give the AI explicit, machine-readable project context.