My AI Coding Setup: Tools, Context, and Guardrails

Jun 02, 2026

Three tools. That is the core of the setup.

Claude Code for the heavy work: feature implementation, debugging, refactoring, test writing, review preparation. Sometimes from the terminal, sometimes from inside the editor. Where it runs depends on what it is touching, and I will get to why.

Claude Desktop for the work that is not about code generation at all. Planning sessions. Reviewing specifications before I start building. Thinking through architectural trade-offs in conversation. When I need the AI to help me reason about a problem rather than produce an artefact, this is where that happens. The distinction matters: Claude Code is for doing. Claude Desktop is for thinking.

GitHub Copilot across my editors. IntelliJ for the Java services, PhpStorm for the PHP work, and VSCode for everything else: Angular, Go, React, Python, and lately some Ruby I have been picking up. Copilot handles the lighter-weight interactions: autocomplete, inline completions, quick suggestions while I am reading or editing code. The kind of assistance where I want the AI close to the cursor but constrained in scope. It fills lines. It does not make decisions.

That is it. I do not use Cursor, Windsurf, Replit, or any of the newer agentic IDE platforms. Not because they are bad. Because every tool I add is another context surface, another set of default behaviours, another thing that might do something I did not ask it to do. In a domain where a dispensing logic error can trigger an FDA inquiry, I want tools with clearly separated responsibilities and well-understood boundaries. It took me longer than I would like to admit to learn that. My first six months with AI tools looked like a graveyard of extensions, plugins, and experimental integrations, each promising productivity and each adding unpredictability I did not need.

The CLAUDE.md Architecture

This is where most of the leverage lives, and where most people underinvest.

Claude Code loads CLAUDE.md files at the start of every session. They go into the system prompt and stay in context for the entire conversation. If you are not using them, or if you have a single bloated file at the root, you are leaving most of the tool’s value on the table.

I use a layered architecture. It took a few iterations to get right.

The root CLAUDE.md is short. Roughly 40 lines. It covers only what applies everywhere: the tech stack overview, git conventions (conventional commits, imperative mood, no AI attribution), the test command, and a handful of behavioural rules I never want the agent to violate. “Use the simplest possible approach.” “Do not refactor code outside the scope of the current task.” “Do not add abstractions that do not exist in the current codebase.”

That last rule was born from pain. Early on, the agent kept introducing design patterns into services that used a different structural approach. Perfectly reasonable patterns in isolation. Completely inconsistent with how the existing codebase was built. I spent a full afternoon unpicking one of them before I added the constraint. Now it is the third line in the root file.

The heavier context lives in subdirectory CLAUDE.md files that load on demand as the agent moves through the codebase. Each service has its own file covering its domain logic, its data access patterns, its integration points with other services, and the compliance rules that govern its behaviour. One service’s file covers formulary mapping logic and the relationships between clinical and billing identifiers. Another covers allocation and replenishment rules, concurrency handling, and the compliance constraints that determine ordering workflows.

This matters because context is a budget, not a library. A fresh Claude Code session consumes roughly 20,000 tokens loading the system prompt and tool definitions. Every token of context you front-load is a token not available for the actual work. Anthropic’s internal testing found that unguided attempts succeed about 33% of the time. Layered files shift a meaningful chunk of the other 67% from “the agent did not know the constraint” to “the agent knew the constraint and either respected it or flagged a conflict.”

Skills: The Context You Do Not Pay For Until You Need It

CLAUDE.md files are always-on context. They load at session start and stay in the window for the entire conversation. Skills are the opposite: they are on-demand context that only fires when the agent recognises a relevant task.

A skill is a SKILL.md file that gives Claude Code a specialised playbook for a specific class of work. The file has a short header (roughly 60 tokens) that describes when it should trigger, and a full body (typically 1,500 to 2,500 tokens) that contains the actual instructions. Only the header loads at session start. The full body loads when the agent determines the skill is relevant. Five skills registered against a project cost roughly 300 tokens at session start instead of 12,500. That is a significant difference when context is your fundamental constraint.

I set up skills for the categories of work where Claude Code consistently made the same mistakes until I gave it explicit instructions. A testing skill that encodes how we structure test classes, which mocking patterns we use, and how we name test methods in each language. A service layer skill that captures the validation-orchestration-persistence separation pattern our Java services follow. A compliance annotation skill that tells the agent how to mark code that implements a regulatory requirement, so the annotations survive refactoring and the audit trail stays intact.

The key insight with skills is that they are not documentation. They are behaviour modification packaged as context. A CLAUDE.md rule says “do not do X.” A skill says “when you encounter this category of task, here is exactly how to do it, including the steps you would not think of on your own.” The CLAUDE.md prevents mistakes. The skills produce correct output from the start.

The ecosystem around skills is maturing fast. Andrej Karpathy’s skills repo crossed 78,000 GitHub stars in April 2026. Vercel’s skills.sh directory indexes public skills across every major agent. But the skills that matter most are the ones you write yourself, because they encode the specific patterns and constraints of your codebase that no public skill will ever capture. Writing a good skill takes about an hour. The return on that hour is every future session where the agent gets it right without being corrected.

What Goes in the File (and What Does Not)

There is a temptation to dump everything in. Architectural decision records, API docs, database schemas, team history. I did this at first. The agent’s output quality actually got worse. Too much context creates noise. The agent starts making connections between things that are unrelated, or treating aspirational standards as current constraints.

Here is what earns its space:

Build and test commands. The exact commands. Not a link to a wiki page. The actual shell invocation. Claude needs to run the tests after every change, and if it has to navigate to a documentation site first, it will not.

The three or four conventions that differ from the language default. Not a comprehensive style guide. Just the things Claude will get wrong without explicit instruction. When you work across Java, PHP, Angular, Go, and Python in the same week, each project has its own patterns for service layer composition, error handling, and naming. The CLAUDE.md for each captures the deviations from what Claude would assume. Standard language conventions stay out, because Claude already knows them.

Domain constraints that the code cannot express. This is the one that matters most in regulated software. The codebase is full of validation checks, locking mechanisms, and audit logging patterns that look like over-engineering if you do not know the compliance context behind them. Without that context in the CLAUDE.md, the agent might reasonably conclude a check is redundant and remove it during a refactoring pass. I have seen this nearly happen. A validation rule that exists because of a specific federal reporting requirement looks exactly like defensive overreach to an agent that does not know the regulation. The CLAUDE.md for that service now includes a one-line note explaining why the check exists. That one line is worth more than any amount of architectural documentation.

Hard boundaries. Things the agent must never do. Phrased in the imperative, no ambiguity. I keep this list short. Four rules.

What stays out:

Anything the agent can discover by reading the code. If the project structure is conventional and the naming is clear, the agent does not need a map. Let it navigate.

Historical context that does not affect current decisions. The migration from framework X to framework Y three years ago is not useful for a task today. If the migration left artefacts that need special handling, describe the artefacts, not the history.

Aspirational standards. If your file says “all services should have 90% test coverage” but the current coverage is 45%, the agent will either waste time writing unnecessary tests or get confused about why the codebase does not match the stated standard. Describe what is, not what you wish.

Where the Agent Runs Depends on What It Is Touching

I do not run Claude Code the same way everywhere. In JetBrains, IntelliJ and PhpStorm, I run it in the terminal, separate from the IDE. In VSCode, I run it inside the editor.

That is not an inconsistency. It is a risk calculation.

The JetBrains work is the enterprise pharmaceutical compliance software. Java services handling formulary logic, PHP services processing billing workflows. Code that carries regulatory weight. When I work on these codebases, I want to control exactly what the agent sees. Running Claude Code in the terminal means I point it at specific files, specific directories, specific problems. It does not have ambient awareness of everything open in the editor. It cannot wander into a file I did not intend it to read, find something it should not have found, and act on context that was there by accident rather than by design.

In April, a Cursor agent at PocketOS found a Railway API token in an unrelated file and used it to delete a production database. That happened because the agent had access to more context than the task required. When the consequences of that kind of accident are compliance violations and patient safety risks, the extra isolation of a terminal session is worth the inconvenience of switching between windows.

The VSCode work is different. Angular frontends, Go services, React components, Python scripts, the Ruby I have been learning. The consequence profile is lower. A mistake in a frontend component is a bug I fix in the next commit, not a regulatory exposure I explain to an auditor. In that context, the convenience of running Claude Code inside the editor, with its awareness of open files and project structure, is a reasonable trade-off. The agent’s ambient access to the workspace is useful rather than dangerous, because the workspace itself is lower-stakes.

The principle is not “terminal is always safer.” The principle is that the degree of isolation should match the consequence severity of the code. High-consequence, regulated code gets the tightest scope control I can manage. Lower-stakes code gets the convenience of tighter editor integration. Treating every codebase the same way is either paranoid or reckless depending on which direction you err.

Guardrails That Have Actually Fired

Guardrail lists are easy to write and easy to ignore. Let me show you one in practice.

My CLAUDE.md includes: “Never remove or weaken existing validation rules without explicit instruction. If a validation check exists, assume it is there for a regulatory reason until told otherwise.”

A few weeks ago I asked Claude Code to clean up a service class that had grown unwieldy. Standard refactoring task: extract methods, simplify control flow, remove dead code. The agent did good work on most of it. But in the diff, buried between two reasonable extractions, it had removed a null check on a billing identifier field. The check looked redundant. The field was already validated upstream. From a pure code quality perspective, the agent’s reasoning was sound: the check was defensive, and defensive checks that duplicate upstream validation are textbook dead code.

Except this check existed because the upstream validation had failed silently in production eighteen months earlier, and the downstream check was added as a safety net after a compliance incident. That context was not in the code. It was not even in the CLAUDE.md. It was in my head, because I was there when the incident happened.

The guardrail caught it. Or rather, the guardrail primed me to catch it. Because the CLAUDE.md says “assume every validation check has a regulatory reason,” I was specifically scanning the diff for weakened validations. Without that rule, I might have approved the removal as a sensible cleanup. The rule is not a cage for the agent. It is a checklist for me.

That is the honest truth about guardrails in AI-assisted development right now. They are not enforceable in the way that a type system is enforceable. They are conventions that increase the probability that a human catches a mistake before it ships. The layered permission model (CLAUDE.md rules for domain constraints, settings.json for operational permissions) shifts the default from “the agent can do anything unless something stops it” to “the agent can do a defined set of things and must ask about everything else.” That inversion of defaults is the most important architectural decision in my entire setup.

What I Am Not Doing

I want to be honest about where I draw the line, because overselling this setup would undermine everything I have written.

I do not use AI for architectural decisions. The agent can implement a design. It does not choose the design. Decisions about service boundaries, data models, integration patterns, and domain abstractions happen in conversations with other engineers, informed by regulatory requirements the agent does not understand. Claude Desktop is useful for thinking through options in these conversations. But the decision is human.

I do not use AI for production incident response. When something breaks, the priority is speed and containment. I do not have time to frame the problem for an agent. I need to read logs, check metrics, and draw on the institutional knowledge in my head about how the system behaves under failure conditions. That knowledge is not in any CLAUDE.md file. It might never be.

I do not use AI to make review judgements. The agent prepares briefings for review (I delegate investigation to subagents that read the changes, summarise the intent, and flag modified validation rules or API contracts). But the judgement call, whether a change is architecturally sound, whether it respects the compliance constraints, whether it introduces subtle interaction effects, is mine.

These are not temporary limitations I expect to outgrow as the models improve. These are categories of work where human judgement is the product. Delegating them would not save time. It would create risk.

The Philosophy, If You Want One

Here is the shortest version of what I believe about AI-assisted engineering in 2026, distilled from a year of using these tools on software that carries regulatory consequences:

The agent is fast. You are right. Use the speed for the things where speed helps: implementation, refactoring, test generation, boilerplate. Do not use it for the things where speed is irrelevant or actively dangerous: architecture, incident response, compliance reasoning, the decision about whether the thing should be built at all.

The agent is confident. It will always produce output. It will never tell you it does not know enough to do the task safely. That confidence is the feature and the risk. Your job is to build the structure (CLAUDE.md files, permission boundaries, scope control, review discipline) that channels the confidence toward useful work and catches it when it wanders.

The agent is amnesiac. It does not remember the last session. It does not know what happened in production last week. It does not carry the institutional knowledge that makes you a senior engineer. Every session is day one. The CLAUDE.md files are your attempt to compress years of context into a few hundred tokens. They will always be incomplete. Knowing what they miss is as important as knowing what they contain.

None of this is going to be in the marketing material. But it is what the workflow actually looks like when the code matters.

The Senior Engineer's Compass

Discussion about this post

Ready for more?