Vibe Coding at NERSC

Practical AI-Assisted Coding for NERSC Users

NUG Community Call: Grads@NERSC

April 16, 2026

What Is Vibe Coding

And What It Is Not

Vibe Coding

People use vibe coding to mean “let the model write the code.”

That is not the goal here.

This talk is about coding with agents, not surrendering to them:

  • You describe the task and constraints
  • The model proposes code, commands, tests, or refactors
  • You run, inspect, and verify
  • You stay in charge

What Is Agentic AI

Agentic AI means the model does more than answer a question.

It can:

  • Understand a task
  • Break it into steps
  • Use tools
  • Check results
  • Iterate toward a goal

That shifts AI from chatting to doing.

Chatbot vs Coding Agent

Chatbot

  • Explains
  • Drafts
  • Waits for your next prompt

Coding agent

  • Reads files
  • Runs commands
  • Edits code
  • Checks results

That difference matters for real technical work.

Why It Matters

For NERSC Users

Why This Matters at NERSC

NERSC workflows often involve:

  • Slurm job scripts
  • Modules and environments
  • MPI or GPU launch details
  • Filesystem paths and data movement
  • Analysis pipelines
  • Logs and scheduler errors

These are exactly the kinds of tasks where AI can remove a lot of mechanical effort.

Good First Use Cases

  • Drafting or improving Slurm scripts
  • Explaining squeue, sacct, or job failures
  • Translating shell workflows into a more structured scripting workflow
  • Generating plotting or post-processing scripts
  • Cleaning up notebooks
  • Writing README files for reproducible workflows

Less Suitable Use Cases

Be careful when:

  • Performance is critical
  • Security decisions matter
  • The model is guessing about MPI, GPU, or filesystem behavior
  • You are handling secrets or sensitive data

If correctness really matters, the model does not get the final vote.

How It Works

The Building Blocks

Building Blocks

Agentic AI usually combines:

  • LLM for reasoning and language
  • User interface such as CLI, IDE, or web UI
  • Tools for shell, file editing, search, or APIs
  • Knowledge from docs, code, logs, and configs
  • Orchestration for retries, planning, and safety checks

Why Tool Use Matters

For NERSC-style work, useful tools include:

  • Reading scripts and config files
  • Running shell commands
  • Searching code and docs
  • Parsing logs and tracebacks
  • Editing scripts and tests

Tool use is what makes the model useful beyond brainstorming.

Why CLI Agents Matter

CLI agents fit HPC work well because they can:

  • Inspect repositories directly
  • Read config files and logs
  • Run tests and linters
  • Suggest small patches
  • Fit naturally into terminal-heavy workflows

That matches how many NERSC users already work.

Beyond Basic Chat

RAG, MCP, and Skills

RAG In Practice

RAG gives the model relevant external context.

At NERSC, that might include:

  • Project documentation
  • README files
  • User notes
  • Job logs
  • Local workflow docs

That helps the model answer from your environment instead of generic assumptions.

MCP Servers

MCP (Model Context Protocol) is a standard way to expose tools and data to LLMs.

Why it matters:

  • Reusable tools across agents
  • Structured tool access
  • Clearer safety boundaries

Think of MCP as a cleaner way to let the model use capabilities instead of just text.

Skills And Reusable Workflows

A useful agent workflow often depends on reusable skills:

  • Run a validation script
  • Summarize recent job failures
  • Check workflow status
  • Convert outputs into a report or figure

Good skills reduce ambiguity and make the model less likely to hallucinate.

Coding Assistants

What the Tooling Looks Like

Types of Coding Assistants

There are several ways to code with AI:

  • Chat web interfaces
  • Cloud-hosted agents
  • AI-augmented IDEs
  • Command line interfaces

The exact vendor matters less than the workflow you build around it.

Installing a Coding Agent

Most command line agents rely on a recent Node installation:

# install nvm
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.3/install.sh | bash

# install node / npm
nvm install --lts

Then install an agent:

npm install -g @anthropic-ai/claude-code
npm install -g @openai/codex
npm install -g @google/gemini-cli

Start Small

Good starter prompts:

  • “Explain this Slurm script and suggest small improvements.”
  • “Here is my traceback from Perlmutter. Rank the likely causes.”
  • “Convert this shell preprocessing workflow into Python.”
  • “Summarize what this sacct output says about the failed job.”

Start with tasks you can verify quickly.

Best Practices

Getting Better Results

Good Engineering Helps Good AI

Agents work better when your workflow is already disciplined:

  • Break large tasks into smaller ones
  • Use Git commits between working states
  • Run linters, compilers, and tests
  • Give the model checks it can run itself

If the model can inspect the result, it can often help fix its own mistakes.

Context Engineering

The less irrelevant context you include, the better the model performs.

Useful habits:

  • Write AGENTS.md or similar instructions
  • Document project structure and build steps
  • Mention cluster-specific assumptions explicitly
  • Start fresh sessions when switching tasks

For HPC work, environment details matter a lot.

Prompting That Works

A good prompt usually includes:

  • The goal
  • The target system, such as Perlmutter
  • Relevant files
  • Current scripts or config
  • Logs, stderr, or outputs
  • Clear success criteria

Ask for: the smallest viable patch, explicit assumptions, ranked hypotheses, and verification steps.

Weak Prompt vs Strong Prompt

Weak

“Help me run my code on NERSC.”

Strong

“I am running a Python MPI workflow on Perlmutter. Here is my Slurm script, module list, and stderr. The job hangs after initialization. Propose the smallest script changes to improve launch reliability and explain why.”

A Practical Workflow

  1. Start with a concrete task
  2. Provide the exact script, log, or config involved
  3. Ask for the smallest useful next step
  4. Run and inspect the result
  5. Iterate based on evidence

Small loops beat giant prompts.

Bigger Projects

For larger efforts:

  1. Write a specification
  2. Ask the model to identify ambiguities
  3. Have it produce an implementation plan
  4. Use a fresh coding session to execute that plan

It is slower up front, but usually more reliable.

Multi-Agent Use

There are good reasons to use several agents:

  • Try a different model when one gets stuck
  • Use one for debugging and another for implementation
  • Split independent tasks across separate sessions
  • Pair this with Git branches or worktrees when needed

More parallelism can help, but only if you review the outputs carefully.

Subagents

Subagents are specialized helper agents inside a larger workflow.

Instead of one agent doing everything, you can delegate side tasks to smaller workers:

  • a read-only explorer
  • a debugger
  • a reviewer
  • a docs researcher

That keeps the main conversation cleaner and lets each worker focus on one job.

Why Subagents Help

Subagents are useful when a side task would otherwise pollute your main context:

  • searching a large codebase
  • reading long logs
  • mapping affected files
  • checking documentation

They also make parallel work possible.
One worker can explore, another can review, and the main agent can integrate.

Good Subagent Patterns

Good subagents are:

  • narrow in scope
  • clear about when to use them
  • limited to the tools they actually need
  • cheap and fast when possible

Common patterns:

  • read-only explorer
  • code reviewer
  • debugger
  • browser tester
  • docs or API researcher

Safety and Security

Keep Humans In Charge

Human Responsibilities Do Not Go Away

You still need to:

  • Define success clearly
  • Review important changes
  • Check correctness against real outputs
  • Protect secrets and sensitive data
  • Benchmark when performance matters

Agents increase leverage, not accountability.

Security Considerations

Keep these in mind:

  • Avoid unrestricted command execution in unsafe environments
  • Do not paste credentials or private keys into external tools
  • Be careful with unpublished or sensitive research data
  • Be cautious with third-party plugins, tools, and MCP servers

Sandboxing

Sandboxing means letting the agent run commands inside enforced boundaries.

Those boundaries usually cover:

  • which files it can read or write
  • whether it can use the network
  • when it must stop and ask for approval

The point is not just trust.
The point is trust plus technical limits.

Why Sandboxing Matters

Without sandboxing, agent autonomy can turn into:

  • approval fatigue
  • accidental damage
  • easier data exfiltration
  • risk from prompt injection or malicious dependencies

With sandboxing, the agent can do routine work inside a safe boundary,
and pause only when it needs to go beyond it.

Common Sandbox Modes

Most tools end up with three practical levels:

  • Read-only
    Inspect files, but do not modify them
  • Workspace-write
    Edit inside a project boundary and run routine local commands
  • Full access
    No meaningful sandbox boundary

For most local work, the sweet spot is usually:
workspace write + approval on request

Sandbox Design Principles

  • Start restrictive and widen only as needed
  • Filesystem isolation matters
  • Network isolation matters too
  • Subprocesses should inherit the same boundary
  • Approvals and sandboxing are different layers

If an agent can run code, package managers, tests, or build tools,
those subprocesses need the same rules as the parent.

Open Sandbox Options

If you want stronger or self-hosted isolation, there are open tools in this space:

  • OpenSandbox
    General sandbox infrastructure for commands, files, browsers, and dev tools
  • microsandbox
    VM-level isolation with very fast startup, designed for untrusted code
  • SmolVM
    MicroVM runtime for agent workloads with strong isolation
  • E2B
    Sandbox platform for running agent code and tools, with self-hosted and BYOC options

Common Failure Modes

Coding agents often fail by:

  • Inventing commands or flags
  • Assuming the wrong module or environment setup
  • Misreading scheduler behavior
  • Making large risky rewrites
  • Producing code that looks plausible but was never run

This is why verification matters.

Verification Is The Whole Game

Ask:

  • Does the job submit?
  • Does it run to completion?
  • Do tests pass?
  • Does the output make sense?
  • Is performance still acceptable?
  • Does the explanation match the observed behavior?

Trust should be proportional to evidence.

Practical HPC Examples

Where This Helps

Example: Drafting A Slurm Script

User intent:

“Run this Python preprocessing step on 4 GPU nodes, save logs, and write outputs to scratch.”

Agent support:

  • Draft the Slurm script
  • Add environment setup
  • Add log file paths
  • Suggest a dry-run checklist
  • Point out missing assumptions

Example: A Perlmutter GPU Job

#!/bin/bash
#SBATCH --constraint=gpu
#SBATCH --gpus=4
#SBATCH --nodes=1
#SBATCH --time=00:10:00
#SBATCH --account=<your_account>
#SBATCH --qos=regular
#SBATCH --job-name=prep
#SBATCH --output=slurm-%j.out

module load python
srun -n 4 python preprocess.py --input data.h5 --out $SCRATCH/results

The agent can draft this.
You still need to verify the resources, account, modules, launch pattern, and output path.

Example: Debugging A Failed Job

Useful prompt ingredients:

  • The sbatch script
  • The exact error output
  • Relevant sacct or log excerpts
  • The software environment
  • What changed since the last successful run

Ask the agent to rank likely causes and suggest the smallest next diagnostic step.

Example: Debugging With sacct

sacct -j 12345678 --format=JobID,JobName,Partition,Account,AllocTRES,State,ExitCode,Elapsed

Good prompt:

“Here is the sacct output for my failed Perlmutter job and my batch script. Explain what the state and exit code suggest, then propose the next two debugging steps.”

Example: Converting A Workflow

Many users have shell-heavy workflows.

An agent can help:

  • Convert shell pipelines into Python
  • Add argument parsing
  • Improve logging
  • Add validation checks
  • Make the workflow easier to rerun and document

Example: Scaling A Training Workflow

One useful NERSC example is the
nersc-dl-multigpu vibe-coding demo:

  • start from a notebook or single-GPU workflow
  • turn it into a cleaner training script
  • add checkpoint-restart support
  • adapt it for multi-GPU or multi-node launches on Perlmutter

This is a good example of where agents help with workflow evolution,
not just code generation.

Repo:
https://github.com/sparticlesteve/nersc-dl-multigpu/tree/vibecoding-demo

What Makes That Example Useful

It shows a realistic progression:

  • interactive experimentation
  • script cleanup and modularization
  • batch submission
  • distributed launch setup
  • checkpointing and restart

That is exactly the kind of multi-step engineering work where an agent can
save time, while the user still has to verify launch details, performance,
and correctness.

Example: Documentation Acceleration

Agents are also useful for:

  • Turning working notes into a README
  • Writing a quickstart from command history
  • Summarizing analysis steps
  • Producing cleaner instructions for collaborators

This is often one of the highest-return use cases.

Common AI Mistakes On HPC Systems

Be especially skeptical when the model:

  • Invents sbatch or srun flags
  • Confuses login-node work with compute-node work
  • Assumes pip install is always the right move
  • Guesses the wrong module names or versions
  • Mixes up $HOME, $SCRATCH, and project storage
  • Suggests an MPI launch pattern that does not match your code
  • Assumes GPU access without the right Slurm constraints

These are common failure modes, not rare edge cases.

Slurm And Module Advice Needs Verification

Questions to check every time:

  • Is the queue or qos valid?
  • Are node counts and GPU counts consistent?
  • Is the account correct?
  • Are the module names real on this system?
  • Does the launch command match the application model?
  • Is the filesystem path appropriate for the workload?

If the model cannot answer from real local context, it is guessing.

Prompt Pattern For NERSC Tasks

System: Perlmutter
Goal: Run a PyTorch training script on 2 GPU nodes
Files: train.py, env.sh, job.slurm
Problem: Job exits immediately after launch
Evidence: stderr, slurm output, module list, conda env
Constraint: Keep the current Python environment
Ask: Propose the smallest patch and list verification steps

This gives the model enough structure to be useful.

Other Useful Command Line Tools

A few tools pair especially well with agents:

  • Marp for slide generation
  • Git worktree for isolated parallel experiments
  • MarkItDown for converting documents into Markdown
  • Existing project scripts and CLI tools

Often the best workflow builds on tools you already trust.

Closing Thought

Capability + Control

Closing Thought

Agentic AI is about capability + control:

  • Capability comes from tools, memory, and orchestration
  • Control comes from clear boundaries, permissions, and human oversight

When done well, it turns a chatbot into a practical collaborator for HPC work.

If You Only Remember Four Things

  1. Give the model real context
  2. Ask for small concrete steps
  3. Verify every meaningful result
  4. Never outsource judgment

References

References

Thank you!

Questions?