Which local models actually work with Claude Code on a 48GB MacBook Pro?

Claude Code Ollama Local models qwen devstral

Claude Code running with devstral-128k via Ollama

I Tested 18 Local Models So You Don’t Have To

Ollama released Anthropic API compatibility in January 2026, so I tested 18 local models with Claude Code to find out which ones actually work for agentic coding tasks.

TL;DR

  1. devstral-small-2:24b is the winner - best quality, fastest, zero interventions
  2. You MUST configure context window - Ollama defaults to 4K; use 64K minimum
  3. Expect 12-24 min for tasks that take ~2 min with Opus 4.5 - but it works!

My Setup

Spec Value
Machine MacBook Pro
Chip Apple M4 Pro
RAM 48 GB unified memory
Ollama v0.14.2

Models

Here’s everything I tested, sorted by size:

Model Size Release SWE-bench Type
nemotron-3-nano:30b 24GB Dec 2025 - MoE
cogito:32b 20GB Jul 2025 - Hybrid reasoning
granite4:32b-a9b-h ~20GB Oct 2025 - General-purpose
command-r:35b 19GB Mar 2024 - RAG-optimized
qwen2.5-coder:32b 19GB Nov 2024 9.0% Coding
deepseek-r1:32b 19GB Jan 2025 41.4% Reasoning
qwen3-coder:30b 18GB Jul 2025 51.6% Coding
qwen3:30b 18GB Apr 2025 - General-purpose
devstral-small-2:24b 15GB Dec 2025 68.0% Agentic coding
mistral-small3.2:24b 15GB Jun 2025 - General-purpose
magistral:24b 14GB Jun 2025 - Reasoning
gpt-oss:20b 14GB Aug 2025 - General-purpose
cogito:14b 9GB Jul 2025 - Hybrid reasoning
deepseek-coder-v2:16b 8.9GB Jun 2024 - Coding (no tools)
rnj-1:8b 5.1GB Dec 2025 20.8% General-purpose
phi4-mini:3.8b 2.5GB Feb 2025 - General-purpose
granite4:3b 2.1GB Oct 2025 - General-purpose
functiongemma:270m 301MB Dec 2025 - Function calling

Experiments

I chose a very simple task: run /init on a repo (jupyterlab-latex) to generate CLAUDE.md, which is normally the first thing I do in a new repo. It’s deceptively hard though - the model has to discover tools, explore multiple files, and synthesize documentation without hallucinating. One or two runs per model; treat results as field notes.

My first two models (nemotron, gpt-oss) used Ollama’s default context window - which is how I discovered the 4K limit issue. After that, I set context to 64K+ in Ollama’s settings.

nemotron-3-nano:30b

My first attempt revealed a critical failure mode. With the default context window, the model’s thinking block explicitly shows it decided to skip reading files entirely:

“We don’t have details of repo… There haven’t been any reads yet… Let’s assume typical repo structure”

Instead of using tools to explore, it fabricated an entire codebase structure. The output described a React/Node.js monorepo with /frontend and /backend directories - neither of which exist in jupyterlab-latex (a Python/TypeScript JupyterLab extension). It invented commands like npm run dev and referenced non-existent config files.

This failure led me to discover Ollama’s default 4K context limit. After configuring a 128K context window, subsequent attempts worked much better:

Read → Glob → Read → Read → Read → Read → Glob → Read → Write

The model properly explored the codebase, but still stopped mid-task and required a follow-up prompt (“Continue”) to finish. Final output was accurate and high quality - proving the model can work, but context configuration is critical.

gpt-oss:20b

Also tested early with the default context window. Fast but unreliable:

  • Direct prompt: Finished quickly but low quality output
  • /init skill: Tool parameter errors, empty results, needed intervention
Sautéed for 2m 37s  (Claude Code's task timer)

devstral-small-2:24b ⭐ Winner

With 128K context configured from the start, this was a perfect run. The model immediately understood the task:

“I’ll analyze this codebase and create a CLAUDE.md file with the essential information for future instances.”

Tool call sequence shows direct, confident tool usage:

Bash → Bash → Bash → Read → Bash → Bash → Bash → Read → Read → Read → Bash → Write

No confusion about subagents or tool parameters - it went straight for Bash and Read to explore the codebase, then used Write to create the output.

The output was 180 lines of documentation with actual function names, Python config examples, and a 5-step communication flow diagram. Every file reference checked out - no hallucinations.

Why did devstral outperform? Mistral trained it specifically for SWE-Bench (68.0% score) and tool-use scenarios. You can see it in the tool calls - direct and confident, no subagent confusion.

Sautéed for 17m 12s

qwen3-coder:30b

Also configured with 128K context. The model’s first instinct was to delegate to a subagent. From the session trace, it tried to spawn an Explore agent twice:

{
  "description": "Explore codebase structure",
  "prompt": "Explore the structure of this JupyterLab LaTeX extension repository...",
  "subagent_type": "Explore"
}

This isn’t an Ollama bug, but a mismatch between what Claude Code can do in a given environment and what the model decides to attempt. Claude Code has a notion of subagents (like an “Explore” helper), but in my setup those weren’t available/configured, so that tool call fails. Ollama’s docs do advertise Claude Code usage, though, so it’s worth calling out explicitly: with third-party models, you should expect occasional “tooling weirdness” like this even if the transport API is compatible.

When the Task tool failed (subagents weren’t configured), qwen3-coder adapted gracefully. Tool sequence shows the recovery:

Task → Task → Bash → Read → Read → Read → Read → Read → Read → Read → Read → Write

After two failed Explore attempts, it switched to direct Bash and Read tools and completed the task without further intervention. Output quality was good - accurate, no hallucinations, but less detailed than devstral (86 lines vs 180).

Sautéed for 23m 48s

granite4:32b-a9b-h

An interesting comparison point - this is IBM’s general-purpose 32B model, not a coding specialist. With 128K context configured, it completed the task in under 7 minutes - the fastest successful run.

The trade-off: minimal exploration. Tool sequence:

Read → Write

Just two tool calls - read the README, write CLAUDE.md. No codebase exploration, no package.json check, no architecture analysis. The output was decent:

  • ✅ Correct project type (JupyterLab LaTeX extension)
  • ✅ Correct commands (jlpm run build, jlpm run watch)
  • ✅ Mermaid architecture diagram
  • ⚠️ Some hallucinated details (referenced src/components/Toolbar.tsx without verifying it exists)

At 32K context, it stalled - started correctly (Glob → Read), but got stuck after reading files and never produced output. A different failure mode than devstral’s 32K hallucination.

Verdict: Works, but lazy. General-purpose models can complete agentic tasks but tend to “wing it” with minimal tool use, while coding specialists explore more thoroughly.

Sautéed for ~7m

qwen3:30b

The general-purpose Qwen3 (not the coder variant). This was the worst performer - pure hallucination with zero exploration.

Tool sequence:

Write

Just one tool call. The thinking block is revealing - it explicitly acknowledged it couldn’t see files but proceeded anyway:

“Since I can’t actually see the files, I’ll have to rely on the context provided.”

It inferred file structure from git status in the system prompt, then fabricated everything:

  • python jupyterlab_latex/build.py - wrong command (should be jlpm run build)
  • latex_cleanup.py - fabricated filename
  • flake8 - assumed linter without checking

At 128K context, it consumed 31GB RAM (vs 18GB on disk) - pushing my 48GB system into swap. The memory pressure may have contributed to its laziness, but the thinking block shows it consciously chose to guess rather than explore.

Key finding: The coder fine-tuning isn’t just about coding knowledge - it teaches the model to actually use tools instead of guessing. qwen3-coder explored properly; qwen3 base hallucinated everything.

Sautéed for ~5m

qwen2.5-coder:32b

Failed. Despite having 128K context configured, through multiple attempts it kept reaching for the Explore subagent tool and then abruptly stopping without completing any work. Unlike qwen3-coder which recovered when Explore failed, qwen2.5-coder couldn’t adapt. Same model family, different generation, completely different behavior when things go wrong.

mistral-small3.2:24b

Failed - hallucinated tool parameters. This model understands it should use tools but invents wrong parameter schemas. From the session trace, it tried to call the Task tool with made-up parameters:

// Attempt 1:
{"instruction": "...", "max_depth": 100}

// Attempt 2:
{"subagent_name": "Explore", "subagent_type": "Explore", "subagent_prompt": "..."}

The actual required parameters are description and prompt. When it received clear error messages explaining this, it simply repeated “I’m going to use the Task tool…” and stopped - unable to self-correct.

This is a different failure mode than hallucinating content (qwen3) or refusing (functiongemma). The model has learned about tools but not the actual invocation format. Worth noting: devstral-small-2 is also a Mistral model and works perfectly - the difference is devstral’s agentic specialization.

Memory: 37GB loaded at 128K context (vs 15GB on disk).

magistral:24b

Failed - narrated tools instead of invoking them. This new Mistral reasoning model understood the task and knew which tools to use, but wrote out tool calls as text instead of actually executing them:

"Let me use the Glob tool to find these patterns:

```bash
Glob pattern: **/README.md
Glob pattern: .github/readme*
...
```

Now that I have the relevant files, let's analyze..."

Zero actual tool calls were made. The model described what it would do, assumed the tools had run, and proceeded to the next step. This suggests training on tool documentation without actual tool-use interactions.

Memory: 23GB loaded at 128K context (vs 14GB on disk).

Native context limitation: magistral’s native context is only 39K. Even with Ollama allocating 128K, the model may not effectively use context beyond its training limit - which could explain why it never received the tool invocation format.

cogito:32b

Failed - memory issues and context-limited stall. This hybrid reasoning model has different failure modes depending on context configuration:

At 128K context: Loaded 64GB into memory (41% CPU / 59% GPU split). On my 48GB system, this caused severe memory thrashing - spiky memory pressure, swap usage, and zero tokens produced after 5+ minutes.

At 64K context: Loaded 42GB (8% CPU / 92% GPU). Still tight but runnable. Same stalling behavior.

At 32K context: Loaded 30GB (100% GPU). Actually started working! Made correct Glob and Read calls, explored the codebase properly:

Glob → Read README.md → "Let me create a todo list..."

But then it just… stopped. Said “Let me start with writing the overview section first” and ended without writing anything. Even nudging with “continue” prompt didn’t help - completely stuck.

This is the same pattern as granite4:32b at 32K context: can explore but can’t complete. 32K context is insufficient for task completion - the model loses track of the goal mid-execution.

cogito:14b

Failed - multiple tool issues. Testing the smaller cogito variant to see if the 7-15B range had any surprises. It did, but not good ones.

Memory: Even at 9GB on disk, loaded 45GB at 128K context with 15% CPU offload. At 64K context it was more manageable.

Tool sequence shows multiple failure modes:

Read README.md ✅ → Read copilot-instructions.md ✅ (not found) →
WebSearch ❌ (hallucinated) → TodoWrite ❌ (wrong params, twice) →
Printed CLAUDE.md as text ⚠️
  1. Hallucinated WebSearch - tool doesn’t exist in Claude Code, got empty results
  2. Wrong TodoWrite params - missing required activeForm field, tried twice without learning
  3. Never used Write tool - just printed the CLAUDE.md content as markdown text instead of writing to file

The generated content was actually reasonable - correct commands, accurate architecture. But the model “completed” the task by printing output rather than writing the file. It understood the goal but couldn’t execute properly.

Time: ~7.7 minutes

The cogito family (both 32b and 14b) consistently fails with Claude Code’s tool schemas - different sizes, different failure modes, same outcome.

command-r:35b

Failed - nested tool parameter schema. The last untested model in the viable 15-35B range. At 128K context it didn’t fit on my GPU. At 64K and 32K it loaded but failed with the same tool schema issue.

From the trace, the model wrapped all tool parameters in a nested structure:

{
  "tool_name": "Task",
  "parameters": {
    "description": "...",
    "prompt": "...",
    "subagent_type": "general-purpose"
  }
}

The correct format is flat parameters at the top level. It made 4 tool calls (3 Task, 1 TodoWrite) - all failed with validation errors like “required parameter description is missing” because the nesting caused parameters to be undefined at the expected level.

Unlike mistral-small3.2 which invented wrong parameter names, command-r uses the correct parameter names but wraps them incorrectly. When it received validation errors, it didn’t retry - just output a text-based “Action Plan” and stopped.

This suggests Cohere’s tool-calling format differs from the Anthropic API schema. The model was trained on a different tool invocation structure.

Context comparison:

  • 32K: 4 tool calls, all failed, gave up quickly (~7 min)
  • 64K: 29 tool calls, all failed, kept retrying same broken schema (~9.5 min)

More context didn’t help - it just gave the model more runway to keep failing the same way. It never learned from the error messages.

Results

✅ Worked

Model Quality Time Notes
devstral-small-2 Excellent 17 min No hallucinations, no interventions
qwen3-coder Good 24 min Recovered after Explore failed
granite4:32b Good ~7 min Fast but lazy, minor hallucinations*

⚠️ Completed With Issues

Model Quality Time Issue
gpt-oss:20b Low ~3 min Needed intervention
nemotron-3-nano Mixed - Hallucinated on first attempt
qwen3:30b Poor ~5 min Zero tool calls, fabricated everything

❌ Failed

Model Time Failure Mode
qwen2.5-coder:32b - Stuck on Explore subagent
mistral-small3.2:24b - Wrong tool parameter schema
magistral:24b - Narrated tools instead of invoking
cogito:32b - Memory thrashing, context stall
cogito:14b ~8 min Hallucinated WebSearch tool
command-r:35b 7-10 min Nested tool parameters
deepseek-r1:32b - No tool support in Ollama
deepseek-coder-v2:16b - No tool support in Ollama
functiongemma:270m - Refuses everything
granite4:3b - Hallucinates without tools
phi4-mini:3.8b - Invents fake tool names
rnj-1:8b - Silent, zero output

*granite4:32b referenced files it never verified existed. It “works” in the sense that it completes the task and produces usable output, but you’d want to review it before trusting it. devstral and qwen3-coder are trustworthy out of the box.

Winner: devstral-small-2 - best quality, smallest footprint, zero interventions.

Model Outputs

Compare the actual CLAUDE.md files generated by each model. Use the tabs to switch between models, or click the side-by-side button to compare them directly:

devstral-small-2

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

This is jupyterlab-latex, a JupyterLab extension for live-editing of LaTeX documents. It provides:

  • Real-time LaTeX compilation and PDF preview
  • SyncTeX support for editor-PDF synchronization
  • Writing tools (text formatting, lists, tables, plots)
  • Mathematical constants and symbols insertion

The extension consists of:

  1. Frontend (TypeScript): JupyterLab UI components and plugins
  2. Backend (Python): Server extension for LaTeX compilation

Development Setup

Prerequisites

  • Node.js (for building TypeScript)
  • Python 3.8+
  • JupyterLab 4.0+
  • LaTeX distribution (e.g., TeX Live, MiKTeX) with pdflatex, xelatex, or tectonic

Installation

# Development install
pip install -e .

# Link with JupyterLab
jupyter labextension develop . --overwrite
jupyter server extension enable jupyterlab_latex

# Build TypeScript
jlpm run build

Building

  • jlpm run build: Build TypeScript and labextension
  • jlpm run build:prod: Production build (clean + minified)
  • jlpm run watch: Watch mode for development
  • jupyter labextension build .: Build labextension

Linting

  • jlpm run lint: Run ESLint, Prettier, and Stylelint
  • jlpm run eslint: Run ESLint only
  • jlpm run prettier: Run Prettier formatting
  • jlpm run stylelint: Run Stylelint

Cleaning

  • jlpm run clean: Clean build artifacts
  • jlpm run clean:all: Clean everything (lib, labextension, lint cache)

Architecture

Frontend Structure (TypeScript)

Location: src/

  • index.ts: Main plugin entry point

    • Defines JupyterFrontEndPlugin for LaTeX extension
    • Sets up commands, toolbar buttons, and context menu items
    • Manages PDF preview and SyncTeX synchronization
  • pdf.ts: PDF viewer components

    • PDFJSViewer: PDF rendering widget
    • PDFJSViewerFactory: Document registry factory
    • PDFJSDocumentWidget: Widget wrapper
  • error.tsx: Error panel component for LaTeX compilation errors

  • pagenumber.tsx: Page number display component

Backend Structure (Python)

Location: jupyterlab_latex/

  • __init__.py: Server extension entry point

    • Registers /latex/build and /latex/synctex handlers
    • Sets up Tornado web handlers
  • build.py: LaTeX compilation handler

    • LatexBuildHandler: Main API handler for compilation
    • latex_cleanup(): Context manager for file cleanup
    • build_tex_cmd_sequence(): Builds LaTeX command sequences
    • filter_output(): Filters LaTeX warnings/errors
    • run_latex(): Executes LaTeX commands
  • synctex.py: SyncTeX handler for editor-PDF synchronization

  • config.py: Configuration schema

    • LatexConfig: Configuration class with settings for:
      • latex_command: LaTeX engine (xelatex, pdflatex, tectonic)
      • bib_command: BibTeX command
      • run_times: Number of compilation passes
      • shell_escape: Security setting (restricted/allow/disallow)
      • cleanup: Whether to clean intermediate files
      • manual_cmd_args: Custom command arguments
      • disable_bibtex: Skip BibTeX compilation
  • util.py: Utility functions

    • run_command(): Execute shell commands with async support

Key Features

  1. Live Preview: Opens PDF alongside .tex editor, auto-updates on save
  2. SyncTeX: Click in PDF to jump to source, click in source to jump to PDF
  3. Toolbar: Adds buttons for formatting (bold, italic), sub/superscripts, lists, tables, plots
  4. Menu: Constants (π, φ) and symbols (≤, ≥, ∈) insertion
  5. Error Handling: Shows compilation errors in a dedicated panel

Communication Flow

  1. User saves .tex file → fileChanged signal emitted
  2. Frontend calls /latex/build API endpoint
  3. Backend runs LaTeX compilation
  4. PDF is generated/updated → displayed in PDF viewer
  5. SyncTeX data generated for navigation

Configuration

Users can configure the extension via jupyter_notebook_config.py:

# Use pdflatex instead of xelatex
c.LatexConfig.latex_command = 'pdflatex'

# Custom compilation command
c.LatexConfig.manual_cmd_args = [
    'lualatex',
    '-interaction=nonstopmode',
    '-shell-escape',
    '{filename}.tex'
]

# Disable BibTeX compilation
c.LatexConfig.disable_bibtex = True

# Clean intermediate files
c.LatexConfig.cleanup = True

Testing Approach

The project uses:

  • TypeScript with ESLint for frontend
  • Python for backend
  • No dedicated test suite found in repository

Important Files

  • package.json: Frontend dependencies and scripts
  • pyproject.toml: Python package configuration
  • schema/plugin.json: JupyterLab settings schema
  • README.md: User documentation
  • CHANGELOG.md: Release notes

Common Development Tasks

  1. Adding a new toolbar button:

    • Create button in EditorToolbarPanel class in src/index.ts
    • Add icon SVG in style/icons/
    • Register command with app.commands.addCommand()
  2. Adding a new menu item:

    • Add to addLatexMenu() function
    • Create command handler
  3. Modifying LaTeX compilation:

    • Update build_tex_cmd_sequence() in jupyterlab_latex/build.py
    • Ensure SyncTeX flags are preserved
  4. Adding new symbols/constants:

    • Extend the constants or symbols maps in addLatexMenu()

Notes

  • The extension uses PDF.js for rendering PDFs in the browser
  • SyncTeX requires .synctex.gz files generated by LaTeX
  • File cleanup is optional and controlled by cleanup config
  • Manual command arguments support {filename} placeholder
qwen3-coder

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Overview

This is the JupyterLab LaTeX extension, which provides live editing of LaTeX documents within JupyterLab. It allows users to compile and preview LaTeX documents in real-time, with features like SyncTeX support for bidirectional navigation between source and preview.

Key Architecture Components

Frontend (TypeScript/JavaScript)

  • The extension is built using JupyterLab's extension system
  • Core frontend functionality is in src/index.ts
  • Uses JupyterLab's application framework, widget system, and services
  • Implements a PDF viewer using pdfjs-dist for displaying compiled LaTeX output
  • Provides toolbar buttons and context menu items for LaTeX editing tools
  • Implements SyncTeX functionality for bidirectional navigation between editor and PDF

Backend (Python)

  • Server extension in jupyterlab_latex/ directory
  • Handles LaTeX compilation through the Jupyter Server API
  • Uses tornado.process.Subprocess for executing LaTeX commands
  • Provides API endpoints at /latex/build and /latex/synctex
  • Configuration is handled through jupyterlab_latex/config.py using traitlets

Build System

  • Uses TypeScript for frontend with tsc compiler
  • Uses JupyterLab's build system with @jupyterlab/builder
  • Uses yarn for package management
  • Builds both frontend and backend extensions

Development Setup

To develop this extension, you need:

  1. NodeJS for building the frontend
  2. Python 3.8+ for the server extension
  3. JupyterLab 4.0+ for running the development environment

Development commands:

  • jlpm install - Install dependencies
  • jlpm build - Build the extension
  • jlpm watch - Watch for changes and rebuild automatically
  • jupyter labextension develop . --overwrite - Link development version
  • jupyter server extension enable jupyterlab_latex - Enable server extension

Key Files and Directories

  • src/index.ts - Main extension entry point
  • jupyterlab_latex/__init__.py - Extension initialization
  • jupyterlab_latex/build.py - LaTeX compilation handler
  • jupyterlab_latex/config.py - Configuration handling
  • jupyterlab_latex/util.py - Utility functions for running commands
  • package.json - Extension metadata and build scripts
  • pyproject.toml - Python package configuration
  • README.md - Documentation and usage instructions

Key Features

  1. Real-time LaTeX compilation and preview
  2. SyncTeX support for bidirectional navigation between source and PDF
  3. Toolbar buttons for common LaTeX editing tasks (subscript, superscript, lists, etc.)
  4. Context menu integration for preview and SyncTeX actions
  5. Support for multiple LaTeX engines (xelatex, pdflatex, tectonic)
  6. Configuration options for customizing compilation commands
  7. BibTeX support for bibliography compilation
  8. Error handling and display of compilation errors

Testing

The extension can be tested by:

  1. Installing in development mode
  2. Opening a .tex file in JupyterLab
  3. Using the preview toolbar button or context menu
  4. Verifying that LaTeX compilation works and PDF is generated
  5. Testing SyncTeX functionality by clicking in PDF and editor

Common Development Tasks

  1. Adding new toolbar buttons or menu items
  2. Modifying LaTeX compilation commands or options
  3. Enhancing error handling or display
  4. Adding new LaTeX editing features
  5. Improving SyncTeX integration
  6. Customizing the extension's appearance or behavior
granite4-32b

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Common Development Commands

  • Build: jlpm run build - Compile the TypeScript source and generate JS bundles.
  • Watch Build: jlpm run watch - Continuously rebuild on file changes while you develop.
  • Run JupyterLab (with extension enabled): jupyter lab - Launch a local JupyterLab instance that loads this extension. The server extension must be installed with pip install -e . or conda install -c conda-forge jupyterlab-latex beforehand.
  • Lint: No dedicated lint step; TypeScript compilation (jlpm run build) will surface syntax errors. Ensure you have ESLint/TSLint configured in your editor if desired.
  • Run Tests (if any): The repository does not expose a test runner command. If unit tests are added later, they would typically be executed via pytest or the JupyterLab testing framework (jlpm run test).
  • Format Code: Use Prettier/TypeScript formatter configured in the project; simply save files after opening them to auto‑format.

High‑Level Architecture Overview

flowchart TD
    subgraph Frontend (LabExtension)
        A[LaTeX UI Components] -->|Provides toolbar, dialogs, preview panel|
        B[LitElement / React components] --> C[Preview iframe]
    end
    subgraph Backend (Server Extension)
        D[Python entrypoint: jupyterlab_latex]
        E[LatexConfig] -->|Customizes LaTeX command, shell escape|
        F[Bibtex Helper] -->|Runs bibtex if .bib files exist|
        G[Compile Runner] -->|Executes latex_command with arguments|
    end
    A -->|Sends compile request to| D
    D --> E
    D --> F
    D --> G
  • LabExtension (frontend): Provides the UI for LaTeX preview, toolbar buttons (subscript/superscript/bold/etc.), table creation dialog, and plot insertion. It registers a command latex:showPreview that triggers compilation.
  • Server Extension (backend): Implements the core logic:
    • LatexConfig holds configuration values such as latex_command, run_times, disable_bibtex, etc., which can be overridden via JupyterLab's config system.
    • When a compile request arrives, it builds an argument list (default: [latex_command, '-interaction=nonstopmode', '-halt-on-error', ... , '{filename}.tex']).
    • It runs the LaTeX command in a subprocess to produce *.pdf. If .bib files are present and bibtex is enabled, it runs bibtex (or custom command) before recompiling.
  • Configuration: Customization via Jupyter config (jupyter_notebook_config.py):
    c.LatexConfig.latex_command = 'pdflatex'      # or 'xelatex', 'lualatex'
    c.LatexConfig.run_times   = 2                # multi‑pass for refs
    c.LatexConfig.disable_bibtex = False         # enable bibtex by default
    
  • Security: The extension respects LaTeX's shell‑escape policy (c.LatexConfig.shell_escape). By default it is restricted; you can set to allow if needed.
  • Integration Points:
    • Toolbar: Buttons call UI actions that emit events handled by the backend via Jupyter messages.
    • Commands: Registered with Lab's command palette (latex:showPreview, etc.).
    • Mime Renderers: The preview panel renders PDF output using an iframe or PDF.js viewer embedded in the frontend.

Tips for Development

  1. Start a development server: conda env create -f environment.yml && conda activate jupyterlab-latex-env (or use pip/conda as described). Then run jlpm install followed by jlpm run watch and open another terminal with jupyter lab --watch. Any change in TypeScript will trigger a rebuild automatically.
  2. Testing: Currently no test suite is shipped; add unit tests under tests/ using pytest if needed, e.g., pytest -q to run them.
  3. Debugging LaTeX Errors: When compilation fails, the server writes logs to <notebook_dir>/latex.log. The frontend shows these in an error panel for user visibility.
  4. Adding New Features:
    • For additional toolbar icons or plot types, extend src/components/Toolbar.tsx and implement corresponding backend handlers that generate LaTeX code snippets.
    • To support a new compilation engine (e.g., tectonic), modify LatexConfig.manual_cmd_args via config or expose an option UI in the frontend.
  5. Version Compatibility: Ensure you are using JupyterLab ≥4.x and NodeJS 18+ to avoid breaking changes introduced after this extension's last release.

Resources & References (from README)

  • BibTeX handling – automatic if .bib files exist, configurable via disable_bibtex flag.
  • Multiple compile passes – set run_times = 2 for resolving references like \ref{}.
  • Custom compilation command – use manual_cmd_args to override default LaTeX invocation.
  • Security settings – control shell escape behavior with shell_escape.

This CLAUDE.md summarizes essential commands and architectural knowledge needed to be productive when working on the jupyterlab-latex repository.

nemotron-30b

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Common Development Commands

  • Build the extension:

     jlpm build:lib && jlpm build:labextension
    

    For production build:

     jlpm clean && jlpm build:prod
    
  • Linting and formatting:

    jlpm run lint:check        # Run eslint and stylelint checks
    jlpm prettier              # Run prettier to format files
    jlpm fix                   # Auto-fix linting issues
    
  • Watch mode for development:

    jlpm watch                 # Watch src and rebuild on changes
    
  • Install the extension in development mode (after cloning):

    pip install -e .
    jupyter labextension develop . --overwrite
    jupyter server extension enable jupyterlab_latex
    

Architecture Overview

The LaTeX extension consists of two primary components:

  1. LaTeX Front‑end Plugin (latexPlugin) – registers commands, toolbar buttons, and context‑menu items for creating new .tex files, opening live previews, and handling SyncTeX mapping between editor and PDF.

  2. PDFJS Renderer (pdfjsPlugin) – provides a viewer widget for rendering PDF files using PDF.js, tracks PDF widgets, and integrates with the notebook file browser.

Key concepts:

  • Uses JupyterFrontEnd's plugin system to add commands like latex:open-preview, latex:synctex-edit, and latex:synctex-view.
  • Leverages WidgetTracker and IPDFJSTracker for managing PDF widget lifecycle.
  • SyncTeX integration enables forward (editor → PDF) and reverse (PDF → editor) navigation via synctexEditRequest and synctexViewRequest functions.
  • Toolbar extensions add formatting shortcuts (subscript, superscript, fraction, alignment, list/tree generation, plot insertion) that operate on the current editor selection.

Configuration can be customized through jupyter_notebook_config.py, e.g., changing the LaTeX compilation command or disabling SyncTeX.

Testing & Verification

  • The project uses a CI pipeline defined in .github/workflows/build.yml. Steps include:
    • Installing dependencies (python -m pip install .[test])
    • Running lint checks (jlpm run lint:check)
    • Building the extension and verifying server/labextension listings.
    • Running python -m jupyterlab.browser_check for browser compatibility validation.
  • There is no dedicated unit‑test command; verification relies on manual preview testing in JupyterLab and automated CI checks.

Customization Points

  • Compilation Command – modify via c.LatexConfig.manual_cmd_args or c.LatexConfig.synctex_command.
  • Shell Escape Settings – control with c.LatexConfig.shell_escape.

These sections give a concise map of typical development tasks and the overall extension architecture for anyone (including future Claude instances) who needs to work with this repository.

Failure Modes

Testing revealed distinct ways models fail at agentic tasks:

Failure Mode Example Probable Cause
Refuses functiongemma Too conservative, confused by system prompts
Hallucinates content qwen3:30b, granite4:3b Skips tools, fabricates output
Hallucinates tools phi4-mini Invents non-existent tool names
Hallucinates params mistral-small3.2 Knows tools exist, wrong schema
Narrates tools magistral Describes tools in text, never invokes
Stuck on subagent qwen2.5-coder Can’t adapt when Explore fails
Context stall cogito:32b, granite4@32K Explores correctly, stops mid-task
Nested params command-r Wraps params in {“tool_name”:X,“parameters”:{…}}
Silent rnj-1:8b Zero output, can’t process system prompts

The more sophisticated failures (wrong params, narration, nested params) suggest models trained on different tool-calling formats or documentation rather than actual Anthropic API interactions. Native context window also matters - magistral (39K native) failed even with 128K allocated.

How Local Models Compare to Cloud

SWE-bench Verified is what everyone uses to evaluate agentic coding - 500 real GitHub issues that models must solve. Here’s how local models compare to cloud:

Frontier Cloud Models (Proprietary)

Model SWE-bench
Gemini 3 Flash 75-76%
Claude Opus 4.5 74-81%
GPT-5.2 72-75%
Claude Sonnet 4.5 70.6%
Claude Haiku 4.5 68.8%

Large Open Weights (Won’t fit 48GB)

Model SWE-bench Size
Devstral 2 72.2% 123B
Qwen3-Coder-480B 67% 480B
DeepSeek-V3.1 66% 671B

Local Models (Fits 48GB)

Model SWE-bench Result
devstral-small-2 68.0% ⭐ Winner
qwen3-coder:30b 51.6% ✅ Good
deepseek-r1:32b 41.4% ❌ No tools
qwen2.5-coder:32b 9.0% ❌ Stuck

The gap is surprisingly small. devstral-small-2 at 68% matches Claude Haiku 4.5 and trails Opus by only 6-8 points. A 24B model running locally keeps up with 100B+ models - turns out agentic training matters more than size.

SWE-bench score also predicts Claude Code success: models without published scores aren’t coding-focused and failed my tests.

Conclusions

Local models can do real agentic work now. devstral-small-2 completed the task reliably, with no hand-holding. It’s slower than cloud (17 min vs 2 min), but it runs on my laptop completely offline.

Key Takeaways

  1. devstral-small-2 wins - best results, smallest footprint, built for this
  2. The gap is smaller than I expected - 68% SWE-bench matches Haiku, trails Opus by 8 points
  3. Context window matters - Ollama defaults to 4K; bump it to 64K or watch models hallucinate
  4. SWE-bench predicts success - no published score usually means it won’t work
  5. Speed hurts - 17-24 minutes vs 2 minutes on cloud
  6. Check tool support first - not all models work with Ollama’s Anthropic API

What Works

devstral-small-2 and qwen3-coder both work reliably. The tool calling infrastructure is solid when the model supports it. Ollama 0.14.0 makes setup easy - no more LiteLLM translation layer.

What Doesn’t Work (Yet)

Most models can’t finish multi-step agentic tasks without help. Context overflow causes hallucinations (fabricated URLs, wrong repo names). And 8-12x slower than cloud is hard to ignore.

Critical: Set Context to 64K+

Ollama defaults to 4K context regardless of what model cards advertise. Claude Code’s system prompts overflow this, causing silent failures or hallucinations.

Ollama settings showing context length slider

Context Result
4-16K ❌ Zero tool calls
32K ⚠️ Starts fine, then hallucinates
64K+ ✅ Works

Quick Start

# 1. Install Ollama 0.14.0+ and pull devstral
ollama pull devstral-small-2

# 2. Set context to 64K in Ollama settings (GUI slider)

# 3. Add alias to ~/.zshrc
alias claude-local='ANTHROPIC_BASE_URL=http://localhost:11434 ANTHROPIC_API_KEY=ollama CLAUDE_CODE_USE_BEDROCK=0 claude --model devstral-small-2'

# 4. Run it
source ~/.zshrc
claude-local