Blogs

700 JupyterLab 4 Extensions!

Fri, 13 Mar 2026 00:00:00 +0000

The JupyterLab extension ecosystem just crossed 700 extensions compatible with JupyterLab 4. Here's what the latest wave tells us about where notebooks are heading.

The JupyterLab extension ecosystem just crossed 700 extensions compatible with JupyterLab 4!

That’s 700 community-built plugins — from astronomical data viewers to reactive notebook editors, from genome browsers to workflow managers — created by hundreds of developers, research labs, and companies around the world.

What Are JupyterLab Extensions?

Extensions are how JupyterLab becomes a Git client, a dashboard builder, a genomics viewer, or an AI workspace — without changing the core application. Install one with pip install, and it activates automatically.

This is by design: JupyterLab itself is built as a collection of extensions — the file browser, the notebook editor, the terminal are all plugins. The same architecture that powers the core lets the community build what they need. For background, see 99 ways to extend the Jupyter ecosystem.

The Ecosystem at 700

700+ extensions compatible with JupyterLab 4
~960 total extensions published on PyPI
~9.8 million downloads/month
100M+ total downloads in the past year

By any measure, a substantial software layer has grown around JupyterLab.

How We Got Here

The ecosystem crossed 600 JL4-compatible extensions in late October 2025, days before JupyterCon in San Diego. At the conference, we ran a full-day Extension Development for Everyone tutorial with hands-on rapid prototyping. By early March 2026, we hit 700.

The ecosystem has been growing at a steady pace, averaging about 18 new extensions per month, with November 2025 setting an all-time monthly record of 33. Modern tooling is helping: better templates, documentation, and code generation tools have lowered the bar for what once required deep familiarity with TypeScript, Lumino, and JupyterLab internals.

Where the Extensions Are

Development & Version Control dominates both in count (267) and downloads (5.4M/month). Visualization & Dashboards (2.7M/month) and System & Resource Management (602K/month) round out the top three most downloaded categories. But the fastest-growing categories point to where things are heading. Here’s what’s new in 2026:

JupyterLab’s AI Layer Starts Taking Shape

AI isn’t yet the biggest category in JupyterLab, but it may be the clearest signal of where new interaction patterns are emerging:

jupyter-ai-acp-client — Brings external AI agents into JupyterLab’s chat via the Agent Communication Protocol. Ships with Claude Code and Kiro personas.
nb-margin — Annotate cells with margin comments, and Claude Code edits them. A different paradigm from chat-based AI.
jupyterlite-ai-kernels — AI-powered kernels for JupyterLite, from Jeremy Tuloup. AI-assisted computation entirely in the browser.
jupyter-chat-components — Reusable chat UI components from Project Jupyter — building blocks for the next generation of AI tools.

These extensions reflect what the JupyterLab team identified as a 2026 priority: first-class integration with AI tooling.

Reproducibility Gets a Toolchain

calkit-python is the most downloaded new extension of 2026 (11,000+ monthly downloads). It gives notebooks project-scoped environments, graphical package management via Astral’s uv, and one-click notebook pipelines with freshness tracking. Think “Makefiles for notebooks” meets “Poetry for Jupyter.”

jupyter-projspec (from the fsspec contributors) takes a complementary approach — it brings projspec into JupyterLab, letting you scan and analyze project structures directly from the notebook environment.

Marimo Comes to JupyterLab

Marimo — the reactive notebook editor — now runs inside JupyterLab. The official marimo-jupyter-extension from the Marimo team lets you open and edit _mo.py Marimo files directly in JupyterLab, bringing reactive execution to JupyterHub deployments without leaving the Jupyter environment. This matters for teams that want to adopt Marimo incrementally — you can keep your JupyterHub infrastructure and add Marimo as another file type alongside traditional notebooks.

Science

fitsview — Stream FITS astronomical data slices directly in JupyterLab without downloading full files.
jupyterlab-urdf-test — 3D robot model viewer/editor (URDF + Three.js), from jupyter-robotics.
climb-jupyter-igv — Integrative Genomics Viewer with S3 access for bioinformatics.
ggblab — GeoGebra interactive geometry with bidirectional Python communication. Second most downloaded new extension of 2026.

Accessibility

Accessibility has been a growing focus for JupyterLab core and extensions are starting to address it too:

jupyterlab-a11y-checker — From UC Berkeley’s DSEP infrastructure team, this extension scans notebooks for WCAG 2.1 AA issues: missing alt text, heading structure, table headers, color contrast, and link text. Guided fix interfaces, optional AI suggestions, and a CLI for CI pipelines. Over 11,000 total downloads and a documentation site.
jupyterlab-change-ui-font-size-fix — Fixes file browser icon misalignment when users change the UI font size — a small but real pain point for anyone who needs larger text.

27 Extensions, One Platform

Stellars is a JupyterLab-based data science platform — GPU support, MLflow, TensorBoard, Optuna — assembled from 27 custom extensions covering everything from branding and file icons to kernel management and diagram rendering, VS Code file icons, trash management, Mermaid-to-PNG conversion, and more. JupyterLab is now flexible enough that one developer can assemble a domain-specific product entirely from extension building blocks.

Want to Build Your Own?

The JupyterCon tutorial is fully available: step-by-step materials and the complete YouTube recording. It covers scaffolding, plugin architecture, publishing to PyPI, and rapid prototyping techniques. The tools have never been more accessible.

How We Track This

The data behind this post comes from the JupyterLab Extension Marketplace, a community project that tracks all published JupyterLab extensions using PyPI data. The marketplace refreshes automatically and provides download trends, category breakdowns, and discovery tools.

For more on the data and methodology, see our PyData Boston 2025 talk.

What’s Next

New interaction patterns are still being figured out — chat-based assistance, cell annotations, agent protocols. Probably all of them for different use cases.
Reproducibility tooling suggests the community is ready for opinionated workflow management built into the notebook experience.
Cross-notebook-format support (Marimo, Quarto) hints at a future where JupyterLab is the IDE and the notebook format is a choice.

Ensuring extensions keep working as JupyterLab evolves is critical — the team has been discussing extension compatibility testing at recent contributors calls.

For the Marketplace itself, we’re working on:

Deeper integration with JupyterLab Extension Manager — deep links and “Install in JupyterLab” buttons to go from discovery to installation in one click.
Expanding Trove classifiers to indicate Jupyter Notebook and JupyterLite support. All three use the same extension system, with important caveats: Notebook extensions need to target different UI elements, and JupyterLite extensions cannot have a server component.
Better contribution signals — surfacing commits, PRs, and issues to help users gauge how actively maintained an extension is.

At 700 extensions, the community now shapes JupyterLab as much as the core team does. If you’re building extensions, thank you! Every one of them makes Jupyter better for someone.

Read the full post on taletskiy.com

Skills Hub: Building an Enterprise Trust Layer for AI Agent Skills

Sat, 21 Feb 2026 00:00:00 +0000

How our team won Judges' Recognition at Anaconda's CKO 2026 hackathon in Portugal by building an enterprise trust layer for AI agent skills.

How our team won Judges’ Recognition at Anaconda’s CKO 2026 hackathon in Portugal

The setup

Every year, Anaconda brings the entire company together for CKO — part all-hands, part hackathon, part team building. This year it was in Portugal. The hackathon gives teams three days to build something from scratch, and this year I joined a team of eight to tackle a problem I’d been thinking about for months.

The problem: AI assistants don’t know your rules

Here’s a scene that plays out a hundred times a day across the Python ecosystem: you open Claude Code in a project that has a perfectly good conda environment set up. You ask it to run the tests. And it fires off pytest with whatever system Python it finds first. No environment activation. No awareness that conda exists.

I know this scene well because I’ve lived it. The non-interactive shell that AI coding assistants spawn doesn’t load your shell config, so conda activate fails with the unhelpful “Run ‘conda init’ first” error. I built a skill called snakehug to solve this — it teaches AI assistants the correct activation patterns for conda, mamba, and pixi, asks you which environment to use once, then remembers it for every future session.

But snakehug solving my problem on my machine is different from solving it for a team, a department, or an enterprise. That’s the gap we went after.

The bigger picture: a supply chain problem for AI behaviors

In December 2025, Anthropic released the Agent Skills (SKILL.md) open standard. Within eight weeks, 40+ major platforms adopted it — OpenAI Codex, GitHub Copilot, Cursor, Gemini CLI, Windsurf. The anthropics/skills repo hit 66,800 GitHub stars. Skills are simple by design: a directory with a SKILL.md file containing YAML frontmatter and markdown instructions.

The simplicity is the point — and the problem. The standard deliberately omits versioning, dependency resolution, signing, and sandboxing. Third-party marketplaces have appeared claiming 160,000+ skills, but these are mostly auto-indexed GitHub repos with no curation. Aikido Security recently found hallucinated npx commands spreading through hundreds of repos via unvetted skills.

This is essentially the npm supply chain attack, but for instructions that can execute arbitrary code on your machine. The ecosystem has the same shape as early package management: lots of content, no trust infrastructure.

What we built: Skills Hub

Our hackathon project was Skills Hub — an enterprise trust layer for agent skills. The framing that clicked for us was: “Package Security Manager, but for AI behaviors.” Anaconda already provides the trust layer between open-source packages and enterprise environments. Skills need the same thing.

In three days, the team shipped four components:

A backend API that stores, validates, and serves skills. Every skill goes through frontmatter validation before it’s published. Skills are categorized by trust level — Anaconda-curated, company internal, and external — so security teams can control what reaches developers.

A CLI extension (anaconda skills) that plugs into the existing Anaconda CLI. Upload, list, install, and inspect skills with commands that feel familiar to anyone who’s used conda. Authentication flows through the same anaconda-auth infrastructure that enterprises already trust.

A web frontend with a catalog UI showing skills organized by trust level with color-coded badges, search, filtering, and upload capabilities.

An Anaconda Desktop integration using the new Feature Modules system — a native sidebar panel that lets you browse and install skills without leaving the desktop app. This was my contribution, and dogfooding the new module system was a great way to put it through its paces.

The demo that won it

For the hackathon presentation, we structured the demo as a before-and-after. I opened with one line of Portuguese (“Olá! Eu sou Konstantin… e o meu português acaba aqui”), then cut straight to the problem.

Before: A screen recording of Claude Code in a project with a conda environment. I ask it to run the tests. It uses bare pytest with system Python. No conda awareness. The environment is completely ignored.

After: Same project, same prompt, but now with snakehug installed. Claude Code detects conda, asks which environment to use, activates it correctly, and the tests pass. One SKILL.md file — completely different behavior.

Then the pivot: “But how does this skill reach every developer on your team safely?” Cut to the CLI uploading snakehug to Skills Hub, then to the web UI showing it in the catalog with trust badges and validated metadata, and finally a flash of the Desktop integration.

The whole thing ran just under two minutes. The judges gave us Judges’ Recognition (honorable mention), which we were very happy with given the quality of the other projects.

How we built it: AI-assisted spec-driven development

The meta-story of the hackathon was almost as interesting as the project itself. We built the entire thing using AI-driven spec-driven development with open-source tools — primarily OpenCode and SpecKit.

The approach: before writing any code, you write a spec. The spec directory contains a formal specification, research notes, a data model, an implementation plan, and a task breakdown. Then the AI coding assistant implements against that spec. Each feature lived on a dedicated branch matching its spec number, with PRs reviewed and merged to main.

The backend accumulated 45 commits across 12 branches and 8 spec directories. The CLI had 21 commits across 5 branches and 6 spec directories. For three days of work, that’s a remarkable amount of structured, traceable output. The specs serve double duty — they’re the design documents and the context that makes AI assistance effective.

This isn’t just an interesting development methodology. It’s directly relevant to the enterprise story: if your team is going to use AI coding tools, you want reproducibility, auditability, and a paper trail. Spec-driven development gives you that.

The skill I contributed: snakehug

Snakehug started as a personal itch — I was tired of Claude Code ignoring my conda environments. The core insight is that AI assistants spawn non-interactive shells, which don’t load your shell config. So conda activate fails, and the assistant falls back to whatever Python is on the system PATH.

The skill works in three phases:

First run: Detect which environment managers are installed (conda, mamba, micromamba, pixi), ask the user which environment to use, test that activation actually works
Save config: Write the complete working activation command to the project’s CLAUDE.md
Future runs: Automatically use the saved pattern — no prompting, no detection, just correct behavior

The key design decision was saving the complete activation command that works in a fresh shell, not just the environment name. Different managers need different invocation patterns (source conda.sh && conda run vs. eval "$(mamba shell hook)" vs. pixi run), and getting this wrong silently is worse than failing loudly.

For the hackathon, I refactored snakehug to the single SKILL.md format for compatibility with the Skills Hub API, and we used it as the flagship demo skill.

What’s next (and what didn’t make the demo)

The piece we deliberately left out of the two-minute demo is skill-gen — a 12-phase pipeline that generates validated SKILL.md files from your team’s real agent conversation logs. The idea is that instead of manually writing skills, you extract them from patterns in how your developers actually correct their AI assistants. More usage → more traces → better skills. Nobody else in the ecosystem is doing this.

We mentioned it as a teaser in the closing (“what we didn’t show today”) and it landed well as a “one more thing” during Q&A.

Whether Skills Hub becomes an Anaconda product is above my pay grade. But the gap is real: enterprises need a trust layer between the wild west of community skills and the developers who use them. Someone is going to build it. I’m glad we got to prototype what it could look like.

Thanks

This was a genuine team effort. Anil Kulkarni led the project and kept us focused. Albert DeFusco built the backend and CLI infrastructure. Denis Dupeyron contributed the upload pipeline and source type system. Max Huang built the skill-gen pipeline. Anna Ratner designed the UI. Arisha Mays implemented it. And we had a great time in Portugal.

Oh, and we made a fado song about Skills Hub. Because when in Portugal.

Read the full post on taletskiy.com

Jupyter Projspec: Bringing Project Discovery to JupyterLab

Fri, 06 Feb 2026 00:00:00 +0000

A JupyterLab extension that automatically discovers and displays project structure — built in collaboration with Martin Durant and Rosio Reyes at Anaconda

What is Jupyter Projspec?

Have you ever opened a directory in JupyterLab and wondered — what kind of project is this? Is it a Python package? A data pipeline? A machine learning experiment? What can I do with it?

Jupyter Projspec is a JupyterLab extension that answers these questions automatically. It scans your working directory using projspec — a project discovery library by Martin Durant — and presents a structured view of what’s inside: the project type, its contents, specifications, and available build artifacts.

Think of it as an intelligent project inspector for JupyterLab.

Featured on Anaconda’s Podcast

Today, jupyter-projspec was featured on Anaconda’s Numerically Speaking podcast! The segment starts around the 24:46 mark.

Why Build This?

When you work with data and code every day, you encounter many different project layouts — pyproject.toml-based Python packages, conda recipes, Zarr stores, HuggingFace datasets, and more. Each has its own conventions, its own set of files to look at, and its own actions you might want to take.

Projspec, created by Martin Durant as part of the fsspec ecosystem, provides a unified way to detect and describe these project types. It can look at a directory and tell you: this is a Python package with these entry points, or this is a Zarr dataset with these arrays, or this is a conda recipe that can be built into a package.

The missing piece was surfacing this information where people actually work — inside JupyterLab. That’s what jupyter-projspec does.

How It Works

The extension adds two UI elements to JupyterLab: a sidebar panel that displays project information in a collapsible tree view, and colored badge chips in the file browser that show detected project types at a glance. When you navigate to a directory, it calls the projspec Python backend to scan and identify what’s there, then renders the results.

Projspec’s Three Concepts

To understand what the extension shows, it helps to know projspec’s model. From projspec’s perspective, every project directory has three layers:

Specs – the project types detected (e.g., PythonLibrary, GitRepo, Pixi). A single directory can match multiple specs simultaneously – a typical project might be a Git repo, a Python library, and a Pixi workspace all at once.
Contents – read-only metadata describing what’s in the project: environment specs (pip/conda/npm), package info, licenses, commands, and descriptive metadata. Contents tell you what is here.
Artifacts – actions the project can perform: building wheels, creating conda packages, generating lock files, spinning up Docker containers, running servers. Artifacts tell you what you can do.

Projspec currently recognizes 23+ project types out of the box, covering the Python ecosystem (pyproject.toml libraries, Poetry, uv, Pixi, conda recipes), JavaScript (Node, Yarn, JupyterLab extensions), Rust (Cargo), web frameworks (Django, Streamlit, PyScript), documentation (mdBook, ReadTheDocs), data projects (Frictionless Data Packages, HuggingFace repos), IDEs (VS Code, JetBrains, Zed), and more. Detection is plugin-based – each project type registers itself and provides a fast match() method that checks for marker files (like pyproject.toml, Cargo.toml, or pixi.toml).

The Architecture

Under the hood, the extension has two layers:

Server extension (Python, Tornado): Exposes a REST endpoint at GET /jupyter-projspec/scan. It takes a directory path, validates it to prevent directory traversal, calls projspec.Project(path).to_dict(), and returns the full project tree as JSON.

Frontend extension (TypeScript, React): Two widgets subscribe to JupyterLab’s fileBrowser.model.pathChanged signal, so they update automatically when you navigate directories. The sidebar panel renders each detected spec as a collapsible section with nested views for contents and artifacts. The chips widget injects colored badges below the file browser breadcrumbs – each chip is labeled with the spec name (like “Python Library” or “Git Repo”) and clicking one scrolls to and expands that spec in the sidebar. Both widgets debounce their API calls and use AbortController to cancel in-flight requests when the path changes rapidly.

Building It: A Collaborative Effort

This extension came together through a collaboration that I’m really proud of.

Martin Durant is the author of projspec (and fsspec, kerchunk, Intake, and many other foundational Python data tools). He built the discovery engine that makes this possible — the ability to look at any directory and understand what kind of project it is. Working closely with Martin meant we could iterate quickly on what the extension should expose and how the Python API should evolve to support the JupyterLab use case.

Rosio Reyes, my colleague on the OSS-Jupyter team at Anaconda, contributed to the frontend development and UX design. Rosio also works on jupyter-fsspec, the JupyterLab extension for browsing remote filesystems — so there’s a natural connection between browsing files (jupyter-fsspec) and understanding what those files represent (jupyter-projspec).

What’s Next

The current release handles project scanning, the sidebar tree view, and file browser chips. Here’s where we’re heading next:

Artifact actions – not just showing what can be built, but letting you trigger builds directly from the UI with Make buttons (e.g., “build this conda package” or “generate a lock file”). This is actively in development on a PR branch, with server-side command resolution through projspec, concurrency limits, and output capture already working.
Remote filesystem support – leveraging fsspec to scan projects on S3, GCS, or any other supported backend, with a natural bridge to jupyter-fsspec for browsing those remote files
More project types in projspec – expanding detection to cover data formats like Zarr, OME-NGFF, and STAC catalogs, alongside the existing HuggingFace and Frictionless Data support

Try It Out

The extension is open source and available on GitHub:

jupyter-projspec: github.com/fsspec/jupyter-projspec
projspec (the underlying library): github.com/fsspec/projspec
projspec docs: projspec.readthedocs.io

Install it with pip:

pip install jupyter-projspec

This pulls in projspec as a dependency. After installation, restart JupyterLab and you’ll see the Projspec panel in the right sidebar.

If you’re interested in project discovery for Jupyter, we’d love your feedback. Open an issue, try it on your own projects, or come say hi in the Jupyter community channels.

I’m a Senior Software Engineer on the OSS-Jupyter team at Anaconda, where I work on JupyterLab core contributions, extensions, and community tools. You can find more of my work at labextensions.dev and follow my conference talks and open source adventures on this blog.

Read the full post on taletskiy.com

Adding Voice to Claude Code (with Audio Ducking)

Fri, 23 Jan 2026 00:00:00 +0000

Text-to-speech for Claude Code that automatically lowers your music when it speaks - like Google Maps in CarPlay.

Claude Code Can Talk

I spend a lot of time in Claude Code. Reading responses while coding is fine, but sometimes I want to hear what Claude is saying while I’m looking at something else.

claude-code-tts by Chris Goff does exactly this. It uses Kokoro TTS, a fast local text-to-speech model, to read Claude’s responses aloud. Hooks into Claude Code’s event system - when Claude finishes responding, it extracts the text and speaks it.

Note: The upstream project uses tac (GNU coreutils) which doesn’t exist on macOS. My fork replaces it with tail -r.

Your browser does not support the video tag.

Audio Ducking

With TTS working, I had a new problem: I like music while coding. Claude talking over music is hard to hear.

CarPlay solved this with audio ducking - when navigation speaks, music volume drops, then comes back. I added this for Claude Code TTS.

The implementation controls Apple Music directly via AppleScript:

# Duck to 5% before TTS starts
osascript -e "tell application \"Music\" to set sound volume to 5"

# Restore when TTS finishes
osascript -e "tell application \"Music\" to set sound volume to $original"

Only Apple Music’s internal volume changes - system volume stays the same, so TTS plays at full volume while music is ducked. A background process monitors when TTS finishes and restores the volume automatically.

Configuration

Set in ~/.claude/settings.json:

Variable	Default	Description
`KOKORO_VOICE`	`af_sky`	Voice to use
`AUDIO_DUCK_ENABLED`	`true`	Set to `false` to disable ducking
`DUCK_LEVEL`	`5`	Percentage of original volume during TTS

Try It

My fork with audio ducking: github.com/ktaletsk/claude-code-tts

Original: git.sr.ht/~cg/claude-code-tts

git clone https://github.com/ktaletsk/claude-code-tts
cd claude-code-tts
./install.sh

Now I can code, listen to music, and hear Claude - all without fighting for audio space.

Read the full post on taletskiy.com

Which local models actually work with Claude Code on a 48GB MacBook Pro?

Thu, 15 Jan 2026 00:00:00 +0000

A little experiment evaluating local models for agentic tasks in Claude Code

I Tested 18 Local Models So You Don’t Have To

Ollama released Anthropic API compatibility in January 2026, so I tested 18 local models with Claude Code to find out which ones actually work for agentic coding tasks.

TL;DR

devstral-small-2:24b is the winner - best quality, fastest, zero interventions

You MUST configure context window - Ollama defaults to 4K; use 64K minimum

Expect 12-24 min for tasks that take ~2 min with Opus 4.5 - but it works!

Ollama docs: https://docs.ollama.com/integrations/claude-code
Anthropic API compatibility: https://docs.ollama.com/api/anthropic-compatibility

My Setup

Spec	Value
Machine	MacBook Pro
Chip	Apple M4 Pro
RAM	48 GB unified memory
Ollama	v0.14.2

Models

Here’s everything I tested, sorted by size:

Model	Size	Release	SWE-bench	Type
nemotron-3-nano:30b	24GB	Dec 2025	-	MoE
cogito:32b	20GB	Jul 2025	-	Hybrid reasoning
granite4:32b-a9b-h	~20GB	Oct 2025	-	General-purpose
command-r:35b	19GB	Mar 2024	-	RAG-optimized
qwen2.5-coder:32b	19GB	Nov 2024	9.0%	Coding
deepseek-r1:32b	19GB	Jan 2025	41.4%	Reasoning
qwen3-coder:30b	18GB	Jul 2025	51.6%	Coding
qwen3:30b	18GB	Apr 2025	-	General-purpose
devstral-small-2:24b	15GB	Dec 2025	68.0%	Agentic coding
mistral-small3.2:24b	15GB	Jun 2025	-	General-purpose
magistral:24b	14GB	Jun 2025	-	Reasoning
gpt-oss:20b	14GB	Aug 2025	-	General-purpose
cogito:14b	9GB	Jul 2025	-	Hybrid reasoning
deepseek-coder-v2:16b	8.9GB	Jun 2024	-	Coding (no tools)
rnj-1:8b	5.1GB	Dec 2025	20.8%	General-purpose
phi4-mini:3.8b	2.5GB	Feb 2025	-	General-purpose
granite4:3b	2.1GB	Oct 2025	-	General-purpose
functiongemma:270m	301MB	Dec 2025	-	Function calling

Experiments

I chose a very simple task: run /init on a repo (jupyterlab-latex) to generate CLAUDE.md, which is normally the first thing I do in a new repo. It’s deceptively hard though - the model has to discover tools, explore multiple files, and synthesize documentation without hallucinating. One or two runs per model; treat results as field notes.

My first two models (nemotron, gpt-oss) used Ollama’s default context window - which is how I discovered the 4K limit issue. After that, I set context to 64K+ in Ollama’s settings.

`nemotron-3-nano:30b`

My first attempt revealed a critical failure mode. With the default context window, the model’s thinking block explicitly shows it decided to skip reading files entirely:

“We don’t have details of repo… There haven’t been any reads yet… Let’s assume typical repo structure”

Instead of using tools to explore, it fabricated an entire codebase structure. The output described a React/Node.js monorepo with /frontend and /backend directories - neither of which exist in jupyterlab-latex (a Python/TypeScript JupyterLab extension). It invented commands like npm run dev and referenced non-existent config files.

This failure led me to discover Ollama’s default 4K context limit. After configuring a 128K context window, subsequent attempts worked much better:

Read → Glob → Read → Read → Read → Read → Glob → Read → Write

The model properly explored the codebase, but still stopped mid-task and required a follow-up prompt (“Continue”) to finish. Final output was accurate and high quality - proving the model can work, but context configuration is critical.

`gpt-oss:20b`

Also tested early with the default context window. Fast but unreliable:

Direct prompt: Finished quickly but low quality output
/init skill: Tool parameter errors, empty results, needed intervention

Sautéed for 2m 37s  (Claude Code's task timer)

`devstral-small-2:24b` ⭐ Winner

With 128K context configured from the start, this was a perfect run. The model immediately understood the task:

“I’ll analyze this codebase and create a CLAUDE.md file with the essential information for future instances.”

Tool call sequence shows direct, confident tool usage:

Bash → Bash → Bash → Read → Bash → Bash → Bash → Read → Read → Read → Bash → Write

No confusion about subagents or tool parameters - it went straight for Bash and Read to explore the codebase, then used Write to create the output.

The output was 180 lines of documentation with actual function names, Python config examples, and a 5-step communication flow diagram. Every file reference checked out - no hallucinations.

Why did devstral outperform? Mistral trained it specifically for SWE-Bench (68.0% score) and tool-use scenarios. You can see it in the tool calls - direct and confident, no subagent confusion.

Sautéed for 17m 12s

`qwen3-coder:30b`

Also configured with 128K context. The model’s first instinct was to delegate to a subagent. From the session trace, it tried to spawn an Explore agent twice:

{
  "description": "Explore codebase structure",
  "prompt": "Explore the structure of this JupyterLab LaTeX extension repository...",
  "subagent_type": "Explore"
}

This isn’t an Ollama bug, but a mismatch between what Claude Code can do in a given environment and what the model decides to attempt. Claude Code has a notion of subagents (like an “Explore” helper), but in my setup those weren’t available/configured, so that tool call fails. Ollama’s docs do advertise Claude Code usage, though, so it’s worth calling out explicitly: with third-party models, you should expect occasional “tooling weirdness” like this even if the transport API is compatible.

When the Task tool failed (subagents weren’t configured), qwen3-coder adapted gracefully. Tool sequence shows the recovery:

Task → Task → Bash → Read → Read → Read → Read → Read → Read → Read → Read → Write

After two failed Explore attempts, it switched to direct Bash and Read tools and completed the task without further intervention. Output quality was good - accurate, no hallucinations, but less detailed than devstral (86 lines vs 180).

Sautéed for 23m 48s

`granite4:32b-a9b-h`

An interesting comparison point - this is IBM’s general-purpose 32B model, not a coding specialist. With 128K context configured, it completed the task in under 7 minutes - the fastest successful run.

The trade-off: minimal exploration. Tool sequence:

Read → Write

Just two tool calls - read the README, write CLAUDE.md. No codebase exploration, no package.json check, no architecture analysis. The output was decent:

✅ Correct project type (JupyterLab LaTeX extension)
✅ Correct commands (jlpm run build, jlpm run watch)
✅ Mermaid architecture diagram
⚠️ Some hallucinated details (referenced src/components/Toolbar.tsx without verifying it exists)

At 32K context, it stalled - started correctly (Glob → Read), but got stuck after reading files and never produced output. A different failure mode than devstral’s 32K hallucination.

Verdict: Works, but lazy. General-purpose models can complete agentic tasks but tend to “wing it” with minimal tool use, while coding specialists explore more thoroughly.

Sautéed for ~7m

`qwen3:30b`

The general-purpose Qwen3 (not the coder variant). This was the worst performer - pure hallucination with zero exploration.

Tool sequence:

Write

Just one tool call. The thinking block is revealing - it explicitly acknowledged it couldn’t see files but proceeded anyway:

“Since I can’t actually see the files, I’ll have to rely on the context provided.”

It inferred file structure from git status in the system prompt, then fabricated everything:

❌ python jupyterlab_latex/build.py - wrong command (should be jlpm run build)
❌ latex_cleanup.py - fabricated filename
❌ flake8 - assumed linter without checking

At 128K context, it consumed 31GB RAM (vs 18GB on disk) - pushing my 48GB system into swap. The memory pressure may have contributed to its laziness, but the thinking block shows it consciously chose to guess rather than explore.

Key finding: The coder fine-tuning isn’t just about coding knowledge - it teaches the model to actually use tools instead of guessing. qwen3-coder explored properly; qwen3 base hallucinated everything.

Sautéed for ~5m

`qwen2.5-coder:32b`

Failed. Despite having 128K context configured, through multiple attempts it kept reaching for the Explore subagent tool and then abruptly stopping without completing any work. Unlike qwen3-coder which recovered when Explore failed, qwen2.5-coder couldn’t adapt. Same model family, different generation, completely different behavior when things go wrong.

`mistral-small3.2:24b`

Failed - hallucinated tool parameters. This model understands it should use tools but invents wrong parameter schemas. From the session trace, it tried to call the Task tool with made-up parameters:

// Attempt 1:
{"instruction": "...", "max_depth": 100}

// Attempt 2:
{"subagent_name": "Explore", "subagent_type": "Explore", "subagent_prompt": "..."}

The actual required parameters are description and prompt. When it received clear error messages explaining this, it simply repeated “I’m going to use the Task tool…” and stopped - unable to self-correct.

This is a different failure mode than hallucinating content (qwen3) or refusing (functiongemma). The model has learned about tools but not the actual invocation format. Worth noting: devstral-small-2 is also a Mistral model and works perfectly - the difference is devstral’s agentic specialization.

Memory: 37GB loaded at 128K context (vs 15GB on disk).

`magistral:24b`

Failed - narrated tools instead of invoking them. This new Mistral reasoning model understood the task and knew which tools to use, but wrote out tool calls as text instead of actually executing them:

"Let me use the Glob tool to find these patterns:

```bash
Glob pattern: **/README.md
Glob pattern: .github/readme*
...
```

Now that I have the relevant files, let's analyze..."

Zero actual tool calls were made. The model described what it would do, assumed the tools had run, and proceeded to the next step. This suggests training on tool documentation without actual tool-use interactions.

Memory: 23GB loaded at 128K context (vs 14GB on disk).

Native context limitation: magistral’s native context is only 39K. Even with Ollama allocating 128K, the model may not effectively use context beyond its training limit - which could explain why it never received the tool invocation format.

`cogito:32b`

Failed - memory issues and context-limited stall. This hybrid reasoning model has different failure modes depending on context configuration:

At 128K context: Loaded 64GB into memory (41% CPU / 59% GPU split). On my 48GB system, this caused severe memory thrashing - spiky memory pressure, swap usage, and zero tokens produced after 5+ minutes.

At 64K context: Loaded 42GB (8% CPU / 92% GPU). Still tight but runnable. Same stalling behavior.

At 32K context: Loaded 30GB (100% GPU). Actually started working! Made correct Glob and Read calls, explored the codebase properly:

Glob → Read README.md → "Let me create a todo list..."

But then it just… stopped. Said “Let me start with writing the overview section first” and ended without writing anything. Even nudging with “continue” prompt didn’t help - completely stuck.

This is the same pattern as granite4:32b at 32K context: can explore but can’t complete. 32K context is insufficient for task completion - the model loses track of the goal mid-execution.

`cogito:14b`

Failed - multiple tool issues. Testing the smaller cogito variant to see if the 7-15B range had any surprises. It did, but not good ones.

Memory: Even at 9GB on disk, loaded 45GB at 128K context with 15% CPU offload. At 64K context it was more manageable.

Tool sequence shows multiple failure modes:

Read README.md ✅ → Read copilot-instructions.md ✅ (not found) →
WebSearch ❌ (hallucinated) → TodoWrite ❌ (wrong params, twice) →
Printed CLAUDE.md as text ⚠️

Hallucinated WebSearch - tool doesn’t exist in Claude Code, got empty results
Wrong TodoWrite params - missing required activeForm field, tried twice without learning
Never used Write tool - just printed the CLAUDE.md content as markdown text instead of writing to file

The generated content was actually reasonable - correct commands, accurate architecture. But the model “completed” the task by printing output rather than writing the file. It understood the goal but couldn’t execute properly.

Time: ~7.7 minutes

The cogito family (both 32b and 14b) consistently fails with Claude Code’s tool schemas - different sizes, different failure modes, same outcome.

`command-r:35b`

Failed - nested tool parameter schema. The last untested model in the viable 15-35B range. At 128K context it didn’t fit on my GPU. At 64K and 32K it loaded but failed with the same tool schema issue.

From the trace, the model wrapped all tool parameters in a nested structure:

{
  "tool_name": "Task",
  "parameters": {
    "description": "...",
    "prompt": "...",
    "subagent_type": "general-purpose"
  }
}

The correct format is flat parameters at the top level. It made 4 tool calls (3 Task, 1 TodoWrite) - all failed with validation errors like “required parameter description is missing” because the nesting caused parameters to be undefined at the expected level.

Unlike mistral-small3.2 which invented wrong parameter names, command-r uses the correct parameter names but wraps them incorrectly. When it received validation errors, it didn’t retry - just output a text-based “Action Plan” and stopped.

This suggests Cohere’s tool-calling format differs from the Anthropic API schema. The model was trained on a different tool invocation structure.

Context comparison:

32K: 4 tool calls, all failed, gave up quickly (~7 min)
64K: 29 tool calls, all failed, kept retrying same broken schema (~9.5 min)

More context didn’t help - it just gave the model more runway to keep failing the same way. It never learned from the error messages.

Results

✅ Worked

Model	Quality	Time	Notes
devstral-small-2 ⭐	Excellent	17 min	No hallucinations, no interventions
qwen3-coder	Good	24 min	Recovered after Explore failed
granite4:32b	Good	~7 min	Fast but lazy, minor hallucinations*

⚠️ Completed With Issues

Model	Quality	Time	Issue
gpt-oss:20b	Low	~3 min	Needed intervention
nemotron-3-nano	Mixed	-	Hallucinated on first attempt
qwen3:30b	Poor	~5 min	Zero tool calls, fabricated everything

❌ Failed

Model	Time	Failure Mode
qwen2.5-coder:32b	-	Stuck on Explore subagent
mistral-small3.2:24b	-	Wrong tool parameter schema
magistral:24b	-	Narrated tools instead of invoking
cogito:32b	-	Memory thrashing, context stall
cogito:14b	~8 min	Hallucinated WebSearch tool
command-r:35b	7-10 min	Nested tool parameters
deepseek-r1:32b	-	No tool support in Ollama
deepseek-coder-v2:16b	-	No tool support in Ollama
functiongemma:270m	-	Refuses everything
granite4:3b	-	Hallucinates without tools
phi4-mini:3.8b	-	Invents fake tool names
rnj-1:8b	-	Silent, zero output

*granite4:32b referenced files it never verified existed. It “works” in the sense that it completes the task and produces usable output, but you’d want to review it before trusting it. devstral and qwen3-coder are trustworthy out of the box.

Winner: devstral-small-2 - best quality, smallest footprint, zero interventions.

Model Outputs

Compare the actual CLAUDE.md files generated by each model. Use the tabs to switch between models, or click the side-by-side button to compare them directly:

devstral-small-2

qwen3-coder

granite4-32b

nemotron-30b

Failure Modes

Testing revealed distinct ways models fail at agentic tasks:

Failure Mode	Example	Probable Cause
Refuses	functiongemma	Too conservative, confused by system prompts
Hallucinates content	qwen3:30b, granite4:3b	Skips tools, fabricates output
Hallucinates tools	phi4-mini	Invents non-existent tool names
Hallucinates params	mistral-small3.2	Knows tools exist, wrong schema
Narrates tools	magistral	Describes tools in text, never invokes
Stuck on subagent	qwen2.5-coder	Can’t adapt when Explore fails
Context stall	cogito:32b, granite4@32K	Explores correctly, stops mid-task
Nested params	command-r	Wraps params in {“tool_name”:X,“parameters”:{…}}
Silent	rnj-1:8b	Zero output, can’t process system prompts

The more sophisticated failures (wrong params, narration, nested params) suggest models trained on different tool-calling formats or documentation rather than actual Anthropic API interactions. Native context window also matters - magistral (39K native) failed even with 128K allocated.

How Local Models Compare to Cloud

SWE-bench Verified is what everyone uses to evaluate agentic coding - 500 real GitHub issues that models must solve. Here’s how local models compare to cloud:

Frontier Cloud Models (Proprietary)

Model	SWE-bench
Gemini 3 Flash	75-76%
Claude Opus 4.5	74-81%
GPT-5.2	72-75%
Claude Sonnet 4.5	70.6%
Claude Haiku 4.5	68.8%

Large Open Weights (Won’t fit 48GB)

Model	SWE-bench	Size
Devstral 2	72.2%	123B
Qwen3-Coder-480B	67%	480B
DeepSeek-V3.1	66%	671B

Local Models (Fits 48GB)

Model	SWE-bench	Result
devstral-small-2	68.0%	⭐ Winner
qwen3-coder:30b	51.6%	✅ Good
deepseek-r1:32b	41.4%	❌ No tools
qwen2.5-coder:32b	9.0%	❌ Stuck

The gap is surprisingly small. devstral-small-2 at 68% matches Claude Haiku 4.5 and trails Opus by only 6-8 points. A 24B model running locally keeps up with 100B+ models - turns out agentic training matters more than size.

SWE-bench score also predicts Claude Code success: models without published scores aren’t coding-focused and failed my tests.

Conclusions

Local models can do real agentic work now. devstral-small-2 completed the task reliably, with no hand-holding. It’s slower than cloud (17 min vs 2 min), but it runs on my laptop completely offline.

Key Takeaways

devstral-small-2 wins - best results, smallest footprint, built for this
The gap is smaller than I expected - 68% SWE-bench matches Haiku, trails Opus by 8 points
Context window matters - Ollama defaults to 4K; bump it to 64K or watch models hallucinate
SWE-bench predicts success - no published score usually means it won’t work
Speed hurts - 17-24 minutes vs 2 minutes on cloud
Check tool support first - not all models work with Ollama’s Anthropic API

What Works

devstral-small-2 and qwen3-coder both work reliably. The tool calling infrastructure is solid when the model supports it. Ollama 0.14.0 makes setup easy - no more LiteLLM translation layer.

What Doesn’t Work (Yet)

Most models can’t finish multi-step agentic tasks without help. Context overflow causes hallucinations (fabricated URLs, wrong repo names). And 8-12x slower than cloud is hard to ignore.

Critical: Set Context to 64K+

Ollama defaults to 4K context regardless of what model cards advertise. Claude Code’s system prompts overflow this, causing silent failures or hallucinations.

Context	Result
4-16K	❌ Zero tool calls
32K	⚠️ Starts fine, then hallucinates
64K+	✅ Works

Quick Start

# 1. Install Ollama 0.14.0+ and pull devstral
ollama pull devstral-small-2

# 2. Set context to 64K in Ollama settings (GUI slider)

# 3. Add alias to ~/.zshrc
alias claude-local='ANTHROPIC_BASE_URL=http://localhost:11434 ANTHROPIC_API_KEY=ollama CLAUDE_CODE_USE_BEDROCK=0 claude --model devstral-small-2'

# 4. Run it
source ~/.zshrc
claude-local

Read the full post on taletskiy.com

No Code, No Problem

Thu, 11 Dec 2025 00:00:00 +0000

PyData Boston 2025 lightning talk on non-code contributions to Open Source

Read the full post on taletskiy.com

The JupyterLab Extension Ecosystem at PyData Boston 2025

Wed, 10 Dec 2025 00:00:00 +0000

I had an opportunity to present at PyData Boston 2025 about analyzing the JupyterLab extension ecosystem. The talk analyzes the current state of the JupyterLab extension landscape in 2025 using public PyPI (via BigQuery) and GitHub data to quantify growth, momentum, and health by examining metrics such as monthly downloads by category, release recency, the relationship between stars and downloads, and the emergence of AI-focused extensions.

I had an opportunity to present at PyData Boston 2025 about analyzing the JupyterLab extension ecosystem.

The talk analyzes the current state of the JupyterLab extension landscape in 2025 using public PyPI (via BigQuery) and GitHub data to quantify growth, momentum, and health by examining metrics such as monthly downloads by category, release recency, the relationship between stars and downloads, and the emergence of AI-focused extensions.

Read the full post on taletskiy.com

Jupyter Open Studio Day SF 2025

Mon, 10 Nov 2025 00:00:00 +0000

The fun did not stop after JupyterCon. Next Monday after the conference, Bloomberg invited everyone to their office to collaborate on Jupyter projects. There were many of the friendly faces who decided to make a trek from Southern California to the Bay while there were here for JupyterCon.

Building on the momentum from the Sprint Day, I continued to explore those topics during the event.

I exported all GitHub PRs and issues related to the filebrowser package (label:pkg:filebrowser) and ran analysis with Claude to find which of 800+ items might relevant to upload/copy/move UX. As a good first issue to solve (I’ve never contributed to the JupyterLab core!) I prototyped the button to cancel the file upload in JupyterLab. Below is my feature in action, but the full presentation is available here.

I had a chat with participants about how my Jupyter Marketplace can be useful for developers, what additional signals to include. I appreciated a suggestion from Ely @ Bloomberg to include a contribution activity indicator (number of commits/issues/PRs over some period of time).
I had an opportunity to help Hannah Chen @ Bloomberg to try and set up my Auto Dashboards extension for generating Streamlit dashboards from Jupyter notebooks with live preview inside JupyterLab. She is uv user, so I learned how to do a development install using uv for JupyterLab extensions and updated my instructions.

Huge thanks to Ely and Bloomberg for the invite and organizing the event for us!

Read the full post on taletskiy.com

JupyterCon 2025 Reflections

Thu, 06 Nov 2025 00:00:00 +0000

Another JupyterCon is in the books! I have been a part of this community for the last 7 years, starting as a user, then building on top of Jupyter OSS projects and API and finally starting to contribute back to the core projects. I am really grateful for all the people I met along the way 🙏 This post is a reflection on my experience.

Another JupyterCon is in the books!

I have been a part of this community for the last 7 years, starting as a user, then building on top of Jupyter OSS projects and API and finally starting to contribute back to the core projects. I am really grateful for all the people I met along the way 🙏 This post is a reflection on my experience.

Extension Development Tutorial

One of the main ways I have always participated in the community is through JupyterLab extensions. This is what makes JupyterLab a next step after Notebook – an extensible architecture starting in the core itself (JupyterLab is built as a collection of extensions) and extending outward to allow exploring new ideas (collaboration, AI) and enhancing UX millions of users can rely on (git, LaTeX, ipywidgets). As an extension author, contributor and maintainer, I’ve seen an explosion of AI-related ideas in Jupyter space. To better highlight the changes happening in the ecosystem, I built a community extension marketplace labextensions.dev, which surfaces the most important signals (categories, downloads, GitHub stars) to both users and developers.

So, naturally, when the JupyterCon CFP opened, I submitted a workshop proposal combining the things I am most interested in: mentoring new generation of contributors and exploring AI coding tools in the ways they can be helpful (or not). Turns out, there were 3 more very similar workshops, so we combined forces with Rosio Reyes, Jason Grout and Matt Fisher and put together a full day tutorial! It was my first ever workshop I organized and I dove head first. It was no small feat, but our amazing team made it possible. I would also like to thank Lahari Chowtoori for providing AWS Bedrock credits for the participants, so they can use Claude Code; and Zach Sailer for agreeing to do a demo of Jupyter AI in action.

But when conference day rolled around, we were ready with a repo and a website complete with all the steps. It is fully open source (MIT license) and will be available to the community for the time being. You can find the tutorial materials here: jupytercon.github.io/jupytercon2025-developingextensions.

And I’m also happy to report that the entire session was recorded and uploaded to YouTube

Anatomy of the extension

The day went in a flash, but when it was all said and done we were able to see the impact clearly:

Participants were able to follow our instructions: we’ve seen 30 repos created during the tutorial
Participants enjoyed their experience and it felt empowering: in our DMs and public posts
Some participants (especially on Windows) struggled with the environment installation steps. Extensions are using somewhat complex stack (Python, nodejs) and tools like git or gh-cli were hard to get working. I would strongly consider creating a cloud-hosted backup option (i.e. GitHub Codespaces) to allow participants to have a ready-to-go environment if their local one is impossible to set up.
Despite the difficulties, at least one of the attendees (Lingtao Xie @ Esri) has since created a brand new JupyterLab extension, jupyterlab-todo-list! After the conference she mentioned that she enjoyed the workshop and invited feedback on the extension as she keeps learning React and TypeScript — exactly the kind of follow‑through and openness that makes this community so fun to work with.

We might also have made the wrong assumptions about the number of participants and their interests. This is because we had very limited data on the workshop participants from the conference organizers. Turns out, pre-registration for a particular workshop was not required, only for the workshop day. Additionally, badges were not scanned at the entrance to the room, so we have limited ways of knowing who attended the session. I hope this will be addressed by the Jupyter/Linux Foundation when planning the next JupyterCon!

Wrap up of the tutorial

Overall, I had a great time teaching people and troubleshooting with them as a TA. Most importantly, we laid a strong foundation for the next tutorials as we created a strong written guide alongside the presentation.

JupyterHub satellites

This is a talk I wanted to present for quite some time. I had a chance to do a brief intro to Notebooks Hub’s approach to running non-notebook applications at the previous JupyterCon, but the opportunity to present this in full detail came only after I departed the team. I am grateful to Axle for the opportunity to still present this material, especially, because the topic found such strong interest in the community.

Initially, I was planning to organize a Birds-of-a-Feather (BoF) session with JupyterHub deployers, but it ended up being a talk. As it turned out, Yuvi Panda @ 2i2c was giving his own perspective (Not Just Notebooks) on running applications, and his talk really echoed mine.

“JupyterHub satellites” as I call them, are really just applications other than Jupyter Notebook/Lab (RStudio, VSCode, Streamlit) orchestrated by JupyterHub. Even though the community had tools and recipes for a long time now, our approach was a little different, as we relied on standalone proxy (jhsingle_native_proxy) and standardized Docker containers. Both my and Yuvi’s talk highlighted the need to update documentation and centralize the existing recipes (scripts, Docker images) to unlock even more satellites (i.e. Marimo, Dyad, Positron, etc) by the community efforts.

Recent updates to Jupyter Server proxy adding standalone mode, finally allowed for both standalone and integrated experiences in JupyterHub with a unified codebase under an official Jupyter repo. Since jhsingle_native_proxy was not actively maintained, it provides an off-ramp for existing users to join the community effort.

After my talk, I heard from JupyterHub users who are interested in collaborating on open-sourcing the recipes for wrapping dashboards in Docker containers. I hope to meet them at the Hub Dash on December 2-3.

To wrap up, I would like to thank Yuvi Panda and Chris Holdgraf @ 2i2c for productive conversations on the topic before and after this talk!

Favorite talks

Even though I spent a lot of time at the Anaconda booth and in hallway conversations, I still managed to sneak out for a few talks that really stuck with me.

Incredibly inspiring

Very useful practical advice for breaking into OSS contributions

Do not compare F1 car and SUV; Jupyter is not an IDE

First conference as Anacondiac

This was my first conference as a OSS Jupyter developer working at Anaconda.

We had a strong showing this year with talks across multiple tracks, sponsored talk at the Demo Theater and a delightful booth where I got a chance to meet so many of our users!

Demo: Usage Patterns in the Jupyter Ecosystem (Jack Evans, Anaconda)

One of the most thought‑provoking sessions for me was Jack’s demo based on internal telemetry about how people actually use Jupyter. One stat that really stuck with me: based on Anaconda’s data, around 79% of users still prefer the classic Notebook interface over JupyterLab, which is a humbling reminder to keep investing in Notebook UX even as we push the ecosystem forward.

You can dig into the full deck here:

Lightning Talk: What’s New in Jupyter Frontends (Jeremy Tuloup, QuantStack & Rosio Reyes, Anaconda)

This was a fast but very dense overview of what’s landing across JupyterLab and the classic Notebook experience. As someone who works on extensions, it was great to see how improvements in the Lab frontend keep flowing back into the “plain notebook” UX that so many users still rely on.

The Lifecycle of a Jupyter Environment: From Exploration To Production‑Grade Pipelines (Dawn Wages, Anaconda)

Dawn’s talk did a great job walking through the journey from an exploratory notebook to a maintainable ETL pipeline, with practical tooling like Papermill, nbconvert and PyScript/Voila/Panel in the mix. I especially appreciated the emphasis on planning for production from the start instead of treating “pipeline-izing” as an afterthought.

Runtime Agents: Unleashing Event Sourced Collaboration for Jupyter (Kyle Kelley, Anaconda)

Kyle made a strong case for “moving notebooks to the server side” so that state lives independently of the browser tab. I loved seeing concrete demos of long‑running, resilient sessions and collaborative editing that felt much closer to how people actually work with notebooks day‑to‑day.

Conversations at the booth

Anaconda crew at the booth. Left to right: myself, Peter Wang, Dan Yeaw, Daina Bouquin, Rosio Reyes

Since conference was well attended by the local students (kudos to JupyterCon!), the topic of job searching advice came up a lot. As someone who mentored and interviewed engineers throughout my career, I highlighted the importance of pursuing personal projects and open source contributions. It really is such a powerful signal to hiring managers being able to see your ideas, contributions and code style in the open. I shared my favorite personal anecdote on how a very well crafted Colab notebook helped me get my first job after grad school. Below are some reflections of the students.

Fariha Sheikh’s reflections 👉 View this post on LinkedIn

Gor Abaghyan’s experience. At the booth and later at Sprints, we talked through his PokéAgent Challenge setup and I suggested Docker as a way to both debug and pick up a tool that would pay off later; after the conference he messaged that he’d built the mgba bindings from source, had the ReAct agent running, and completed about 30% of the run, describing it as a really great experience.

Exploring the city and connecting to fellow Anacondiacs

As we wrapped each day, the Anaconda team would pack into local restaurant and invite fellow Jovians for a dinner and conversation about Jupyter, Python and Open Source.

El Agave dinner: A memorable night of with great food and amazing people from the Jupyter community.

Dinner with the Deepnote team. Photo credit: Dawn Wages

Venue and city

Set in beautiful San Diego, this was a great place to be in the beginning of the November. Paradise Point resort did a great job creating such a welcoming experience. Continuing on JupyterCon 2023 success, this year’s catering was perfect. Not only they provided breakfast and lunch, but the variety of snacks and desserts like no other conference!

Photo credit: Dawn Wages

Paradise Point Resort beach at night – the end of a perfect JupyterCon day

Lights and decorations at Paradise Point Resort

Flowers at Paradise Point Resort

San Diego Gaslamp District

Ghirardelli store in Gaslamp district

After wrapping up the conference, I spent some quality time exploring San Diego with my family. The San Diego Zoo was a favorite, with its lush landscapes, panda exhibit, and countless other animal encounters.

San Diego Zoo was a big highlight for the family!

The panda exhibit was my daughter’s favorite

We also managed to visit Legoland, making the trip a perfect mix of work and play!

We topped off the trip with a visit to Legoland!

Sprint Day

The energy from the conference carried right into Sprint Day. Kirstie Whitaker and Zach Sailer kicked off the Sprints by opening the floor to anyone who had an idea on what to work together. One-by-one participants lined up to explain their ideas in 30 seconds. The diversity of topics was remarkable, spanning from infrastructure challenges like Kubernetes directory management and JupyterHub cost optimization, to emerging AI integrations including browser-based AI and Jupyter AI coordination. Others focused on improving the documentation and publishing ecosystem with MyST and JupyterBook enhancements, while several participants tackled developer experience improvements from package audits to Git workflows. What struck me was the balance between technical infrastructure work and efforts to make Jupyter more accessible – WYSIWYG editors, better documentation, and collecting user stories to understand pain points. This breadth really showcased how the Jupyter ecosystem continues to evolve in multiple directions simultaneously, driven by the diverse needs of its community.

Tackling File Browser UX Challenges

During the sprints, Andrew Thornton from Maxar raised an issue that resonated with many in the room: accidental drag-and-drop operations in JupyterLab can trigger large file copies with no visual feedback, no progress indicators, and no way to cancel them. His users were experiencing disk space issues and frozen servers from these unintentional operations.

This sparked a productive discussion where Taran Rorem shared that he had already solved this issue with a custom extension that wraps the file browser’s rename and move methods. He generously shared his approach using the IFileBrowserFactory interface, demonstrating how a relatively simple plugin could intercept these operations and add the missing feedback layer. We created a Zulip topic to continue tracking this issue and coordinate solutions.

This conversation opened up a broader examination of file operations UX in JupyterLab. Through discussions with various users and my own analysis of issues and PRs, I identified several critical gaps:

No cancellation for uploads: Users accidentally uploading large files have no choice but to wait or kill the server
Missing progress indicators: File copies and moves happen silently, leaving users uncertain if operations are running or complete
No operation queue visibility: When handling multiple file operations, users can only see one progress bar at a time
Risk of data corruption: Users may shut down servers thinking operations are complete when they’re still in progress

These are some rough edges causing daily frustrations for users working with large datasets, remote servers, or production workflows. I compiled these user stories and began prototyping solutions, which I later presented at the Jupyter Open Studio Day at Bloomberg (more on that in a future post).

Jupyter AI v3 and Personas

As we started working in groups, Jupyter AI took the stage to run through the setup and development of Personas for upcoming Jupyter AI v3. This topic was so popular that it captivated the room that morning, with multiple people (including myself) circling the stage with their chairs, laptops going full speed.

I helped to start a Zulip thread documenting the setup steps. While, still in their early days, the Persona approach is a very powerful concept, deliberately steering away from the currently popular “agent” approach of 2025, and combining traits of AI models and tools under one umbrella. If you want to learn more about the philosophy of Jupyter’s approach to AI, watch an interview with Brian Granger on his vision for Jupyter, AI, and collaboration between humans and machines at TheNewStack.

How does one create a Jupyter AI Persona? Turns out, it is very simple. You just need to write a Python class inheriting from the BasePersona, which overrides metadata PersonaDefaults (so that your Persona has its own name) and process_message which receives the input Message and sends it back to the chat self.send_message.

With this simple API comes a great power and a great responsibility.

Power comes from not being tied to a particular AI framework (i.e. LangChain in JupyterLab AI v1 and v2). You can readily grab a simple SDK usage example from a provider’s docs and add it to your Persona. Boom – you’ve got yourself a support of a new provider in Jupyter AI. After seeing the demos, I immediately wanted to experiment with Cerebras AI, which enables very fast inference at 1000-2000 tokens/s.

To explore the possible issues with a new API, I created a silly “hacker” persona that immediately deletes all .ipynb files in the directory when mentioned. Sadly, it just worked, so the community needs to figure out our approach to this issue – either enabling guardrails or building a trusted ecosystem of personas (after all users are always responsible for what they install with pip intall, this is no different).

Your browser does not support the video tag. *Demo of the "hacker" persona in action*

Community Collaboration in Action

What struck me most about Sprint Day was how quickly the community rallied around these seemingly “small” UX issues that have big impacts on daily productivity. Within hours, we had identified the problems, shared existing solutions, created tracking issues, and started planning implementations. This is the Jupyter community at its best – practitioners identifying real problems and immediately working together toward solutions.

Community

Throughout the conference, I had a great time talking with members of Jupyter community, in particular our tutorial working group and Community Building Working Group Members, which (not surprisingly!) have a big intersection.

Decompressing after our workshop with Rosio Reyes and Matt Fisher at An’s Electornics Repair ice cream shop. Not a real repair shop, but features menu items on a CRTs!

It was so awesome to meet the people whom I regularly see on my screen, in GitHub issues, in Zoom calls!

And of course, it was such a privilege to shake hands and talk to the Project Jupyter leaders and creators: Fernando Pérez, Brian Granger, Min Ragan-Kelley and many others.

At JupyterCon with Brian Granger, Sylvain Corlay and Jason Weill in the back

Read the full post on taletskiy.com

Beyond Code Comparison: Mito's Functional Evaluation Approach to AI Testing

Sun, 06 Apr 2025 00:00:00 +0000

Learn how Mito's execution-based evaluation approach focuses on the functional results of AI-generated code rather than superficial similarity.

Evaluating the performance and capabilities of Large Language Models (LLMs) is a crucial process, and while there is no single unified approach, patterns can be adapted for specific use cases. For general-purpose models, benchmarks like MMLU are popular, but for domain- and task-specific models, more focused evaluations often provide greater benefit. When it comes to AI-generated code for tasks like data analysis, traditional evaluation methods often fall short by focusing on superficial code similarity. Mito embraces a different paradigm, centering its evaluation on what truly matters: the functional results of the generated code.

The Problem with Traditional Code Evaluation

Traditional code evaluation frequently falls into the trap of expecting AI-generated code to be a carbon copy of a human-written reference solution. This “string matching trap” overlooks the fundamental truth that there are countless valid ways to write code that achieves the same outcome. Differences in stylistic choices like variable names, code formatting, or even the algorithmic approach taken do not necessarily impact the code’s functionality. Insisting on exact string matches can stifle AI creativity and miss equally valid, albeit structurally different, solutions.

Mito’s Execution-Based Evaluation

Instead of getting bogged down in syntactic comparisons, Mito’s evaluation system focuses on what code actually does: it executes both the AI-generated code and a reference solution within isolated environments and then compares their effects.

This execution-based approach assesses two critical dimensions:

1. Global Variable State Comparison

After the execution of both code snippets, Mito intelligently compares the variables that were created or modified. This comparison is not a simple object identity check but understands the nuances of different data types:

For basic data types like integers and strings, a direct equality check is performed.
For pandas DataFrames, DataFrame-specific equality methods are used to compare the data content, irrespective of object identity.
NumPy arrays are compared using functions that correctly handle special values like NaN.
For custom objects, their defined equality behavior is respected.

This sophisticated comparison means that if two code solutions generate a DataFrame with the same data, they will pass this test, even if their underlying implementations (e.g., using df.query() versus boolean indexing) differ.

2. Output Comparison

Beyond variable state, Mito also captures and compares the standard output (anything printed to the console) from both executions. This ensures that any visualization or reporting functionality works identically.

summary = df.groupby('region').agg({
    'revenue': 'sum',
    'order_value': 'mean'
}).reset_index()

Real-World Example: The Power of Functional Equivalence

Consider this example:

User request: “Create a DataFrame summarizing sales by region, showing total revenue and mean order value.”

Reference solution:

summary = df.groupby('region').agg({
    'revenue': 'sum',
    'order_value': 'mean'
}).reset_index()

AI-generated solution:

region_groups = df.groupby('region')
total_revenue = region_groups['revenue'].sum()
avg_order = region_groups['order_value'].mean()
summary = pd.DataFrame({
    'region': total_revenue.index,
    'revenue': total_revenue.values,
    'order_value': avg_order.values
})

Under traditional string comparison, these would be considered completely different. But Mito’s execution-based evaluation recognizes that both produce functionally identical summary DataFrames with the same data.

Why This Approach Matters

This execution-focused evaluation brings several critical advantages:

Embraces AI Creativity: It allows AI to find novel solutions rather than forcing it to mimic a specific style.
Focuses on User Intent: What matters is whether the AI satisfied the user’s request, not how it constructed the solution.
Handles Edge Cases Naturally: The comparison automatically handles complexities like floating-point precision differences and object equivalence.
Mirrors Real-World Usage: Users care about results, not code aesthetics—this approach aligns evaluation with actual success criteria.
Enables Objective Measurement: Success is binary and objectively determinable: either the code produces the correct output and variable state, or it doesn’t.

Implementing Your Own Execution-Based Evaluation

The core of Mito’s approach can be adapted by others building AI coding tools:

Execute reference and AI code in isolated environments
Capture the resulting variable state and outputs
Compare them using type-appropriate equality checks
Base success on functional equivalence rather than code similarity

This shift from syntactic comparison to functional evaluation represents a fundamental advancement in how we should evaluate AI-generated code—focusing on what matters most: whether the code does what the user asked for.

By focusing on execution results rather than implementation details, Mito has created an evaluation framework that truly measures what matters for users—reliable results over superficial code similarity.

Read the full post on taletskiy.com

Building Auto Dashboards - A Hackathon Journey

Sat, 08 Feb 2025 00:00:00 +0000

Reflecting on my experience of creating the Auto Dashboards project during a hackathon.

How It All Started

The first Bay Area H4CK D4Y has wrapped up, and what an exhilarating experience it was! As we stepped into 2025, I was eager to capture the enthusiasm and hope for the new year. Most importantly, I wanted to embrace that uncomfortably exciting itch to build and make something happen—a sentiment shared by my fellow participants.

Ten passionate, friendly, and capable hackers gathered to build on the day. We explored every possible Gen AI technology under the sun and worked on real-world projects. The day was so productive that we didn’t have enough time to finish the demos!

The amazing team of hackers at the Bay Area H4CK D4Y.

My demo session showcasing Auto-Dashboard POC.

The Hackathon Experience

I had a blast at H4CK D4Y Bay Area! We tested all the latest coding agents, and I was particularly impressed with Cline in combination with Gemini 2.0 Pro. Others experimented with Devin and Windsurf.

Here’s a glimpse of the projects we worked on:

Automated university assignment grader
Tool to infer security breaches from logs
Book ranking tracker
Tool to convert Jupyter Notebooks to Streamlit apps
Voice transcription
Document classification and filing tool

Auto-Dashboards

During the event, I built and demoed a tool for single-click notebook-to-dashboard conversion, with results rendered side-by-side in JupyterLab. The source will be published here.

I started with the existing Streamlit rendering extension from Elyra and added a new endpoint and UI to convert notebooks to Python script and send to LLM for code-to-code translation. The output is Streamlit dashboard with tables, headings and widget APIs from Jupyter replaced by the ones from Streamlit.

pip install auto-dashboards jupyterlab

Conclusion

A big thank you to Jasmine Robinson, Luke Fernandez, Paul “π” Ivanov, Smit Lunagariya, CL Kao, Salman Munaf, and Scott Behrens for an incredibly fun day. Special thanks to Itay Dafna, the organizer, who brought together a diverse group of participants from companies like Netflix, Google, and TikTok from across Bay Area.

Read the full post on taletskiy.com

Notebooks Hub at JupyterCon 2023

Wed, 10 May 2023 00:00:00 +0000

I had an opportunity to present a lightning talk at JupyterCon 2023 in Paris. The talk starts at 36:50 mark.

Read the full post on taletskiy.com

Play MIDI tones on Adafruit Clue

Thu, 05 May 2022 00:00:00 +0000

Learn how to play MIDI tones on the Adafruit Clue microcontroller using CircuitPython. This tutorial explores converting MIDI files into playable sounds using the device's built-in speaker, creating a foundation for game development.

I’ve recently got an Adafruit Clue microcontroller (shout out to Chipy and JFrog for the prize!). It is packed with sensors and a color LCD screen, but the best thing - it is programmable with Circuit Python. With the small screen and couple of buttons to the side, it reminded me of some sort of old pocket gaming console. I decided to build a little game that would run on Clue. When I started, there were no gaming engines written to run on Clue or CircuitPython, so I decided to build my own.

What good game is without a soundtrack? The first thing I got interested in when creating a game was playing a sounds with a tiny speaker found on Clue. It only has a basic API allowing to play a tone with given frequency for a given period of time.

The Adafruit Clue has a built-in tiny speaker that can be controlled using CircuitPython’s audioio module. Here’s the basic API for playing tones:

from adafruit_clue import clue

# Play a tone at 440 Hz (A4 note) for 1 second
clue.play_tone(440, 1.0)

# Play a higher note (C5 at 523 Hz) for 0.5 seconds
clue.play_tone(523, 0.5)

The play_tone function takes two parameters:

frequency: The frequency of the tone in Hz (higher values create higher-pitched sounds)
duration: How long to play the tone in seconds

This simple API makes it easy to play individual tones, but it blocks program execution while the tone plays. For more complex sounds or background music, you would need to implement timing mechanisms around this basic functionality.

There is an excellent post explaining the connection between musical notes and frequencies and a CircuitPython code for playing Jingle Bells and Hanukkah tune: https://blog.wokwi.com/play-musical-notes-on-circuitpython/

But what if we don’t have the notes written down for us? I thought about finding the soundtrack I like in MIDI format on the Internet and dropping that on my Clue drive folder, so that Python script can read the file and convert to a sequence of notes. What I didn’t know is the internals of the MIDI file. It is more complicated than just a sequence of notes, but we can still extract that information.

Extracting notes

https://github.com/mido/mido

Read the full post on taletskiy.com

Create a desktop shortcut for JupyterLab on Windows

Mon, 03 Feb 2020 00:00:00 +0000

Simple guide for creating a convenient desktop shortcut to launch JupyterLab

Read the full post on taletskiy.com

Pool Limited Queue Processing in Python

Sun, 02 Feb 2020 00:00:00 +0000

A practical guide to using Python's multiprocessing library to implement a system where many parallel processes write to a queue, while a limited pool of workers processes those queue items. Includes solutions for common challenges and Windows-specific issues.

I was recently confronted with a problem: I needed to build a large number (order of 100) of Docker containers and then push them to the registry. Docker SDK for Python provided an excellent handle on that, and together with multiprocessing library allowed to parallelize the task very effectively. However, after some initial testing I discovered that pushing multiple images to registry got stalled likely due to an overload of simultaneous uploads. In my testing, I was only able to run 2-3 simultaneous docker push commands until all the new ones I add got stalled. At that point I decided to limit the simultaneous uploads to the small number of parallel threads, while still utilizing large number of threads to facilitate image builds. Combination of queue (multiprocessing.Queue) for passing down the work from builder threads to pusher threads and thread pool (multiprocessing.Pool) looked like a best candidate. Yet, there are small nuances and gaps in documentation which took me some time to understand (especially when using multiprocessing on Windows). Below, I provide a small tutorial on how to use these data structures and objects.

Problem formulation

In this toy problem we have a large array of parallel Processes writing results into the Queue. Alongside them, there is a single-threaded reader Process checking for new items in the Queue and assigning them to new Processes in the Pool, such that only a small fixed number of these Processes are running at the same time. Let’s go through all the elements below.

`Process`

For our large array of parallel threads on the left we are going to use multithreading.Process(). From official the reference: “Process objects represent activity that is run in a separate process”. Starting a process(es) requires 2 things: the target function called and the Process call itself. Let’s take a look:

from multiprocessing import Process

def proc(i):
    print(f'I am Process {i}')

if __name__ ==  '__main__':
    for i in range(10):
        Process(target=proc, args=(i,)).start()

In the example above we created 10 Processes and launched them all at the same time. Each process is running an instance of proc() function with arguments taken from arg. Because the order of execution is not guaranteed, if we run it we get something like:

I am Process 6
I am Process 2
I am Process 0
I am Process 3
I am Process 7
I am Process 4
I am Process 8
I am Process 1
I am Process 5
I am Process 9

Notice also the interesting syntax of the args=(i,). Process requires that args is iterable, so changing it to args=(i) or args=i will lead to a TypeError.

`Queue`

Now, it is time to introduce a multithreading.Queue(). According to reference it “returns a process shared queue implemented using a pipe and a few locks/semaphores”. Queue will allow us to put objects into it and then process them elswhere asynchronously. Importantly, queues are thread and process safe. Let’s modify our previous example to add the Queue object and pass it to our parallel Processes:

from multiprocessing import Process, Queue

def writer(i,q):
    message = f'I am Process {i}'
    q.put(message)

if __name__ ==  '__main__':    
    # Create multiprocessing queue
    q = Queue()
    # Create a group of parallel writers and start them
    for i in range(10):
        Process(target=writer, args=(i,q,)).start()

    # Read the queue sequentially
    for i in range(10):
        message = q.get()
        print(message)

Keep in mind that Queue.get() is a blocking method, so we are not going to miss any messages in that queue.

The next step in solving our problem is to switch to a parallel reads from the queue. We could just spawn the reader processes in the same way we spawned writers, but that will permit up 10 threads run in parallel. What should we do if we are limited by the smaller number of readers like in the original problem description?

`Pool`

Enter multithreading.Pool(): “A process pool object which controls a pool of worker processes to which jobs can be submitted. It supports asynchronous results with timeouts and callbacks and has a parallel map implementation”. Using Pool we can assign as many parallel processes as we like, but only the processes number of threads will be active at any given moment.

Let’s see how it will behave if we through all the readers to the Pool:

from multiprocessing import Process, Queue, Pool

def writer(i,q):
    message = f'I am Process {i}'
    q.put(message)

def reader(i,q):
    message = q.get()
    print(message)

if __name__ ==  '__main__':    
    # Create multiprocessing queue
    q = Queue()

    # Create a group of parallel writers and start them
    for i in range(10):
        Process(target=writer, args=(i,q,)).start()

    # Create multiprocessing pool
    p = Pool(10)

    # Create a group of parallel readers and start them
    # Number of readers is matching the number of writers
    # However, the number of simultaneously running
    #   readers is constrained to the pool size
    for i in range(10):
        p.apply_async(reader, (i,q,))

However, if we run the code above, we will get no output. What happened? When we called apply_async, the code execution immediately moved on and, since nothing else has left in the main function, exited. Thankfully, multiprocessing reference provides a way to wait for the execution results:

from multiprocessing import Process, Queue, Pool

def writer(i,q):
    message = f'I am Process {i}'
    q.put(message)

def reader(i,q):
    message = q.get()
    print(message)

if __name__ ==  '__main__':    
    # Create multiprocessing queue
    q = Queue()

    # Create a group of parallel writers and start them
    for i in range(10):
        Process(target=writer, args=(i,q,)).start()

    # Create multiprocessing pool
    p = Pool(10)

    # Create a group of parallel readers and start them
    # Number of readers is matching the number of writers
    # However, the number of simultaneously running
    #   readers is constrained to the pool size    
    readers = []
    for i in range(10):
        readers.append(p.apply_async(reader, (i,q,)))
    
    # Wait for the asynchrounous reader threads to finish
    [r.get() for r in readers]

This time, if we run the code we will get the following error: RuntimeError: Queue objects should only be shared between processes through inheritance. The multiprocessing.Manager will enable us to manage the queue and to also make it accessible to different workers:

from multiprocessing import Process, Queue, Pool, Manager

def writer(i,q):
    message = f'I am Process {i}'
    q.put(message)

def reader(i,q):
    message = q.get()
    print(message)

if __name__ ==  '__main__':    
    # Create manager
    m = Manager()
    
    # Create multiprocessing queue
    q = m.Queue()

    # Create a group of parallel writers and start them
    for i in range(10):
        Process(target=writer, args=(i,q,)).start()

    # Create multiprocessing pool
    p = Pool(10)

    # Create a group of parallel readers and start them
    # Number of readers is matching the number of writers
    # However, the number of simultaneously running
    #   readers is constrained to the pool size    
    readers = []
    for i in range(10):
        readers.append(p.apply_async(reader, (i,q,)))
    
    # Wait for the asynchrounous reader threads to finish
    [r.get() for r in readers]

Finally, we get the results we expect:

> python pl.py
I am Process 1
I am Process 4
I am Process 9
I am Process 8
I am Process 0
I am Process 5
I am Process 7
I am Process 2
I am Process 6
I am Process 3

I initially started working on this problem on a Linux-based machine, but later continued on Windows. Unfortunately many of the things did not work immediately. Here are the things you need to know:

Interrupting the program execution (Ctrl+C) will not work right away with the code above. The workaround would be to add initializer workers:

def init_worker():
    """
    Pool worker initialization, required for keyboard interrupt on Windows
    """
    signal.signal(signal.SIGINT, signal.SIG_IGN)

p = Pool(num_readers, init_worker)

I was not able to run the code in Jupyter notebook on Windows, unless I move worker functions into separate .py file and import them to my notebook. Related to that, you won’t be able to run the scripts above without wraping the main code into if __name__ == '__main__':

Final Result

As a finishing touches, let’s add the following:

delays to imitate CPU-bound work on reader and writer
Exception handling when waiting for reader threads to finish
Configurable number of writer and reader threads
Some function documentation

Here is the final result:

from multiprocessing import Pool, Queue, Process, Manager
import random
import signal
import time

num_writers = 10
num_readers = 3

def writer(i,q):
    # Imitate CPU-bound work happening in writer
    delay = random.randint(1,10)
    time.sleep(delay)

    # Put the result into the queue
    t = time.time()
    print(f'I am writer {i}: {t}')
    q.put(t)

def init_worker():
    """
    Pool worker initialization, required for keyboard interrupt on Windows
    """
    signal.signal(signal.SIGINT, signal.SIG_IGN)

def reader(i, q):
    """
    Queue reader worker
    """

    # Read the top message from the queue
    message = q.get()

    # # Imitate CPU-bound work happening in reader
    time.sleep(3)
    print(f'I am reader {i}: {message}')

if __name__ ==  '__main__':
    # Create manager
    m = Manager()
    
    # Create multiprocessing queue
    q = m.Queue()

    # Create a group of parallel writers and start them
    for i in range(num_writers):
        Process(target=writer, args=(i,q,)).start()

    # Create multiprocessing pool
    p = Pool(num_readers, init_worker)

    # Create a group of parallel readers and start them
    # Number of readers is matching the number of writers
    # However, the number of simultaneously running
    #   readers is constrained to the pool size
    readers = []
    for i in range(10):
        readers.append(p.apply_async(reader, (i,q,)))
    
    # Wait for the asynchrounous reader threads to finish
    try:
        [r.get() for r in readers]
    except:
        print('Interrupted')
        p.terminate()
        p.join()

If you run it, you will get something like this:

> python final.py
I am writer 8: 1580659076.783544
I am writer 3: 1580659076.783544
I am reader 0: 1580659076.783544
I am reader 1: 1580659076.783544
I am writer 7: 1580659079.7990372
I am writer 2: 1580659080.7971141
I am writer 1: 1580659081.785277
I am writer 4: 1580659082.7955923
I am reader 2: 1580659079.7990372
I am reader 3: 1580659080.7971141
I am writer 6: 1580659083.800029
I am writer 0: 1580659084.7862694
I am reader 4: 1580659081.785277
I am writer 9: 1580659085.7819643
I am writer 5: 1580659085.7919443
I am reader 5: 1580659082.7955923
I am reader 6: 1580659083.800029
I am reader 7: 1580659084.7862694
I am reader 8: 1580659085.7819643
I am reader 9: 1580659085.7919443

Read the full post on taletskiy.com

On Pytorch Tensors and Autograd

Sun, 26 Jan 2020 00:00:00 +0000

A brief explanation of PyTorch's Autograd system, computational graphs, and how gradient calculation works in deep learning frameworks.

On Pytorch Tensors and Autograd

Somehow, Pytorch blitz tutorial on Autograd completely confused me. I could not understand what does .backward(), .grad and grad_fn do.

Fortunately, I found an excellent explanation of Autograd and Computational Graph is here: https://blog.paperspace.com/pytorch-101-understanding-graphs-and-automatic-differentiation/. Just for my notes and anyone interested, I am going to leave my short recap here:

Computational Graph - records the order of operations on tensors in the graph. Edges of the graph represent the local gradients. Leafs of the graph are independent variables (inputs and weights/biases in case of NN)
tensor.backward() computes the gradients all the way back through the computational graph and accumulate results in leafs. Can only be called on the 0-rank tensor.
tensor.grad holds the accumulated gradient from the call to .backward() with respect to the given tensor

Read the full post on taletskiy.com

US Radio Spectrum Interactive Visualization with Python and BokehJS

Wed, 07 Aug 2019 00:00:00 +0000

Creating an interactive visualization of the US radio frequency spectrum using Python and BokehJS to explore frequency allocations across different bands, replacing static FCC infographics with a dynamic, color-coded interface.

So, I recently came across the Bokeh visualization library which uses Python to generated plots, but they are rendered in JS (like the most of the great interactive visualization tools recently). I also noticed my fellow Twitter user took on the challenge to learn Bokeh over the next 30 days and post their results, so I decided to try as well.

I was hearing a lot about 5G coming to US soon and got curious about what frequencies will be used, and which are available. I did not know much about it so I searched and found this PDF infographics on FCCs website: https://www.ntia.doc.gov/files/ntia/publications/2003-allochrt.pdf. There were couple of problems with it: it was too small and non-interactive, so I decided to make a better one myself.

The preliminary result is here:

You can scroll with your mouse wheel or swipe left and right with touch/pointer. If hover over the band, it shows the purpose of the band. Similar purposes were automatically colored into the same colors.

Read the full post on taletskiy.com

Displaying real-time webcam stream in IPython at (relatively) high framerate

Sun, 15 Apr 2018 00:00:00 +0000

How to efficiently display webcam video feeds in Jupyter notebooks

Read the full post on taletskiy.com

Blogs

700 JupyterLab 4 Extensions!

What Are JupyterLab Extensions?

The Ecosystem at 700

How We Got Here

Where the Extensions Are

JupyterLab’s AI Layer Starts Taking Shape

Reproducibility Gets a Toolchain

Marimo Comes to JupyterLab

Science

Accessibility

27 Extensions, One Platform

Want to Build Your Own?

How We Track This

What’s Next

Skills Hub: Building an Enterprise Trust Layer for AI Agent Skills

The setup

The problem: AI assistants don’t know your rules

The bigger picture: a supply chain problem for AI behaviors

What we built: Skills Hub

The demo that won it

How we built it: AI-assisted spec-driven development

The skill I contributed: snakehug

What’s next (and what didn’t make the demo)

Thanks

Jupyter Projspec: Bringing Project Discovery to JupyterLab

What is Jupyter Projspec?

Featured on Anaconda’s Podcast

Why Build This?

How It Works

Projspec’s Three Concepts

The Architecture

Building It: A Collaborative Effort

What’s Next

Try It Out

Adding Voice to Claude Code (with Audio Ducking)

Claude Code Can Talk

Audio Ducking

Configuration

Try It

Which local models actually work with Claude Code on a 48GB MacBook Pro?

I Tested 18 Local Models So You Don’t Have To

My Setup

Models

Experiments

nemotron-3-nano:30b

gpt-oss:20b

devstral-small-2:24b ⭐ Winner

qwen3-coder:30b

granite4:32b-a9b-h

qwen3:30b

qwen2.5-coder:32b

mistral-small3.2:24b

magistral:24b

cogito:32b

cogito:14b

command-r:35b

Results

✅ Worked

⚠️ Completed With Issues

❌ Failed

Model Outputs

Failure Modes

How Local Models Compare to Cloud

Conclusions

Key Takeaways

What Works

What Doesn’t Work (Yet)

Critical: Set Context to 64K+

Quick Start

No Code, No Problem

The JupyterLab Extension Ecosystem at PyData Boston 2025

Jupyter Open Studio Day SF 2025

JupyterCon 2025 Reflections

Extension Development Tutorial

JupyterHub satellites

Favorite talks

First conference as Anacondiac

Demo: Usage Patterns in the Jupyter Ecosystem (Jack Evans, Anaconda)

Lightning Talk: What’s New in Jupyter Frontends (Jeremy Tuloup, QuantStack & Rosio Reyes, Anaconda)

`nemotron-3-nano:30b`

`gpt-oss:20b`

`devstral-small-2:24b` ⭐ Winner

`qwen3-coder:30b`

`granite4:32b-a9b-h`

`qwen3:30b`

`qwen2.5-coder:32b`

`mistral-small3.2:24b`

`magistral:24b`

`cogito:32b`

`cogito:14b`

`command-r:35b`

`Process`

`Queue`

`Pool`