<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Blogs</title>
    <link>https://taletskiy.com/blogs/</link>
    <description>All blog posts from Konstantin Taletskiy.</description>
    <generator>Hugo</generator>
    <language>en-us</language>
    <atom:link href="https://taletskiy.com/blogs/index.xml" rel="self" type="application/rss+xml"></atom:link>
    <item>
      <title>700 JupyterLab 4 Extensions!</title>
      <link>https://taletskiy.com/blogs/700-jupyterlab-extensions/</link>
      <pubDate>Fri, 13 Mar 2026 00:00:00 &#43;0000</pubDate>
      <guid>https://taletskiy.com/blogs/700-jupyterlab-extensions/</guid>
      <description><![CDATA[<p>The JupyterLab extension ecosystem just crossed 700 extensions compatible with JupyterLab 4. Here's what the latest wave tells us about where notebooks are heading.</p><p><a href="https://taletskiy.com/blogs/700-jupyterlab-extensions/">Read the full post on taletskiy.com</a></p>]]></description>
      <content:encoded><![CDATA[<p><img src="https://taletskiy.com/img/700_extensions_collage.png" alt="700 JupyterLab 4 Extensions!"></p><p>The JupyterLab extension ecosystem just crossed 700 extensions compatible with JupyterLab 4. Here's what the latest wave tells us about where notebooks are heading.</p><p><img src="https://taletskiy.com/img/700_extensions_collage.png" alt="700 extensions for JupyterLab 4, and counting!"></p>
<p>The JupyterLab extension ecosystem just crossed <strong>700 extensions compatible with JupyterLab 4!</strong></p>
<p>That&rsquo;s 700 community-built plugins — from astronomical data viewers to reactive notebook editors, from genome browsers to workflow managers — created by hundreds of developers, research labs, and companies around the world.</p>
<h2 id="what-are-jupyterlab-extensions">What Are JupyterLab Extensions?</h2>
<p>Extensions are how JupyterLab becomes a Git client, a dashboard builder, a genomics viewer, or an AI workspace — without changing the core application. Install one with <code>pip install</code>, and it activates automatically.</p>
<p><img src="https://taletskiy.com/img/popular_extensions_collage.png" alt="Popular JupyterLab extensions"></p>
<p>This is by design: JupyterLab itself is built as a collection of extensions — the <a href="https://github.com/jupyterlab/jupyterlab/tree/main/packages/filebrowser">file browser</a>, the <a href="https://github.com/jupyterlab/jupyterlab/tree/main/packages/notebook">notebook editor</a>, the <a href="https://github.com/jupyterlab/jupyterlab/tree/main/packages/terminal">terminal</a> are all plugins. The same architecture that powers the core lets the community build what they need. For background, see <a href="https://blog.jupyter.org/99-ways-to-extend-the-jupyter-ecosystem-11e5dab7c54">99 ways to extend the Jupyter ecosystem</a>.</p>
<h2 id="the-ecosystem-at-700">The Ecosystem at 700</h2>
<ul>
<li><strong>700+ extensions compatible with JupyterLab 4</strong></li>
<li><strong>~960 total extensions</strong> published on PyPI</li>
<li><strong>~9.8 million downloads/month</strong></li>
<li><strong>100M+ total downloads</strong> in the past year</li>
</ul>
<p>By any measure, a substantial software layer has grown around JupyterLab.</p>
<h2 id="how-we-got-here">How We Got Here</h2>
<p>The ecosystem crossed <strong>600 JL4-compatible extensions in late October 2025</strong>, days before <a href="https://www.jupytercon.com/">JupyterCon in San Diego</a>. At the conference, we ran a full-day <a href="https://jupytercon.github.io/jupytercon2025-developingextensions/">Extension Development for Everyone</a> tutorial with hands-on rapid prototyping. By early March 2026, we hit <strong>700</strong>.</p>
<p>The ecosystem has been growing at a steady pace, averaging about 18 new extensions per month, with November 2025 setting an all-time monthly record of 33. Modern tooling is helping: better templates, documentation, and code generation tools have lowered the bar for what once required deep familiarity with TypeScript, Lumino, and JupyterLab internals.</p>
<h2 id="where-the-extensions-are">Where the Extensions Are</h2>
<p><img src="https://taletskiy.com/img/extensions_per_category.png" alt="Number of JupyterLab extensions by category"></p>
<p><img src="https://taletskiy.com/img/extensions_downloads_per_category.png" alt="Monthly PyPI downloads by category"></p>
<p>Development &amp; Version Control dominates both in count (267) and downloads (5.4M/month). Visualization &amp; Dashboards (2.7M/month) and System &amp; Resource Management (602K/month) round out the top three most downloaded categories. But the fastest-growing categories point to where things are heading. Here&rsquo;s what&rsquo;s new in 2026:</p>
<h2 id="jupyterlabs-ai-layer-starts-taking-shape">JupyterLab&rsquo;s AI Layer Starts Taking Shape</h2>
<p>AI isn&rsquo;t yet the biggest category in JupyterLab, but it may be the clearest signal of where new interaction patterns are emerging:</p>
<ul>
<li><strong><a href="https://labextensions.dev/extensions/jupyter-ai-acp-client?utm_source=blog&amp;utm_medium=post&amp;utm_campaign=700_extensions">jupyter-ai-acp-client</a></strong> — Brings external AI agents into JupyterLab&rsquo;s chat via the Agent Communication Protocol. Ships with Claude Code and Kiro personas.</li>
<li><strong><a href="https://labextensions.dev/extensions/nb-margin?utm_source=blog&amp;utm_medium=post&amp;utm_campaign=700_extensions">nb-margin</a></strong> — Annotate cells with margin comments, and Claude Code edits them. A different paradigm from chat-based AI.</li>
<li><strong><a href="https://labextensions.dev/extensions/jupyterlite-ai-kernels?utm_source=blog&amp;utm_medium=post&amp;utm_campaign=700_extensions">jupyterlite-ai-kernels</a></strong> — AI-powered kernels for JupyterLite, from Jeremy Tuloup. AI-assisted computation entirely in the browser.</li>
<li><strong><a href="https://labextensions.dev/extensions/jupyter-chat-components?utm_source=blog&amp;utm_medium=post&amp;utm_campaign=700_extensions">jupyter-chat-components</a></strong> — Reusable chat UI components from Project Jupyter — building blocks for the next generation of AI tools.</li>
</ul>
<p>These extensions reflect what the JupyterLab team identified as a 2026 priority: first-class integration with AI tooling.</p>
<h2 id="reproducibility-gets-a-toolchain">Reproducibility Gets a Toolchain</h2>
<p><strong><a href="https://labextensions.dev/extensions/calkit-python?utm_source=blog&amp;utm_medium=post&amp;utm_campaign=700_extensions">calkit-python</a></strong> is the most downloaded new extension of 2026 (11,000+ monthly downloads). It gives notebooks project-scoped environments, graphical package management via Astral&rsquo;s <code>uv</code>, and one-click notebook pipelines with freshness tracking. Think &ldquo;Makefiles for notebooks&rdquo; meets &ldquo;Poetry for Jupyter.&rdquo;</p>
<p><img src="https://taletskiy.com/img/calkit_screenshot.png" alt="Calkit manages notebook pipelines with environment tracking and one-click reruns. The orange ‘run’ button signals stale outputs that need to be regenerated."></p>
<p><strong><a href="https://labextensions.dev/extensions/jupyter-projspec?utm_source=blog&amp;utm_medium=post&amp;utm_campaign=700_extensions">jupyter-projspec</a></strong> (from the fsspec contributors) takes a complementary approach — it brings <a href="https://github.com/fsspec/projspec">projspec</a> into JupyterLab, letting you scan and analyze project structures directly from the notebook environment.</p>
<p><img src="https://taletskiy.com/img/jupyter-projspec-hero.png" alt="jupyter-projspec integrates to system filebrowser to show the project metadata"></p>
<h2 id="marimo-comes-to-jupyterlab">Marimo Comes to JupyterLab</h2>
<p><a href="https://marimo.io">Marimo</a> — the reactive notebook editor — now runs inside JupyterLab. The official <strong><a href="https://labextensions.dev/extensions/marimo-jupyter-extension?utm_source=blog&amp;utm_medium=post&amp;utm_campaign=700_extensions">marimo-jupyter-extension</a></strong> from the Marimo team lets you open and edit <code>_mo.py</code> Marimo files directly in JupyterLab, bringing reactive execution to JupyterHub deployments without leaving the Jupyter environment. This matters for teams that want to adopt Marimo incrementally — you can keep your JupyterHub infrastructure and add Marimo as another file type alongside traditional notebooks.</p>
<h2 id="science">Science</h2>
<ul>
<li><strong><a href="https://labextensions.dev/extensions/fitsview?utm_source=blog&amp;utm_medium=post&amp;utm_campaign=700_extensions">fitsview</a></strong> — Stream FITS astronomical data slices directly in JupyterLab without downloading full files.</li>
<li><strong><a href="https://labextensions.dev/extensions/jupyterlab-urdf-test?utm_source=blog&amp;utm_medium=post&amp;utm_campaign=700_extensions">jupyterlab-urdf-test</a></strong> — 3D robot model viewer/editor (URDF + Three.js), from <a href="https://github.com/jupyter-robotics">jupyter-robotics</a>.</li>
<li><strong><a href="https://labextensions.dev/extensions/climb-jupyter-igv?utm_source=blog&amp;utm_medium=post&amp;utm_campaign=700_extensions">climb-jupyter-igv</a></strong> — Integrative Genomics Viewer with S3 access for bioinformatics.</li>
<li><strong><a href="https://labextensions.dev/extensions/ggblab?utm_source=blog&amp;utm_medium=post&amp;utm_campaign=700_extensions">ggblab</a></strong> — GeoGebra interactive geometry with bidirectional Python communication. Second most downloaded new extension of 2026.</li>
</ul>
<h2 id="accessibility">Accessibility</h2>
<p>Accessibility has been a growing focus for JupyterLab core and extensions are starting to address it too:</p>
<ul>
<li><strong><a href="https://labextensions.dev/extensions/jupyterlab-a11y-checker?utm_source=blog&amp;utm_medium=post&amp;utm_campaign=700_extensions">jupyterlab-a11y-checker</a></strong> — From UC Berkeley&rsquo;s <a href="https://github.com/berkeley-dsep-infra/jupyterlab-a11y-checker">DSEP infrastructure team</a>, this extension scans notebooks for WCAG 2.1 AA issues: missing alt text, heading structure, table headers, color contrast, and link text. Guided fix interfaces, optional AI suggestions, and a CLI for CI pipelines. Over 11,000 total downloads and a <a href="https://a11y-checker-guide.datahub.berkeley.edu/">documentation site</a>.</li>
<li><strong><a href="https://labextensions.dev/extensions/jupyterlab-change-ui-font-size-fix?utm_source=blog&amp;utm_medium=post&amp;utm_campaign=700_extensions">jupyterlab-change-ui-font-size-fix</a></strong> — Fixes file browser icon misalignment when users change the UI font size — a small but real pain point for anyone who needs larger text.</li>
</ul>
<h2 id="27-extensions-one-platform">27 Extensions, One Platform</h2>
<p><a href="https://github.com/stellarshenson/stellars-jupyterlab-ds">Stellars</a> is a JupyterLab-based data science platform — GPU support, MLflow, TensorBoard, Optuna — assembled from <strong>27 custom extensions</strong> covering everything from <a href="https://labextensions.dev/extensions/jupyterlab-branding-extension?utm_source=blog&amp;utm_medium=post&amp;utm_campaign=700_extensions">branding</a> and <a href="https://labextensions.dev/extensions/jupyterlab-vscode-icons-extension?utm_source=blog&amp;utm_medium=post&amp;utm_campaign=700_extensions">file icons</a> to <a href="https://labextensions.dev/extensions/jupyterlab-kernel-terminal-workspace-culler-extension?utm_source=blog&amp;utm_medium=post&amp;utm_campaign=700_extensions">kernel management</a> and <a href="https://labextensions.dev/extensions/jupyterlab-drawio-render-extension?utm_source=blog&amp;utm_medium=post&amp;utm_campaign=700_extensions">diagram rendering</a>, <a href="https://labextensions.dev/extensions/jupyterlab-vscode-icons-extension?utm_source=blog&amp;utm_medium=post&amp;utm_campaign=700_extensions">VS Code file icons</a>, <a href="https://labextensions.dev/extensions/jupyterlab-trash-mgmt-extension?utm_source=blog&amp;utm_medium=post&amp;utm_campaign=700_extensions">trash management</a>, <a href="https://labextensions.dev/extensions/jupyterlab-mmd-to-png-extension?utm_source=blog&amp;utm_medium=post&amp;utm_campaign=700_extensions">Mermaid-to-PNG conversion</a>, and more. JupyterLab is now flexible enough that one developer can assemble a domain-specific product entirely from extension building blocks.</p>
<h2 id="want-to-build-your-own">Want to Build Your Own?</h2>
<p>The JupyterCon tutorial is fully available: <a href="https://jupytercon.github.io/jupytercon2025-developingextensions/">step-by-step materials</a> and the complete <a href="https://www.youtube.com/watch?v=z-KZ6CjZjbM">YouTube recording</a>. It covers scaffolding, plugin architecture, publishing to PyPI, and rapid prototyping techniques. The tools have never been more accessible.</p>
<h2 id="how-we-track-this">How We Track This</h2>
<p>The data behind this post comes from the <a href="https://labextensions.dev?utm_source=blog&amp;utm_medium=post&amp;utm_campaign=700_extensions">JupyterLab Extension Marketplace</a>, a community <a href="https://github.com/orbrx/jupyter-marketplace">project</a> that tracks all published JupyterLab extensions using PyPI data. The marketplace refreshes automatically and provides download trends, category breakdowns, and discovery tools.</p>
<p><img src="https://taletskiy.com/img/marketplace_screenshot.png" alt="JupyterLab Marketplace"></p>
<p>For more on the data and methodology, see our <a href="https://www.youtube.com/watch?v=OWt3Yzhrs1E">PyData Boston 2025 talk</a>.</p>
<h2 id="whats-next">What&rsquo;s Next</h2>
<ul>
<li><strong>New interaction patterns</strong> are still being figured out — chat-based assistance, cell annotations, agent protocols. Probably all of them for different use cases.</li>
<li><strong>Reproducibility tooling</strong> suggests the community is ready for opinionated workflow management built into the notebook experience.</li>
<li><strong>Cross-notebook-format support</strong> (Marimo, Quarto) hints at a future where JupyterLab is the IDE and the notebook format is a choice.</li>
</ul>
<p>Ensuring extensions keep working as JupyterLab evolves is critical — the team has been <a href="https://github.com/jupyterlab/frontends-team-compass/issues/301">discussing extension compatibility testing</a> at recent contributors calls.</p>
<p>For the <a href="https://labextensions.dev?utm_source=blog&amp;utm_medium=post&amp;utm_campaign=700_extensions">Marketplace</a> itself, we&rsquo;re working on:</p>
<ul>
<li><strong>Deeper integration with JupyterLab Extension Manager</strong> — deep links and &ldquo;Install in JupyterLab&rdquo; buttons to go from discovery to installation in one click.</li>
<li><strong>Expanding Trove classifiers</strong> to indicate Jupyter Notebook and JupyterLite support. All three use the same extension system, with important caveats: Notebook extensions need to target different UI elements, and JupyterLite extensions cannot have a server component.</li>
<li><strong>Better contribution signals</strong> — surfacing commits, PRs, and issues to help users gauge how actively maintained an extension is.</li>
</ul>
<p>At 700 extensions, the community now shapes JupyterLab as much as the core team does. If you&rsquo;re building extensions, thank you! Every one of them makes Jupyter better for someone.</p>
<p><a href="https://taletskiy.com/blogs/700-jupyterlab-extensions/">Read the full post on taletskiy.com</a></p>]]></content:encoded>
    </item>
    <item>
      <title>Skills Hub: Building an Enterprise Trust Layer for AI Agent Skills</title>
      <link>https://taletskiy.com/blogs/skills-hub-cko-2026/</link>
      <pubDate>Sat, 21 Feb 2026 00:00:00 &#43;0000</pubDate>
      <guid>https://taletskiy.com/blogs/skills-hub-cko-2026/</guid>
      <description><![CDATA[<p>How our team won Judges' Recognition at Anaconda's CKO 2026 hackathon in Portugal by building an enterprise trust layer for AI agent skills.</p><p><a href="https://taletskiy.com/blogs/skills-hub-cko-2026/">Read the full post on taletskiy.com</a></p>]]></description>
      <content:encoded><![CDATA[<p><img src="https://taletskiy.com/img/skills-hub-thumb.jpeg" alt="Skills Hub: Building an Enterprise Trust Layer for AI Agent Skills"></p><p>How our team won Judges' Recognition at Anaconda's CKO 2026 hackathon in Portugal by building an enterprise trust layer for AI agent skills.</p><p><em>How our team won Judges&rsquo; Recognition at Anaconda&rsquo;s CKO 2026 hackathon in Portugal</em></p>
<h2 id="the-setup">The setup</h2>
<p>Every year, Anaconda brings the entire company together for CKO — part all-hands, part hackathon, part team building. This year it was in Portugal. The hackathon gives teams three days to build something from scratch, and this year I joined a team of eight to tackle a problem I&rsquo;d been thinking about for months.</p>
<p><img src="https://taletskiy.com/img/IMG_4936.jpeg" alt="The CKO 2026 venue — round tables, green-lit stage, and the whole company together in Portugal"></p>
<p><img src="https://taletskiy.com/img/IMG_4935.jpeg" alt="The hackathon structure: three days, three paths to glory"></p>
<h2 id="the-problem-ai-assistants-dont-know-your-rules">The problem: AI assistants don&rsquo;t know your rules</h2>
<p>Here&rsquo;s a scene that plays out a hundred times a day across the Python ecosystem: you open Claude Code in a project that has a perfectly good conda environment set up. You ask it to run the tests. And it fires off <code>pytest</code> with whatever system Python it finds first. No environment activation. No awareness that conda exists.</p>
<p>I know this scene well because I&rsquo;ve lived it. The non-interactive shell that AI coding assistants spawn doesn&rsquo;t load your shell config, so <code>conda activate</code> fails with the unhelpful &ldquo;Run &lsquo;conda init&rsquo; first&rdquo; error. I built a skill called snakehug to solve this — it teaches AI assistants the correct activation patterns for conda, mamba, and pixi, asks you which environment to use once, then remembers it for every future session.</p>
<p>But snakehug solving <em>my</em> problem on <em>my</em> machine is different from solving it for a team, a department, or an enterprise. That&rsquo;s the gap we went after.</p>
<h2 id="the-bigger-picture-a-supply-chain-problem-for-ai-behaviors">The bigger picture: a supply chain problem for AI behaviors</h2>
<p>In December 2025, Anthropic released the Agent Skills (SKILL.md) open standard. Within eight weeks, 40+ major platforms adopted it — OpenAI Codex, GitHub Copilot, Cursor, Gemini CLI, Windsurf. The anthropics/skills repo hit 66,800 GitHub stars. Skills are simple by design: a directory with a SKILL.md file containing YAML frontmatter and markdown instructions.</p>
<p>The simplicity is the point — and the problem. The standard deliberately omits versioning, dependency resolution, signing, and sandboxing. Third-party marketplaces have appeared claiming 160,000+ skills, but these are mostly auto-indexed GitHub repos with no curation. Aikido Security recently found hallucinated npx commands spreading through hundreds of repos via unvetted skills.</p>
<p>This is essentially the npm supply chain attack, but for instructions that can execute arbitrary code on your machine. The ecosystem has the same shape as early package management: lots of content, no trust infrastructure.</p>
<h2 id="what-we-built-skills-hub">What we built: Skills Hub</h2>
<p>Our hackathon project was Skills Hub — an enterprise trust layer for agent skills. The framing that clicked for us was: &ldquo;Package Security Manager, but for AI behaviors.&rdquo; Anaconda already provides the trust layer between open-source packages and enterprise environments. Skills need the same thing.</p>
<p>In three days, the team shipped four components:</p>
<p><strong>A backend API</strong> that stores, validates, and serves skills. Every skill goes through frontmatter validation before it&rsquo;s published. Skills are categorized by trust level — Anaconda-curated, company internal, and external — so security teams can control what reaches developers.</p>
<p><strong>A CLI extension</strong> (<code>anaconda skills</code>) that plugs into the existing Anaconda CLI. Upload, list, install, and inspect skills with commands that feel familiar to anyone who&rsquo;s used <code>conda</code>. Authentication flows through the same <code>anaconda-auth</code> infrastructure that enterprises already trust.</p>
<p><strong>A web frontend</strong> with a catalog UI showing skills organized by trust level with color-coded badges, search, filtering, and upload capabilities.</p>
<p><strong>An Anaconda Desktop integration</strong> using the new Feature Modules system — a native sidebar panel that lets you browse and install skills without leaving the desktop app. This was my contribution, and dogfooding the new module system was a great way to put it through its paces.</p>
<h2 id="the-demo-that-won-it">The demo that won it</h2>
<p>For the hackathon presentation, we structured the demo as a before-and-after. I opened with one line of Portuguese (&ldquo;Olá! Eu sou Konstantin&hellip; e o meu português acaba aqui&rdquo;), then cut straight to the problem.</p>
<p><strong>Before:</strong> A screen recording of Claude Code in a project with a conda environment. I ask it to run the tests. It uses bare <code>pytest</code> with system Python. No conda awareness. The environment is completely ignored.</p>
<p><strong>After:</strong> Same project, same prompt, but now with snakehug installed. Claude Code detects conda, asks which environment to use, activates it correctly, and the tests pass. One SKILL.md file — completely different behavior.</p>
<p>Then the pivot: &ldquo;But how does this skill reach every developer on your team safely?&rdquo; Cut to the CLI uploading snakehug to Skills Hub, then to the web UI showing it in the catalog with trust badges and validated metadata, and finally a flash of the Desktop integration.</p>
<p>The whole thing ran just under two minutes. The judges gave us Judges&rsquo; Recognition (honorable mention), which we were very happy with given the quality of the other projects.</p>
<p><img src="https://taletskiy.com/img/IMG_4940.jpeg" alt="At the CKO 2026 stage"></p>
<h2 id="how-we-built-it-ai-assisted-spec-driven-development">How we built it: AI-assisted spec-driven development</h2>
<p>The meta-story of the hackathon was almost as interesting as the project itself. We built the entire thing using AI-driven spec-driven development with open-source tools — primarily OpenCode and SpecKit.</p>
<p><img src="https://taletskiy.com/img/IMG_4939.jpeg" alt="Early brainstorming — the whiteboard where Skills Hub took shape"></p>
<p>The approach: before writing any code, you write a spec. The spec directory contains a formal specification, research notes, a data model, an implementation plan, and a task breakdown. Then the AI coding assistant implements against that spec. Each feature lived on a dedicated branch matching its spec number, with PRs reviewed and merged to main.</p>
<p>The backend accumulated 45 commits across 12 branches and 8 spec directories. The CLI had 21 commits across 5 branches and 6 spec directories. For three days of work, that&rsquo;s a remarkable amount of structured, traceable output. The specs serve double duty — they&rsquo;re the design documents <em>and</em> the context that makes AI assistance effective.</p>
<p>This isn&rsquo;t just an interesting development methodology. It&rsquo;s directly relevant to the enterprise story: if your team is going to use AI coding tools, you want reproducibility, auditability, and a paper trail. Spec-driven development gives you that.</p>
<h2 id="the-skill-i-contributed-snakehug">The skill I contributed: snakehug</h2>
<p>Snakehug started as a personal itch — I was tired of Claude Code ignoring my conda environments. The core insight is that AI assistants spawn non-interactive shells, which don&rsquo;t load your shell config. So <code>conda activate</code> fails, and the assistant falls back to whatever Python is on the system PATH.</p>
<p>The skill works in three phases:</p>
<ol>
<li><strong>First run:</strong> Detect which environment managers are installed (conda, mamba, micromamba, pixi), ask the user which environment to use, test that activation actually works</li>
<li><strong>Save config:</strong> Write the complete working activation command to the project&rsquo;s <code>CLAUDE.md</code></li>
<li><strong>Future runs:</strong> Automatically use the saved pattern — no prompting, no detection, just correct behavior</li>
</ol>
<p>The key design decision was saving the <em>complete activation command</em> that works in a fresh shell, not just the environment name. Different managers need different invocation patterns (<code>source conda.sh &amp;&amp; conda run</code> vs. <code>eval &quot;$(mamba shell hook)&quot;</code> vs. <code>pixi run</code>), and getting this wrong silently is worse than failing loudly.</p>
<p>For the hackathon, I refactored snakehug to the single SKILL.md format for compatibility with the Skills Hub API, and we used it as the flagship demo skill.</p>
<h2 id="whats-next-and-what-didnt-make-the-demo">What&rsquo;s next (and what didn&rsquo;t make the demo)</h2>
<p>The piece we deliberately left out of the two-minute demo is skill-gen — a 12-phase pipeline that generates validated SKILL.md files from your team&rsquo;s real agent conversation logs. The idea is that instead of manually writing skills, you extract them from patterns in how your developers actually correct their AI assistants. More usage → more traces → better skills. Nobody else in the ecosystem is doing this.</p>
<p>We mentioned it as a teaser in the closing (&ldquo;what we didn&rsquo;t show today&rdquo;) and it landed well as a &ldquo;one more thing&rdquo; during Q&amp;A.</p>
<p>Whether Skills Hub becomes an Anaconda product is above my pay grade. But the gap is real: enterprises need a trust layer between the wild west of community skills and the developers who use them. Someone is going to build it. I&rsquo;m glad we got to prototype what it could look like.</p>
<h2 id="thanks">Thanks</h2>
<p><img src="https://taletskiy.com/img/IMG_4994.jpeg" alt="Part of the Skills Hub team at CKO 2026"></p>
<p>This was a genuine team effort. Anil Kulkarni led the project and kept us focused. Albert DeFusco built the backend and CLI infrastructure. Denis Dupeyron contributed the upload pipeline and source type system. Max Huang built the skill-gen pipeline. Anna Ratner designed the UI. Arisha Mays implemented it. And we had a great time in Portugal.</p>
<p>Oh, and we made a fado song about Skills Hub. Because when in Portugal.</p>
<p><a href="https://taletskiy.com/blogs/skills-hub-cko-2026/">Read the full post on taletskiy.com</a></p>]]></content:encoded>
    </item>
    <item>
      <title>Jupyter Projspec: Bringing Project Discovery to JupyterLab</title>
      <link>https://taletskiy.com/blogs/jupyter-projspec/</link>
      <pubDate>Fri, 06 Feb 2026 00:00:00 &#43;0000</pubDate>
      <guid>https://taletskiy.com/blogs/jupyter-projspec/</guid>
      <description><![CDATA[<p>A JupyterLab extension that automatically discovers and displays project structure — built in collaboration with Martin Durant and Rosio Reyes at Anaconda</p><p><a href="https://taletskiy.com/blogs/jupyter-projspec/">Read the full post on taletskiy.com</a></p>]]></description>
      <content:encoded><![CDATA[<p><img src="https://taletskiy.com/img/jupyter-projspec-hero.png" alt="Jupyter Projspec: Bringing Project Discovery to JupyterLab"></p><p>A JupyterLab extension that automatically discovers and displays project structure — built in collaboration with Martin Durant and Rosio Reyes at Anaconda</p><p><img src="https://taletskiy.com/img/jupyter-projspec-hero.png" alt="jupyter-projspec sidebar and chips in JupyterLab"></p>
<h2 id="what-is-jupyter-projspec">What is Jupyter Projspec?</h2>
<p>Have you ever opened a directory in JupyterLab and wondered — what kind of project is this? Is it a Python package? A data pipeline? A machine learning experiment? What can I do with it?</p>
<p><a href="https://github.com/fsspec/jupyter-projspec">Jupyter Projspec</a> is a JupyterLab extension that answers these questions automatically. It scans your working directory using <a href="https://github.com/fsspec/projspec">projspec</a> — a project discovery library by Martin Durant — and presents a structured view of what&rsquo;s inside: the project type, its contents, specifications, and available build artifacts.</p>
<p>Think of it as an intelligent project inspector for JupyterLab.</p>
<h2 id="featured-on-anacondas-podcast">Featured on Anaconda&rsquo;s Podcast</h2>
<p>Today, jupyter-projspec was featured on Anaconda&rsquo;s <a href="https://www.youtube.com/live/tF5XIH4sTyM?si=3aQ_wMLz0eeY_Z0b&amp;t=1486">Numerically Speaking</a> podcast! The segment starts around the 24:46 mark.</p>
<div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      
    </div>

<h2 id="why-build-this">Why Build This?</h2>
<p>When you work with data and code every day, you encounter many different project layouts — pyproject.toml-based Python packages, conda recipes, Zarr stores, HuggingFace datasets, and more. Each has its own conventions, its own set of files to look at, and its own actions you might want to take.</p>
<p>Projspec, created by Martin Durant as part of the <a href="https://github.com/fsspec">fsspec</a> ecosystem, provides a unified way to detect and describe these project types. It can look at a directory and tell you: this is a Python package with these entry points, or this is a Zarr dataset with these arrays, or this is a conda recipe that can be built into a package.</p>
<p>The missing piece was surfacing this information where people actually work — inside JupyterLab. That&rsquo;s what jupyter-projspec does.</p>
<h2 id="how-it-works">How It Works</h2>
<p>The extension adds two UI elements to JupyterLab: a <strong>sidebar panel</strong> that displays project information in a collapsible tree view, and <strong>colored badge chips</strong> in the file browser that show detected project types at a glance. When you navigate to a directory, it calls the projspec Python backend to scan and identify what&rsquo;s there, then renders the results.</p>
<h3 id="projspecs-three-concepts">Projspec&rsquo;s Three Concepts</h3>
<p>To understand what the extension shows, it helps to know projspec&rsquo;s model. From projspec&rsquo;s perspective, every project directory has three layers:</p>
<ul>
<li><strong>Specs</strong> &ndash; the project types detected (e.g., <code>PythonLibrary</code>, <code>GitRepo</code>, <code>Pixi</code>). A single directory can match multiple specs simultaneously &ndash; a typical project might be a Git repo, a Python library, and a Pixi workspace all at once.</li>
<li><strong>Contents</strong> &ndash; read-only metadata describing what&rsquo;s in the project: environment specs (pip/conda/npm), package info, licenses, commands, and descriptive metadata. Contents tell you <em>what is here</em>.</li>
<li><strong>Artifacts</strong> &ndash; actions the project can perform: building wheels, creating conda packages, generating lock files, spinning up Docker containers, running servers. Artifacts tell you <em>what you can do</em>.</li>
</ul>
<p>Projspec currently recognizes 23+ project types out of the box, covering the Python ecosystem (pyproject.toml libraries, Poetry, uv, Pixi, conda recipes), JavaScript (Node, Yarn, JupyterLab extensions), Rust (Cargo), web frameworks (Django, Streamlit, PyScript), documentation (mdBook, ReadTheDocs), data projects (Frictionless Data Packages, HuggingFace repos), IDEs (VS Code, JetBrains, Zed), and more. Detection is plugin-based &ndash; each project type registers itself and provides a fast <code>match()</code> method that checks for marker files (like <code>pyproject.toml</code>, <code>Cargo.toml</code>, or <code>pixi.toml</code>).</p>
<h3 id="the-architecture">The Architecture</h3>
<p>Under the hood, the extension has two layers:</p>
<p><strong>Server extension</strong> (Python, Tornado): Exposes a REST endpoint at <code>GET /jupyter-projspec/scan</code>. It takes a directory path, validates it to prevent directory traversal, calls <code>projspec.Project(path).to_dict()</code>, and returns the full project tree as JSON.</p>
<p><strong>Frontend extension</strong> (TypeScript, React): Two widgets subscribe to JupyterLab&rsquo;s <code>fileBrowser.model.pathChanged</code> signal, so they update automatically when you navigate directories. The sidebar panel renders each detected spec as a collapsible section with nested views for contents and artifacts. The chips widget injects colored badges below the file browser breadcrumbs &ndash; each chip is labeled with the spec name (like &ldquo;Python Library&rdquo; or &ldquo;Git Repo&rdquo;) and clicking one scrolls to and expands that spec in the sidebar. Both widgets debounce their API calls and use AbortController to cancel in-flight requests when the path changes rapidly.</p>
<h2 id="building-it-a-collaborative-effort">Building It: A Collaborative Effort</h2>
<p>This extension came together through a collaboration that I&rsquo;m really proud of.</p>
<p><strong>Martin Durant</strong> is the author of projspec (and fsspec, kerchunk, Intake, and many other foundational Python data tools). He built the discovery engine that makes this possible — the ability to look at any directory and understand what kind of project it is. Working closely with Martin meant we could iterate quickly on what the extension should expose and how the Python API should evolve to support the JupyterLab use case.</p>
<p><strong>Rosio Reyes</strong>, my colleague on the OSS-Jupyter team at Anaconda, contributed to the frontend development and UX design. Rosio also works on <a href="https://github.com/fsspec/jupyter-fsspec">jupyter-fsspec</a>, the JupyterLab extension for browsing remote filesystems — so there&rsquo;s a natural connection between browsing files (jupyter-fsspec) and understanding what those files represent (jupyter-projspec).</p>
<h2 id="whats-next">What&rsquo;s Next</h2>
<p>The current release handles project scanning, the sidebar tree view, and file browser chips. Here&rsquo;s where we&rsquo;re heading next:</p>
<ul>
<li><strong>Artifact actions</strong> &ndash; not just showing what can be built, but letting you trigger builds directly from the UI with Make buttons (e.g., &ldquo;build this conda package&rdquo; or &ldquo;generate a lock file&rdquo;). This is actively in development on a PR branch, with server-side command resolution through projspec, concurrency limits, and output capture already working.</li>
<li><strong>Remote filesystem support</strong> &ndash; leveraging fsspec to scan projects on S3, GCS, or any other supported backend, with a natural bridge to <a href="https://github.com/fsspec/jupyter-fsspec">jupyter-fsspec</a> for browsing those remote files</li>
<li><strong>More project types in projspec</strong> &ndash; expanding detection to cover data formats like Zarr, OME-NGFF, and STAC catalogs, alongside the existing HuggingFace and Frictionless Data support</li>
</ul>
<h2 id="try-it-out">Try It Out</h2>
<p>The extension is open source and available on GitHub:</p>
<ul>
<li><strong>jupyter-projspec</strong>: <a href="https://github.com/fsspec/jupyter-projspec">github.com/fsspec/jupyter-projspec</a></li>
<li><strong>projspec</strong> (the underlying library): <a href="https://github.com/fsspec/projspec">github.com/fsspec/projspec</a></li>
<li><strong>projspec docs</strong>: <a href="https://projspec.readthedocs.io">projspec.readthedocs.io</a></li>
</ul>
<p>Install it with pip:</p>
<pre tabindex="0"><code>pip install jupyter-projspec
</code></pre><p>This pulls in projspec as a dependency. After installation, restart JupyterLab and you&rsquo;ll see the Projspec panel in the right sidebar.</p>
<p>If you&rsquo;re interested in project discovery for Jupyter, we&rsquo;d love your feedback. Open an issue, try it on your own projects, or come say hi in the Jupyter community channels.</p>
<hr>
<p><em>I&rsquo;m a Senior Software Engineer on the OSS-Jupyter team at Anaconda, where I work on JupyterLab core contributions, extensions, and community tools. You can find more of my work at <a href="https://labextensions.dev">labextensions.dev</a> and follow my conference talks and open source adventures on this blog.</em></p>
<p><a href="https://taletskiy.com/blogs/jupyter-projspec/">Read the full post on taletskiy.com</a></p>]]></content:encoded>
    </item>
    <item>
      <title>Adding Voice to Claude Code (with Audio Ducking)</title>
      <link>https://taletskiy.com/blogs/claude-code-tts/</link>
      <pubDate>Fri, 23 Jan 2026 00:00:00 &#43;0000</pubDate>
      <guid>https://taletskiy.com/blogs/claude-code-tts/</guid>
      <description><![CDATA[<p>Text-to-speech for Claude Code that automatically lowers your music when it speaks - like Google Maps in CarPlay.</p><p><a href="https://taletskiy.com/blogs/claude-code-tts/">Read the full post on taletskiy.com</a></p>]]></description>
      <content:encoded><![CDATA[<p>Text-to-speech for Claude Code that automatically lowers your music when it speaks - like Google Maps in CarPlay.</p><h2 id="claude-code-can-talk">Claude Code Can Talk</h2>
<p>I spend a lot of time in Claude Code. Reading responses while coding is fine, but sometimes I want to hear what Claude is saying while I&rsquo;m looking at something else.</p>
<p><a href="https://git.sr.ht/~cg/claude-code-tts">claude-code-tts</a> by Chris Goff does exactly this. It uses <a href="https://github.com/nazdridoy/kokoro-tts">Kokoro TTS</a>, a fast local text-to-speech model, to read Claude&rsquo;s responses aloud. Hooks into Claude Code&rsquo;s event system - when Claude finishes responding, it extracts the text and speaks it.</p>
<p><strong>Note:</strong> The upstream project uses <code>tac</code> (GNU coreutils) which doesn&rsquo;t exist on macOS. My fork replaces it with <code>tail -r</code>.</p>
<video width="100%" controls>
  <source src="https://taletskiy.com/videos/claude-code-tts.mp4" type="video/mp4">
  Your browser does not support the video tag.
</video>
<h2 id="audio-ducking">Audio Ducking</h2>
<p>With TTS working, I had a new problem: I like music while coding. Claude talking over music is hard to hear.</p>
<p>CarPlay solved this with audio ducking - when navigation speaks, music volume drops, then comes back. I added this for Claude Code TTS.</p>
<p>The implementation controls Apple Music directly via AppleScript:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#75715e"># Duck to 5% before TTS starts</span>
</span></span><span style="display:flex;"><span>osascript -e <span style="color:#e6db74">&#34;tell application \&#34;Music\&#34; to set sound volume to 5&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Restore when TTS finishes</span>
</span></span><span style="display:flex;"><span>osascript -e <span style="color:#e6db74">&#34;tell application \&#34;Music\&#34; to set sound volume to </span>$original<span style="color:#e6db74">&#34;</span>
</span></span></code></pre></div><p>Only Apple Music&rsquo;s internal volume changes - system volume stays the same, so TTS plays at full volume while music is ducked. A background process monitors when TTS finishes and restores the volume automatically.</p>
<h2 id="configuration">Configuration</h2>
<p>Set in <code>~/.claude/settings.json</code>:</p>
<table>
  <thead>
      <tr>
          <th>Variable</th>
          <th>Default</th>
          <th>Description</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>KOKORO_VOICE</code></td>
          <td><code>af_sky</code></td>
          <td><a href="https://huggingface.co/hexgrad/Kokoro-82M/blob/main/VOICES.md">Voice to use</a></td>
      </tr>
      <tr>
          <td><code>AUDIO_DUCK_ENABLED</code></td>
          <td><code>true</code></td>
          <td>Set to <code>false</code> to disable ducking</td>
      </tr>
      <tr>
          <td><code>DUCK_LEVEL</code></td>
          <td><code>5</code></td>
          <td>Percentage of original volume during TTS</td>
      </tr>
  </tbody>
</table>
<h2 id="try-it">Try It</h2>
<p>My fork with audio ducking: <a href="https://github.com/ktaletsk/claude-code-tts">github.com/ktaletsk/claude-code-tts</a></p>
<p>Original: <a href="https://git.sr.ht/~cg/claude-code-tts">git.sr.ht/~cg/claude-code-tts</a></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>git clone https://github.com/ktaletsk/claude-code-tts
</span></span><span style="display:flex;"><span>cd claude-code-tts
</span></span><span style="display:flex;"><span>./install.sh
</span></span></code></pre></div><p>Now I can code, listen to music, and hear Claude - all without fighting for audio space.</p>
<p><a href="https://taletskiy.com/blogs/claude-code-tts/">Read the full post on taletskiy.com</a></p>]]></content:encoded>
    </item>
    <item>
      <title>Which local models actually work with Claude Code on a 48GB MacBook Pro?</title>
      <link>https://taletskiy.com/blogs/ollama-claude-code/</link>
      <pubDate>Thu, 15 Jan 2026 00:00:00 &#43;0000</pubDate>
      <guid>https://taletskiy.com/blogs/ollama-claude-code/</guid>
      <description><![CDATA[<p>A little experiment evaluating local models for agentic tasks in Claude Code</p><p><a href="https://taletskiy.com/blogs/ollama-claude-code/">Read the full post on taletskiy.com</a></p>]]></description>
      <content:encoded><![CDATA[<p><img src="https://taletskiy.com/img/ollama-claude-code-hero.png" alt="Which local models actually work with Claude Code on a 48GB MacBook Pro?"></p><p>A little experiment evaluating local models for agentic tasks in Claude Code</p><p><img src="https://taletskiy.com/img/ollama-claude-code-hero.png" alt="Claude Code running with devstral-128k via Ollama"></p>
<h2 id="i-tested-18-local-models-so-you-dont-have-to">I Tested 18 Local Models So You Don&rsquo;t Have To</h2>
<p>Ollama <a href="https://github.com/ollama/ollama/releases/tag/v0.14.0">released</a> Anthropic API compatibility in January 2026, so I tested <strong>18 local models</strong> with Claude Code to find out which ones actually work for agentic coding tasks.</p>
<blockquote>
<p><strong>TL;DR</strong></p>
<ol>
<li><a href="https://ollama.com/library/devstral-small-2"><code>devstral-small-2:24b</code></a> is the winner - best quality, fastest, zero interventions</li>
<li><strong>You MUST configure context window</strong> - Ollama defaults to 4K; use 64K minimum</li>
<li><strong>Expect 12-24 min for tasks that take ~2 min with Opus 4.5</strong> - but it works!</li>
</ol></blockquote>
<ul>
<li>Ollama docs: <a href="https://docs.ollama.com/integrations/claude-code">https://docs.ollama.com/integrations/claude-code</a></li>
<li>Anthropic API compatibility: <a href="https://docs.ollama.com/api/anthropic-compatibility">https://docs.ollama.com/api/anthropic-compatibility</a></li>
</ul>
<h2 id="my-setup">My Setup</h2>
<table>
  <thead>
      <tr>
          <th>Spec</th>
          <th>Value</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Machine</td>
          <td>MacBook Pro</td>
      </tr>
      <tr>
          <td>Chip</td>
          <td>Apple M4 Pro</td>
      </tr>
      <tr>
          <td>RAM</td>
          <td>48 GB unified memory</td>
      </tr>
      <tr>
          <td>Ollama</td>
          <td>v0.14.2</td>
      </tr>
  </tbody>
</table>
<h2 id="models">Models</h2>
<p>Here&rsquo;s everything I tested, sorted by size:</p>
<table>
  <thead>
      <tr>
          <th>Model</th>
          <th>Size</th>
          <th>Release</th>
          <th>SWE-bench</th>
          <th>Type</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>nemotron-3-nano:30b</td>
          <td>24GB</td>
          <td>Dec 2025</td>
          <td>-</td>
          <td>MoE</td>
      </tr>
      <tr>
          <td>cogito:32b</td>
          <td>20GB</td>
          <td>Jul 2025</td>
          <td>-</td>
          <td>Hybrid reasoning</td>
      </tr>
      <tr>
          <td>granite4:32b-a9b-h</td>
          <td>~20GB</td>
          <td>Oct 2025</td>
          <td>-</td>
          <td>General-purpose</td>
      </tr>
      <tr>
          <td>command-r:35b</td>
          <td>19GB</td>
          <td>Mar 2024</td>
          <td>-</td>
          <td>RAG-optimized</td>
      </tr>
      <tr>
          <td>qwen2.5-coder:32b</td>
          <td>19GB</td>
          <td>Nov 2024</td>
          <td>9.0%</td>
          <td>Coding</td>
      </tr>
      <tr>
          <td>deepseek-r1:32b</td>
          <td>19GB</td>
          <td>Jan 2025</td>
          <td>41.4%</td>
          <td>Reasoning</td>
      </tr>
      <tr>
          <td>qwen3-coder:30b</td>
          <td>18GB</td>
          <td>Jul 2025</td>
          <td>51.6%</td>
          <td>Coding</td>
      </tr>
      <tr>
          <td>qwen3:30b</td>
          <td>18GB</td>
          <td>Apr 2025</td>
          <td>-</td>
          <td>General-purpose</td>
      </tr>
      <tr>
          <td>devstral-small-2:24b</td>
          <td>15GB</td>
          <td>Dec 2025</td>
          <td>68.0%</td>
          <td>Agentic coding</td>
      </tr>
      <tr>
          <td>mistral-small3.2:24b</td>
          <td>15GB</td>
          <td>Jun 2025</td>
          <td>-</td>
          <td>General-purpose</td>
      </tr>
      <tr>
          <td>magistral:24b</td>
          <td>14GB</td>
          <td>Jun 2025</td>
          <td>-</td>
          <td>Reasoning</td>
      </tr>
      <tr>
          <td>gpt-oss:20b</td>
          <td>14GB</td>
          <td>Aug 2025</td>
          <td>-</td>
          <td>General-purpose</td>
      </tr>
      <tr>
          <td>cogito:14b</td>
          <td>9GB</td>
          <td>Jul 2025</td>
          <td>-</td>
          <td>Hybrid reasoning</td>
      </tr>
      <tr>
          <td>deepseek-coder-v2:16b</td>
          <td>8.9GB</td>
          <td>Jun 2024</td>
          <td>-</td>
          <td>Coding (no tools)</td>
      </tr>
      <tr>
          <td>rnj-1:8b</td>
          <td>5.1GB</td>
          <td>Dec 2025</td>
          <td>20.8%</td>
          <td>General-purpose</td>
      </tr>
      <tr>
          <td>phi4-mini:3.8b</td>
          <td>2.5GB</td>
          <td>Feb 2025</td>
          <td>-</td>
          <td>General-purpose</td>
      </tr>
      <tr>
          <td>granite4:3b</td>
          <td>2.1GB</td>
          <td>Oct 2025</td>
          <td>-</td>
          <td>General-purpose</td>
      </tr>
      <tr>
          <td>functiongemma:270m</td>
          <td>301MB</td>
          <td>Dec 2025</td>
          <td>-</td>
          <td>Function calling</td>
      </tr>
  </tbody>
</table>
<h2 id="experiments">Experiments</h2>
<p>I chose a very simple task: run <code>/init</code> on a repo (<code>jupyterlab-latex</code>) to generate CLAUDE.md, which is normally the first thing I do in a new repo. It&rsquo;s deceptively hard though - the model has to discover tools, explore multiple files, and synthesize documentation without hallucinating. One or two runs per model; treat results as field notes.</p>
<p>My first two models (nemotron, gpt-oss) used Ollama&rsquo;s default context window - which is how I discovered the 4K limit issue. After that, I set context to 64K+ in Ollama&rsquo;s settings.</p>
<h3 id="nemotron-3-nano30b"><code>nemotron-3-nano:30b</code></h3>
<p>My first attempt revealed a critical failure mode. With the default context window, the model&rsquo;s thinking block explicitly shows it decided to skip reading files entirely:</p>
<blockquote>
<p><em>&ldquo;We don&rsquo;t have details of repo&hellip; There haven&rsquo;t been any reads yet&hellip; Let&rsquo;s assume typical repo structure&rdquo;</em></p></blockquote>
<p>Instead of using tools to explore, it <strong>fabricated an entire codebase structure</strong>. The output described a React/Node.js monorepo with <code>/frontend</code> and <code>/backend</code> directories - neither of which exist in jupyterlab-latex (a Python/TypeScript JupyterLab extension). It invented commands like <code>npm run dev</code> and referenced non-existent config files.</p>
<p>This failure led me to discover Ollama&rsquo;s default 4K context limit. After configuring a 128K context window, subsequent attempts worked much better:</p>
<pre tabindex="0"><code>Read → Glob → Read → Read → Read → Read → Glob → Read → Write
</code></pre><p>The model properly explored the codebase, but still stopped mid-task and required a follow-up prompt (&ldquo;Continue&rdquo;) to finish. Final output was accurate and high quality - proving the model <em>can</em> work, but context configuration is critical.</p>
<h3 id="gpt-oss20b"><code>gpt-oss:20b</code></h3>
<p>Also tested early with the default context window. Fast but unreliable:</p>
<ul>
<li>Direct prompt: Finished quickly but low quality output</li>
<li><code>/init</code> skill: Tool parameter errors, empty results, needed intervention</li>
</ul>
<pre tabindex="0"><code>Sautéed for 2m 37s  (Claude Code&#39;s task timer)
</code></pre><h3 id="devstral-small-224b--winner"><code>devstral-small-2:24b</code> ⭐ Winner</h3>
<p>With 128K context configured from the start, this was a <strong>perfect run</strong>. The model immediately understood the task:</p>
<blockquote>
<p>&ldquo;I&rsquo;ll analyze this codebase and create a CLAUDE.md file with the essential information for future instances.&rdquo;</p></blockquote>
<p>Tool call sequence shows direct, confident tool usage:</p>
<pre tabindex="0"><code>Bash → Bash → Bash → Read → Bash → Bash → Bash → Read → Read → Read → Bash → Write
</code></pre><p>No confusion about subagents or tool parameters - it went straight for <code>Bash</code> and <code>Read</code> to explore the codebase, then used <code>Write</code> to create the output.</p>
<p>The output was 180 lines of documentation with actual function names, Python config examples, and a 5-step communication flow diagram. Every file reference checked out - no hallucinations.</p>
<p>Why did devstral outperform? Mistral trained it specifically for SWE-Bench (68.0% score) and tool-use scenarios. You can see it in the tool calls - direct and confident, no subagent confusion.</p>
<pre tabindex="0"><code>Sautéed for 17m 12s
</code></pre><h3 id="qwen3-coder30b"><code>qwen3-coder:30b</code></h3>
<p>Also configured with 128K context. The model&rsquo;s first instinct was to delegate to a subagent. From the session trace, it tried to spawn an Explore agent twice:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;description&#34;</span>: <span style="color:#e6db74">&#34;Explore codebase structure&#34;</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;prompt&#34;</span>: <span style="color:#e6db74">&#34;Explore the structure of this JupyterLab LaTeX extension repository...&#34;</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;subagent_type&#34;</span>: <span style="color:#e6db74">&#34;Explore&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>This isn&rsquo;t an Ollama bug, but a mismatch between <strong>what Claude Code can do in a given environment</strong> and <strong>what the model decides to attempt</strong>. Claude Code has a notion of subagents (like an &ldquo;Explore&rdquo; helper), but in my setup those weren&rsquo;t available/configured, so that tool call fails. Ollama&rsquo;s docs do advertise Claude Code usage, though, so it&rsquo;s worth calling out explicitly: with third-party models, you should expect occasional &ldquo;tooling weirdness&rdquo; like this even if the transport API is compatible.</p>
<p>When the Task tool failed (subagents weren&rsquo;t configured), qwen3-coder adapted gracefully. Tool sequence shows the recovery:</p>
<pre tabindex="0"><code>Task → Task → Bash → Read → Read → Read → Read → Read → Read → Read → Read → Write
</code></pre><p>After two failed Explore attempts, it switched to direct <code>Bash</code> and <code>Read</code> tools and completed the task without further intervention. Output quality was good - accurate, no hallucinations, but less detailed than devstral (86 lines vs 180).</p>
<pre tabindex="0"><code>Sautéed for 23m 48s
</code></pre><h3 id="granite432b-a9b-h"><code>granite4:32b-a9b-h</code></h3>
<p>An interesting comparison point - this is IBM&rsquo;s general-purpose 32B model, not a coding specialist. With 128K context configured, it completed the task in <strong>under 7 minutes</strong> - the fastest successful run.</p>
<p>The trade-off: minimal exploration. Tool sequence:</p>
<pre tabindex="0"><code>Read → Write
</code></pre><p>Just two tool calls - read the README, write CLAUDE.md. No codebase exploration, no package.json check, no architecture analysis. The output was decent:</p>
<ul>
<li>✅ Correct project type (JupyterLab LaTeX extension)</li>
<li>✅ Correct commands (<code>jlpm run build</code>, <code>jlpm run watch</code>)</li>
<li>✅ Mermaid architecture diagram</li>
<li>⚠️ Some hallucinated details (referenced <code>src/components/Toolbar.tsx</code> without verifying it exists)</li>
</ul>
<p>At 32K context, it stalled - started correctly (Glob → Read), but got stuck after reading files and never produced output. A different failure mode than devstral&rsquo;s 32K hallucination.</p>
<p><strong>Verdict:</strong> Works, but lazy. General-purpose models can complete agentic tasks but tend to &ldquo;wing it&rdquo; with minimal tool use, while coding specialists explore more thoroughly.</p>
<pre tabindex="0"><code>Sautéed for ~7m
</code></pre><h3 id="qwen330b"><code>qwen3:30b</code></h3>
<p>The general-purpose Qwen3 (not the coder variant). This was the worst performer - <strong>pure hallucination with zero exploration</strong>.</p>
<p>Tool sequence:</p>
<pre tabindex="0"><code>Write
</code></pre><p>Just one tool call. The thinking block is revealing - it explicitly acknowledged it couldn&rsquo;t see files but proceeded anyway:</p>
<blockquote>
<p><em>&ldquo;Since I can&rsquo;t actually see the files, I&rsquo;ll have to rely on the context provided.&rdquo;</em></p></blockquote>
<p>It inferred file structure from git status in the system prompt, then fabricated everything:</p>
<ul>
<li>❌ <code>python jupyterlab_latex/build.py</code> - wrong command (should be <code>jlpm run build</code>)</li>
<li>❌ <code>latex_cleanup.py</code> - fabricated filename</li>
<li>❌ <code>flake8</code> - assumed linter without checking</li>
</ul>
<p>At 128K context, it consumed <strong>31GB RAM</strong> (vs 18GB on disk) - pushing my 48GB system into swap. The memory pressure may have contributed to its laziness, but the thinking block shows it consciously chose to guess rather than explore.</p>
<p><strong>Key finding:</strong> The coder fine-tuning isn&rsquo;t just about coding knowledge - it teaches the model to actually use tools instead of guessing. qwen3-coder explored properly; qwen3 base hallucinated everything.</p>
<pre tabindex="0"><code>Sautéed for ~5m
</code></pre><h3 id="qwen25-coder32b"><code>qwen2.5-coder:32b</code></h3>
<p><strong>Failed.</strong> Despite having 128K context configured, through multiple attempts it kept reaching for the <code>Explore</code> subagent tool and then abruptly stopping without completing any work. Unlike qwen3-coder which recovered when Explore failed, qwen2.5-coder couldn&rsquo;t adapt. Same model family, different generation, completely different behavior when things go wrong.</p>
<h3 id="mistral-small3224b"><code>mistral-small3.2:24b</code></h3>
<p><strong>Failed - hallucinated tool parameters.</strong> This model understands it should use tools but invents wrong parameter schemas. From the session trace, it tried to call the Task tool with made-up parameters:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span><span style="color:#75715e">// Attempt 1:
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>{<span style="color:#f92672">&#34;instruction&#34;</span>: <span style="color:#e6db74">&#34;...&#34;</span>, <span style="color:#f92672">&#34;max_depth&#34;</span>: <span style="color:#ae81ff">100</span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e">// Attempt 2:
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>{<span style="color:#f92672">&#34;subagent_name&#34;</span>: <span style="color:#e6db74">&#34;Explore&#34;</span>, <span style="color:#f92672">&#34;subagent_type&#34;</span>: <span style="color:#e6db74">&#34;Explore&#34;</span>, <span style="color:#f92672">&#34;subagent_prompt&#34;</span>: <span style="color:#e6db74">&#34;...&#34;</span>}
</span></span></code></pre></div><p>The actual required parameters are <code>description</code> and <code>prompt</code>. When it received clear error messages explaining this, it simply repeated &ldquo;I&rsquo;m going to use the Task tool&hellip;&rdquo; and stopped - unable to self-correct.</p>
<p>This is a different failure mode than hallucinating content (qwen3) or refusing (functiongemma). The model has learned <em>about</em> tools but not the actual invocation format. Worth noting: devstral-small-2 is also a Mistral model and works perfectly - the difference is devstral&rsquo;s agentic specialization.</p>
<p><strong>Memory:</strong> 37GB loaded at 128K context (vs 15GB on disk).</p>
<h3 id="magistral24b"><code>magistral:24b</code></h3>
<p><strong>Failed - narrated tools instead of invoking them.</strong> This new Mistral reasoning model understood the task and knew which tools to use, but wrote out tool calls as text instead of actually executing them:</p>
<pre tabindex="0"><code>&#34;Let me use the Glob tool to find these patterns:

```bash
Glob pattern: **/README.md
Glob pattern: .github/readme*
...
```

Now that I have the relevant files, let&#39;s analyze...&#34;
</code></pre><p>Zero actual tool calls were made. The model described what it <em>would</em> do, assumed the tools had run, and proceeded to the next step. This suggests training on tool documentation without actual tool-use interactions.</p>
<p><strong>Memory:</strong> 23GB loaded at 128K context (vs 14GB on disk).</p>
<p><strong>Native context limitation:</strong> magistral&rsquo;s native context is only 39K. Even with Ollama allocating 128K, the model may not effectively use context beyond its training limit - which could explain why it never received the tool invocation format.</p>
<h3 id="cogito32b"><code>cogito:32b</code></h3>
<p><strong>Failed - memory issues and context-limited stall.</strong> This hybrid reasoning model has different failure modes depending on context configuration:</p>
<p><strong>At 128K context:</strong> Loaded 64GB into memory (41% CPU / 59% GPU split). On my 48GB system, this caused severe memory thrashing - spiky memory pressure, swap usage, and zero tokens produced after 5+ minutes.</p>
<p><strong>At 64K context:</strong> Loaded 42GB (8% CPU / 92% GPU). Still tight but runnable. Same stalling behavior.</p>
<p><strong>At 32K context:</strong> Loaded 30GB (100% GPU). Actually started working! Made correct Glob and Read calls, explored the codebase properly:</p>
<pre tabindex="0"><code>Glob → Read README.md → &#34;Let me create a todo list...&#34;
</code></pre><p>But then it just&hellip; stopped. Said &ldquo;Let me start with writing the overview section first&rdquo; and ended without writing anything. Even nudging with &ldquo;continue&rdquo; prompt didn&rsquo;t help - completely stuck.</p>
<p>This is the same pattern as granite4:32b at 32K context: can explore but can&rsquo;t complete. <strong>32K context is insufficient for task completion</strong> - the model loses track of the goal mid-execution.</p>
<h3 id="cogito14b"><code>cogito:14b</code></h3>
<p><strong>Failed - multiple tool issues.</strong> Testing the smaller cogito variant to see if the 7-15B range had any surprises. It did, but not good ones.</p>
<p><strong>Memory:</strong> Even at 9GB on disk, loaded 45GB at 128K context with 15% CPU offload. At 64K context it was more manageable.</p>
<p>Tool sequence shows multiple failure modes:</p>
<pre tabindex="0"><code>Read README.md ✅ → Read copilot-instructions.md ✅ (not found) →
WebSearch ❌ (hallucinated) → TodoWrite ❌ (wrong params, twice) →
Printed CLAUDE.md as text ⚠️
</code></pre><ol>
<li><strong>Hallucinated <code>WebSearch</code></strong> - tool doesn&rsquo;t exist in Claude Code, got empty results</li>
<li><strong>Wrong TodoWrite params</strong> - missing required <code>activeForm</code> field, tried twice without learning</li>
<li><strong>Never used Write tool</strong> - just printed the CLAUDE.md content as markdown text instead of writing to file</li>
</ol>
<p>The generated content was actually reasonable - correct commands, accurate architecture. But the model &ldquo;completed&rdquo; the task by printing output rather than writing the file. It understood the goal but couldn&rsquo;t execute properly.</p>
<p><strong>Time:</strong> ~7.7 minutes</p>
<p>The cogito family (both 32b and 14b) consistently fails with Claude Code&rsquo;s tool schemas - different sizes, different failure modes, same outcome.</p>
<h3 id="command-r35b"><code>command-r:35b</code></h3>
<p><strong>Failed - nested tool parameter schema.</strong> The last untested model in the viable 15-35B range. At 128K context it didn&rsquo;t fit on my GPU. At 64K and 32K it loaded but failed with the same tool schema issue.</p>
<p>From the trace, the model wrapped all tool parameters in a nested structure:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;tool_name&#34;</span>: <span style="color:#e6db74">&#34;Task&#34;</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;parameters&#34;</span>: {
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;description&#34;</span>: <span style="color:#e6db74">&#34;...&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;prompt&#34;</span>: <span style="color:#e6db74">&#34;...&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;subagent_type&#34;</span>: <span style="color:#e6db74">&#34;general-purpose&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>The correct format is flat parameters at the top level. It made 4 tool calls (3 Task, 1 TodoWrite) - all failed with validation errors like &ldquo;required parameter <code>description</code> is missing&rdquo; because the nesting caused parameters to be undefined at the expected level.</p>
<p>Unlike mistral-small3.2 which invented wrong parameter <em>names</em>, command-r uses the correct parameter names but wraps them incorrectly. When it received validation errors, it didn&rsquo;t retry - just output a text-based &ldquo;Action Plan&rdquo; and stopped.</p>
<p>This suggests Cohere&rsquo;s tool-calling format differs from the Anthropic API schema. The model was trained on a different tool invocation structure.</p>
<p><strong>Context comparison:</strong></p>
<ul>
<li><strong>32K</strong>: 4 tool calls, all failed, gave up quickly (~7 min)</li>
<li><strong>64K</strong>: 29 tool calls, all failed, kept retrying same broken schema (~9.5 min)</li>
</ul>
<p>More context didn&rsquo;t help - it just gave the model more runway to keep failing the same way. It never learned from the error messages.</p>
<h2 id="results">Results</h2>
<h3 id="-worked">✅ Worked</h3>
<table>
  <thead>
      <tr>
          <th>Model</th>
          <th>Quality</th>
          <th>Time</th>
          <th>Notes</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>devstral-small-2</strong> ⭐</td>
          <td>Excellent</td>
          <td>17 min</td>
          <td>No hallucinations, no interventions</td>
      </tr>
      <tr>
          <td>qwen3-coder</td>
          <td>Good</td>
          <td>24 min</td>
          <td>Recovered after Explore failed</td>
      </tr>
      <tr>
          <td>granite4:32b</td>
          <td>Good</td>
          <td>~7 min</td>
          <td>Fast but lazy, minor hallucinations*</td>
      </tr>
  </tbody>
</table>
<h3 id="-completed-with-issues">⚠️ Completed With Issues</h3>
<table>
  <thead>
      <tr>
          <th>Model</th>
          <th>Quality</th>
          <th>Time</th>
          <th>Issue</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>gpt-oss:20b</td>
          <td>Low</td>
          <td>~3 min</td>
          <td>Needed intervention</td>
      </tr>
      <tr>
          <td>nemotron-3-nano</td>
          <td>Mixed</td>
          <td>-</td>
          <td>Hallucinated on first attempt</td>
      </tr>
      <tr>
          <td>qwen3:30b</td>
          <td>Poor</td>
          <td>~5 min</td>
          <td>Zero tool calls, fabricated everything</td>
      </tr>
  </tbody>
</table>
<h3 id="-failed">❌ Failed</h3>
<table>
  <thead>
      <tr>
          <th>Model</th>
          <th>Time</th>
          <th>Failure Mode</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>qwen2.5-coder:32b</td>
          <td>-</td>
          <td>Stuck on Explore subagent</td>
      </tr>
      <tr>
          <td>mistral-small3.2:24b</td>
          <td>-</td>
          <td>Wrong tool parameter schema</td>
      </tr>
      <tr>
          <td>magistral:24b</td>
          <td>-</td>
          <td>Narrated tools instead of invoking</td>
      </tr>
      <tr>
          <td>cogito:32b</td>
          <td>-</td>
          <td>Memory thrashing, context stall</td>
      </tr>
      <tr>
          <td>cogito:14b</td>
          <td>~8 min</td>
          <td>Hallucinated WebSearch tool</td>
      </tr>
      <tr>
          <td>command-r:35b</td>
          <td>7-10 min</td>
          <td>Nested tool parameters</td>
      </tr>
      <tr>
          <td>deepseek-r1:32b</td>
          <td>-</td>
          <td>No tool support in Ollama</td>
      </tr>
      <tr>
          <td>deepseek-coder-v2:16b</td>
          <td>-</td>
          <td>No tool support in Ollama</td>
      </tr>
      <tr>
          <td>functiongemma:270m</td>
          <td>-</td>
          <td>Refuses everything</td>
      </tr>
      <tr>
          <td>granite4:3b</td>
          <td>-</td>
          <td>Hallucinates without tools</td>
      </tr>
      <tr>
          <td>phi4-mini:3.8b</td>
          <td>-</td>
          <td>Invents fake tool names</td>
      </tr>
      <tr>
          <td>rnj-1:8b</td>
          <td>-</td>
          <td>Silent, zero output</td>
      </tr>
  </tbody>
</table>
<p>*granite4:32b referenced files it never verified existed. It &ldquo;works&rdquo; in the sense that it completes the task and produces usable output, but you&rsquo;d want to review it before trusting it. devstral and qwen3-coder are trustworthy out of the box.</p>
<p><strong>Winner: devstral-small-2</strong> - best quality, smallest footprint, zero interventions.</p>
<h3 id="model-outputs">Model Outputs</h3>
<p>Compare the actual CLAUDE.md files generated by each model. Use the tabs to switch between models, or click the side-by-side button to compare them directly:</p>






<style>
  .text-compare-container {
    margin: 1.5rem 0;
    border: 1px solid #374151;
    border-radius: 0.5rem;
    overflow: hidden;
    background: #1a1a1a;
  }
  
  .text-compare-controls {
    display: flex;
    flex-wrap: wrap;
    align-items: center;
    gap: 0.5rem;
    padding: 0.75rem 1rem;
    background: #1f1f1f;
    border-bottom: 1px solid #374151;
  }
  
  .text-compare-tabs {
    display: flex;
    flex-wrap: wrap;
    gap: 0.25rem;
    flex: 1;
  }
  
  .text-compare-tab {
    padding: 0.5rem 1rem;
    border: none;
    background: transparent;
    color: #9ca3af;
    cursor: pointer;
    border-radius: 0.375rem;
    font-size: 0.875rem;
    font-weight: 500;
    transition: all 0.2s;
    white-space: nowrap;
  }
  
  .text-compare-tab:hover {
    background: #2a2a2a;
    color: #e5e7eb;
  }
  
  .text-compare-tab.active {
    background: #3b82f6;
    color: white;
  }
  
  .text-compare-view-toggle {
    display: flex;
    gap: 0.25rem;
    background: #2a2a2a;
    border-radius: 0.375rem;
    padding: 0.25rem;
  }
  
  .text-compare-view-btn {
    padding: 0.375rem 0.75rem;
    border: none;
    background: transparent;
    color: #9ca3af;
    cursor: pointer;
    border-radius: 0.25rem;
    font-size: 0.75rem;
    transition: all 0.2s;
  }
  
  .text-compare-view-btn:hover {
    color: #e5e7eb;
  }
  
  .text-compare-view-btn.active {
    background: #374151;
    color: white;
  }
  
  .text-compare-content {
    display: flex;
    overflow: hidden;
  }
  
  .text-compare-panel {
    flex: 1;
    min-width: 0;
    overflow-y: auto;
    padding: 1rem 1.5rem;
    background: #121212;
    display: none;
  }
  
  .text-compare-panel.active {
    display: block;
  }
  
  .text-compare-content.side-by-side .text-compare-panel {
    display: block;
    border-right: 1px solid #374151;
  }
  
  .text-compare-content.side-by-side .text-compare-panel:last-child {
    border-right: none;
  }
  
  .text-compare-panel-header {
    display: none;
    font-weight: 600;
    font-size: 0.875rem;
    color: #60a5fa;
    margin-bottom: 1rem;
    padding-bottom: 0.5rem;
    border-bottom: 1px solid #374151;
    position: sticky;
    top: 0;
    background: #121212;
    z-index: 10;
  }
  
  .text-compare-content.side-by-side .text-compare-panel-header {
    display: block;
  }
  
  .text-compare-panel .prose {
    font-size: 0.875rem;
    line-height: 1.6;
  }
  
  .text-compare-panel .prose h1 {
    font-size: 1.25rem;
    margin-top: 0;
  }
  
  .text-compare-panel .prose h2 {
    font-size: 1.125rem;
  }
  
  .text-compare-panel .prose h3 {
    font-size: 1rem;
  }
  
  .text-compare-panel .prose pre {
    font-size: 0.75rem;
    padding: 0.75rem;
    overflow-x: auto;
  }
  
  .text-compare-panel .prose code {
    font-size: 0.75rem;
  }
  
  .text-compare-panel .prose table {
    width: 100%;
    border-collapse: collapse;
    margin: 1rem 0;
    font-size: 0.8rem;
  }
  
  .text-compare-panel .prose th,
  .text-compare-panel .prose td {
    border: 1px solid #374151;
    padding: 0.5rem 0.75rem;
    text-align: left;
  }
  
  .text-compare-panel .prose th {
    background: #1f2937;
    font-weight: 600;
    color: #e5e7eb;
  }
  
  .text-compare-panel .prose tr:nth-child(even) {
    background: #1a1a1a;
  }
  
  .text-compare-panel .prose tr:hover {
    background: #252525;
  }
  
  .text-compare-loading {
    display: flex;
    align-items: center;
    justify-content: center;
    padding: 2rem;
    color: #6b7280;
  }
  
  .text-compare-error {
    color: #ef4444;
    padding: 1rem;
  }
  
   
  @media (max-width: 768px) {
    .text-compare-content.side-by-side {
      flex-direction: column;
    }
    
    .text-compare-content.side-by-side .text-compare-panel {
      border-right: none;
      border-bottom: 1px solid #374151;
      max-height: 400px;
    }
    
    .text-compare-content.side-by-side .text-compare-panel:last-child {
      border-bottom: none;
    }
    
    .text-compare-controls {
      flex-direction: column;
      align-items: stretch;
    }
    
    .text-compare-tabs {
      justify-content: center;
    }
    
    .text-compare-view-toggle {
      justify-content: center;
    }
  }
</style>

<div class="text-compare-container" id="text-compare-1774035248363210757">
  <div class="text-compare-controls">
    <div class="text-compare-tabs">
      
        
        <button class="text-compare-tab active" data-file="devstral-small-2">
          devstral-small-2
        </button>
      
        
        <button class="text-compare-tab" data-file="qwen3-coder">
          qwen3-coder
        </button>
      
        
        <button class="text-compare-tab" data-file="granite4-32b">
          granite4-32b
        </button>
      
        
        <button class="text-compare-tab" data-file="nemotron-30b">
          nemotron-30b
        </button>
      
    </div>
    <div class="text-compare-view-toggle">
      <button class="text-compare-view-btn active" data-view="tabs" title="Tabbed view">
        <svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
          <rect x="3" y="3" width="18" height="18" rx="2" ry="2"></rect>
          <line x1="3" y1="9" x2="21" y2="9"></line>
          <line x1="9" y1="3" x2="9" y2="9"></line>
        </svg>
      </button>
      <button class="text-compare-view-btn" data-view="side-by-side" title="Side-by-side view">
        <svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
          <rect x="3" y="3" width="18" height="18" rx="2" ry="2"></rect>
          <line x1="12" y1="3" x2="12" y2="21"></line>
        </svg>
      </button>
    </div>
  </div>
  
  <div class="text-compare-content" style="height: 600px;">
    
      
      <div class="text-compare-panel active" data-file="devstral-small-2">
        <div class="text-compare-panel-header">devstral-small-2</div>
        <div class="text-compare-loading">Loading...</div>
      </div>
    
      
      <div class="text-compare-panel" data-file="qwen3-coder">
        <div class="text-compare-panel-header">qwen3-coder</div>
        <div class="text-compare-loading">Loading...</div>
      </div>
    
      
      <div class="text-compare-panel" data-file="granite4-32b">
        <div class="text-compare-panel-header">granite4-32b</div>
        <div class="text-compare-loading">Loading...</div>
      </div>
    
      
      <div class="text-compare-panel" data-file="nemotron-30b">
        <div class="text-compare-panel-header">nemotron-30b</div>
        <div class="text-compare-loading">Loading...</div>
      </div>
    
  </div>
</div>




<h3 id="failure-modes">Failure Modes</h3>
<p>Testing revealed distinct ways models fail at agentic tasks:</p>
<table>
  <thead>
      <tr>
          <th>Failure Mode</th>
          <th>Example</th>
          <th>Probable Cause</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>Refuses</strong></td>
          <td>functiongemma</td>
          <td>Too conservative, confused by system prompts</td>
      </tr>
      <tr>
          <td><strong>Hallucinates content</strong></td>
          <td>qwen3:30b, granite4:3b</td>
          <td>Skips tools, fabricates output</td>
      </tr>
      <tr>
          <td><strong>Hallucinates tools</strong></td>
          <td>phi4-mini</td>
          <td>Invents non-existent tool names</td>
      </tr>
      <tr>
          <td><strong>Hallucinates params</strong></td>
          <td>mistral-small3.2</td>
          <td>Knows tools exist, wrong schema</td>
      </tr>
      <tr>
          <td><strong>Narrates tools</strong></td>
          <td>magistral</td>
          <td>Describes tools in text, never invokes</td>
      </tr>
      <tr>
          <td><strong>Stuck on subagent</strong></td>
          <td>qwen2.5-coder</td>
          <td>Can&rsquo;t adapt when Explore fails</td>
      </tr>
      <tr>
          <td><strong>Context stall</strong></td>
          <td>cogito:32b, granite4@32K</td>
          <td>Explores correctly, stops mid-task</td>
      </tr>
      <tr>
          <td><strong>Nested params</strong></td>
          <td>command-r</td>
          <td>Wraps params in {&ldquo;tool_name&rdquo;:X,&ldquo;parameters&rdquo;:{&hellip;}}</td>
      </tr>
      <tr>
          <td><strong>Silent</strong></td>
          <td>rnj-1:8b</td>
          <td>Zero output, can&rsquo;t process system prompts</td>
      </tr>
  </tbody>
</table>
<p>The more sophisticated failures (wrong params, narration, nested params) suggest models trained on different tool-calling formats or documentation rather than actual Anthropic API interactions. Native context window also matters - magistral (39K native) failed even with 128K allocated.</p>
<h3 id="how-local-models-compare-to-cloud">How Local Models Compare to Cloud</h3>
<p>SWE-bench Verified is what everyone uses to evaluate agentic coding - 500 real GitHub issues that models must solve. Here&rsquo;s how local models compare to cloud:</p>
<p><strong>Frontier Cloud Models (Proprietary)</strong></p>
<table>
  <thead>
      <tr>
          <th>Model</th>
          <th>SWE-bench</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Gemini 3 Flash</td>
          <td>75-76%</td>
      </tr>
      <tr>
          <td>Claude Opus 4.5</td>
          <td>74-81%</td>
      </tr>
      <tr>
          <td>GPT-5.2</td>
          <td>72-75%</td>
      </tr>
      <tr>
          <td>Claude Sonnet 4.5</td>
          <td>70.6%</td>
      </tr>
      <tr>
          <td>Claude Haiku 4.5</td>
          <td>68.8%</td>
      </tr>
  </tbody>
</table>
<p><strong>Large Open Weights (Won&rsquo;t fit 48GB)</strong></p>
<table>
  <thead>
      <tr>
          <th>Model</th>
          <th>SWE-bench</th>
          <th>Size</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Devstral 2</td>
          <td>72.2%</td>
          <td>123B</td>
      </tr>
      <tr>
          <td>Qwen3-Coder-480B</td>
          <td>67%</td>
          <td>480B</td>
      </tr>
      <tr>
          <td>DeepSeek-V3.1</td>
          <td>66%</td>
          <td>671B</td>
      </tr>
  </tbody>
</table>
<p><strong>Local Models (Fits 48GB)</strong></p>
<table>
  <thead>
      <tr>
          <th>Model</th>
          <th>SWE-bench</th>
          <th>Result</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>devstral-small-2</strong></td>
          <td><strong>68.0%</strong></td>
          <td>⭐ Winner</td>
      </tr>
      <tr>
          <td>qwen3-coder:30b</td>
          <td>51.6%</td>
          <td>✅ Good</td>
      </tr>
      <tr>
          <td>deepseek-r1:32b</td>
          <td>41.4%</td>
          <td>❌ No tools</td>
      </tr>
      <tr>
          <td>qwen2.5-coder:32b</td>
          <td>9.0%</td>
          <td>❌ Stuck</td>
      </tr>
  </tbody>
</table>
<p><strong>The gap is surprisingly small.</strong> devstral-small-2 at 68% matches Claude Haiku 4.5 and trails Opus by only 6-8 points. A 24B model running locally keeps up with 100B+ models - turns out agentic training matters more than size.</p>
<p>SWE-bench score also predicts Claude Code success: models without published scores aren&rsquo;t coding-focused and failed my tests.</p>
<h2 id="conclusions">Conclusions</h2>
<p>Local models can do real agentic work now. devstral-small-2 completed the task reliably, with no hand-holding. It&rsquo;s slower than cloud (17 min vs 2 min), but it runs on my laptop completely offline.</p>
<h3 id="key-takeaways">Key Takeaways</h3>
<ol>
<li><strong>devstral-small-2 wins</strong> - best results, smallest footprint, built for this</li>
<li><strong>The gap is smaller than I expected</strong> - 68% SWE-bench matches Haiku, trails Opus by 8 points</li>
<li><strong>Context window matters</strong> - Ollama defaults to 4K; bump it to 64K or watch models hallucinate</li>
<li><strong>SWE-bench predicts success</strong> - no published score usually means it won&rsquo;t work</li>
<li><strong>Speed hurts</strong> - 17-24 minutes vs 2 minutes on cloud</li>
<li><strong>Check tool support first</strong> - not all models work with Ollama&rsquo;s Anthropic API</li>
</ol>
<h3 id="what-works">What Works</h3>
<p>devstral-small-2 and qwen3-coder both work reliably. The tool calling infrastructure is solid when the model supports it. Ollama 0.14.0 makes setup easy - no more LiteLLM translation layer.</p>
<h3 id="what-doesnt-work-yet">What Doesn&rsquo;t Work (Yet)</h3>
<p>Most models can&rsquo;t finish multi-step agentic tasks without help. Context overflow causes hallucinations (fabricated URLs, wrong repo names). And 8-12x slower than cloud is hard to ignore.</p>
<h3 id="critical-set-context-to-64k">Critical: Set Context to 64K+</h3>
<p>Ollama defaults to 4K context regardless of what model cards advertise. Claude Code&rsquo;s system prompts overflow this, causing silent failures or hallucinations.</p>
<p><img src="https://taletskiy.com/img/ollama-context-setting.png" alt="Ollama settings showing context length slider"></p>
<table>
  <thead>
      <tr>
          <th>Context</th>
          <th>Result</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>4-16K</td>
          <td>❌ Zero tool calls</td>
      </tr>
      <tr>
          <td>32K</td>
          <td>⚠️ Starts fine, then hallucinates</td>
      </tr>
      <tr>
          <td>64K+</td>
          <td>✅ Works</td>
      </tr>
  </tbody>
</table>
<h3 id="quick-start">Quick Start</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#75715e"># 1. Install Ollama 0.14.0+ and pull devstral</span>
</span></span><span style="display:flex;"><span>ollama pull devstral-small-2
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># 2. Set context to 64K in Ollama settings (GUI slider)</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># 3. Add alias to ~/.zshrc</span>
</span></span><span style="display:flex;"><span>alias claude-local<span style="color:#f92672">=</span><span style="color:#e6db74">&#39;ANTHROPIC_BASE_URL=http://localhost:11434 ANTHROPIC_API_KEY=ollama CLAUDE_CODE_USE_BEDROCK=0 claude --model devstral-small-2&#39;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># 4. Run it</span>
</span></span><span style="display:flex;"><span>source ~/.zshrc
</span></span><span style="display:flex;"><span>claude-local
</span></span></code></pre></div><p><a href="https://taletskiy.com/blogs/ollama-claude-code/">Read the full post on taletskiy.com</a></p>]]></content:encoded>
    </item>
    <item>
      <title>No Code, No Problem</title>
      <link>https://ktaletsk.quarto.pub/no-code-no-problem</link>
      <pubDate>Thu, 11 Dec 2025 00:00:00 &#43;0000</pubDate>
      <guid>https://ktaletsk.quarto.pub/no-code-no-problem</guid>
      <description><![CDATA[<p>PyData Boston 2025 lightning talk on non-code contributions to Open Source</p><p><a href="https://ktaletsk.quarto.pub/no-code-no-problem">Read the full post on taletskiy.com</a></p>]]></description>
      <content:encoded><![CDATA[<p>PyData Boston 2025 lightning talk on non-code contributions to Open Source</p><p><a href="https://ktaletsk.quarto.pub/no-code-no-problem">Read the full post on taletskiy.com</a></p>]]></content:encoded>
    </item>
    <item>
      <title>The JupyterLab Extension Ecosystem at PyData Boston 2025</title>
      <link>https://taletskiy.com/blogs/pydata-boston-25/</link>
      <pubDate>Wed, 10 Dec 2025 00:00:00 &#43;0000</pubDate>
      <guid>https://taletskiy.com/blogs/pydata-boston-25/</guid>
      <description><![CDATA[<p> I had an opportunity to present at PyData Boston 2025 about analyzing the JupyterLab extension ecosystem.
The talk analyzes the current state of the JupyterLab extension landscape in 2025 using public PyPI (via BigQuery) and GitHub data to quantify growth, momentum, and health by examining metrics such as monthly downloads by category, release recency, the relationship between stars and downloads, and the emergence of AI-focused extensions.
</p><p><a href="https://taletskiy.com/blogs/pydata-boston-25/">Read the full post on taletskiy.com</a></p>]]></description>
      <content:encoded><![CDATA[<p> I had an opportunity to present at PyData Boston 2025 about analyzing the JupyterLab extension ecosystem.
The talk analyzes the current state of the JupyterLab extension landscape in 2025 using public PyPI (via BigQuery) and GitHub data to quantify growth, momentum, and health by examining metrics such as monthly downloads by category, release recency, the relationship between stars and downloads, and the emergence of AI-focused extensions.
</p><div class="video-container">
    
</div>
<p>I had an opportunity to present at PyData Boston 2025 about analyzing the JupyterLab extension ecosystem.</p>
<p>The talk analyzes the current state of the JupyterLab extension landscape in 2025 using public PyPI (via BigQuery) and GitHub data to quantify growth, momentum, and health by examining metrics such as monthly downloads by category, release recency, the relationship between stars and downloads, and the emergence of AI-focused extensions.</p>
<p><a href="https://taletskiy.com/blogs/pydata-boston-25/">Read the full post on taletskiy.com</a></p>]]></content:encoded>
    </item>
    <item>
      <title>Jupyter Open Studio Day SF 2025</title>
      <link>https://taletskiy.com/blogs/jupytrer-open-studio-day/</link>
      <pubDate>Mon, 10 Nov 2025 00:00:00 &#43;0000</pubDate>
      <guid>https://taletskiy.com/blogs/jupytrer-open-studio-day/</guid>
      <description><![CDATA[<p>The fun did not stop after JupyterCon. Next Monday after the conference, Bloomberg invited everyone to their office to collaborate on Jupyter projects. There were many of the friendly faces who decided to make a trek from Southern California to the Bay while there were here for JupyterCon.
</p><p><a href="https://taletskiy.com/blogs/jupytrer-open-studio-day/">Read the full post on taletskiy.com</a></p>]]></description>
      <content:encoded><![CDATA[<p>The fun did not stop after JupyterCon. Next Monday after the conference, Bloomberg invited everyone to their office to collaborate on Jupyter projects. There were many of the friendly faces who decided to make a trek from Southern California to the Bay while there were here for JupyterCon.
</p><p>The fun did not stop after JupyterCon. Next Monday after the conference, Bloomberg <a href="https://go.bloomberg.com/attend/invite/jupyter-open-studio-day-november-10-2025/">invited</a> everyone to their office to collaborate on Jupyter projects. There were many of the friendly faces who decided to make a trek from Southern California to the Bay while there were here for JupyterCon.</p>
<p><img src="https://taletskiy.com/img/jupytercon/jupyter-open-studio-day.jpeg" alt="Jupyter Open Studio Day venue"></p>
<p>Building on the momentum from the <a href="https://taletskiy.com/blogs/jupytrer-open-studio-day/#sprint-day">Sprint Day</a>, I continued to explore those topics during the event.</p>
<ul>
<li>I exported all GitHub PRs and issues related to the filebrowser package (<a href="https://github.com/jupyterlab/jupyterlab/issues?q=label%3Apkg%3Afilebrowser"><code>label:pkg:filebrowser</code></a>) and ran analysis with Claude to find which of 800+ items might relevant to upload/copy/move UX. As a good first issue to solve (I&rsquo;ve never contributed to the JupyterLab core!) I prototyped the button to cancel the file upload in JupyterLab. Below is my feature in action, but the full presentation is available <a href="https://hackmd.io/@eqt0f1ICTrun-afIFzMReA/H1A0n1ggZg">here</a>.</li>
</ul>
<div style="position: relative; padding-bottom: 62.42774566473989%; height: 0;"></div>
<ul>
<li>I had a chat with participants about how my Jupyter Marketplace can be useful for developers, what additional signals to include. I appreciated a suggestion from Ely @ Bloomberg to include a contribution activity indicator (number of commits/issues/PRs over some period of time).</li>
<li>I had an opportunity to help Hannah Chen @ Bloomberg to try and set up my <a href="https://github.com/orbrx/auto-dashboards">Auto Dashboards</a> extension for generating Streamlit dashboards from Jupyter notebooks with live preview inside JupyterLab. She is <code>uv</code> user, so I learned how to do a development install using <code>uv</code> for JupyterLab extensions and updated my instructions.</li>
</ul>
<p><img src="https://taletskiy.com/img/jupytercon/bloomberg-office-view.jpeg" alt="View from Bloomberg office during Jupyter Open Studio Day"></p>
<p>Huge thanks to Ely and Bloomberg for the invite and organizing the event for us!</p>
<p><img src="https://taletskiy.com/img/jupytercon/sf-ferry-skyline.jpeg" alt="San Francisco Ferry Building and a skyline"></p><p><a href="https://taletskiy.com/blogs/jupytrer-open-studio-day/">Read the full post on taletskiy.com</a></p>]]></content:encoded>
    </item>
    <item>
      <title>JupyterCon 2025 Reflections</title>
      <link>https://taletskiy.com/blogs/jupytercon-25/</link>
      <pubDate>Thu, 06 Nov 2025 00:00:00 &#43;0000</pubDate>
      <guid>https://taletskiy.com/blogs/jupytercon-25/</guid>
      <description><![CDATA[<p>Another JupyterCon is in the books!
I have been a part of this community for the last 7 years, starting as a user, then building on top of Jupyter OSS projects and API and finally starting to contribute back to the core projects. I am really grateful for all the people I met along the way 🙏 This post is a reflection on my experience.
</p><p><a href="https://taletskiy.com/blogs/jupytercon-25/">Read the full post on taletskiy.com</a></p>]]></description>
      <content:encoded><![CDATA[<p>Another JupyterCon is in the books!
I have been a part of this community for the last 7 years, starting as a user, then building on top of Jupyter OSS projects and API and finally starting to contribute back to the core projects. I am really grateful for all the people I met along the way 🙏 This post is a reflection on my experience.
</p><p>Another JupyterCon is in the books!</p>
<div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      
    </div>

<p>I have been a part of this community for the last 7 years, starting as a user, then building on top of Jupyter OSS projects and API and finally starting to contribute back to the core projects. I am really grateful for all the people I met along the way 🙏
This post is a reflection on my experience.</p>
<p><img src="https://taletskiy.com/img/jupytercon/me-at-jupytercon2025.jpeg" alt="Me at JupyterCon 2025"></p>
<h2 id="extension-development-tutorial">Extension Development Tutorial</h2>
<p>One of the main ways I have always participated in the community is through JupyterLab extensions. This is what makes JupyterLab a next step after Notebook &ndash; an extensible architecture starting in the core itself (JupyterLab is built as a collection of extensions) and extending outward to allow exploring new ideas (collaboration, AI) and enhancing UX millions of users can rely on (git, LaTeX, ipywidgets). As an extension author, contributor and maintainer, I&rsquo;ve seen an explosion of AI-related ideas in Jupyter space. To better highlight the changes happening in the ecosystem, I built a community extension marketplace <a href="https://labextensions.dev">labextensions.dev</a>, which surfaces the most important signals (categories, downloads, GitHub stars) to both users and developers.</p>
<p>So, naturally, when the JupyterCon CFP opened, I submitted a workshop proposal combining the things I am most interested in: mentoring new generation of contributors and exploring AI coding tools in the ways they can be helpful (or not). Turns out, there were 3 more very similar workshops, so we combined forces with Rosio Reyes, Jason Grout and Matt Fisher and put together a full day tutorial! It was my first ever workshop I organized and I dove head first. It was no small feat, but our amazing team made it possible. I would also like to thank Lahari Chowtoori for providing AWS Bedrock credits for the participants, so they can use Claude Code; and Zach Sailer for agreeing to do a demo of Jupyter AI in action.</p>
<p>But when conference day rolled around, we were ready with a repo and a website complete with all the steps. It is fully open source (MIT license) and will be available to the community for the time being. You can find the tutorial materials here: <a href="https://jupytercon.github.io/jupytercon2025-developingextensions/">jupytercon.github.io/jupytercon2025-developingextensions</a>.</p>
<p>And I&rsquo;m also happy to report that the entire session was recorded and uploaded to YouTube</p>
<div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      
    </div>

<p><img src="https://taletskiy.com/img/jupytercon/tutorial-room.jpeg" alt="JupyterCon Tutorial Room">
<em>Anatomy of the extension</em></p>
<p>The day went in a flash, but when it was all said and done we were able to see the impact clearly:</p>
<ol>
<li>Participants were able to follow our instructions: we&rsquo;ve seen <a href="https://github.com/topics/jupytercon2025">30 repos</a> created during the tutorial</li>
<li>Participants enjoyed their experience and it felt empowering: in our DMs and public <a href="https://medium.com/womenintechnology/reflections-from-jupytercon-2025-8ace9e6b27ab">posts</a></li>
<li>Some participants (especially on Windows) struggled with the environment installation steps. Extensions are using somewhat complex stack (Python, nodejs) and tools like <code>git</code> or <code>gh-cli</code> were hard to get working. I would strongly consider creating a cloud-hosted backup option (i.e. GitHub Codespaces) to allow participants to have a ready-to-go environment if their local one is impossible to set up.</li>
<li>Despite the difficulties, at least one of the attendees (Lingtao Xie @ Esri) has since created a brand new JupyterLab extension, <a href="https://labextensions.dev/extensions/jupyterlab-todo-list">jupyterlab-todo-list</a>! After the conference she mentioned that she enjoyed the workshop and invited feedback on the extension as she keeps learning React and TypeScript — exactly the kind of follow‑through and openness that makes this community so fun to work with.</li>
</ol>
<p><img src="https://taletskiy.com/img/jupytercon/todo-extension-screenshot.png" alt="Screenshot of jupyterlab-todo-list extension in action"></p>
<ol start="5">
<li>We might also have made the wrong assumptions about the number of participants and their interests. This is because we had very limited data on the workshop participants from the conference organizers. Turns out, pre-registration for a particular workshop was not required, only for the workshop day. Additionally, badges were not scanned at the entrance to the room, so we have limited ways of knowing who attended the session. I hope this will be addressed by the Jupyter/Linux Foundation when planning the next JupyterCon!</li>
</ol>
<p><img src="https://taletskiy.com/img/jupytercon/IMG_1489.jpeg" alt="JupyterCon 2025 Workshop">
<em>Wrap up of the tutorial</em></p>
<p>Overall, I had a great time teaching people and troubleshooting with them as a TA. Most importantly, we laid a strong foundation for the next tutorials as we created a strong written guide alongside the presentation.</p>
<h2 id="jupyterhub-satellites">JupyterHub satellites</h2>
<div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      
    </div>

<p>This is a talk I wanted to present for quite some time.
I had a chance to do a brief intro to Notebooks Hub&rsquo;s approach to running non-notebook applications at the <a href="https://taletskiy.com/blogs/jupytercon-23/">previous JupyterCon</a>,
but the opportunity to present this in full detail came only after I departed the team.
I am grateful to Axle for the opportunity to still present this material, especially, because the topic found such strong interest in the community.</p>
<p>Initially, I was planning to organize a Birds-of-a-Feather (BoF) session with JupyterHub deployers,
but it ended up being a talk. As it turned out, Yuvi Panda @ 2i2c was giving his own perspective (<a href="https://jupytercon2025.sched.com/event/28H4l/not-just-for-notebooks-jupyterhub-in-2025-yuvi-p-2i2c">Not Just Notebooks</a>) on running applications, and his talk really echoed mine.</p>
<p>&ldquo;JupyterHub satellites&rdquo; as I call them, are really just applications other than Jupyter Notebook/Lab (RStudio, VSCode, Streamlit) orchestrated by JupyterHub. Even though the community had tools and recipes for a long time now, our approach was a little different, as we relied on standalone proxy (jhsingle_native_proxy) and standardized Docker containers. Both my and Yuvi&rsquo;s talk highlighted the need to update documentation and centralize the existing recipes (scripts, Docker images) to unlock even more satellites (i.e. <a href="https://marimo.io/">Marimo</a>, <a href="https://www.dyad.sh/">Dyad</a>, <a href="https://github.com/posit-dev/positron">Positron</a>, etc) by the community efforts.</p>
<p>Recent updates to Jupyter Server proxy adding standalone mode, finally allowed for both standalone and integrated experiences in JupyterHub with a unified codebase under an official Jupyter repo. Since jhsingle_native_proxy was not actively maintained, it provides an off-ramp for existing users to join the community effort.</p>
<p>After my talk, I heard from JupyterHub users who are interested in collaborating on open-sourcing the recipes for wrapping dashboards in Docker containers. I hope to meet them at the <a href="https://jupyter.zulipchat.com/#narrow/channel/469744-jupyterhub/topic/Hub.20Dash.3A.202-3.20December.202025/near/554490824">Hub Dash</a> on December 2-3.</p>
<p>To wrap up, I would like to thank Yuvi Panda and Chris Holdgraf @ 2i2c for productive conversations on the topic before and after this talk!</p>
<h2 id="favorite-talks">Favorite talks</h2>
<p>Even though I spent a lot of time at the Anaconda booth and in hallway conversations, I still managed to sneak out for a few talks that really stuck with me.</p>
<p><div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      
    </div>

<em>Incredibly inspiring</em></p>
<p><div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      
    </div>

<em>Very useful practical advice for breaking into OSS contributions</em></p>
<p><div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      
    </div>

<em>Do not compare F1 car and SUV; Jupyter is not an IDE</em></p>
<h2 id="first-conference-as-anacondiac">First conference as Anacondiac</h2>
<p>This was my first conference as a OSS Jupyter developer working at Anaconda.</p>
<p><img src="https://taletskiy.com/img/jupytercon/IMG_1427.jpeg" alt="Me at JupyterCon 2025, wearing Anaconda jacket and showing my badge"></p>
<p>We had a strong showing this year with talks across multiple tracks, sponsored talk at the Demo Theater and a delightful booth where I got a chance to meet so many of our users!</p>
<h3 id="demo-usage-patterns-in-the-jupyter-ecosystem-jack-evans-anaconda">Demo: Usage Patterns in the Jupyter Ecosystem (Jack Evans, Anaconda)</h3>
<p>One of the most thought‑provoking sessions for me was Jack’s demo based on internal telemetry about how people actually use Jupyter. One stat that really stuck with me: based on Anaconda’s data, <strong>around 79% of users still prefer the classic Notebook interface over JupyterLab</strong>, which is a humbling reminder to keep investing in Notebook UX even as we push the ecosystem forward.</p>
<p>You can dig into the full deck here:</p>

<h3 id="lightning-talk-whats-new-in-jupyter-frontends-jeremy-tuloup-quantstack--rosio-reyes-anaconda">Lightning Talk: What&rsquo;s New in Jupyter Frontends (Jeremy Tuloup, QuantStack &amp; Rosio Reyes, Anaconda)</h3>
<div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      
    </div>

<p>This was a fast but very dense overview of what’s landing across JupyterLab and the classic Notebook experience. As someone who works on extensions, it was great to see how improvements in the Lab frontend keep flowing back into the “plain notebook” UX that so many users still rely on.</p>
<h3 id="the-lifecycle-of-a-jupyter-environment-from-exploration-to-productiongrade-pipelines-dawn-wages-anaconda">The Lifecycle of a Jupyter Environment: From Exploration To Production‑Grade Pipelines (Dawn Wages, Anaconda)</h3>
<div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      
    </div>

<p>Dawn’s talk did a great job walking through the journey from an exploratory notebook to a maintainable ETL pipeline, with practical tooling like Papermill, nbconvert and PyScript/Voila/Panel in the mix. I especially appreciated the emphasis on planning for production from the start instead of treating “pipeline-izing” as an afterthought.</p>
<h3 id="runtime-agents-unleashing-event-sourced-collaboration-for-jupyter-kyle-kelley-anaconda">Runtime Agents: Unleashing Event Sourced Collaboration for Jupyter (Kyle Kelley, Anaconda)</h3>
<div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      
    </div>

<p>Kyle made a strong case for “moving notebooks to the server side” so that state lives independently of the browser tab. I loved seeing concrete demos of long‑running, resilient sessions and collaborative editing that felt much closer to how people actually work with notebooks day‑to‑day.</p>
<h3 id="conversations-at-the-booth">Conversations at the booth</h3>
<p><img src="https://taletskiy.com/img/jupytercon/at-anaconda-booth.jpeg" alt="Anaconda crew at the booth">
<em>Anaconda crew at the booth. Left to right: myself, Peter Wang, Dan Yeaw, Daina Bouquin, Rosio Reyes</em></p>
<p>Since conference was well attended by the local students (kudos to JupyterCon!), the topic of job searching advice came up a lot. As someone who mentored and interviewed engineers throughout my career, I highlighted the importance of pursuing personal projects and open source contributions. It really is such a powerful signal to hiring managers being able to see your ideas, contributions and code style in the open. I shared my favorite personal anecdote on how a very well crafted <a href="https://colab.research.google.com/github/ktaletsk/CPF/blob/master/1D_Example/CPF_1D_toy.ipynb">Colab notebook</a> helped me get my first job after grad school. Below are some reflections of the students.</p>
<ul>
<li><a href="https://www.linkedin.com/in/fariha-sheikh-usc/">Fariha Sheikh</a>&rsquo;s reflections 👉 <a href="https://www.linkedin.com/feed/update/urn:li:share:7392660528554225664">View this post on LinkedIn</a></li>
</ul>

<ul>
<li><a href="https://www.linkedin.com/in/abaghyangor/">Gor Abaghyan</a>&rsquo;s experience. At the booth and later at Sprints, we talked through his <a href="https://pokeagent.github.io/">PokéAgent Challenge</a> setup and I suggested Docker as a way to both debug and pick up a tool that would pay off later; after the conference he messaged that he’d built the <code>mgba</code> bindings from source, had the ReAct agent running, and completed about 30% of the run, describing it as a really great experience.</li>
</ul>
<h3 id="exploring-the-city-and-connecting-to-fellow-anacondiacs">Exploring the city and connecting to fellow Anacondiacs</h3>
<p>As we wrapped each day, the Anaconda team would pack into local restaurant and invite fellow Jovians for a dinner and conversation about Jupyter, Python and Open Source.</p>
<p><img src="https://taletskiy.com/img/jupytercon/el-agave-dinner.jpeg" alt="Dinner at El Agave with Jupyter community">
<em>El Agave dinner: A memorable night of with great food and amazing people from the Jupyter community.</em></p>
<p><img src="https://taletskiy.com/img/jupytercon/anaconda-x-deepnote-dinner.jpg" alt="Anaconda × Deepnote Dinner at JupyterCon 2025">
<em>Dinner with the Deepnote team. Photo credit: Dawn Wages</em></p>
<h2 id="venue-and-city">Venue and city</h2>
<p>Set in beautiful San Diego, this was a great place to be in the beginning of the November. Paradise Point resort did a great job creating such a welcoming experience. Continuing on JupyterCon 2023 success, this year&rsquo;s catering was perfect. Not only they provided breakfast and lunch, but the variety of snacks and desserts like no other conference!</p>
<p><img src="https://taletskiy.com/img/jupytercon/jupytercon-lunch.jpg" alt="Conference lunch at JupyterCon 2025">
<em>Photo credit: Dawn Wages</em></p>
<p><img src="https://taletskiy.com/img/jupytercon/IMG_1419.jpeg" alt="Paradise Point Resort beach at night">
<em>Paradise Point Resort beach at night – the end of a perfect JupyterCon day</em></p>
<p><img src="https://taletskiy.com/img/jupytercon/IMG_1402.jpeg" alt="Paradise Point Resort at sunset">
<em>Lights and decorations at Paradise Point Resort</em></p>
<p><img src="https://taletskiy.com/img/jupytercon/IMG_1426.jpeg" alt="Paradise Point Resort grounds with conference banners">
<em>Flowers at Paradise Point Resort</em></p>
<p><img src="https://taletskiy.com/img/jupytercon/san-diego-gaslamp.jpeg" alt="San Diego Gaslamp District">
<em>San Diego Gaslamp District</em></p>
<p><img src="https://taletskiy.com/img/jupytercon/IMG_1530.jpeg" alt="Ghirardelli store in Gaslamp district">
<em>Ghirardelli store in Gaslamp district</em></p>
<p>After wrapping up the conference, I spent some quality time exploring San Diego with my family. The San Diego Zoo was a favorite, with its lush landscapes, panda exhibit, and countless other animal encounters.</p>
<p><img src="https://taletskiy.com/img/jupytercon/san-diego-zoo.jpeg" alt="San Diego Zoo Adventure">
<em>San Diego Zoo was a big highlight for the family!</em></p>
<p><img src="https://taletskiy.com/img/jupytercon/san-diego-zoo-panda.jpeg" alt="San Diego Zoo Panda Exhibit">
<em>The panda exhibit was my daughter&rsquo;s favorite</em></p>
<p>We also managed to visit Legoland, making the trip a perfect mix of work and play!</p>
<p><img src="https://taletskiy.com/img/jupytercon/legoland.jpeg" alt="Legoland Adventure">
<em>We topped off the trip with a visit to Legoland!</em></p>
<h2 id="sprint-day">Sprint Day</h2>
<p>The energy from the conference carried right into Sprint Day. Kirstie Whitaker and Zach Sailer kicked off the Sprints by opening the floor to anyone who had an idea on what to work together. One-by-one participants lined up to explain <a href="https://jupyter.zulipchat.com/#narrow/channel/531269-jupytercon/topic/.E2.9C.94.20sprints/with/558372277">their ideas</a> in 30 seconds. The diversity of topics was remarkable, spanning from infrastructure challenges like Kubernetes directory management and JupyterHub cost optimization, to emerging AI integrations including browser-based AI and Jupyter AI coordination. Others focused on improving the documentation and publishing ecosystem with MyST and JupyterBook enhancements, while several participants tackled developer experience improvements from package audits to Git workflows. What struck me was the balance between technical infrastructure work and efforts to make Jupyter more accessible – WYSIWYG editors, better documentation, and collecting user stories to understand pain points. This breadth really showcased how the Jupyter ecosystem continues to evolve in multiple directions simultaneously, driven by the diverse needs of its community.</p>
<h3 id="tackling-file-browser-ux-challenges">Tackling File Browser UX Challenges</h3>
<p>During the sprints, Andrew Thornton from Maxar raised an issue that resonated with many in the room: accidental drag-and-drop operations in JupyterLab can trigger large file copies with no visual feedback, no progress indicators, and no way to cancel them. His users were experiencing disk space issues and frozen servers from these unintentional operations.</p>
<p>This sparked a productive discussion where Taran Rorem shared that he had already solved this issue with a custom extension that wraps the file browser&rsquo;s rename and move methods. He generously shared his approach using the <code>IFileBrowserFactory</code> interface, demonstrating how a relatively simple plugin could intercept these operations and add the missing feedback layer. We created a <a href="https://jupyter.zulipchat.com/#narrow/channel/531269-jupytercon/topic/Lab.20file.20browser.20drag-drop.20copy.20announcement">Zulip topic</a> to continue tracking this issue and coordinate solutions.</p>
<p>This conversation opened up a broader examination of file operations UX in JupyterLab. Through discussions with various users and my own analysis of issues and PRs, I identified several critical gaps:</p>
<ul>
<li><strong>No cancellation for uploads</strong>: Users accidentally uploading large files have no choice but to wait or kill the server</li>
<li><strong>Missing progress indicators</strong>: File copies and moves happen silently, leaving users uncertain if operations are running or complete</li>
<li><strong>No operation queue visibility</strong>: When handling multiple file operations, users can only see one progress bar at a time</li>
<li><strong>Risk of data corruption</strong>: Users may shut down servers thinking operations are complete when they&rsquo;re still in progress</li>
</ul>
<p>These are some rough edges causing daily frustrations for users working with large datasets, remote servers, or production workflows. I compiled these user stories and began prototyping solutions, which I later presented at the Jupyter Open Studio Day at Bloomberg (more on that in a future post).</p>
<h3 id="jupyter-ai-v3-and-personas">Jupyter AI v3 and Personas</h3>
<p>As we started working in groups, Jupyter AI took the stage to run through the setup and development of Personas for upcoming Jupyter AI v3. This topic was so popular that it captivated the room that morning, with multiple people (including myself) circling the stage with their chairs, laptops going full speed.</p>
<p>I helped to start a <a href="https://jupyter.zulipchat.com/#narrow/channel/531269-jupytercon/topic/.E2.9C.94.20jupyter-ai-sprint/with/558372308">Zulip thread</a> documenting the setup steps. While, still in their early days, the Persona approach is a very powerful concept, deliberately steering away from the currently popular &ldquo;agent&rdquo; approach of 2025, and combining traits of AI models and tools under one umbrella. If you want to learn more about the philosophy of Jupyter&rsquo;s approach to AI, watch an <a href="https://thenewstack.io/from-physics-to-the-future-brian-granger-on-project-jupyter-in-the-age-of-ai/">interview with Brian Granger</a> on his vision for Jupyter, AI, and collaboration between humans and machines at TheNewStack.</p>
<p>How does one create a Jupyter AI Persona? Turns out, it is very simple. You just need to write a Python class inheriting from the <code>BasePersona</code>, which overrides metadata <code>PersonaDefaults</code> (so that your Persona has its own name) and <code>process_message</code> which receives the input <code>Message</code> and sends it back to the chat <code>self.send_message</code>.</p>
<p>With this simple API comes a great power and a great responsibility.</p>
<p>Power comes from not being tied to a particular AI framework (i.e. LangChain in JupyterLab AI v1 and v2). You can readily grab a simple SDK usage example from a provider&rsquo;s docs and add it to your Persona. Boom &ndash; you&rsquo;ve got yourself a support of a new provider in Jupyter AI. After seeing the demos, I immediately wanted to experiment with Cerebras AI, which enables very fast inference at 1000-2000 tokens/s.</p>
<p>To explore the possible issues with a new API, I created a silly &ldquo;hacker&rdquo; persona that immediately deletes all .ipynb files in the directory when mentioned. Sadly, it just worked, so the community needs to figure out our approach to this issue &ndash; either enabling guardrails or building a trusted ecosystem of personas (after all users are always responsible for what they install with <code>pip intall</code>, this is no different).</p>
<video controls width="700">
  <source src="https://jupyter.zulipchat.com/user_uploads/1430/6PB3qgncf87SuoxC-rtgGukO/h4cker-persona.mp4" type="video/mp4">
  Your browser does not support the video tag.
</video>
*Demo of the "hacker" persona in action*
<h3 id="community-collaboration-in-action">Community Collaboration in Action</h3>
<p>What struck me most about Sprint Day was how quickly the community rallied around these seemingly &ldquo;small&rdquo; UX issues that have big impacts on daily productivity. Within hours, we had identified the problems, shared existing solutions, created tracking issues, and started planning implementations. This is the Jupyter community at its best – practitioners identifying real problems and immediately working together toward solutions.</p>
<h2 id="community">Community</h2>
<p>Throughout the conference, I had a great time talking with members of Jupyter community, in particular our tutorial working group and <a href="https://jupyter.org/about#community-building-working-group-members">Community Building Working Group Members</a>, which (not surprisingly!) have a big intersection.</p>
<p><img src="https://taletskiy.com/img/jupytercon/IMG_4026.jpeg" alt="Presenting at JupyterCon 2025"></p>
<p><img src="https://taletskiy.com/img/jupytercon/ans-electronics-repair-shop.jpeg" alt="AN’S ELECTRONICS REPAIR ice cream shop">
<em>Decompressing after our workshop with Rosio Reyes and Matt Fisher at An&rsquo;s Electornics Repair ice cream shop. Not a real repair shop, but features menu items on a CRTs!</em></p>
<p>It was so awesome to meet the people whom I regularly see on my screen, in GitHub issues, in Zoom calls!</p>
<p><img src="https://taletskiy.com/img/jupytercon/triage-call-crew.jpeg" alt="Triage Call Crew at JupyterCon 2025"></p>
<p>And of course, it was such a privilege to shake hands and talk to the Project Jupyter leaders and creators:  Fernando Pérez, Brian Granger, Min Ragan-Kelley and many others.</p>
<p><img src="https://taletskiy.com/img/jupytercon/IMG_1525.jpeg" alt="At JupyterCon with Brian Granger, Sylvain Corlay and Jason Weill in the back">
<em>At JupyterCon with Brian Granger, Sylvain Corlay and Jason Weill in the back</em></p>
<p><a href="https://taletskiy.com/blogs/jupytercon-25/">Read the full post on taletskiy.com</a></p>]]></content:encoded>
    </item>
    <item>
      <title>Beyond Code Comparison: Mito&#39;s Functional Evaluation Approach to AI Testing</title>
      <link>https://taletskiy.com/blogs/functional-llm-evals/</link>
      <pubDate>Sun, 06 Apr 2025 00:00:00 &#43;0000</pubDate>
      <guid>https://taletskiy.com/blogs/functional-llm-evals/</guid>
      <description><![CDATA[<p>Learn how Mito's execution-based evaluation approach focuses on the functional results of AI-generated code rather than superficial similarity.</p><p><a href="https://taletskiy.com/blogs/functional-llm-evals/">Read the full post on taletskiy.com</a></p>]]></description>
      <content:encoded><![CDATA[<p>Learn how Mito's execution-based evaluation approach focuses on the functional results of AI-generated code rather than superficial similarity.</p><p>Evaluating the performance and capabilities of Large Language Models (LLMs) is a crucial process, and while there is no single unified approach, patterns can be adapted for specific use cases. For general-purpose models, benchmarks like MMLU are popular, but for domain- and task-specific models, more focused evaluations often provide greater benefit. When it comes to AI-generated code for tasks like data analysis, traditional evaluation methods often fall short by focusing on superficial code similarity. Mito embraces a different paradigm, centering its evaluation on what truly matters: the <strong>functional results of the generated code</strong>.</p>
<h2 id="the-problem-with-traditional-code-evaluation">The Problem with Traditional Code Evaluation</h2>
<p>Traditional code evaluation frequently falls into the trap of expecting AI-generated code to be a carbon copy of a human-written reference solution. This &ldquo;string matching trap&rdquo; overlooks the fundamental truth that there are countless valid ways to write code that achieves the same outcome. Differences in stylistic choices like variable names, code formatting, or even the algorithmic approach taken do not necessarily impact the code&rsquo;s functionality. Insisting on exact string matches can stifle AI creativity and miss equally valid, albeit structurally different, solutions.</p>
<h2 id="mitos-execution-based-evaluation">Mito&rsquo;s Execution-Based Evaluation</h2>
<p>Instead of getting bogged down in syntactic comparisons, Mito&rsquo;s evaluation system focuses on what code actually does: it executes both the AI-generated code and a reference solution within isolated environments and then compares their effects.</p>
<p>This execution-based approach assesses two critical dimensions:</p>
<h3 id="1-global-variable-state-comparison">1. Global Variable State Comparison</h3>
<p>After the execution of both code snippets, Mito intelligently compares the variables that were created or modified. This comparison is not a simple object identity check but understands the nuances of different data types:</p>
<ul>
<li>For basic data types like integers and strings, a direct equality check is performed.</li>
<li>For pandas DataFrames, DataFrame-specific equality methods are used to compare the data content, irrespective of object identity.</li>
<li>NumPy arrays are compared using functions that correctly handle special values like NaN.</li>
<li>For custom objects, their defined equality behavior is respected.</li>
</ul>
<p>This sophisticated comparison means that if two code solutions generate a DataFrame with the same data, they will pass this test, even if their underlying implementations (e.g., using df.query() versus boolean indexing) differ.</p>
<h3 id="2-output-comparison">2. Output Comparison</h3>
<p>Beyond variable state, Mito also captures and compares the standard output (anything printed to the console) from both executions. This ensures that any visualization or reporting functionality works identically.</p>
<p><img src="https://taletskiy.com/img/evals.png" alt="Mito’s Execution-Based Evaluation"></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>summary <span style="color:#f92672">=</span> df<span style="color:#f92672">.</span>groupby(<span style="color:#e6db74">&#39;region&#39;</span>)<span style="color:#f92672">.</span>agg({
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#39;revenue&#39;</span>: <span style="color:#e6db74">&#39;sum&#39;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#39;order_value&#39;</span>: <span style="color:#e6db74">&#39;mean&#39;</span>
</span></span><span style="display:flex;"><span>})<span style="color:#f92672">.</span>reset_index()
</span></span></code></pre></div><h2 id="real-world-example-the-power-of-functional-equivalence">Real-World Example: The Power of Functional Equivalence</h2>
<p>Consider this example:</p>
<p>User request: &ldquo;Create a DataFrame summarizing sales by region, showing total revenue and mean order value.&rdquo;</p>
<p>Reference solution:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>summary <span style="color:#f92672">=</span> df<span style="color:#f92672">.</span>groupby(<span style="color:#e6db74">&#39;region&#39;</span>)<span style="color:#f92672">.</span>agg({
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#39;revenue&#39;</span>: <span style="color:#e6db74">&#39;sum&#39;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#39;order_value&#39;</span>: <span style="color:#e6db74">&#39;mean&#39;</span>
</span></span><span style="display:flex;"><span>})<span style="color:#f92672">.</span>reset_index()
</span></span></code></pre></div><p>AI-generated solution:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>region_groups <span style="color:#f92672">=</span> df<span style="color:#f92672">.</span>groupby(<span style="color:#e6db74">&#39;region&#39;</span>)
</span></span><span style="display:flex;"><span>total_revenue <span style="color:#f92672">=</span> region_groups[<span style="color:#e6db74">&#39;revenue&#39;</span>]<span style="color:#f92672">.</span>sum()
</span></span><span style="display:flex;"><span>avg_order <span style="color:#f92672">=</span> region_groups[<span style="color:#e6db74">&#39;order_value&#39;</span>]<span style="color:#f92672">.</span>mean()
</span></span><span style="display:flex;"><span>summary <span style="color:#f92672">=</span> pd<span style="color:#f92672">.</span>DataFrame({
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#39;region&#39;</span>: total_revenue<span style="color:#f92672">.</span>index,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#39;revenue&#39;</span>: total_revenue<span style="color:#f92672">.</span>values,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#39;order_value&#39;</span>: avg_order<span style="color:#f92672">.</span>values
</span></span><span style="display:flex;"><span>})
</span></span></code></pre></div><p>Under traditional string comparison, these would be considered completely different. But Mito&rsquo;s execution-based evaluation recognizes that both produce functionally identical summary DataFrames with the same data.</p>
<h2 id="why-this-approach-matters">Why This Approach Matters</h2>
<p>This execution-focused evaluation brings several critical advantages:</p>
<ol>
<li>Embraces AI Creativity: It allows AI to find novel solutions rather than forcing it to mimic a specific style.</li>
<li>Focuses on User Intent: What matters is whether the AI satisfied the user&rsquo;s request, not how it constructed the solution.</li>
<li>Handles Edge Cases Naturally: The comparison automatically handles complexities like floating-point precision differences and object equivalence.</li>
<li>Mirrors Real-World Usage: Users care about results, not code aesthetics—this approach aligns evaluation with actual success criteria.</li>
<li>Enables Objective Measurement: Success is binary and objectively determinable: either the code produces the correct output and variable state, or it doesn&rsquo;t.</li>
</ol>
<h2 id="implementing-your-own-execution-based-evaluation">Implementing Your Own Execution-Based Evaluation</h2>
<p>The core of Mito&rsquo;s approach can be adapted by others building AI coding tools:</p>
<ol>
<li>Execute reference and AI code in isolated environments</li>
<li>Capture the resulting variable state and outputs</li>
<li>Compare them using type-appropriate equality checks</li>
<li>Base success on functional equivalence rather than code similarity</li>
</ol>
<p>This shift from syntactic comparison to functional evaluation represents a fundamental advancement in how we should evaluate AI-generated code—focusing on what matters most: whether the code does what the user asked for.</p>
<p>By focusing on execution results rather than implementation details, Mito has created an evaluation framework that truly measures what matters for users—reliable results over superficial code similarity.</p>
<p><a href="https://taletskiy.com/blogs/functional-llm-evals/">Read the full post on taletskiy.com</a></p>]]></content:encoded>
    </item>
    <item>
      <title>Building Auto Dashboards - A Hackathon Journey</title>
      <link>https://taletskiy.com/blogs/hackathon_experience/</link>
      <pubDate>Sat, 08 Feb 2025 00:00:00 &#43;0000</pubDate>
      <guid>https://taletskiy.com/blogs/hackathon_experience/</guid>
      <description><![CDATA[<p>Reflecting on my experience of creating the Auto Dashboards project during a hackathon.</p><p><a href="https://taletskiy.com/blogs/hackathon_experience/">Read the full post on taletskiy.com</a></p>]]></description>
      <content:encoded><![CDATA[<p>Reflecting on my experience of creating the Auto Dashboards project during a hackathon.</p><h2 id="how-it-all-started">How It All Started</h2>
<p>The first Bay Area H4CK D4Y has wrapped up, and what an exhilarating experience it was! As we stepped into 2025, I was eager to capture the enthusiasm and hope for the new year. Most importantly, I wanted to embrace that uncomfortably exciting itch to build and make something happen—a sentiment shared by my fellow participants.</p>
<p>Ten passionate, friendly, and capable hackers gathered to build on the day. We explored every possible Gen AI technology under the sun and worked on real-world projects. The day was so productive that we didn&rsquo;t have enough time to finish the demos!</p>
<p><img src="https://taletskiy.com/img/hackathon_1.jpeg" alt="Hackathon Group Photo">
<em>The amazing team of hackers at the Bay Area H4CK D4Y.</em></p>
<p><img src="https://taletskiy.com/img/hackathon_2.jpeg" alt="Demo Time">
<em>My demo session showcasing Auto-Dashboard POC.</em></p>
<h2 id="the-hackathon-experience">The Hackathon Experience</h2>
<p>I had a blast at H4CK D4Y Bay Area! We tested all the latest coding agents, and I was particularly impressed with Cline in combination with Gemini 2.0 Pro. Others experimented with Devin and Windsurf.</p>
<p>Here&rsquo;s a glimpse of the projects we worked on:</p>
<ol>
<li>Automated university assignment grader</li>
<li>Tool to infer security breaches from logs</li>
<li>Book ranking tracker</li>
<li>Tool to convert Jupyter Notebooks to Streamlit apps</li>
<li>Voice transcription</li>
<li>Document classification and filing tool</li>
</ol>
<h2 id="auto-dashboards">Auto-Dashboards</h2>
<p>During the event, I built and demoed a tool for single-click notebook-to-dashboard conversion, with results rendered side-by-side in JupyterLab. The source will be published <a href="https://github.com/orbrx/auto-dashboards">here</a>.</p>
<p>I started with the existing Streamlit rendering extension from Elyra and added a new endpoint and UI to convert notebooks to Python script and send to LLM for code-to-code translation. The output is Streamlit dashboard with tables, headings and widget APIs from Jupyter replaced by the ones from Streamlit.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>pip install auto-dashboards jupyterlab
</span></span></code></pre></div><h2 id="conclusion">Conclusion</h2>
<p>A big thank you to Jasmine Robinson, Luke Fernandez, Paul &ldquo;π&rdquo; Ivanov, Smit Lunagariya, CL Kao, Salman Munaf, and Scott Behrens for an incredibly fun day. Special thanks to Itay Dafna, the organizer, who brought together a diverse group of participants from companies like Netflix, Google, and TikTok from across Bay Area.</p>
<p><a href="https://taletskiy.com/blogs/hackathon_experience/">Read the full post on taletskiy.com</a></p>]]></content:encoded>
    </item>
    <item>
      <title>Notebooks Hub at JupyterCon 2023</title>
      <link>https://taletskiy.com/blogs/jupytercon-23/</link>
      <pubDate>Wed, 10 May 2023 00:00:00 &#43;0000</pubDate>
      <guid>https://taletskiy.com/blogs/jupytercon-23/</guid>
      <description><![CDATA[<p> I had an opportunity to present a lightning talk at JupyterCon 2023 in Paris. The talk starts at 36:50 mark.
</p><p><a href="https://taletskiy.com/blogs/jupytercon-23/">Read the full post on taletskiy.com</a></p>]]></description>
      <content:encoded><![CDATA[<p> I had an opportunity to present a lightning talk at JupyterCon 2023 in Paris. The talk starts at 36:50 mark.
</p><div class="video-container">
    
</div>
<p>I had an opportunity to present a lightning talk at JupyterCon 2023 in Paris.
The talk starts at 36:50 mark.</p>
<p><a href="https://taletskiy.com/blogs/jupytercon-23/">Read the full post on taletskiy.com</a></p>]]></content:encoded>
    </item>
    <item>
      <title>Play MIDI tones on Adafruit Clue</title>
      <link>https://taletskiy.com/blogs/adafruit-clue-midi/</link>
      <pubDate>Thu, 05 May 2022 00:00:00 &#43;0000</pubDate>
      <guid>https://taletskiy.com/blogs/adafruit-clue-midi/</guid>
      <description><![CDATA[<p>Learn how to play MIDI tones on the Adafruit Clue microcontroller using CircuitPython. This tutorial explores converting MIDI files into playable sounds using the device's built-in speaker, creating a foundation for game development.</p><p><a href="https://taletskiy.com/blogs/adafruit-clue-midi/">Read the full post on taletskiy.com</a></p>]]></description>
      <content:encoded><![CDATA[<p>Learn how to play MIDI tones on the Adafruit Clue microcontroller using CircuitPython. This tutorial explores converting MIDI files into playable sounds using the device's built-in speaker, creating a foundation for game development.</p><p>I&rsquo;ve recently got an Adafruit Clue microcontroller (shout out to Chipy and JFrog for the prize!). It is packed with sensors and a color LCD screen, but the best thing - it is programmable with Circuit Python. With the small screen and couple of buttons to the side, it reminded me of some sort of old pocket gaming console. I decided to build a little game that would run on Clue. When I started, there were no gaming engines written to run on Clue or CircuitPython, so I decided to build my own.</p>
<p>What good game is without a soundtrack? The first thing I got interested in when creating a game was playing a sounds with a tiny speaker found on Clue. It only has a basic API allowing to play a tone with given frequency for a given period of time.</p>
<p>The Adafruit Clue has a built-in tiny speaker that can be controlled using CircuitPython&rsquo;s <code>audioio</code> module. Here&rsquo;s the basic API for playing tones:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">from</span> adafruit_clue <span style="color:#f92672">import</span> clue
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Play a tone at 440 Hz (A4 note) for 1 second</span>
</span></span><span style="display:flex;"><span>clue<span style="color:#f92672">.</span>play_tone(<span style="color:#ae81ff">440</span>, <span style="color:#ae81ff">1.0</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Play a higher note (C5 at 523 Hz) for 0.5 seconds</span>
</span></span><span style="display:flex;"><span>clue<span style="color:#f92672">.</span>play_tone(<span style="color:#ae81ff">523</span>, <span style="color:#ae81ff">0.5</span>)
</span></span></code></pre></div><p>The <code>play_tone</code> function takes two parameters:</p>
<ul>
<li><code>frequency</code>: The frequency of the tone in Hz (higher values create higher-pitched sounds)</li>
<li><code>duration</code>: How long to play the tone in seconds</li>
</ul>
<p>This simple API makes it easy to play individual tones, but it blocks program execution while the tone plays. For more complex sounds or background music, you would need to implement timing mechanisms around this basic functionality.</p>
<p>There is an excellent post explaining the connection between musical notes and frequencies and a CircuitPython code for playing Jingle Bells and Hanukkah tune: <a href="https://blog.wokwi.com/play-musical-notes-on-circuitpython/">https://blog.wokwi.com/play-musical-notes-on-circuitpython/</a></p>
<p>But what if we don&rsquo;t have the notes written down for us? I thought about finding the soundtrack I like in MIDI format on the Internet and dropping that on my Clue drive folder, so that Python script can read the file and convert to a sequence of notes. What I didn&rsquo;t know is the internals of the MIDI file. It is more complicated than just a sequence of notes, but we can still extract that information.</p>
<h2 id="extracting-notes">Extracting notes</h2>
<p><a href="https://github.com/mido/mido">https://github.com/mido/mido</a></p>
<p><a href="https://taletskiy.com/blogs/adafruit-clue-midi/">Read the full post on taletskiy.com</a></p>]]></content:encoded>
    </item>
    <item>
      <title>Create a desktop shortcut for JupyterLab on Windows</title>
      <link>https://medium.com/@kostal91/create-a-desktop-shortcut-for-jupyterlab-on-windows-9fcabcfa0d3f</link>
      <pubDate>Mon, 03 Feb 2020 00:00:00 &#43;0000</pubDate>
      <guid>https://medium.com/@kostal91/create-a-desktop-shortcut-for-jupyterlab-on-windows-9fcabcfa0d3f</guid>
      <description><![CDATA[<p>Simple guide for creating a convenient desktop shortcut to launch JupyterLab</p><p><a href="https://medium.com/@kostal91/create-a-desktop-shortcut-for-jupyterlab-on-windows-9fcabcfa0d3f">Read the full post on taletskiy.com</a></p>]]></description>
      <content:encoded><![CDATA[<p>Simple guide for creating a convenient desktop shortcut to launch JupyterLab</p><p><a href="https://medium.com/@kostal91/create-a-desktop-shortcut-for-jupyterlab-on-windows-9fcabcfa0d3f">Read the full post on taletskiy.com</a></p>]]></content:encoded>
    </item>
    <item>
      <title>Pool Limited Queue Processing in Python</title>
      <link>https://taletskiy.com/blogs/python-pool-limited-queue-processing/</link>
      <pubDate>Sun, 02 Feb 2020 00:00:00 &#43;0000</pubDate>
      <guid>https://taletskiy.com/blogs/python-pool-limited-queue-processing/</guid>
      <description><![CDATA[<p>A practical guide to using Python's multiprocessing library to implement a system where many parallel processes write to a queue, while a limited pool of workers processes those queue items. Includes solutions for common challenges and Windows-specific issues.</p><p><a href="https://taletskiy.com/blogs/python-pool-limited-queue-processing/">Read the full post on taletskiy.com</a></p>]]></description>
      <content:encoded><![CDATA[<p>A practical guide to using Python's multiprocessing library to implement a system where many parallel processes write to a queue, while a limited pool of workers processes those queue items. Includes solutions for common challenges and Windows-specific issues.</p><p>I was recently confronted with a problem: I needed to build a large number (order of 100) of Docker containers and then push them to the registry. Docker SDK for Python provided an excellent handle on that, and together with <code>multiprocessing</code> library allowed to parallelize the task very effectively. However, after some initial testing I discovered that pushing multiple images to registry got stalled likely due to an overload of simultaneous uploads. In my testing, I was only able to run 2-3 simultaneous <code>docker push</code> commands until all the new ones I add got stalled. At that point I decided to limit the simultaneous uploads to the small number of parallel threads, while still utilizing large number of threads to facilitate image builds. Combination of queue (<code>multiprocessing.Queue</code>) for passing down the work from builder threads to pusher threads and thread pool (<code>multiprocessing.Pool</code>) looked like a best candidate. Yet, there are small nuances and gaps in documentation which took me some time to understand (especially when using <code>multiprocessing</code> on Windows). Below, I provide a small tutorial on how to use these data structures and objects.</p>
<h2 id="problem-formulation">Problem formulation</h2>
<p><img src="https://taletskiy.com/img/multiprocessing.png" alt="Problem formulation"></p>
<p>In this toy problem we have a large array of parallel Processes writing results into the Queue. Alongside them, there is a single-threaded reader Process checking for new items in the Queue and assigning them to new Processes in the Pool, such that only a small fixed number of these Processes are running at the same time. Let&rsquo;s go through all the elements below.</p>
<h2 id="process"><code>Process</code></h2>
<p>For our large array of parallel threads on the left we are going to use <code>multithreading.Process()</code>. From official the <a href="https://docs.python.org/3.7/library/multiprocessing.html#multiprocessing.Process">reference</a>: &ldquo;<code>Process</code> objects represent activity that is run in a separate process&rdquo;. Starting a process(es) requires 2 things: the target function called and the <code>Process</code> call itself. Let&rsquo;s take a look:</p>
<pre tabindex="0"><code>from multiprocessing import Process

def proc(i):
    print(f&#39;I am Process {i}&#39;)

if __name__ ==  &#39;__main__&#39;:
    for i in range(10):
        Process(target=proc, args=(i,)).start()
</code></pre><p>In the example above we created 10 <code>Process</code>es and launched them all at the same time. Each process is running an instance of <code>proc()</code> function with arguments taken from <code>arg</code>. Because the order of execution is not guaranteed, if we run it we get something like:</p>
<pre tabindex="0"><code>I am Process 6
I am Process 2
I am Process 0
I am Process 3
I am Process 7
I am Process 4
I am Process 8
I am Process 1
I am Process 5
I am Process 9
</code></pre><p>Notice also the interesting syntax of the <code>args=(i,)</code>. <code>Process</code> requires that <code>args</code> is iterable, so changing it to <code>args=(i)</code> or <code>args=i</code> will lead to a <code>TypeError</code>.</p>
<h2 id="queue"><code>Queue</code></h2>
<p>Now, it is time to introduce a <code>multithreading.Queue()</code>. According to <a href="https://docs.python.org/3.7/library/multiprocessing.html#multiprocessing.Queue">reference</a> it &ldquo;returns a process shared queue implemented using a pipe and a few locks/semaphores&rdquo;. Queue will allow us to put objects into it and then process them elswhere asynchronously. Importantly, queues are thread and process safe. Let&rsquo;s modify our previous example to add the <code>Queue</code> object and pass it to our parallel <code>Process</code>es:</p>
<pre tabindex="0"><code>from multiprocessing import Process, Queue

def writer(i,q):
    message = f&#39;I am Process {i}&#39;
    q.put(message)

if __name__ ==  &#39;__main__&#39;:    
    # Create multiprocessing queue
    q = Queue()
    # Create a group of parallel writers and start them
    for i in range(10):
        Process(target=writer, args=(i,q,)).start()

    # Read the queue sequentially
    for i in range(10):
        message = q.get()
        print(message)
</code></pre><p>Keep in mind that <code>Queue.get()</code> is a blocking method, so we are not going to miss any messages in that queue.</p>
<p>The next step in solving our problem is to switch to a parallel reads from the queue. We could just spawn the reader processes in the same way we spawned writers, but that will permit up 10 threads run in parallel. What should we do if we are limited by the smaller number of readers like in the original problem description?</p>
<h2 id="pool"><code>Pool</code></h2>
<p>Enter <a href="https://docs.python.org/3.7/library/multiprocessing.html#multiprocessing.pool.Pool"><code>multithreading.Pool()</code></a>: &ldquo;A process pool object which controls a pool of worker processes to which jobs can be submitted. It supports asynchronous results with timeouts and callbacks and has a parallel map implementation&rdquo;. Using <code>Pool</code> we can assign as many parallel processes as we like, but only the <code>processes</code> number of threads will be active at any given moment.</p>
<p>Let&rsquo;s see how it will behave if we through all the readers to the <code>Pool</code>:</p>
<pre tabindex="0"><code>from multiprocessing import Process, Queue, Pool

def writer(i,q):
    message = f&#39;I am Process {i}&#39;
    q.put(message)

def reader(i,q):
    message = q.get()
    print(message)

if __name__ ==  &#39;__main__&#39;:    
    # Create multiprocessing queue
    q = Queue()

    # Create a group of parallel writers and start them
    for i in range(10):
        Process(target=writer, args=(i,q,)).start()

    # Create multiprocessing pool
    p = Pool(10)

    # Create a group of parallel readers and start them
    # Number of readers is matching the number of writers
    # However, the number of simultaneously running
    #   readers is constrained to the pool size
    for i in range(10):
        p.apply_async(reader, (i,q,))
</code></pre><p>However, if we run the code above, we will get no output. What happened? When we called <code>apply_async</code>, the code execution immediately moved on and, since nothing else has left in the main function, exited. Thankfully, <code>multiprocessing</code> reference provides a way to wait for the execution results:</p>
<pre tabindex="0"><code>from multiprocessing import Process, Queue, Pool

def writer(i,q):
    message = f&#39;I am Process {i}&#39;
    q.put(message)

def reader(i,q):
    message = q.get()
    print(message)

if __name__ ==  &#39;__main__&#39;:    
    # Create multiprocessing queue
    q = Queue()

    # Create a group of parallel writers and start them
    for i in range(10):
        Process(target=writer, args=(i,q,)).start()

    # Create multiprocessing pool
    p = Pool(10)

    # Create a group of parallel readers and start them
    # Number of readers is matching the number of writers
    # However, the number of simultaneously running
    #   readers is constrained to the pool size    
    readers = []
    for i in range(10):
        readers.append(p.apply_async(reader, (i,q,)))
    
    # Wait for the asynchrounous reader threads to finish
    [r.get() for r in readers]
</code></pre><p>This time, if we run the code we will get the following error: <code>RuntimeError: Queue objects should only be shared between processes through inheritance</code>. The <code>multiprocessing.Manager</code> will enable us to manage the queue and to also make it accessible to different workers:</p>
<pre tabindex="0"><code>from multiprocessing import Process, Queue, Pool, Manager

def writer(i,q):
    message = f&#39;I am Process {i}&#39;
    q.put(message)

def reader(i,q):
    message = q.get()
    print(message)

if __name__ ==  &#39;__main__&#39;:    
    # Create manager
    m = Manager()
    
    # Create multiprocessing queue
    q = m.Queue()

    # Create a group of parallel writers and start them
    for i in range(10):
        Process(target=writer, args=(i,q,)).start()

    # Create multiprocessing pool
    p = Pool(10)

    # Create a group of parallel readers and start them
    # Number of readers is matching the number of writers
    # However, the number of simultaneously running
    #   readers is constrained to the pool size    
    readers = []
    for i in range(10):
        readers.append(p.apply_async(reader, (i,q,)))
    
    # Wait for the asynchrounous reader threads to finish
    [r.get() for r in readers]
</code></pre><p>Finally, we get the results we expect:</p>
<pre tabindex="0"><code>&gt; python pl.py
I am Process 1
I am Process 4
I am Process 9
I am Process 8
I am Process 0
I am Process 5
I am Process 7
I am Process 2
I am Process 6
I am Process 3
</code></pre><h2 id="windows-related-quirks">Windows-related quirks</h2>
<p>I initially started working on this problem on a Linux-based machine, but later continued on Windows. Unfortunately many of the things did not work immediately. Here are the things you need to know:</p>
<ol>
<li>Interrupting the program execution (Ctrl+C) will not work right away with the code above. The <a href="https://stackoverflow.com/a/6191991">workaround</a> would be to add initializer workers:</li>
</ol>
<pre tabindex="0"><code>def init_worker():
    &#34;&#34;&#34;
    Pool worker initialization, required for keyboard interrupt on Windows
    &#34;&#34;&#34;
    signal.signal(signal.SIGINT, signal.SIG_IGN)

p = Pool(num_readers, init_worker)
</code></pre><ol start="2">
<li>I was not able to run the code in Jupyter notebook on Windows, unless I move worker functions into separate <code>.py</code> file and import them to my notebook. Related to that, you won&rsquo;t be able to run the scripts above without wraping the main code into <code>if __name__ ==  '__main__':</code></li>
</ol>
<h2 id="final-result">Final Result</h2>
<p>As a finishing touches, let&rsquo;s add the following:</p>
<ul>
<li>delays to imitate CPU-bound work on reader and writer</li>
<li>Exception handling when waiting for reader threads to finish</li>
<li>Configurable number of writer and reader threads</li>
<li>Some function documentation</li>
</ul>
<p>Here is the final result:</p>
<pre tabindex="0"><code>from multiprocessing import Pool, Queue, Process, Manager
import random
import signal
import time

num_writers = 10
num_readers = 3

def writer(i,q):
    # Imitate CPU-bound work happening in writer
    delay = random.randint(1,10)
    time.sleep(delay)

    # Put the result into the queue
    t = time.time()
    print(f&#39;I am writer {i}: {t}&#39;)
    q.put(t)

def init_worker():
    &#34;&#34;&#34;
    Pool worker initialization, required for keyboard interrupt on Windows
    &#34;&#34;&#34;
    signal.signal(signal.SIGINT, signal.SIG_IGN)

def reader(i, q):
    &#34;&#34;&#34;
    Queue reader worker
    &#34;&#34;&#34;

    # Read the top message from the queue
    message = q.get()

    # # Imitate CPU-bound work happening in reader
    time.sleep(3)
    print(f&#39;I am reader {i}: {message}&#39;)

if __name__ ==  &#39;__main__&#39;:
    # Create manager
    m = Manager()
    
    # Create multiprocessing queue
    q = m.Queue()

    # Create a group of parallel writers and start them
    for i in range(num_writers):
        Process(target=writer, args=(i,q,)).start()

    # Create multiprocessing pool
    p = Pool(num_readers, init_worker)

    # Create a group of parallel readers and start them
    # Number of readers is matching the number of writers
    # However, the number of simultaneously running
    #   readers is constrained to the pool size
    readers = []
    for i in range(10):
        readers.append(p.apply_async(reader, (i,q,)))
    
    # Wait for the asynchrounous reader threads to finish
    try:
        [r.get() for r in readers]
    except:
        print(&#39;Interrupted&#39;)
        p.terminate()
        p.join()
</code></pre><p>If you run it, you will get something like this:</p>
<pre tabindex="0"><code>&gt; python final.py
I am writer 8: 1580659076.783544
I am writer 3: 1580659076.783544
I am reader 0: 1580659076.783544
I am reader 1: 1580659076.783544
I am writer 7: 1580659079.7990372
I am writer 2: 1580659080.7971141
I am writer 1: 1580659081.785277
I am writer 4: 1580659082.7955923
I am reader 2: 1580659079.7990372
I am reader 3: 1580659080.7971141
I am writer 6: 1580659083.800029
I am writer 0: 1580659084.7862694
I am reader 4: 1580659081.785277
I am writer 9: 1580659085.7819643
I am writer 5: 1580659085.7919443
I am reader 5: 1580659082.7955923
I am reader 6: 1580659083.800029
I am reader 7: 1580659084.7862694
I am reader 8: 1580659085.7819643
I am reader 9: 1580659085.7919443
</code></pre><p><a href="https://taletskiy.com/blogs/python-pool-limited-queue-processing/">Read the full post on taletskiy.com</a></p>]]></content:encoded>
    </item>
    <item>
      <title>On Pytorch Tensors and Autograd</title>
      <link>https://taletskiy.com/blogs/pytorch-autograd/</link>
      <pubDate>Sun, 26 Jan 2020 00:00:00 &#43;0000</pubDate>
      <guid>https://taletskiy.com/blogs/pytorch-autograd/</guid>
      <description><![CDATA[<p>A brief explanation of PyTorch's Autograd system, computational graphs, and how gradient calculation works in deep learning frameworks.</p><p><a href="https://taletskiy.com/blogs/pytorch-autograd/">Read the full post on taletskiy.com</a></p>]]></description>
      <content:encoded><![CDATA[<p>A brief explanation of PyTorch's Autograd system, computational graphs, and how gradient calculation works in deep learning frameworks.</p><h1 id="on-pytorch-tensors-and-autograd">On Pytorch Tensors and Autograd</h1>
<p>Somehow, Pytorch blitz tutorial on Autograd completely confused me. I could not understand what does <code>.backward()</code>, <code>.grad</code> and <code>grad_fn</code> do.</p>
<p>Fortunately, I found an excellent explanation of Autograd and Computational Graph is here: <a href="https://blog.paperspace.com/pytorch-101-understanding-graphs-and-automatic-differentiation/">https://blog.paperspace.com/pytorch-101-understanding-graphs-and-automatic-differentiation/</a>. Just for my notes and anyone interested, I am going to leave my short recap here:</p>
<ul>
<li>Computational Graph - records the order of operations on tensors in the graph. Edges of the graph represent the local gradients. Leafs of the graph are independent variables (inputs and weights/biases in case of NN)</li>
<li><code>tensor.backward()</code> computes the gradients all the way back through the computational graph and accumulate results in leafs. Can only be called on the 0-rank tensor.</li>
<li><code>tensor.grad</code> holds the accumulated gradient from the call to <code>.backward()</code> with respect to the given tensor</li>
</ul>
<p><a href="https://taletskiy.com/blogs/pytorch-autograd/">Read the full post on taletskiy.com</a></p>]]></content:encoded>
    </item>
    <item>
      <title>US Radio Spectrum Interactive Visualization with Python and BokehJS</title>
      <link>https://taletskiy.com/blogs/bokeh-radio-spectrum/</link>
      <pubDate>Wed, 07 Aug 2019 00:00:00 &#43;0000</pubDate>
      <guid>https://taletskiy.com/blogs/bokeh-radio-spectrum/</guid>
      <description><![CDATA[<p>Creating an interactive visualization of the US radio frequency spectrum using Python and BokehJS to explore frequency allocations across different bands, replacing static FCC infographics with a dynamic, color-coded interface.</p><p><a href="https://taletskiy.com/blogs/bokeh-radio-spectrum/">Read the full post on taletskiy.com</a></p>]]></description>
      <content:encoded><![CDATA[<p>Creating an interactive visualization of the US radio frequency spectrum using Python and BokehJS to explore frequency allocations across different bands, replacing static FCC infographics with a dynamic, color-coded interface.</p><p>So, I recently came across the Bokeh visualization library which uses Python to generated plots, but they are rendered in JS (like the most of the great <em>interactive</em> visualization tools recently). I also noticed my fellow Twitter user took on the challenge to learn Bokeh over the next 30 days and post their results, so I decided to try as well.</p>
<!--  -->
<p>I was hearing a lot about 5G coming to US soon and got curious about what frequencies will be used, and which are available. I did not know much about it so I searched and found this PDF infographics on FCCs website: <a href="https://www.ntia.doc.gov/files/ntia/publications/2003-allochrt.pdf">https://www.ntia.doc.gov/files/ntia/publications/2003-allochrt.pdf</a>. There were couple of problems with it: it was too small and non-interactive, so I decided to make a better one myself.</p>
<!--  -->
<p>The preliminary result is here:</p>

            
<div class="bk-root" id="556cd84c-dfdc-47be-bb89-15e1006b3bea" data-root-id="2137"></div>      

        <script type="text/javascript">
          (function() {
            var fn = function() {
              Bokeh.safely(function() {
                (function(root) {
                  function embed_document(root) {
<pre><code>              var docs_json = document.getElementById('2422').textContent;
              var render_items = [{&quot;docid&quot;:&quot;54bbd253-fa4f-4743-802a-c4b1e795fb49&quot;,&quot;roots&quot;:{&quot;2137&quot;:&quot;556cd84c-dfdc-47be-bb89-15e1006b3bea&quot;}}];
              root.Bokeh.embed.embed_items(docs_json, render_items);
            
              }
              if (root.Bokeh !== undefined) {
                embed_document(root);
              } else {
                var attempts = 0;
                var timer = setInterval(function(root) {
                  if (root.Bokeh !== undefined) {
                    embed_document(root);
                    clearInterval(timer);
                  }
                  attempts++;
                  if (attempts &gt; 100) {
                    console.log(&quot;Bokeh: ERROR: Unable to run BokehJS code because BokehJS library is missing&quot;);
                    clearInterval(timer);
                  }
                }, 10, root)
              }
            })(window);
          });
        };
        if (document.readyState != &quot;loading&quot;) fn();
        else document.addEventListener(&quot;DOMContentLoaded&quot;, fn);
      })();
    &lt;/script&gt;
</code></pre>
<p>You can scroll with your mouse wheel or swipe left and right with touch/pointer. If hover over the band, it shows the purpose of the band. Similar purposes were automatically colored into the same colors.</p><p><a href="https://taletskiy.com/blogs/bokeh-radio-spectrum/">Read the full post on taletskiy.com</a></p>]]></content:encoded>
    </item>
    <item>
      <title>Displaying real-time webcam stream in IPython at (relatively) high framerate</title>
      <link>https://medium.com/@kostal91/displaying-real-time-webcam-stream-in-ipython-at-relatively-high-framerate-8e67428ac522</link>
      <pubDate>Sun, 15 Apr 2018 00:00:00 &#43;0000</pubDate>
      <guid>https://medium.com/@kostal91/displaying-real-time-webcam-stream-in-ipython-at-relatively-high-framerate-8e67428ac522</guid>
      <description><![CDATA[<p>How to efficiently display webcam video feeds in Jupyter notebooks</p><p><a href="https://medium.com/@kostal91/displaying-real-time-webcam-stream-in-ipython-at-relatively-high-framerate-8e67428ac522">Read the full post on taletskiy.com</a></p>]]></description>
      <content:encoded><![CDATA[<p>How to efficiently display webcam video feeds in Jupyter notebooks</p><p><a href="https://medium.com/@kostal91/displaying-real-time-webcam-stream-in-ipython-at-relatively-high-framerate-8e67428ac522">Read the full post on taletskiy.com</a></p>]]></content:encoded>
    </item>
  </channel>
</rss>
