Arc Notes Weekly #105: Enforce

This week, why LLM-generated code can pass tests yet fail on performance, how a GitHub issue title triggered a supply-chain breach on 4,000 machines, and why verification hasn't caught up with AI code

Mar 08, 2026

Code is getting cheaper. Trust isn’t. This week we look at what happens when AI-generated software ships without verification – and one supply-chain attack that proves enforcement can’t be an afterthought.

This week, Quadtrees for spatial search, fine-tuning Qwen3.5 with Unsloth, building a sub-500ms voice agent for $100, and why good software knows when to stop evolving.

Enjoy this week's round-up!

— Mahdi Yusuf (@myusuf3) or LinkedIn

👋🏾 You are reading Architecture Notes - Your Sunday newsletter, which curates best system design and architecture news from around the web. We would appreciate you sharing it with like-minded people.

Articles

How Quadtrees Speed Up Spatial Search, From Maps to Collision Detection

This interactive primer explains how quadtrees recursively split 2D space into four quadrants, showing why a point lookup can take about log4(n) steps instead of scanning all points; for a million points, that's roughly 10 checks. It also walks through range queries, nearest-neighbor search, collision detection, and image compression, making clear when quadtrees shine and where they degrade, with enough concrete mechanics to sharpen your intuition.

AI Is Writing More Software, but Verification Is Not Keeping Up

Leonardo de Moura argues that while Google and Microsoft say 25–30% of new code is AI-generated and Anthropic built a 100,000-line C compiler in two weeks for under $20,000, formal verification has not scaled with that output. He makes the case for proof-based development on platforms like Lean, pointing to an AI-assisted zlib proof and enterprise use at AWS and Microsoft as signs that verified software may become the practical bottleneck and advantage.

How Docker Turned Linux Containers Into A Cross-Platform Developer Standard

How To Fine-Tune Qwen3.5 Models With Unsloth

Unsloth now supports fine-tuning the full Qwen3.5 family for both text and vision, from 0.8B to MoE models like 35B-A3B, with claimed 1.5× faster training and 50% lower VRAM than FA2 setups; bf16 LoRA for 35B-A3B runs on 74GB VRAM, while full fine-tuning uses about 4× more. It matters because the guide includes practical constraints and deployment paths—such as using Transformers v5, avoiding QLoRA on Qwen3.5, and exporting to GGUF or vLLM—plus enough notebook and VRAM detail to judge whether your hardware can handle it.

Why LLM-Written Code Can Look Right While Failing on Performance

A benchmark comparing SQLite with a ground-up, LLM-generated Rust rewrite found a simple 100-row primary-key lookup took 0.09 ms in SQLite versus 1,815.43 ms in the rewrite, largely because the planner missed SQLiteâ€™s INTEGER PRIMARY KEY fast path and fell back to full table scans. The article argues this is a broader pattern: LLMs often produce code that compiles, passes tests, and mirrors the requested architecture, yet misses the real invariants that make systems work efficiently, a gap with obvious consequences for anyone relying on generated code without deep verification.

Good Software Knows When To Stop Evolving

Using a satirical mockup in which the Unix ls command is replaced by an “AI-Powered Directory Intelligence” tool with a 30-day deprecation notice, Olivier Girardot argues that software should remain focused on its core job instead of endlessly expanding. Drawing on ideas from 37signals—such as saying no by default and treating constraints as advantages—he makes the case that restraint preserves usefulness and standards in an era of AI rebrands, such as MinIO becoming AIStor.

How One Developer Built a Sub-500ms Voice Agent From Scratch

Nick Tikhonov describes building a voice agent in about a day using roughly $100 in API credits, combining Twilio, Deepgram Flux, ElevenLabs, and later Groq’s llama-3.3-70b to reduce end-to-end latency from around 1.7 seconds locally to roughly 400 ms. The piece is most useful as a practical guide to what actually drives voice responsiveness—turn detection, streaming pipelines, warm TTS sockets, and regional deployment—and it shows why custom orchestration can sometimes outperform all-in-one platforms.

Projects

Google Workspace CLI

Google’s open-source Google Workspace CLI aims to unify access to Google services from the terminal. The GitHub repository already shows 16.1k stars, 608 forks, 22 tags, and 145 commits as of March 6, 2026. It matters for developers and administrators who want a single interface for Workspace tasks, especially as recent breaking changes—such as removing MCP server mode and multi-account support—highlight how quickly the tool is evolving.

llmfit

LLMFit is an open-source project by AlexsJones that analyzes local hardware and recommends compatible models, with CLI, TUI, desktop, API, Docker, and Nix packaging options now in the repo. For anyone choosing models under real memory and GPU constraints, it promises a faster way to narrow the field before downloading, and the breadth of interfaces suggests it is evolving beyond a simple compatibility checker.

How A GitHub Issue Title Led To Cline Installing OpenClaw On 4,000 Machines

This post traces the five-step “Clinejection” chain in which a malicious GitHub issue title exploited Cline’s AI triage workflow, poisoned GitHub Actions caches, stole release tokens, and led to the publication of cline@2.3.0 with a one-line postinstall hook that globally installed OpenClaw for about eight hours. It matters because the breach began with untrusted natural-language input and ended in a supply-chain compromise, showing how AI agents in CI/CD pipelines can quietly turn routine automation into a much larger trust problem.

Arc Notes Weekly #105: Enforce

This week, why LLM-generated code can pass tests yet fail on performance, how a GitHub issue title triggered a supply-chain breach on 4,000 machines, and why verification hasn't caught up with AI code

Articles

Projects

Ready for more?