Bringing AlphaGo to the Edge: Porting AutoGo to Apple Silicon with MLX
May 22, 2026If you’ve been following the artificial intelligence space lately, you know the absolute oxygen in the room belongs to Large Language Models (LLMs) and their brute-force scaling laws. But a few weeks ago, Eric Jang (former VP of AI at 1X Technologies and Senior Research Scientist at Google DeepMind) dropped a refreshing, first-principles reminder of what elegant AI engineering actually looks like.
On the Dwarkesh Podcast, Eric shared how he spent his sabbatical replicating an AlphaGo-like architecture from scratch for under $10,000 in compute costs—a project he open-sourced as autogo.
I was so captivated by the architectural efficiency of his codebase that I wanted to see how far we could push it in a local-first development ecosystem.
Today, I’m excited to announce autogo-mlx, a complete, native macOS port of Eric’s framework, rebuilt from the ground up for Apple Silicon using Apple’s MLX framework.
👉 Check out the repository here: github.com/nilbot/autogo-mlx
The Core Paradigm: Why AutoGo Matters Now
In modern LLM reinforcement learning (like Policy Gradient methods), models often struggle in an environment of extremely sparse rewards—essentially stumbling through a mathematical “blind trial-and-error” phase until they hit a token combination that works.
AlphaGo solved this beautifully a decade ago by decoupling search from raw intuition. It combines a compact, highly efficient 10-layer neural network with Monte Carlo Tree Search (MCTS).
During self-play, MCTS acts as a look-ahead mechanism. Even if the ultimate win/loss reward at the end of a match is sparse, the tree search rolls out future possibilities and aggregates them. This mathematically smooths out the final outcome into dense, highly informative, per-state supervision signals for the neural network.
[Sparse Win/Loss Reward] ──> [MCTS Look-Ahead & Rollouts] ──> [High-Density Per-State Supervision] ──> [10-Layer Policy/Value Net]
It is a masterpiece of sample efficiency, proving that you don’t always need millions of dollars of cloud compute to train incredibly smart, autonomous agents if your algorithmic architecture is tight.
The Engineering Gap: Moving from Linux Clusters to the Mac
While Eric’s original autogo repository is brilliant, it reflects a classic enterprise infrastructure bias: it is heavily coupled to Linux container clusters and distributed NVIDIA GPUs orchestrated via cloud setups. For independent researchers, hobbyists, or developers sitting at a desk with a MacBook, spinning up cloud nodes just to experiment with RL architecture can become an expensive barrier to entry.
This is where MLX comes in. Apple’s native machine learning framework handles array operations directly on the SoC, bypassing standard abstractions to give Python developers direct, blistering access to the hardware.
With autogo-mlx, I set out to achieve three goals:
- Ditch the CUDA Tax: Port every tensor operation, weight initialization, and layer configuration out of its original ecosystem and map it natively onto MLX primitives.
- Expose Unified Memory Bandwidth: Deep learning loops involving heavy MCTS look-aheads require intense data thrashing between the CPU (managing the search tree logic) and the GPU (evaluating the neural network policies). On standard PC hardware, the PCIe bus bottleneck can kill performance. Because Apple Silicon uses a Unified Memory Architecture (UMA), the CPU and GPU share the same memory space, allowing the MCTS loop to run with near-zero copy overhead.
- Keep the Training Local and Free: The entire pipeline—from zero-knowledge initialization through self-play generation and policy iteration—now runs completely self-contained on your Mac Studio or MacBook. Zero cloud bills.
Diving Deep into the /experiments Folder
If you clone the repository, the core implementation is ready for you to poke at, but the real engineering insights lie inside the experiments/ directory.
I have meticulously ported over the experiment configuration, benchmarking tools, and telemetry structures from the original work. When you run local sessions, these tools allow you to track:
- Moves-per-Second Throughput: See exactly how many MCTS simulations your specific Apple Silicon chip (M1/M2/M3 Base, Pro, Max, or Ultra) can crunch per second.
- Convergence Behavior: Watch the cross-entropy and value loss metrics steadily drop as the local network learns from its own MCTS-guided self-play sessions.
- LLM-in-the-Loop Orchestration: The framework retains its capacity to interface with LLMs (like Claude) to act as an automated research assistant, evaluating the data generated in your local
/experimentsfolder, adjusting hyper-parameters, and writing updates back to your local workspace.
Standing on the Shoulders of Giants
This project is entirely a port and adaptation, and all credit for the breakthrough implementation, research methodology, and algorithmic design goes directly to Eric Jang. His commitment to open-sourcing first-principles AI exploration is what makes projects like this possible.
If you are a systems engineer looking to see how MLX handles high-frequency CPU/GPU data sharing, an RL researcher wanting a lightweight sandbox, or an AI enthusiast who wants to watch their Mac generate a world-class board-game engine from scratch over a weekend, give autogo-mlx a spin.
If you find it useful, please consider dropping a ⭐ Star on the repository, opening an issue if you catch a bug, or submitting a pull request with performance optimizations!
Until next time, happy hacking.
— Nilbot