Microsoft Open-Sources BitNet: A Native 1-Bit LLM That Runs on Your CPU
Microsoft Research has released BitNet b1.58 2B4T, the first open-source, natively trained 1-bit large language model, alongside bitnet.cpp โ an inference framework that makes it runnable on standard CPUs.
What Makes It Different
Every weight in a BitNet model is constrained to just three values: {-1, 0, +1}. Rather than quantizing an existing full-precision model after training, BitNet is trained from scratch with this ternary constraint. The result: no floating-point multiplications during inference โ only integer additions and subtractions your CPU was already designed for.
The flagship release, BitNet b1.58 2B4T, is a 2-billion parameter model trained on 4 trillion tokens. It fits in roughly 0.4GB of RAM and consumes approximately 0.028 joules per inference. Benchmarks show it performs on par with leading open-weight, full-precision models of comparable size.
Performance Numbers
The bitnet.cpp framework delivers significant gains over standard llama.cpp inference:
- x86 CPUs: 2.37x to 6.17x speedup, 71.9%โ82.2% energy reduction
- ARM CPUs (MacBook, etc.): 1.37x to 5.07x speedup, 55.4%โ70.0% energy reduction
- 100B parameter models can also run on a single CPU at 5โ7 tokens per second โ near human reading speed
The model weights are published on Hugging Face under an MIT license and support both CPU and GPU backends. NPU support is listed as coming soon.
Why It Matters
BitNet shifts what's possible for local AI deployment. Applications that previously required cloud APIs or dedicated GPU hardware can now run entirely offline โ on laptops, edge devices, phones, or in regions with limited connectivity.
The GitHub repo has crossed 32,000 stars, reflecting strong community interest in efficient local inference as a practical alternative to always-online, GPU-dependent pipelines.