Free Tool
Can My Mac Run It?
The definitive LLM memory calculator for Macs. Estimate the exact memory footprint, token generation speed, and compatibility for your specific Apple Silicon configuration.
1. Your Hardware
2. Choose a Model
e.g. 8 for an 8B model.
Lower precision = less memory, slight quality loss.
Prompt + max generated tokens combined.
Parallel sequences (1 for chat).
Advanced Architecture
Recommended for Your Mac
Memory Breakdown
Total RAM Required
0.0 GB
Throughput (Est.)
~0 t/s
Time to First Token
<0.5 s
Power Draw
0 W
Est. Cost / Hr
$0.00
CO₂ / Hr
0 gCO₂e
Quick Start
Copy and paste into your terminal to get started:
Frequently Asked Questions
How is token throughput calculated?
On Apple Silicon, inference speed is bottlenecked by memory bandwidth. We divide your chip's bandwidth by the loaded model size, then apply a framework-specific efficiency factor (MLX ≈ 85%, Ollama ≈ 65%, vLLM ≈ 70%, SGLang ≈ 88%).
Why does macOS version matter?
macOS reserves memory for the system. Sequoia and Tahoe reserve more than Sonoma due to on-device AI features. This calculator accounts for the difference so your estimate is realistic.
Why do frameworks use different amounts of RAM?
vLLM pre-allocates large KV cache blocks for high concurrency. Ollama carries Go runtime overhead (~600 MB). MLX and SGLang are leaner C++/Python backends (~200 MB base).
What does quantization do?
Quantization reduces the precision of model weights (e.g. FP16 → INT4), shrinking memory usage by 2–4×. The trade-off is a small quality loss, usually acceptable for chat and code tasks.
How are power and cost estimates calculated?
Power is derived from your chip's TDP under load. Cost uses the global average electricity rate ($0.15/kWh). Emissions use the global average grid carbon intensity (385 g CO₂e/kWh).
Can I run models larger than my RAM?
Technically yes — macOS will use swap memory on your SSD. But throughput drops to roughly 10% of normal, making real-time chat unusable. We flag this as "Tight Fit" or "Cannot Run" depending on the overshoot.