Free Tool

Can My Mac Run It?

The definitive LLM memory calculator for Macs. Estimate the exact memory footprint, token generation speed, and compatibility for your specific Apple Silicon configuration.

1. Your Hardware

Mac Processor

Unified Memory (RAM)

macOS Version

Inference Framework

2. Choose a Model

Search Models

Parameters (Billions)

e.g. 8 for an 8B model.

Quantization / Precision

Lower precision = less memory, slight quality loss.

Context Window (Tokens)

Prompt + max generated tokens combined.

Batch Size

Parallel sequences (1 for chat).

Advanced Architecture

Layers

Hidden Size

Attention Heads

KV Heads (GQA/MQA)

Recommended for Your Mac

Memory Breakdown

Total RAM Required

0.0 GB

: 16 GB

Model Weights 0.0 GB

KV Cache 0.0 GB

Framework Overhead & Acts. 1.0 GB

macOS Reservation 2.0 GB

Excellent — runs comfortably with headroom to spare. Great for chat and heavy context.

Throughput (Est.)

~0 t/s

Time to First Token

<0.5 s

Power Draw

0 W

Est. Cost / Hr

$0.00

CO₂ / Hr

0 gCO₂e

Quick Start

Copy and paste into your terminal to get started:

$ mlx_lm.generate --model meta-llama/Meta-Llama-3-8B

Frequently Asked Questions

How is token throughput calculated?

On Apple Silicon, inference speed is bottlenecked by memory bandwidth. We divide your chip's bandwidth by the loaded model size, then apply a framework-specific efficiency factor (MLX ≈ 85%, Ollama ≈ 65%, vLLM ≈ 70%, SGLang ≈ 88%).

Why does macOS version matter?

macOS reserves memory for the system. Sequoia and Tahoe reserve more than Sonoma due to on-device AI features. This calculator accounts for the difference so your estimate is realistic.

Why do frameworks use different amounts of RAM?

vLLM pre-allocates large KV cache blocks for high concurrency. Ollama carries Go runtime overhead (~600 MB). MLX and SGLang are leaner C++/Python backends (~200 MB base).

What does quantization do?

Quantization reduces the precision of model weights (e.g. FP16 → INT4), shrinking memory usage by 2–4×. The trade-off is a small quality loss, usually acceptable for chat and code tasks.

How are power and cost estimates calculated?

Power is derived from your chip's TDP under load. Cost uses the global average electricity rate ($0.15/kWh). Emissions use the global average grid carbon intensity (385 g CO₂e/kWh).

Can I run models larger than my RAM?

Technically yes — macOS will use swap memory on your SSD. But throughput drops to roughly 10% of normal, making real-time chat unusable. We flag this as "Tight Fit" or "Cannot Run" depending on the overshoot.