_ local inference, simplified

You bought the GPU. Now use it.

infai auto-detects your GGUF models, wraps llama.cpp in a terminal UI, and lets you launch inference servers with one keypress.
No flags to memorize. No YAML. No scripts.

infai
~/models
 
qwen2.5-7b-instruct-q4_k_m.gguf 7.2 GB
  deepseek-r1-8b-q5_k_s.gguf 5.8 GB
  llama-3.3-70b-q2_k.gguf 26.4 GB
  mistral-7b-v0.3-q6_k.gguf 5.5 GB
 
enter: select · /: filter · a: all · f: folders · q: quit

The problem is obvious

Every local LLM session starts the same way: scrolling through shell history, fixing a typo, forgetting a flag.

Before
$ llama-server \ -m ~/models/qwen2.5-7b-q4_k_m.gguf \ --ctx-size 65536 \ --n-gpu-layers 99 \ --batch-size 2048 \ --ubatch-size 512 \ --flash-attn \ --host 0.0.0.0 \ --port 8080

Every. Single. Time.

vs
After
$ infai # select model, press enter. that's it.

Profiles remember everything. You just pick and run.

01

Configure once,
launch forever

Name your profile, set context size, GPU layers, batch parameters, flash attention, quantization type. Save it. Never think about it again.

infai profile configuration
02

Watch inference
happen live

Built-in scrollable viewport streams server output in real-time. Stop, restart, or switch models without leaving the TUI.

infai live server logs

What you get

//

Auto-detection

Point at your model directories. infai scans for GGUF files and indexes them. New model? Just rescan.

//

Reusable profiles

Multiple configs per model. Compare Q4_K_M vs Q5_K_S, or 4K vs 64K context, in seconds flat.

//

Live inference logs

Real-time scrollable viewport. No more tmux splits or tail -f in another window.

//

Terminal themes

Tokyonight, Everforest, One Dark, Rose Pine, Gruvbox. Match your terminal's vibe.

//

SQLite config

One database. No scattered dotfiles, no YAML, no env vars. Everything in one place.

//

One-key launch

Select model. Press enter. Server starts. That's the whole workflow.

45M+ GGUF downloads on HuggingFace, 2025
70% of local LLM users run on personal hardware
$2-15K typical hardware spend on local inference

You invested in the hardware. Stop wasting time on the flags.

Get it

Homebrew recommended
brew install dipankardas011/tap/infai copy

macOS & Linux. Pre-compiled. Zero dependencies.

Binary

Grab from GitHub Releases — linux & macOS, amd64 & arm64.

From source
go install github.com/dipankardas011/infai@latest copy

Requires Go 1.23+ and a C compiler (SQLite).

What's next

llama.cpp today. Your single control plane for all local inference tomorrow.

shipped

llama.cpp

GGUF auto-detect, launch profiles, live logs, 5 themes