AI Dev Radar

AutoAWQ

frameworkneeds_review

AutoAWQ is a Python framework for quantizing and running large language models using Activation-aware Weight Quantization, typically reducing models to efficient 4-bit weights for local inference.

Visit →

Unsloth

frameworkneeds_review

Unsloth is an open-source framework for faster, memory-efficient fine-tuning of local large language models, especially using LoRA and QLoRA.

Visit →

Open WebUI

toolneeds_review

Open WebUI is a self-hosted, extensible web interface for running and managing local or remote LLMs through Ollama and OpenAI-compatible APIs.

Visit →

KoboldCpp

toolneeds_review

KoboldCpp is a portable local LLM runtime and server for running GGUF/GGML-style language models with a KoboldAI-compatible web UI and API.

Visit →

text-generation-webui

toolneeds_review

text-generation-webui is a popular Gradio-based web interface for running, chatting with, and experimenting with local large language models.

Visit →

GPT4All

toolneeds_review

GPT4All is an open-source ecosystem for running large language models locally on consumer hardware, with desktop apps, Python bindings, and model management tools.

Visit →

MLX LM

frameworkneeds_review

MLX LM is a Python package from Apple’s MLX ecosystem for running, fine-tuning, and serving large language models efficiently on Apple Silicon.

Visit →

ExLlamaV2

runtimeneeds_review

ExLlamaV2 is a high-performance Python/CUDA runtime for running quantized Llama-family large language models locally on NVIDIA GPUs.

Visit →

vLLM

runtimeneeds_review

vLLM is a high-throughput, memory-efficient inference and serving runtime for large language models, designed around techniques such as PagedAttention and continuous batching.

Visit →

llama.cpp

runtimeneeds_review

llama.cpp is a high-performance, portable C/C++ runtime for running large language models locally on CPUs, GPUs, and edge devices, commonly using GGUF-quantized model files.

Visit →

LM Studio

trending

toolconfirmedproduction

A desktop application for browsing, downloading, and running local LLMs with a built-in chat interface and OpenAI-compatible API server for integration with other tools.

Visit →

llama.cpp

popular

libraryconfirmedproduction

A high-performance C++ inference engine for large language models with advanced quantization techniques. It enables running models on consumer hardware and powers tools like Ollama.

Visit →v1

Ollama

popular

serviceconfirmedproduction

The leading open-source tool for running large language models locally. It provides a simple CLI interface for downloading, managing, and serving models like Llama, Mistral, and CodeLlama.

Visit →v1