AutoAWQ
AutoAWQ is a Python framework for quantizing and running large language models using Activation-aware Weight Quantization, typically reducing models to efficient 4-bit weights for local inference.
AutoAWQ is a Python framework for quantizing and running large language models using Activation-aware Weight Quantization, typically reducing models to efficient 4-bit weights for local inference.
Unsloth is an open-source framework for faster, memory-efficient fine-tuning of local large language models, especially using LoRA and QLoRA.
Open WebUI is a self-hosted, extensible web interface for running and managing local or remote LLMs through Ollama and OpenAI-compatible APIs.
KoboldCpp is a portable local LLM runtime and server for running GGUF/GGML-style language models with a KoboldAI-compatible web UI and API.
text-generation-webui is a popular Gradio-based web interface for running, chatting with, and experimenting with local large language models.
GPT4All is an open-source ecosystem for running large language models locally on consumer hardware, with desktop apps, Python bindings, and model management tools.
MLX LM is a Python package from Apple’s MLX ecosystem for running, fine-tuning, and serving large language models efficiently on Apple Silicon.
ExLlamaV2 is a high-performance Python/CUDA runtime for running quantized Llama-family large language models locally on NVIDIA GPUs.
vLLM is a high-throughput, memory-efficient inference and serving runtime for large language models, designed around techniques such as PagedAttention and continuous batching.
llama.cpp is a high-performance, portable C/C++ runtime for running large language models locally on CPUs, GPUs, and edge devices, commonly using GGUF-quantized model files.
A desktop application for browsing, downloading, and running local LLMs with a built-in chat interface and OpenAI-compatible API server for integration with other tools.
A high-performance C++ inference engine for large language models with advanced quantization techniques. It enables running models on consumer hardware and powers tools like Ollama.
The leading open-source tool for running large language models locally. It provides a simple CLI interface for downloading, managing, and serving models like Llama, Mistral, and CodeLlama.