Skip to content

Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Zk

License

Notifications You must be signed in to change notification settings

lemonade-sdk/lemonade

Repository files navigation

🍋 Lemonade: Local LLMs with GPU and NPU acceleration

Discord Lemonade Server Build Windows 11 Ubuntu 24.04 | 25.04 Arch Linux Made with Python PRs Welcome Latest Release GitHub downloads GitHub issues License: Apache Code style: black Star History Chart

Lemonade Banner

Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs.

Apps like n8n, VS Code Copilot, Morphik, and many more use Lemonade to seamlessly run LLMs on any PC.

Getting Started

  1. Install: Windows · Linux · Docker · Source
  2. Get Models: Browse and download with the Model Manager
  3. Chat: Try models with the built-in chat interface
  4. Mobile: Take your lemonade to go: iOS · Android (soon) · Source
  5. Connect: Use Lemonade with your favorite apps:

Open WebUI  n8n  Gaia  Infinity Arcade  Continue  GitHub Copilot  OpenHands  Dify  Deep Tutor  Iterate.ai

Want your app featured here? Discord · GitHub Issue · Email

Using the CLI

To run and chat with Gemma 3:

lemonade-server run Gemma-3-4b-it-GGUF

To install models ahead of time, use the pull command:

lemonade-server pull Gemma-3-4b-it-GGUF

To check all models available, use the list command:

lemonade-server list

Tip: You can use --llamacpp vulkan/rocm to select a backend when running GGUF models.

Model Library

Model Manager

Lemonade supports GGUF, FLM, and ONNX models across CPU, GPU, and NPU (see supported configurations).

Use lemonade-server pull or the built-in Model Manager to download models. You can also import custom GGUF/ONNX models from Hugging Face.

Browse all built-in models →


Image Generation

Lemonade supports image generation using Stable Diffusion models via stable-diffusion.cpp.

# Pull an image generation model
lemonade-server pull SD-Turbo

# Start the server
lemonade-server serve

Available models: SD-Turbo (fast, 4-step), SDXL-Turbo, SD-1.5, SDXL-Base-1.0

See examples/api_image_generation.py for complete examples.

Supported Configurations

Lemonade supports the following configurations, while also making it easy to switch between them at runtime.

Hardware Engine: OGA Engine: llamacpp Engine: FLM Windows Linux
🧠 CPU All platforms All platforms -
🎮 GPU Vulkan: All platforms
ROCm: Selected AMD platforms*
Metal: Apple Silicon
🤖 NPU AMD Ryzen™ AI 300 series Ryzen™ AI 300 series
* See supported AMD ROCm platforms
Architecture Platform Support GPU Models
gfx1151 (STX Halo) Windows, Ubuntu Ryzen AI MAX+ Pro 395
gfx120X (RDNA4) Windows, Ubuntu Radeon AI PRO R9700, RX 9070 XT/GRE/9070, RX 9060 XT
gfx110X (RDNA3) Windows, Ubuntu Radeon PRO W7900/W7800/W7700/V710, RX 7900 XTX/XT/GRE, RX 7800 XT, RX 7700 XT

Project Roadmap

Under Development Under Consideration Recently Completed
macOS vLLM support Image generation (stable-diffusion.cpp)
Apps marketplace Text to speech General speech-to-text support (whisper.cpp)
lemonade-eval CLI MLX support ROCm support for Ryzen AI 360-375 (Strix) APUs
ryzenai-server dedicated repo Lemonade desktop app
Enhanced custom model support

Integrate Lemonade Server with Your Application

You can use any OpenAI-compatible client library by configuring it to use http://localhost:8000/api/v1 as the base URL. A table containing official and popular OpenAI clients on different languages is shown below.

Feel free to pick and choose your preferred language.

Python C++ Java C# Node.js Go Ruby Rust PHP
openai-python openai-cpp openai-java openai-dotnet openai-node go-openai ruby-openai async-openai openai-php

Python Client Example

from openai import OpenAI

# Initialize the client to use Lemonade Server
client = OpenAI(
    base_url="http://localhost:8000/api/v1",
    api_key="lemonade"  # required but unused
)

# Create a chat completion
completion = client.chat.completions.create(
    model="Llama-3.2-1B-Instruct-Hybrid",  # or any other available model
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

# Print the response
print(completion.choices[0].message.content)

For more detailed integration instructions, see the Integration Guide.

FAQ

To read our frequently asked questions, see our FAQ Guide

Contributing

We are actively seeking collaborators from across the industry. If you would like to contribute to this project, please check out our contribution guide.

New contributors can find beginner-friendly issues tagged with "Good First Issue" to get started.

Good First Issue

Maintainers

This is a community project maintained by @amd-pworfolk @bitgamma @danielholanda @jeremyfowers @Geramy @ramkrishna2910 @siavashhub @sofiageo @vgodsoe, and sponsored by AMD. You can reach us by filing an issue, emailing [email protected], or joining our Discord.

License and Attribution

This project is: