Introduction
Large Language Models (LLMs) have revolutionized how we interact with artificial intelligence, but accessing them often requires expensive cloud API calls or complex setups. Ollama solves this problem by providing a simple way to run LLMs locally, while Open WebUI offers an intuitive web interface similar to ChatGPT.
In this comprehensive tutorial, you’ll learn how to deploy both Ollama and Open WebUI on an Ubuntu 24.04 VPS using Docker Compose. This setup allows you to run powerful language models privately on your own infrastructure, giving you complete control over your data and conversations.
What you’ll accomplish:
- Set up a complete local LLM environment
- Deploy Ollama for model management and inference
- Configure Open WebUI for an intuitive chat interface
- Implement security best practices
- Optimize performance for your VPS resources
Prerequisites
Before beginning this tutorial, ensure you have:
System Requirements:
- Ubuntu 24.04 LTS VPS with minimum 8GB RAM (16GB+ recommended for larger models)
- At least 4 CPU cores for optimal performance
- 50GB+ storage space (models can be 4-70GB each)
- Root or sudo access to the server
Software Requirements:
- Docker Engine 24.0+ and Docker Compose v2
- Basic familiarity with command line operations
- SSH access to your VPS
Note: High-performance VPS instances with modern processors like AMD EPYC Milan provide significantly better inference speeds for language models.
Step-by-Step Tutorial
Step 1: Update Your Ubuntu System
First, connect to your VPS via SSH and update the system packages:
sudo apt update && sudo apt upgrade -y
sudo apt install curl wget git -y
Step 2: Install Docker and Docker Compose
Install Docker using the official installation script:
# Download and install Docker
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
# Add your user to the docker group
sudo usermod -aG docker $USER
# Start and enable Docker service
sudo systemctl start docker
sudo systemctl enable docker
Log out and back in for group changes to take effect, then verify the installation:
docker --version
docker compose version
Step 3: Create Project Directory and Configuration
Create a dedicated directory for your Ollama deployment:
mkdir ~/ollama-webui
cd ~/ollama-webui
Step 4: Create Docker Compose Configuration
Create a docker-compose.yml
file with the following configuration:
version: '3.8'
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
environment:
- OLLAMA_HOST=0.0.0.0:11434
restart: unless-stopped
deploy:
resources:
reservations:
memory: 2G
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
ports:
- "3000:8080"
environment:
- OLLAMA_BASE_URL=http://ollama:11434
- WEBUI_SECRET_KEY=your-secret-key-here
volumes:
- open_webui_data:/app/backend/data
depends_on:
- ollama
restart: unless-stopped
volumes:
ollama_data:
open_webui_data:
Security Note: Replace your-secret-key-here
with a strong, randomly generated secret key:
openssl rand -base64 32
Step 5: Deploy the Services
Launch both services using Docker Compose:
docker compose up -d
Verify both containers are running:
docker compose ps
Monitor the logs to ensure everything starts correctly:
docker compose logs -f
Step 6: Download and Configure Language Models
Access the Ollama container to download your first model:
# Download a lightweight model for testing
docker exec -it ollama ollama pull llama3.2:3b
# For more powerful models (requires more RAM):
# docker exec -it ollama ollama pull llama3.2:8b
# docker exec -it ollama ollama pull codellama:7b
List available models:
docker exec -it ollama ollama list
Step 7: Configure Firewall (Security Best Practice)
Set up UFW firewall to secure your deployment:
# Enable UFW
sudo ufw enable
# Allow SSH (adjust port if needed)
sudo ufw allow 22/tcp
# Allow Open WebUI access
sudo ufw allow 3000/tcp
# Block direct Ollama access from external sources
# (Open WebUI will communicate internally)
sudo ufw deny 11434/tcp
# Check firewall status
sudo ufw status
Step 8: Access Open WebUI
Open your web browser and navigate to http://your-vps-ip:3000
. You’ll be prompted to create an admin account on first access.
Once logged in, you can:
- Start conversations with downloaded models
- Download additional models through the web interface
- Customize model parameters and system prompts
- Create and manage different chat sessions
Best Practices
Performance Optimization
Resource Management:
- Monitor RAM usage with
docker stats
– smaller models like llama3.2:3b use ~4GB, while larger models can require 16GB+ - Use SSD storage for faster model loading and inference
- Consider CPU-optimized VPS instances for better performance
Model Selection Strategy:
- Start with smaller models (3B-7B parameters) to test your setup
- Gradually scale to larger models based on your VPS resources
- Use specialized models like CodeLlama for programming tasks
Security Considerations
Essential Security Measures:
- Never expose Ollama port (11434) to the internet – it lacks authentication
- Use strong passwords for Open WebUI admin accounts
- Regularly update container images:
docker compose pull && docker compose up -d
- Consider setting up SSL/TLS with a reverse proxy like Nginx for production use
- Implement regular backups of your conversation data
Maintenance and Monitoring
Set up log rotation to prevent disk space issues:
# View current log sizes
docker system df
# Clean up unused containers and images
docker system prune -f
Create a backup script for your data:
#!/bin/bash
docker compose down
sudo tar -czf ~/ollama-backup-$(date +%Y%m%d).tar.gz ~/ollama-webui
docker compose up -d
Conclusion
You’ve successfully deployed a complete local LLM environment with Ollama and Open WebUI on Ubuntu 24.04. This setup provides you with:
- Complete privacy and control over your AI conversations
- No per-token costs or API rate limits
- The ability to customize and fine-tune models for your specific needs
- A scalable foundation for building AI-powered applications
Your deployment can handle multiple concurrent users and supports various language models depending on your VPS specifications. As your needs grow, you can easily scale by upgrading your server resources or deploying additional instances.
For optimal performance and reliability, consider high-performance VPS solutions with modern processors and NVMe storage. Modern cloud infrastructure ensures your LLM deployment runs smoothly and responds quickly to user interactions.
Ready to explore more advanced deployments? Consider investigating GPU-accelerated instances for even faster inference speeds, or setting up load-balanced multiple instances for enterprise workloads.