Releases: makr-code/ThemisDB
ThemisDB v1.3.4 - Insert Performance Optimization
ThemisDB v1.3.4 Release Notes
Insert Performance Optimization Release 🚀
Release Date: 28. Dezember 2025
Type: Minor Feature Release
Focus: Secondary Index Insert Performance
🎯 Highlights
Massive Performance Improvements
- 23-77x faster bulk inserts via new Batch Insert API
- 98.2% latency reduction for 100-entity batches (810ms → 14.5ms)
- 60-200x faster index metadata lookups (<10 µs vs 600-2000 µs)
- Phase 1 & 2 goals dramatically exceeded
New Features
- Batch Insert API (
putBatch()) for optimal bulk insert performance - Secondary Index Metadata Cache with TTL-based invalidation
- Comprehensive benchmarking suite for v1.3.4 optimizations
📊 Performance Results
Batch Insert API Performance
| Batch Size | Single Inserts | Batch API | Speedup | Latency Reduction |
|---|---|---|---|---|
| 100 entities | 810ms (3.87 items/s) | 14.5ms (9,040 items/s) | 23.4x | 98.2% |
| 1000 entities | 3744ms (4.18 items/s) | 311ms (323,900 items/s) | 77.5x | 91.7% |
Metadata Cache Impact
- Before: 600-2000 µs per insert (6 DB scans)
- After: <10 µs per insert (cached lookups)
- Improvement: 60-200x faster metadata access
Phase Goal Achievement
- ✅ Phase 1 Target (+50-100%): Exceeded by 2,240%
- ✅ Phase 2 Target (+100-200%): Exceeded by 7,650%
🆕 New Features
1. Batch Insert API
New putBatch() method for optimal bulk insert performance:
#include "index/secondary_index.h"
// Prepare entities
std::vector<themis::BaseEntity> entities;
for (int i = 0; i < 1000; ++i) {
themis::BaseEntity entity("user_" + std::to_string(i));
entity.setField("email", "user" + std::to_string(i) + "@example.com");
entity.setField("username", "username_" + std::to_string(i));
entities.push_back(std::move(entity));
}
// Single batch insert (23-77x faster than individual inserts!)
auto status = indexMgr->putBatch("users", entities);Key Benefits:
- Single atomic commit for all entities
- Reduced commit overhead from ~2000 µs per entity to ~2 µs amortized
- Automatic rollback on any error
- Thread-safe and production-ready
2. Secondary Index Metadata Cache
Automatic in-memory caching of index configurations:
// Cache is transparent - no code changes needed!
// Index metadata is cached for 60 seconds by default
// Manual cache control (optional):
#include "index/secondary_index_metadata_cache.h"
auto& cache = SecondaryIndexMetadataCache::instance();
// Get cache statistics
auto stats = cache.get_stats();
std::cout << "Hit rate: " << stats.hit_rate() << "%" << std::endl;
// Manual cache invalidation (automatic on index changes)
cache.invalidate("table_name");
// Adjust TTL if needed
cache.set_ttl(std::chrono::seconds(120));Key Benefits:
- Eliminates 6 DB scans per insert
- Thread-safe with shared_mutex
- Automatic invalidation on schema changes
- Statistics for monitoring
🔧 Improvements
Index Update Performance
- Optimized
updateIndexesForPut_()with single pkBytes computation - Added
reserve()calls for composite index column vectors - Reduced allocations in sparse, geo, TTL, and fulltext index updates
- Eliminated shadowing variables for cleaner code
Benchmark Suite
- New
bench_batch_insertbenchmark demonstrating API benefits - Updated
bench_v1_3_4_optimizationswith cache validation - Simple insert test for debugging
📚 Documentation
New Documentation Files
- BATCH_INSERT_PERFORMANCE_RESULTS.md - Detailed benchmark results
- V1_3_4_QUICK_SUMMARY.md - Executive summary
- V1_3_4_RELEASE_SUMMARY.md - Complete release overview
- V1_3_4_PERFORMANCE_ANALYSIS.md - Mathematical analysis
- V1_3_4_VALIDATION_REPORT.md - Phase goal validation
- INSERT_PERFORMANCE_DEEP_DIVE.md - Root cause analysis
🐛 Bug Fixes
- Fixed WriteBatch commit issues with TransactionDB (requires WAL enabled)
- Removed all pkBytes shadowing declarations (compiler warnings)
- Fixed include paths in batch insert benchmarks
⚙️ Technical Details
Root Cause Analysis
The v1.3.3 insert regression was caused by two primary bottlenecks:
-
Metadata DB Scans (6x per insert): 600-2000 µs overhead
- Solution: In-memory metadata cache
- Result: -1990 µs per insert
-
Per-Insert Commit Overhead: 500-2000 µs per commit
- Solution: Batch Insert API with amortized commits
- Result: -1900 µs amortized per insert
Implementation Details
Metadata Cache:
- Location:
include/index/secondary_index_metadata_cache.h - Pattern: Thread-safe singleton with TTL
- Integration: Transparent in
updateIndexesForPut_() - Invalidation: Automatic on all 12 create/drop index methods
Batch Insert API:
- Location:
src/index/secondary_index.cpp:772-825 - Pattern: Single WriteBatch for N entities
- Error Handling: Automatic rollback on any failure
- Atomicity: All-or-nothing guarantee
🔄 Migration Guide
For Bulk Inserts
Before (v1.3.3):
for (const auto& entity : entities) {
auto status = indexMgr->put("table", entity);
if (!status.ok) { /* handle error */ }
}
// 1000 entities × 2000 µs commit = 2 seconds overheadAfter (v1.3.4):
auto status = indexMgr->putBatch("table", entities);
if (!status.ok) { /* handle error */ }
// 1 commit = 2 ms overhead (1000x faster!)No Changes Required
The metadata cache is automatically enabled for all existing code. No migration needed!
📦 Installation
From GitHub Release
# Download binaries
wget https://github.com/yourusername/themis/releases/download/v1.3.4/themis-v1.3.4-linux-x64.tar.gz
# Extract
tar -xzf themis-v1.3.4-linux-x64.tar.gz
# Run
cd themis-v1.3.4
./themis_server --helpDocker
# Pull image
docker pull yourusername/themis:1.3.4
# Run
docker run -p 7687:7687 -p 8080:8080 yourusername/themis:1.3.4Build from Source
git clone https://github.com/yourusername/themis.git
cd themis
git checkout v1.3.4
# Windows (MSVC)
cmake -S . -B build-msvc -G "Visual Studio 17 2022" -A x64 ^
-DCMAKE_TOOLCHAIN_FILE="%VCPKG_ROOT%\scripts\buildsystems\vcpkg.cmake"
cmake --build build-msvc --config Release --parallel 8
# Linux
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build --parallel $(nproc)🔜 What's Next (v1.3.5)
- Extended batch API for update and delete operations
- Adaptive cache TTL based on workload patterns
- Parallel batch processing for multi-core optimization
- Additional micro-optimizations for serialization
🙏 Contributors
- Core team for performance analysis and optimization
- Community for feedback on v1.3.3 performance regression
📝 Full Changelog
See CHANGELOG.md for complete version history.
🔗 Resources
- Documentation: docs/
- Benchmarks: benchmarks/
- Performance Analysis: V1_3_4_PERFORMANCE_ANALYSIS.md
- GitHub: https://github.com/yourusername/themis
- Docker Hub: https://hub.docker.com/r/yourusername/themis
Questions or Issues? Open an issue on GitHub
ThemisDB v1.3.0 - Keep Your Own Llamas
ThemisDB v1.3.0 - Native LLM Integration
Release Date: 20. Dezember 2025
Code Name: "Keep Your Own Llamas"
🎉 Overview
ThemisDB v1.3.0 brings native LLM integration with embedded llama.cpp, enabling you to run AI/LLM workloads directly in your database without external API dependencies. This release introduces a complete plugin architecture, GPU acceleration, and enterprise-grade caching for production LLM deployments.
"ThemisDB keeps its own llamas." – Run LLaMA, Mistral, Phi-3 models (1B-70B params) directly in your database.
🚀 Major Features
🧠 Embedded LLM Engine (llama.cpp)
- Native Integration: llama.cpp embedded as local clone (not committed to repo)
- Model Support: GGUF format models (LLaMA 3, Mistral, Phi-3, etc.)
- Inference Engine: Full tokenization, evaluation, sampling, and detokenization pipeline
- Memory Management: Lazy model loading with configurable VRAM budgets
⚡ GPU Acceleration
- CUDA Support: NVIDIA GPU acceleration with 100x speedup vs CPU
- Metal Support: Apple Silicon optimization
- Vulkan Support: Cross-platform GPU backend
- Automatic Fallback: Graceful degradation to CPU when GPU unavailable
🧩 Plugin Architecture
- LlamaCppPlugin: Reference implementation for llama.cpp backend
- ILLMPlugin Interface: Extensible plugin system for custom LLM backends
- Plugin Manager: Centralized management with lifecycle control
- Hot-Swappable: Load/unload models and LoRA adapters dynamically
🗃️ Advanced Model Management
- Lazy Loading: Ollama-style on-demand model loading (2-3s first load, instant cache hits)
- Multi-LoRA Manager: vLLM-style support for up to 16 concurrent LoRA adapters
- Model Pinning: Prevent eviction of critical models from memory
- TTL Management: Automatic model eviction after configurable idle time (default: 30 min)
💾 Enterprise Caching
- Response Cache: Semantic caching for identical queries (70-90% cost reduction)
- Prefix Cache: Reuse common prompt prefixes across requests
- Model Metadata Cache: TBB lock-free cache for 10x faster metadata access
- KV Cache Buffer: Shared read-only buffers for 70% memory savings
🔧 Build & Deployment
- Windows/MSVC Support: PowerShell build script with Visual Studio 2022
- MSVC Fixes:
/Zc:char8_t-compiler flag for llama.cpp compatibility - Docker BuildKit: Flexible llama.cpp source (local or git clone)
- Offline-First: Root vcpkg standardization for reproducible builds
📚 Documentation
- Consolidated Guides: Root README + 17 specialized LLM docs (380 KB)
- Integration Paths: Local clone approach (no git submodules)
- API Specifications: HTTP REST + gRPC binary protocols
- Client SDKs: Python, JavaScript, Go, Rust, Java, C# examples
📦 What's Included
Binary Artifacts
themis_server.exe(10.2 MB) - Windows x64 Releasellama.dll(2.2 MB) - llama.cpp inference engineggml*.dll(1.4 MB) - GGML computation kernels
Source Components
- 23 LLM Headers (
include/llm/) - 23 LLM Implementations (
src/llm/) - 17 Documentation Guides (
docs/llm/) - 8+ Test Suites (
tests/test_llm_*.cpp) - PowerShell Build Script (
scripts/build-themis-server-llm.ps1)
🔄 Breaking Changes
⚠️ llama.cpp Submodule Removed
- Before:
external/llama.cppas git submodule - After: Local clone in project root (excluded via
.gitignore/.dockerignore) - Migration: Clone llama.cpp locally:
git clone https://github.com/ggerganov/llama.cpp.git
⚠️ vcpkg Standardization
- Before: Multiple vcpkg locations (
external/vcpkg,./vcpkg) - After: Single root
./vcpkgwithVCPKG_ROOTstandardization - Migration: Set
VCPKG_ROOT=C:\VCC\themis\vcpkg(or your path)
⚠️ Docker Build Context
- Before:
external/copied into build context - After:
external/excluded; use BuildKit--build-contextfor llama.cpp - Migration: See Docker build commands below
🛠️ Installation & Upgrade
Windows (MSVC)
# Clone repository
git clone https://github.com/makr-code/ThemisDB.git
cd ThemisDB
# Clone llama.cpp locally (required for LLM support)
git clone https://github.com/ggerganov/llama.cpp.git llama.cpp
# Build with LLM support
powershell -File scripts/build-themis-server-llm.ps1
# Verify build
./build-msvc/Release/themis_server.exe --helpDocker (with LLM)
# With local llama.cpp clone
docker buildx build \
--build-arg ENABLE_LLM=ON \
--build-context llama=./llama.cpp \
-t themisdb:v1.3.0-llm .
# Without local clone (git clone in Docker)
docker buildx build \
--build-arg ENABLE_LLM=ON \
-t themisdb:v1.3.0-llm .Linux/WSL
# Clone and build
git clone https://github.com/makr-code/ThemisDB.git
cd ThemisDB
git clone https://github.com/ggerganov/llama.cpp.git llama.cpp
cmake -B build -DTHEMIS_ENABLE_LLM=ON
cmake --build build -j$(nproc)
./build/themis_server --help📖 Quick Start
1. Download a Model
# Example: Mistral 7B Instruct Q4 (~4GB)
mkdir -p models
cd models
wget https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q4_K_M.gguf2. Configure LLM
# config/llm_config.yaml
llm:
enabled: true
plugin: llamacpp
model:
path: ./models/mistral-7b-instruct-v0.2.Q4_K_M.gguf
n_gpu_layers: 32 # GPU offload layers
n_ctx: 4096 # Context window
cache:
max_models: 3
max_vram_mb: 24576 # 24 GB3. Start Server
./themis_server --config config/llm_config.yaml4. Run Inference (HTTP API)
curl -X POST http://localhost:8765/api/llm/generate \
-H "Content-Type: application/json" \
-d '{
"prompt": "What is ThemisDB?",
"max_tokens": 512,
"temperature": 0.7
}'🎯 Performance Benchmarks
GPU vs CPU (Mistral-7B Q4, RTX 4090)
| Operation | CPU (20 cores) | GPU (CUDA) | Speedup |
|---|---|---|---|
| Model Load | 2.8s | 2.1s | 1.3x |
| Inference (512 tokens) | 32s | 0.3s | 107x |
| Throughput | 16 tok/s | 1,700 tok/s | 106x |
Memory Usage (with Caching)
| Feature | Memory | Savings |
|---|---|---|
| Base Model (Mistral-7B Q4) | 4.2 GB | - |
| + Response Cache | 4.5 GB | 85% query cost |
| + Prefix Cache | 4.6 GB | 40% latency |
| + KV Cache Sharing | 3.1 GB | 70% memory |
Lazy Loading Impact
| Scenario | Cold Start | Warm Cache | Benefit |
|---|---|---|---|
| First Request | 2.8s | - | - |
| Subsequent Requests | - | ~0ms | Instant |
| After TTL Expiry | 2.8s | - | Auto-reload |
🔒 Security Considerations
- Model Files: Store models outside web root with proper permissions
- API Authentication: Enable Bearer Token (JWT) authentication in production
- Rate Limiting: Configure per-user quotas for inference requests
- Resource Limits: Set
max_vram_mbandmax_modelsto prevent exhaustion - Audit Logging: All LLM operations logged for compliance
📊 Known Limitations
- Windows DLL Export Limit: Use static build (
THEMIS_CORE_SHARED=OFF) to avoid 65k symbol limit - GPU Memory: Requires sufficient VRAM for model + overhead (~100 MB CUDA)
- Model Format: Only GGUF format supported (llama.cpp v2+)
- Concurrent Requests: Limited by available VRAM and KV cache size
- Docker BuildKit: Requires Docker 19.03+ and BuildKit enabled
🐛 Bug Fixes
- Fixed MSVC
char8_tcompilation errors in llama.cpp via/Zc:char8_t-flag - Resolved vcpkg path conflicts between
external/vcpkgand root./vcpkg - Corrected Docker
.dockerignoreto excludellama.cpp/from build context - Removed circular submodule dependencies (
docker/tmp/openssl,external/llama.cpp) - Fixed CMake generator/architecture issues on Windows (requires
-A x64)
📝 Deprecations
- Git Submodules for llama.cpp: Deprecated in favor of local clone approach
- external/vcpkg: Deprecated in favor of root
./vcpkglocation - Manual llama.cpp Setup: Use
scripts/setup-llamacpp.shor PowerShell build script
🔮 Roadmap (v1.4.0)
- Streaming Generation: Server-Sent Events (SSE) for real-time responses
- Batch Inference: Process multiple requests in single forward pass
- Distributed Sharding: Multi-node LLM deployment with etcd coordination
- vLLM Plugin: Native vLLM backend for PagedAttention and continuous batching
- Model Replication: Raft consensus for cross-shard model synchronization
- Advanced Quantization: Support for AWQ, GPTQ, and custom quantization schemes
🙏 Acknowledgments
- llama.cpp Team: For the incredible inference engine (MIT License)
- GGML: For efficient tensor operations on CPU/GPU
- HuggingFace: For GGUF model hosting and community
- ThemisDB Contributors: For testing, feedback, and documentation improvements
📚 Documentation
- LLM Integration Guide: docs/llm/LLAMA_CPP_INTEGRATION.md
- Plugin Development: docs/llm/README_PLUGINS.md
- Architecture Review: docs/llm/INTEGRATION_REVIEW_AND_SEQUENCE.md
- HTTP API Spec: docs/llm/HTTP_API_SPECIFICATION.md
- Docker Deployment: DOCKER_DEPLOYMENT.md
- Build Guide: docs/build/README.md
📞 Support & Community
- GitHub Issues: [Report bugs or request features](https://git...