You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
AnyGPT is the evolution of JacGPT — a platform where users upload their documents and get a fully customized AI assistant with a branded UI, API endpoint, and optionally an auto-specialized small language model trained on their data.
Goal: Users upload docs → get an AI assistant that answers questions from those docs with cutting-edge retrieval.
1.1 — Multi-tenant Document Ingestion
User document upload (PDF, Markdown, HTML, DOCX, code files)
Per-tenant isolated document stores and vector indices
Multimodal ingestion via ColPali/ColQwen — treat PDFs as screenshots, no parsing/OCR pipeline needed, VLM processes text + images together using late-interaction retrieval
Contextual chunking using Anthropic's Contextual Retrieval — LLM generates chunk-specific context preambles before embedding (reduces retrieval errors by 67% vs naive chunking)
1.2 — Agentic RAG Pipeline (replaces naive RAG)
RAPTOR hierarchical summarization tree — cluster chunks with GMMs, recursively summarize, enabling synthesis across long documents (+20% on complex reasoning benchmarks)
Hybrid retrieval: dense (contextual embeddings) + sparse (contextual BM25) + ColBERT v2 late-interaction reranking (100x more efficient than cross-encoders at comparable accuracy)
Self-RAG reflection loop — model decides mid-generation whether to fetch additional evidence or critique its own draft
Corrective RAG retrieval evaluator — confidence scoring triggers fallback to broader search when initial retrieval quality is low
Agentic orchestration via Jac walkers — decompose complex queries into sub-tasks, retrieve iteratively, verify claims before responding
1.3 — MCP Integration
Build AnyGPT MCP server exposing document stores as MCP Resources and search as MCP Tools
Support MCP transport (Streamable HTTP for remote, STDIO for local/desktop)
Users can connect external data sources (databases, APIs, wikis) as additional MCP servers — the agent queries across all sources
MCP-based tool use for actions beyond Q&A (write to databases, trigger workflows, call APIs)
1.4 — Per-Tenant UI & API
Auto-generated branded chat UI per tenant (Jac Client + Mantine)
REST API endpoint per tenant for programmatic access
Embeddable widget (iframe/web component) for integration into existing sites
SSE streaming responses with agent reasoning transparency
1.5 — Open Source Launch
Dockerized single-command deployment (jac start)
Documentation and quickstart guide
Community contribution guidelines
Phase 2: Enterprise Edition MVP
Goal: Dashboard, configurator, integrations, on-prem deployment — no auto-specialize yet.
2.1 — Configuration Dashboard
Web-based admin dashboard for managing tenants, documents, models, and integrations
Rejection sampling: generate multiple answers per question (k=8), score each against ground truth, keep only the best reasoning paths
Generate preference pairs for DPO: (good answer, bad answer) tuples per question
Human-in-the-loop review interface for critical domain accuracy (enterprise tier)
Output: SFT dataset (question → best answer with CoT) + DPO dataset (question → preferred vs rejected)
3.3 — SLM Training Pipeline (Step 3)
Use dataset to train SLM to be as good as Opus on docs
Base model selection (configurable per tenant):
Phi-4-Mini (3.8B) — default, reasoning comparable to 7-9B models
Qwen 2.5 (7B) — strong multilingual, benefits most from distillation
Llama 3.2 (3B) — optimized for edge/mobile deployment
Mistral (7B) — strong general-purpose
Stage 1 — SFT with QLoRA: Fine-tune on synthetic Q&A dataset using 4-bit quantization (runs on single 12GB GPU). LoRA rank 16-64, targeting attention + MLP layers
Stage 2 — DPO alignment: Direct Preference Optimization on preference pairs (40-75% cheaper than RLHF, no reward model needed). Trains the model to prefer domain-accurate, well-reasoned responses
Stage 3 — Evaluation loop:
Hold out 10% of synthetic data as eval set
Run automated benchmarks: answer accuracy, retrieval faithfulness, hallucination rate
Compare SLM outputs against Opus outputs on same questions
If below threshold → generate more training data on failure cases → retrain → re-evaluate
Stage 4 — Model export: GGUF quantization (Q4_K_M) for local deployment, ONNX for edge, vLLM-compatible for cloud serving
One-click training trigger from dashboard — user uploads docs → clicks "Auto-Specialize" → gets a trained model
3.4 — Specialized Model Serving
Serve trained SLM alongside RAG pipeline — model handles domain questions natively, RAG provides grounding and citations
Automatic routing: use trained model for in-domain queries, fall back to large model for out-of-domain
Model versioning — retrain when documents are updated, keep previous versions
A/B testing — compare specialized SLM vs large model on live queries to validate quality
Technical Decisions
Why Not Old-School RAG?
Traditional RAG
AnyGPT Approach
Naive chunking loses context
Contextual Retrieval prepends document-level context to each chunk
Single retrieval pass
Agentic RAG with Self-RAG reflection + Corrective RAG fallback
Text-only PDF parsing (lossy)
ColPali treats PDFs as images — no parsing pipeline, handles tables/figures natively
RAPTOR builds recursive summary trees for synthesis across long docs
Why Not GraphRAG?
GraphRAG (knowledge graph construction + community detection) was considered but dropped due to:
High cost — requires LLM calls to extract entities/relationships from every document chunk at ingestion time, making it prohibitively expensive for large document corpora
High latency — graph construction is slow (minutes to hours per corpus), and graph traversal queries add significant overhead vs direct vector search
Diminishing returns — the combination of Contextual Retrieval + RAPTOR + Self-RAG achieves strong multi-hop reasoning without the graph infrastructure overhead
Why MCP?
MCP (Model Context Protocol) is now an industry standard (Anthropic + OpenAI + Google, under Linux Foundation). Benefits:
Standardized context layer — any MCP-compatible client can connect to AnyGPT's document stores
Extensible — users plug in their own data sources (Slack, Confluence, databases) without custom code
Future-proof — as the ecosystem grows, AnyGPT automatically gains compatibility with new tools and data sources
Why Tauri for Desktop?
10x smaller bundle (10MB vs 100MB+ Electron)
5x less memory (30-40MB vs 200-300MB idle)
Sidecar support — bundle llama.cpp/Ollama for fully offline AI
Mobile support — iOS + Android via Tauri 2.x (Electron can't do this)
Rust backend — fast, safe, system-level access for local model inference
Auto-Specialize IP
The push-button auto-specialization pipeline is AnyGPT's core differentiator:
User Docs ──→ Opus generates questions ──→ Opus + Agentic RAG generates answers
│
┌───────────┴───────────┐
│ SFT Dataset │
│ DPO Preference Pairs│
└───────────┬───────────┘
│
QLoRA Fine-tune SLM (Phi-4/Qwen/Llama)
│
DPO Alignment
│
Eval Loop (compare vs Opus)
│
┌───────────┴───────────┐
│ Pass? → Deploy │
│ Fail? → More data │
│ → Retrain │
└───────────────────────┘
No competitor offers this as a push-button feature. Most white-label AI platforms (Dify, Flowise, Langflow, CustomGPT) stop at RAG — none auto-train specialized models. This is the moat.
Competitive Landscape
Platform
RAG
Custom UI
API
SLM Training
Desktop
Open Source
AnyGPT
Agentic + RAPTOR + Self-RAG
✅ White-label
✅ Per-tenant
✅ Auto-specialize
✅ Tauri
✅
Dify
Basic RAG
Limited
✅
❌
❌
✅
CustomGPT
Basic RAG
✅
✅
❌
❌
❌
Flowise
Basic RAG
❌
✅
❌
❌
✅
Open WebUI
Basic RAG
Limited
❌
❌
❌
✅
AnythingLLM
Basic RAG
Limited
✅
❌
✅ Electron
✅
Stack
Language: Jac (Jaseci stack)
Backend: Jac walkers + byLLM plugin
Frontend: Jac Client (React-based) + Mantine UI
Vector Store: pgvector / Qdrant (replacing FAISS for multi-tenancy)
Multimodal Retrieval: ColPali/ColQwen
Reranking: ColBERT v2
LLM: Claude Opus (teacher/generation), configurable per tenant
SLM Training: QLoRA + DPO via HuggingFace Transformers + TRL
Vision
AnyGPT is the evolution of JacGPT — a platform where users upload their documents and get a fully customized AI assistant with a branded UI, API endpoint, and optionally an auto-specialized small language model trained on their data.
Built on the Jaseci stack. Two editions:
Architecture Overview
Roadmap
Phase 1: Open Source AnyGPT MVP
Goal: Users upload docs → get an AI assistant that answers questions from those docs with cutting-edge retrieval.
1.1 — Multi-tenant Document Ingestion
1.2 — Agentic RAG Pipeline (replaces naive RAG)
1.3 — MCP Integration
1.4 — Per-Tenant UI & API
1.5 — Open Source Launch
jac start)Phase 2: Enterprise Edition MVP
Goal: Dashboard, configurator, integrations, on-prem deployment — no auto-specialize yet.
2.1 — Configuration Dashboard
2.2 — Integration Layer
2.3 — White-Label Desktop App
2.4 — On-Prem Tooling
Phase 3: Auto-Specialize (Priority in AnyGPT Roadmap)
Goal: Push-button auto-specialization — train a small language model on user's docs that performs as well as Opus/GPT-4 on domain questions.
3.1 — Synthetic Dataset Generation (Step 1)
3.2 — Agent-Generated Answer Dataset (Step 2)
3.3 — SLM Training Pipeline (Step 3)
3.4 — Specialized Model Serving
Technical Decisions
Why Not Old-School RAG?
Why Not GraphRAG?
GraphRAG (knowledge graph construction + community detection) was considered but dropped due to:
Why MCP?
MCP (Model Context Protocol) is now an industry standard (Anthropic + OpenAI + Google, under Linux Foundation). Benefits:
Why Tauri for Desktop?
Auto-Specialize IP
The push-button auto-specialization pipeline is AnyGPT's core differentiator:
No competitor offers this as a push-button feature. Most white-label AI platforms (Dify, Flowise, Langflow, CustomGPT) stop at RAG — none auto-train specialized models. This is the moat.
Competitive Landscape
Stack