AnyGPT — Open Source & Enterprise AI Platform (JacGPT Evolution)

## Vision

**AnyGPT** is the evolution of JacGPT — a platform where users upload their documents and get a fully customized AI assistant with a branded UI, API endpoint, and optionally an auto-specialized small language model trained on their data.

Built on the Jaseci stack. Two editions:
- **Open Source**: Self-hostable, community-driven
- **Enterprise**: Dashboard + configurator + integrations, on-prem tooling, white-label customizable desktop app

---

## Architecture Overview

```
┌─────────────────────────────────────────────────────────────────────┐
│                     AnyGPT Platform                                 │
├──────────────┬──────────────┬───────────────┬───────────────────────┤
│  Tenant UI   │ Config Dashboard │  API Gateway  │  Desktop App      │
│  (per-user   │ (enterprise)     │  (per-tenant  │  (Tauri + Jac)    │
│   branded)   │                  │   endpoints)  │                   │
├──────────────┴──────────────┴───────────────┴───────────────────────┤
│                     Orchestration Layer (Jac Walkers)               │
│  ┌──────────┐  ┌──────────────┐  ┌───────────┐  ┌───────────────┐  │
│  │ Agentic  │  │ MCP Context  │  │ Auto-     │  │ Tenant        │  │
│  │ RAG      │  │ Engine       │  │ Specialize│  │ Manager       │  │
│  └──────────┘  └──────────────┘  └───────────┘  └───────────────┘  │
├─────────────────────────────────────────────────────────────────────┤
│                     Knowledge Layer                                 │
│  ┌──────────────┐  ┌────────────┐  ┌──────────┐  ┌─────────────┐  │
│  │ Contextual   │  │ ColPali    │  │ Hybrid   │  │ RAPTOR       │  │
│  │ Retrieval    │  │ (Multimodal│  │ Search   │  │ (Hierarchical│  │
│  │ + Self-RAG   │  │  Retrieval)│  │ + Rerank │  │  Summaries)  │  │
│  └──────────────┘  └────────────┘  └──────────┘  └─────────────┘  │
├─────────────────────────────────────────────────────────────────────┤
│                     Storage Layer                                   │
│  pgvector / Qdrant          │          Object Store (S3/local)      │
└─────────────────────────────────────────────────────────────────────┘
```

---

## Roadmap

### Phase 1: Open Source AnyGPT MVP
**Goal**: Users upload docs → get an AI assistant that answers questions from those docs with cutting-edge retrieval.

**1.1 — Multi-tenant Document Ingestion**
- [ ] User document upload (PDF, Markdown, HTML, DOCX, code files)
- [ ] Per-tenant isolated document stores and vector indices
- [ ] Multimodal ingestion via **ColPali/ColQwen** — treat PDFs as screenshots, no parsing/OCR pipeline needed, VLM processes text + images together using late-interaction retrieval
- [ ] Contextual chunking using **Anthropic's Contextual Retrieval** — LLM generates chunk-specific context preambles before embedding (reduces retrieval errors by 67% vs naive chunking)

**1.2 — Agentic RAG Pipeline (replaces naive RAG)**
- [ ] **RAPTOR** hierarchical summarization tree — cluster chunks with GMMs, recursively summarize, enabling synthesis across long documents (+20% on complex reasoning benchmarks)
- [ ] **Hybrid retrieval**: dense (contextual embeddings) + sparse (contextual BM25) + **ColBERT v2** late-interaction reranking (100x more efficient than cross-encoders at comparable accuracy)
- [ ] **Self-RAG** reflection loop — model decides mid-generation whether to fetch additional evidence or critique its own draft
- [ ] **Corrective RAG** retrieval evaluator — confidence scoring triggers fallback to broader search when initial retrieval quality is low
- [ ] Agentic orchestration via Jac walkers — decompose complex queries into sub-tasks, retrieve iteratively, verify claims before responding

**1.3 — MCP Integration**
- [ ] Build AnyGPT MCP server exposing document stores as MCP Resources and search as MCP Tools
- [ ] Support MCP transport (Streamable HTTP for remote, STDIO for local/desktop)
- [ ] Users can connect external data sources (databases, APIs, wikis) as additional MCP servers — the agent queries across all sources
- [ ] MCP-based tool use for actions beyond Q&A (write to databases, trigger workflows, call APIs)

**1.4 — Per-Tenant UI & API**
- [ ] Auto-generated branded chat UI per tenant (Jac Client + Mantine)
- [ ] REST API endpoint per tenant for programmatic access
- [ ] Embeddable widget (iframe/web component) for integration into existing sites
- [ ] SSE streaming responses with agent reasoning transparency

**1.5 — Open Source Launch**
- [ ] Dockerized single-command deployment (`jac start`)
- [ ] Documentation and quickstart guide
- [ ] Community contribution guidelines

---

### Phase 2: Enterprise Edition MVP
**Goal**: Dashboard, configurator, integrations, on-prem deployment — no auto-specialize yet.

**2.1 — Configuration Dashboard**
- [ ] Web-based admin dashboard for managing tenants, documents, models, and integrations
- [ ] Visual pipeline configurator — toggle RAG strategies (RAPTOR, hybrid search, Self-RAG) per tenant
- [ ] Model selection — choose between cloud LLMs (Claude, GPT-4, etc.) or self-hosted models (Ollama, vLLM)
- [ ] Usage analytics, query logs, retrieval quality metrics per tenant

**2.2 — Integration Layer**
- [ ] Pre-built MCP connectors: Slack, Confluence, Notion, Google Drive, SharePoint, GitHub
- [ ] Webhook/event system for custom integrations
- [ ] SSO/SAML authentication
- [ ] Role-based access control (admin, editor, viewer per tenant)

**2.3 — White-Label Desktop App**
- [ ] **Tauri 2.0** desktop app (30-40MB vs 200MB+ Electron, <0.5s startup)
- [ ] Rust backend + Jac Client frontend
- [ ] Local-first mode — bundle Ollama/llama.cpp as Tauri sidecar for fully offline operation
- [ ] Cross-platform: macOS, Windows, Linux (+ iOS/Android via Tauri 2.x)
- [ ] Customizable branding (logo, colors, name) via config file

**2.4 — On-Prem Tooling**
- [ ] Helm chart for Kubernetes deployment
- [ ] Air-gapped installation support (all models and dependencies bundled)
- [ ] Data residency controls — all data stays within customer infrastructure

---

### Phase 3: Auto-Specialize (Priority in AnyGPT Roadmap)
**Goal**: Push-button auto-specialization — train a small language model on user's docs that performs as well as Opus/GPT-4 on domain questions.

**3.1 — Synthetic Dataset Generation (Step 1)**
> *Use Opus to generate many questions from docs*

- [ ] Document analysis walker — crawl uploaded docs, identify key concepts, entities, relationships, edge cases
- [ ] **Multi-strategy question generation** using Claude Opus:
  - Factual questions (who/what/where/when)
  - Reasoning questions (why/how, multi-hop)
  - Comparison questions (contrast concepts)
  - Application questions (how to use X in scenario Y)
  - Edge case questions (what happens if...)
- [ ] Question quality filtering — deduplicate, validate answerability, ensure coverage of all document sections
- [ ] Target: 5,000-50,000 high-quality Q&A pairs per document corpus (scales with doc size)

**3.2 — Agent-Generated Answer Dataset (Step 2)**
> *Use Opus to run agent on questions to generate dataset*

- [ ] Run each question through the full Agentic RAG pipeline with Opus as the backbone
- [ ] Capture complete reasoning traces (chain-of-thought, retrieval steps, self-correction)
- [ ] **Rejection sampling**: generate multiple answers per question (k=8), score each against ground truth, keep only the best reasoning paths
- [ ] Generate **preference pairs** for DPO: (good answer, bad answer) tuples per question
- [ ] Human-in-the-loop review interface for critical domain accuracy (enterprise tier)
- [ ] Output: SFT dataset (question → best answer with CoT) + DPO dataset (question → preferred vs rejected)

**3.3 — SLM Training Pipeline (Step 3)**
> *Use dataset to train SLM to be as good as Opus on docs*

- [ ] **Base model selection** (configurable per tenant):
  - **Phi-4-Mini (3.8B)** — default, reasoning comparable to 7-9B models
  - **Qwen 2.5 (7B)** — strong multilingual, benefits most from distillation
  - **Llama 3.2 (3B)** — optimized for edge/mobile deployment
  - **Mistral (7B)** — strong general-purpose
- [ ] **Stage 1 — SFT with QLoRA**: Fine-tune on synthetic Q&A dataset using 4-bit quantization (runs on single 12GB GPU). LoRA rank 16-64, targeting attention + MLP layers
- [ ] **Stage 2 — DPO alignment**: Direct Preference Optimization on preference pairs (40-75% cheaper than RLHF, no reward model needed). Trains the model to prefer domain-accurate, well-reasoned responses
- [ ] **Stage 3 — Evaluation loop**:
  - Hold out 10% of synthetic data as eval set
  - Run automated benchmarks: answer accuracy, retrieval faithfulness, hallucination rate
  - Compare SLM outputs against Opus outputs on same questions
  - If below threshold → generate more training data on failure cases → retrain → re-evaluate
- [ ] **Stage 4 — Model export**: GGUF quantization (Q4_K_M) for local deployment, ONNX for edge, vLLM-compatible for cloud serving
- [ ] One-click training trigger from dashboard — user uploads docs → clicks "Auto-Specialize" → gets a trained model

**3.4 — Specialized Model Serving**
- [ ] Serve trained SLM alongside RAG pipeline — model handles domain questions natively, RAG provides grounding and citations
- [ ] Automatic routing: use trained model for in-domain queries, fall back to large model for out-of-domain
- [ ] Model versioning — retrain when documents are updated, keep previous versions
- [ ] A/B testing — compare specialized SLM vs large model on live queries to validate quality

---

## Technical Decisions

### Why Not Old-School RAG?

| Traditional RAG | AnyGPT Approach |
|---|---|
| Naive chunking loses context | Contextual Retrieval prepends document-level context to each chunk |
| Single retrieval pass | Agentic RAG with Self-RAG reflection + Corrective RAG fallback |
| Text-only PDF parsing (lossy) | ColPali treats PDFs as images — no parsing pipeline, handles tables/figures natively |
| Single similarity metric | Hybrid search (dense + sparse + ColBERT reranking) |
| No document hierarchy awareness | RAPTOR builds recursive summary trees for synthesis across long docs |

### Why Not GraphRAG?

GraphRAG (knowledge graph construction + community detection) was considered but **dropped** due to:
- **High cost** — requires LLM calls to extract entities/relationships from every document chunk at ingestion time, making it prohibitively expensive for large document corpora
- **High latency** — graph construction is slow (minutes to hours per corpus), and graph traversal queries add significant overhead vs direct vector search
- **Diminishing returns** — the combination of Contextual Retrieval + RAPTOR + Self-RAG achieves strong multi-hop reasoning without the graph infrastructure overhead

### Why MCP?

MCP (Model Context Protocol) is now an industry standard (Anthropic + OpenAI + Google, under Linux Foundation). Benefits:
- **Standardized context layer** — any MCP-compatible client can connect to AnyGPT's document stores
- **Extensible** — users plug in their own data sources (Slack, Confluence, databases) without custom code
- **Future-proof** — as the ecosystem grows, AnyGPT automatically gains compatibility with new tools and data sources

### Why Tauri for Desktop?

- **10x smaller** bundle (10MB vs 100MB+ Electron)
- **5x less memory** (30-40MB vs 200-300MB idle)
- **Sidecar support** — bundle llama.cpp/Ollama for fully offline AI
- **Mobile support** — iOS + Android via Tauri 2.x (Electron can't do this)
- **Rust backend** — fast, safe, system-level access for local model inference

### Auto-Specialize IP

The push-button auto-specialization pipeline is AnyGPT's core differentiator:

```
User Docs ──→ Opus generates questions ──→ Opus + Agentic RAG generates answers
                                                          │
                                              ┌───────────┴───────────┐
                                              │   SFT Dataset         │
                                              │   DPO Preference Pairs│
                                              └───────────┬───────────┘
                                                          │
                                              QLoRA Fine-tune SLM (Phi-4/Qwen/Llama)
                                                          │
                                              DPO Alignment
                                                          │
                                              Eval Loop (compare vs Opus)
                                                          │
                                              ┌───────────┴───────────┐
                                              │  Pass? → Deploy       │
                                              │  Fail? → More data    │
                                              │          → Retrain    │
                                              └───────────────────────┘
```

No competitor offers this as a push-button feature. Most white-label AI platforms (Dify, Flowise, Langflow, CustomGPT) stop at RAG — none auto-train specialized models. This is the moat.

---

## Competitive Landscape

| Platform | RAG | Custom UI | API | SLM Training | Desktop | Open Source |
|---|---|---|---|---|---|---|
| **AnyGPT** | Agentic + RAPTOR + Self-RAG | ✅ White-label | ✅ Per-tenant | ✅ Auto-specialize | ✅ Tauri | ✅ |
| Dify | Basic RAG | Limited | ✅ | ❌ | ❌ | ✅ |
| CustomGPT | Basic RAG | ✅ | ✅ | ❌ | ❌ | ❌ |
| Flowise | Basic RAG | ❌ | ✅ | ❌ | ❌ | ✅ |
| Open WebUI | Basic RAG | Limited | ❌ | ❌ | ❌ | ✅ |
| AnythingLLM | Basic RAG | Limited | ✅ | ❌ | ✅ Electron | ✅ |

---

## Stack

- **Language**: Jac (Jaseci stack)
- **Backend**: Jac walkers + byLLM plugin
- **Frontend**: Jac Client (React-based) + Mantine UI
- **Vector Store**: pgvector / Qdrant (replacing FAISS for multi-tenancy)
- **Multimodal Retrieval**: ColPali/ColQwen
- **Reranking**: ColBERT v2
- **LLM**: Claude Opus (teacher/generation), configurable per tenant
- **SLM Training**: QLoRA + DPO via HuggingFace Transformers + TRL
- **Desktop**: Tauri 2.0 + Rust + llama.cpp sidecar
- **MCP**: Streamable HTTP transport
- **Deployment**: Docker + Kubernetes (Helm charts)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AnyGPT — Open Source & Enterprise AI Platform (JacGPT Evolution) #533

Vision

Architecture Overview

Roadmap

Phase 1: Open Source AnyGPT MVP

Phase 2: Enterprise Edition MVP

Phase 3: Auto-Specialize (Priority in AnyGPT Roadmap)

Technical Decisions

Why Not Old-School RAG?

Why Not GraphRAG?

Why MCP?

Why Tauri for Desktop?

Auto-Specialize IP

Competitive Landscape

Stack

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Traditional RAG	AnyGPT Approach
Naive chunking loses context	Contextual Retrieval prepends document-level context to each chunk
Single retrieval pass	Agentic RAG with Self-RAG reflection + Corrective RAG fallback
Text-only PDF parsing (lossy)	ColPali treats PDFs as images — no parsing pipeline, handles tables/figures natively
Single similarity metric	Hybrid search (dense + sparse + ColBERT reranking)
No document hierarchy awareness	RAPTOR builds recursive summary trees for synthesis across long docs

Platform	RAG	Custom UI	API	SLM Training	Desktop	Open Source
AnyGPT	Agentic + RAPTOR + Self-RAG	✅ White-label	✅ Per-tenant	✅ Auto-specialize	✅ Tauri	✅
Dify	Basic RAG	Limited	✅	❌	❌	✅
CustomGPT	Basic RAG	✅	✅	❌	❌	❌
Flowise	Basic RAG	❌	✅	❌	❌	✅
Open WebUI	Basic RAG	Limited	❌	❌	❌	✅
AnythingLLM	Basic RAG	Limited	✅	❌	✅ Electron	✅

AnyGPT — Open Source & Enterprise AI Platform (JacGPT Evolution) #533

Description

Vision

Architecture Overview

Roadmap

Phase 1: Open Source AnyGPT MVP

Phase 2: Enterprise Edition MVP

Phase 3: Auto-Specialize (Priority in AnyGPT Roadmap)

Technical Decisions

Why Not Old-School RAG?

Why Not GraphRAG?

Why MCP?

Why Tauri for Desktop?

Auto-Specialize IP

Competitive Landscape

Stack

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions