Skip to content

AnyGPT — Open Source & Enterprise AI Platform (JacGPT Evolution) #533

@udithishanka

Description

@udithishanka

Vision

AnyGPT is the evolution of JacGPT — a platform where users upload their documents and get a fully customized AI assistant with a branded UI, API endpoint, and optionally an auto-specialized small language model trained on their data.

Built on the Jaseci stack. Two editions:

  • Open Source: Self-hostable, community-driven
  • Enterprise: Dashboard + configurator + integrations, on-prem tooling, white-label customizable desktop app

Architecture Overview

┌─────────────────────────────────────────────────────────────────────┐
│                     AnyGPT Platform                                 │
├──────────────┬──────────────┬───────────────┬───────────────────────┤
│  Tenant UI   │ Config Dashboard │  API Gateway  │  Desktop App      │
│  (per-user   │ (enterprise)     │  (per-tenant  │  (Tauri + Jac)    │
│   branded)   │                  │   endpoints)  │                   │
├──────────────┴──────────────┴───────────────┴───────────────────────┤
│                     Orchestration Layer (Jac Walkers)               │
│  ┌──────────┐  ┌──────────────┐  ┌───────────┐  ┌───────────────┐  │
│  │ Agentic  │  │ MCP Context  │  │ Auto-     │  │ Tenant        │  │
│  │ RAG      │  │ Engine       │  │ Specialize│  │ Manager       │  │
│  └──────────┘  └──────────────┘  └───────────┘  └───────────────┘  │
├─────────────────────────────────────────────────────────────────────┤
│                     Knowledge Layer                                 │
│  ┌──────────────┐  ┌────────────┐  ┌──────────┐  ┌─────────────┐  │
│  │ Contextual   │  │ ColPali    │  │ Hybrid   │  │ RAPTOR       │  │
│  │ Retrieval    │  │ (Multimodal│  │ Search   │  │ (Hierarchical│  │
│  │ + Self-RAG   │  │  Retrieval)│  │ + Rerank │  │  Summaries)  │  │
│  └──────────────┘  └────────────┘  └──────────┘  └─────────────┘  │
├─────────────────────────────────────────────────────────────────────┤
│                     Storage Layer                                   │
│  pgvector / Qdrant          │          Object Store (S3/local)      │
└─────────────────────────────────────────────────────────────────────┘

Roadmap

Phase 1: Open Source AnyGPT MVP

Goal: Users upload docs → get an AI assistant that answers questions from those docs with cutting-edge retrieval.

1.1 — Multi-tenant Document Ingestion

  • User document upload (PDF, Markdown, HTML, DOCX, code files)
  • Per-tenant isolated document stores and vector indices
  • Multimodal ingestion via ColPali/ColQwen — treat PDFs as screenshots, no parsing/OCR pipeline needed, VLM processes text + images together using late-interaction retrieval
  • Contextual chunking using Anthropic's Contextual Retrieval — LLM generates chunk-specific context preambles before embedding (reduces retrieval errors by 67% vs naive chunking)

1.2 — Agentic RAG Pipeline (replaces naive RAG)

  • RAPTOR hierarchical summarization tree — cluster chunks with GMMs, recursively summarize, enabling synthesis across long documents (+20% on complex reasoning benchmarks)
  • Hybrid retrieval: dense (contextual embeddings) + sparse (contextual BM25) + ColBERT v2 late-interaction reranking (100x more efficient than cross-encoders at comparable accuracy)
  • Self-RAG reflection loop — model decides mid-generation whether to fetch additional evidence or critique its own draft
  • Corrective RAG retrieval evaluator — confidence scoring triggers fallback to broader search when initial retrieval quality is low
  • Agentic orchestration via Jac walkers — decompose complex queries into sub-tasks, retrieve iteratively, verify claims before responding

1.3 — MCP Integration

  • Build AnyGPT MCP server exposing document stores as MCP Resources and search as MCP Tools
  • Support MCP transport (Streamable HTTP for remote, STDIO for local/desktop)
  • Users can connect external data sources (databases, APIs, wikis) as additional MCP servers — the agent queries across all sources
  • MCP-based tool use for actions beyond Q&A (write to databases, trigger workflows, call APIs)

1.4 — Per-Tenant UI & API

  • Auto-generated branded chat UI per tenant (Jac Client + Mantine)
  • REST API endpoint per tenant for programmatic access
  • Embeddable widget (iframe/web component) for integration into existing sites
  • SSE streaming responses with agent reasoning transparency

1.5 — Open Source Launch

  • Dockerized single-command deployment (jac start)
  • Documentation and quickstart guide
  • Community contribution guidelines

Phase 2: Enterprise Edition MVP

Goal: Dashboard, configurator, integrations, on-prem deployment — no auto-specialize yet.

2.1 — Configuration Dashboard

  • Web-based admin dashboard for managing tenants, documents, models, and integrations
  • Visual pipeline configurator — toggle RAG strategies (RAPTOR, hybrid search, Self-RAG) per tenant
  • Model selection — choose between cloud LLMs (Claude, GPT-4, etc.) or self-hosted models (Ollama, vLLM)
  • Usage analytics, query logs, retrieval quality metrics per tenant

2.2 — Integration Layer

  • Pre-built MCP connectors: Slack, Confluence, Notion, Google Drive, SharePoint, GitHub
  • Webhook/event system for custom integrations
  • SSO/SAML authentication
  • Role-based access control (admin, editor, viewer per tenant)

2.3 — White-Label Desktop App

  • Tauri 2.0 desktop app (30-40MB vs 200MB+ Electron, <0.5s startup)
  • Rust backend + Jac Client frontend
  • Local-first mode — bundle Ollama/llama.cpp as Tauri sidecar for fully offline operation
  • Cross-platform: macOS, Windows, Linux (+ iOS/Android via Tauri 2.x)
  • Customizable branding (logo, colors, name) via config file

2.4 — On-Prem Tooling

  • Helm chart for Kubernetes deployment
  • Air-gapped installation support (all models and dependencies bundled)
  • Data residency controls — all data stays within customer infrastructure

Phase 3: Auto-Specialize (Priority in AnyGPT Roadmap)

Goal: Push-button auto-specialization — train a small language model on user's docs that performs as well as Opus/GPT-4 on domain questions.

3.1 — Synthetic Dataset Generation (Step 1)

Use Opus to generate many questions from docs

  • Document analysis walker — crawl uploaded docs, identify key concepts, entities, relationships, edge cases
  • Multi-strategy question generation using Claude Opus:
    • Factual questions (who/what/where/when)
    • Reasoning questions (why/how, multi-hop)
    • Comparison questions (contrast concepts)
    • Application questions (how to use X in scenario Y)
    • Edge case questions (what happens if...)
  • Question quality filtering — deduplicate, validate answerability, ensure coverage of all document sections
  • Target: 5,000-50,000 high-quality Q&A pairs per document corpus (scales with doc size)

3.2 — Agent-Generated Answer Dataset (Step 2)

Use Opus to run agent on questions to generate dataset

  • Run each question through the full Agentic RAG pipeline with Opus as the backbone
  • Capture complete reasoning traces (chain-of-thought, retrieval steps, self-correction)
  • Rejection sampling: generate multiple answers per question (k=8), score each against ground truth, keep only the best reasoning paths
  • Generate preference pairs for DPO: (good answer, bad answer) tuples per question
  • Human-in-the-loop review interface for critical domain accuracy (enterprise tier)
  • Output: SFT dataset (question → best answer with CoT) + DPO dataset (question → preferred vs rejected)

3.3 — SLM Training Pipeline (Step 3)

Use dataset to train SLM to be as good as Opus on docs

  • Base model selection (configurable per tenant):
    • Phi-4-Mini (3.8B) — default, reasoning comparable to 7-9B models
    • Qwen 2.5 (7B) — strong multilingual, benefits most from distillation
    • Llama 3.2 (3B) — optimized for edge/mobile deployment
    • Mistral (7B) — strong general-purpose
  • Stage 1 — SFT with QLoRA: Fine-tune on synthetic Q&A dataset using 4-bit quantization (runs on single 12GB GPU). LoRA rank 16-64, targeting attention + MLP layers
  • Stage 2 — DPO alignment: Direct Preference Optimization on preference pairs (40-75% cheaper than RLHF, no reward model needed). Trains the model to prefer domain-accurate, well-reasoned responses
  • Stage 3 — Evaluation loop:
    • Hold out 10% of synthetic data as eval set
    • Run automated benchmarks: answer accuracy, retrieval faithfulness, hallucination rate
    • Compare SLM outputs against Opus outputs on same questions
    • If below threshold → generate more training data on failure cases → retrain → re-evaluate
  • Stage 4 — Model export: GGUF quantization (Q4_K_M) for local deployment, ONNX for edge, vLLM-compatible for cloud serving
  • One-click training trigger from dashboard — user uploads docs → clicks "Auto-Specialize" → gets a trained model

3.4 — Specialized Model Serving

  • Serve trained SLM alongside RAG pipeline — model handles domain questions natively, RAG provides grounding and citations
  • Automatic routing: use trained model for in-domain queries, fall back to large model for out-of-domain
  • Model versioning — retrain when documents are updated, keep previous versions
  • A/B testing — compare specialized SLM vs large model on live queries to validate quality

Technical Decisions

Why Not Old-School RAG?

Traditional RAG AnyGPT Approach
Naive chunking loses context Contextual Retrieval prepends document-level context to each chunk
Single retrieval pass Agentic RAG with Self-RAG reflection + Corrective RAG fallback
Text-only PDF parsing (lossy) ColPali treats PDFs as images — no parsing pipeline, handles tables/figures natively
Single similarity metric Hybrid search (dense + sparse + ColBERT reranking)
No document hierarchy awareness RAPTOR builds recursive summary trees for synthesis across long docs

Why Not GraphRAG?

GraphRAG (knowledge graph construction + community detection) was considered but dropped due to:

  • High cost — requires LLM calls to extract entities/relationships from every document chunk at ingestion time, making it prohibitively expensive for large document corpora
  • High latency — graph construction is slow (minutes to hours per corpus), and graph traversal queries add significant overhead vs direct vector search
  • Diminishing returns — the combination of Contextual Retrieval + RAPTOR + Self-RAG achieves strong multi-hop reasoning without the graph infrastructure overhead

Why MCP?

MCP (Model Context Protocol) is now an industry standard (Anthropic + OpenAI + Google, under Linux Foundation). Benefits:

  • Standardized context layer — any MCP-compatible client can connect to AnyGPT's document stores
  • Extensible — users plug in their own data sources (Slack, Confluence, databases) without custom code
  • Future-proof — as the ecosystem grows, AnyGPT automatically gains compatibility with new tools and data sources

Why Tauri for Desktop?

  • 10x smaller bundle (10MB vs 100MB+ Electron)
  • 5x less memory (30-40MB vs 200-300MB idle)
  • Sidecar support — bundle llama.cpp/Ollama for fully offline AI
  • Mobile support — iOS + Android via Tauri 2.x (Electron can't do this)
  • Rust backend — fast, safe, system-level access for local model inference

Auto-Specialize IP

The push-button auto-specialization pipeline is AnyGPT's core differentiator:

User Docs ──→ Opus generates questions ──→ Opus + Agentic RAG generates answers
                                                          │
                                              ┌───────────┴───────────┐
                                              │   SFT Dataset         │
                                              │   DPO Preference Pairs│
                                              └───────────┬───────────┘
                                                          │
                                              QLoRA Fine-tune SLM (Phi-4/Qwen/Llama)
                                                          │
                                              DPO Alignment
                                                          │
                                              Eval Loop (compare vs Opus)
                                                          │
                                              ┌───────────┴───────────┐
                                              │  Pass? → Deploy       │
                                              │  Fail? → More data    │
                                              │          → Retrain    │
                                              └───────────────────────┘

No competitor offers this as a push-button feature. Most white-label AI platforms (Dify, Flowise, Langflow, CustomGPT) stop at RAG — none auto-train specialized models. This is the moat.


Competitive Landscape

Platform RAG Custom UI API SLM Training Desktop Open Source
AnyGPT Agentic + RAPTOR + Self-RAG ✅ White-label ✅ Per-tenant ✅ Auto-specialize ✅ Tauri
Dify Basic RAG Limited
CustomGPT Basic RAG
Flowise Basic RAG
Open WebUI Basic RAG Limited
AnythingLLM Basic RAG Limited ✅ Electron

Stack

  • Language: Jac (Jaseci stack)
  • Backend: Jac walkers + byLLM plugin
  • Frontend: Jac Client (React-based) + Mantine UI
  • Vector Store: pgvector / Qdrant (replacing FAISS for multi-tenancy)
  • Multimodal Retrieval: ColPali/ColQwen
  • Reranking: ColBERT v2
  • LLM: Claude Opus (teacher/generation), configurable per tenant
  • SLM Training: QLoRA + DPO via HuggingFace Transformers + TRL
  • Desktop: Tauri 2.0 + Rust + llama.cpp sidecar
  • MCP: Streamable HTTP transport
  • Deployment: Docker + Kubernetes (Helm charts)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions