You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
AnyGPT is a document-based AI assistant platform where users upload files and get accurate, cited answers — built on Jaseci (Jac + byLLM). The existing MVP already has a working RAG pipeline, FAISS vector search, CrossEncoder reranking, session management, and a full Jac-Client frontend.
This issue defines how we modularize the platform into 4 focused repos — one per team — designed to interconnect and allow standalone integration into any product.
fromanygpt_ragimportRagEngineengine=RagEngine(backend="qdrant", tenant_id="acme")
engine.ingest_file("docs/manual.pdf")
results=engine.search("How do I reset my password?")
Repo 3: anygpt-mcp (MCP Server + Connectors 🔌)
Team: Integrations What it owns: AnyGPT as an MCP server, connectors to external MCP servers, tool actions Standalone: Run as an MCP server over Streamable HTTP or STDIO — works with Claude, Cursor, or any MCP client
Standalone usage — expose your AnyGPT knowledge base as MCP:
# Run as standalone MCP server (Streamable HTTP)
anygpt-mcp serve --port 3000 --docs ./my-docs/
# Or via STDIO (for Claude Desktop, Cursor)
anygpt-mcp stdio --docs ./my-docs/
Team: AI/ML (GPU workloads) What it owns: Auto-specialization pipeline — synthetic data generation, SFT+DPO training, eval, model export Why separate: Different infrastructure (GPU), different release cycle, different team skills (ML engineering vs app dev)
graph LR
subgraph PIPELINE["SLM Training Pipeline"]
T1["Step 1\nSynthetic Data Gen\nOpus crawls docs → 5K–50K Q&A pairs\n(multi-strategy)"]
T2["Step 2\nAgent Answer Gen\nOpus + Agentic RAG → answers\nRejection sampling k=8 + DPO pairs"]
T3["Step 3\nSFT + DPO Training\nQLoRA 4-bit → SFT on Q&A\n→ DPO alignment on preference pairs"]
T4["Step 4\nEval Loop\nCompare SLM vs Opus\nPass → Deploy | Fail → Retrain"]
T1 --> T2 --> T3 --> T4
end
subgraph BASE["Base Model Options"]
M1[Phi-4-Mini 3.8B]
M2[Qwen 2.5 7B]
M3[Llama 3.2 3B]
M4[Mistral 7B]
end
subgraph EXPORT["Model Export"]
E1[GGUF Q4_K_M → Local/Desktop]
E2[ONNX → Edge]
E3[vLLM → Cloud Serving]
end
BASE --> PIPELINE
T4 --> EXPORT
EXPORT -->|push to| REG[Model Registry\nin any-gpt]
Loading
Trigger: Automatically invoked by any-gpt's Auto-Specialize Trigger walker when:
New documents are ingested beyond a threshold
SLM accuracy drops below a threshold vs LLM
Tenant explicitly requests a re-specialization
How the Repos Connect
sequenceDiagram
participant U as User
participant CORE as any-gpt
participant RAG as anygpt-rag
participant MCP as anygpt-mcp
participant TRAINER as anygpt-slm-trainer
U->>CORE: Upload docs + chat
CORE->>RAG: ingest_file() / search()
RAG-->>CORE: results + citations
CORE->>MCP: fetch external context\n(Slack, Confluence, Notion)
MCP-->>CORE: additional context chunks
CORE-->>U: streamed answer (SSE)
Note over CORE,TRAINER: Background — triggered async
CORE->>TRAINER: auto_specialize(tenant_id, docs)
TRAINER->>TRAINER: data gen → train → eval
TRAINER-->>CORE: push GGUF model to registry
CORE->>CORE: route in-domain queries to SLM
Loading
Integration with External Products
Modules marked 🔌 are designed to be dropped into any product without AnyGPT:
Module
How to use standalone
anygpt-rag
pip install anygpt-rag — RAG pipeline in any Python app
anygpt-mcp
Run as MCP server (HTTP/STDIO) — works with Claude, Cursor, any MCP client
Multi-Tenant Isolation (Phase 2)
graph LR
subgraph TA["Tenant A — Branded"]
TA1[Custom UI · Logo · Colors]
TA2[API /api/tenant-a/*]
TA3[Isolated Vector Index]
TA4[SLM v2 · Phi-4-Mini]
TA5["RAPTOR=ON · Self-RAG=ON · Model=Claude"]
end
subgraph TB["Tenant B — White-Label"]
TB1[Custom UI · Full Rebrand]
TB2[API /api/tenant-b/*]
TB3[Isolated Vector Index]
TB4[SLM v1 · Qwen 2.5 7B]
TB5["RAPTOR=OFF · Self-RAG=ON · Model=GPT-4"]
end
Loading
Query Flow (Phase 3)
sequenceDiagram
participant U as User
participant GW as Auth + Tenant Router
participant DC as Domain Classifier
participant SLM as Specialized SLM
participant LLM as Cloud LLM
participant RAG as RAG Retrieval
participant SR as Self-RAG Reflection
participant SSE as SSE Stream
U->>GW: Query
GW->>DC: Classify domain
alt In-Domain (trained docs)
DC->>SLM: Route to SLM
else Out-of-Domain
DC->>LLM: Route to LLM
end
SLM->>RAG: Retrieve context
RAG->>SR: Self-RAG reflection check
SR->>SSE: Validated response
SSE->>U: Stream chunks
Loading
Development Phases
gantt
title AnyGPT — Development Roadmap
dateFormat YYYY-MM-DD
section Phase 1 MVP ✅
any-gpt core (done) :done, p1, 2025-01-01, 2025-04-01
section Phase 2 Multi-Tenant
anygpt-rag (extract + Qdrant) :p2a, 2025-04-01, 6w
any-gpt auth + tenant walkers :p2b, 2025-04-01, 6w
any-gpt admin dashboard :p2c, after p2b, 4w
section Phase 3 Advanced AI
anygpt-mcp server :p3a, after p2a, 5w
any-gpt query router + Self-RAG :p3b, after p2b, 5w
section Phase 4 SLM + Enterprise
anygpt-slm-trainer pipeline :p4a, after p2a, 8w
any-gpt Tauri desktop :p4b, after p3b, 6w
any-gpt Helm + K8s deploy :p4c, after p3b, 4w
Loading
Checklist
Phase 2 — Extract anygpt-rag from Any_GPT/services/ingestion + Any_GPT/services/retrieval
Overview
AnyGPT is a document-based AI assistant platform where users upload files and get accurate, cited answers — built on Jaseci (Jac + byLLM). The existing MVP already has a working RAG pipeline, FAISS vector search, CrossEncoder reranking, session management, and a full Jac-Client frontend.
This issue defines how we modularize the platform into 4 focused repos — one per team — designed to interconnect and allow standalone integration into any product.
What Already Exists (MVP)
Stack: Jac · byLLM · Jac Client · Mantine · FAISS · OpenAI Embeddings · CrossEncoder · HF TRL (planned) · Tauri 2.0 (planned)
Current Architecture
graph TD subgraph CLIENT["Frontend — Jac Client + Mantine"] P1[LoginPage / RegisterPage] P2[ChatPage] C1[Sidebar] C2[ChatInput + ChatMessage] C3[FileUpload + DocumentList] H1[useAuth / useChat hooks] S1[anygptService] end subgraph BACKEND["Backend — Jac Walkers"] W1[interact walker\nchat with ReAct + streaming] W2[upload_document walker] W3[list/delete/reindex walkers] W4[get_user_sessions walker] N1[Session node] N2[ChatNode → byLLM ReAct] end subgraph RAG["RAG Engine"] R1[RagEngine\nper-user, shared CrossEncoder] R2[ingestion/\nloaders · chunker · file_store] R3[retrieval/\nvector_store · reranker · retriever] R4[FAISS Index\nOpenAI Embeddings] end subgraph STORAGE["Storage — Local FS"] S2[anygpt-data/tenants/\<tenant>/\nusers/\<user>/uploads/] S3[anygpt-data/tenants/\<tenant>/\nusers/\<user>/faiss_index/] end CLIENT --> BACKEND BACKEND --> RAG RAG --> STORAGETarget: 4-Repo Modular Structure
The split is driven by team ownership, standalone reusability, and deployment independence — not by number of files.
graph TD CORE["📦 any-gpt\nCore Platform\n(this repo, extended)"] RAG["📦 anygpt-rag\nRAG as standalone package\n🔌 pip installable"] MCP["📦 anygpt-mcp\nMCP Server + Connectors\n🔌 works with any MCP client"] TRAINER["📦 anygpt-slm-trainer\nSLM Auto-Specialization\nGPU pipeline, different lifecycle"] CORE -->|uses| RAG CORE -->|uses| MCP TRAINER -->|exports models to| CORE RAG -.->|standalone in any product| EXTERNAL[3rd Party Products] MCP -.->|standalone MCP server| EXTERNALRepo 1:
any-gpt(Core Platform — this repo, extended)This is the main product repo. All platform-level features go here. The RAG engine is consumed as a package from
anygpt-rag.graph TD subgraph ANYG["any-gpt — Phase Roadmap"] PH1["✅ Phase 1 MVP\nRAG chat · doc upload · sessions\nFAISS · single tenant"] PH2["🔜 Phase 2 Multi-Tenant\nTenant isolation · auth (SSO/SAML)\nRBAC · per-tenant RAG config\nAdmin Dashboard"] PH3["🔜 Phase 3 Advanced AI\nQuery Router (SLM vs LLM)\nAgentic RAG Orchestrator\nMCP Context Engine\nSelf-RAG reflection"] PH4["🔜 Phase 4 Enterprise\nTauri desktop (offline llama.cpp)\nHelm / K8s deployment\nAir-gapped support\nSSE streaming"] PH1 --> PH2 --> PH3 --> PH4 endPhase 2 additions:
services/auth/— SSO/SAML, RBAC, JWTservices/tenant/— Tenant Manager walker (isolation · config · model selection)pages/AdminPage.cl.jac— Enterprise config dashboardanygpt-ragpackage (Qdrant/pgvector backend)Phase 3 additions:
walkers/query_router.jac— Domain classifier (in-domain → SLM, out-domain → LLM)walkers/agentic_rag.jac— Sub-task decomposition orchestratorwalkers/mcp_context.jac— MCP Context Engine walkeranygpt-mcpas a sidecarPhase 4 additions:
desktop/— Tauri 2.0 app with local llama.cpp inferencedeploy/— Docker Compose + Helm chartsRepo 2:
anygpt-rag(Standalone RAG Package 🔌)The ingestion + retrieval modules are already well-isolated in the current codebase. This repo extracts and evolves them independently.
graph LR subgraph RAG_PKG["anygpt-rag package"] I1[ingestion/\nloaders · chunker · file_store] I2[retrieval/\nvector_store · reranker · retriever] I3[RagEngine\norchestrator] subgraph BACKENDS["Pluggable Backends"] B1[FAISS\ncurrent Phase 1] B2[Qdrant\nPhase 2] B3[pgvector\nPhase 2] end subgraph STRATEGIES["Retrieval Strategies"] S1[BM25 + Dense\ncurrent] S2[RAPTOR\nPhase 2] S3[Self-RAG\nPhase 2] S4[Contextual Chunking\nPhase 2] end end I1 --> I3 I2 --> I3 I3 --> BACKENDS I3 --> STRATEGIESPhase roadmap:
Standalone usage:
Repo 3:
anygpt-mcp(MCP Server + Connectors 🔌)graph TD subgraph MCP_SERVER["AnyGPT MCP Server"] R1[Resources\nDocs as MCP resources] T1[Tools\nSearch-as-tool · Q&A-as-tool] end subgraph CONNECTORS["External MCP Connectors"] E1[Slack] E2[Confluence · Notion] E3[GitHub · GitLab] E4[Google Drive] E5[Jira · Linear] end subgraph ACTIONS["MCP Tool Actions"] A1[DB Writes] A2[Webhook Triggers] A3[Workflow Automation] end MCP_CLIENT[Claude · Cursor · Any MCP Client] --> MCP_SERVER MCP_SERVER --> CONNECTORS MCP_SERVER --> ACTIONSStandalone usage — expose your AnyGPT knowledge base as MCP:
Repo 4:
anygpt-slm-trainer(SLM Auto-Specialization)graph LR subgraph PIPELINE["SLM Training Pipeline"] T1["Step 1\nSynthetic Data Gen\nOpus crawls docs → 5K–50K Q&A pairs\n(multi-strategy)"] T2["Step 2\nAgent Answer Gen\nOpus + Agentic RAG → answers\nRejection sampling k=8 + DPO pairs"] T3["Step 3\nSFT + DPO Training\nQLoRA 4-bit → SFT on Q&A\n→ DPO alignment on preference pairs"] T4["Step 4\nEval Loop\nCompare SLM vs Opus\nPass → Deploy | Fail → Retrain"] T1 --> T2 --> T3 --> T4 end subgraph BASE["Base Model Options"] M1[Phi-4-Mini 3.8B] M2[Qwen 2.5 7B] M3[Llama 3.2 3B] M4[Mistral 7B] end subgraph EXPORT["Model Export"] E1[GGUF Q4_K_M → Local/Desktop] E2[ONNX → Edge] E3[vLLM → Cloud Serving] end BASE --> PIPELINE T4 --> EXPORT EXPORT -->|push to| REG[Model Registry\nin any-gpt]Trigger: Automatically invoked by
any-gpt's Auto-Specialize Trigger walker when:How the Repos Connect
sequenceDiagram participant U as User participant CORE as any-gpt participant RAG as anygpt-rag participant MCP as anygpt-mcp participant TRAINER as anygpt-slm-trainer U->>CORE: Upload docs + chat CORE->>RAG: ingest_file() / search() RAG-->>CORE: results + citations CORE->>MCP: fetch external context\n(Slack, Confluence, Notion) MCP-->>CORE: additional context chunks CORE-->>U: streamed answer (SSE) Note over CORE,TRAINER: Background — triggered async CORE->>TRAINER: auto_specialize(tenant_id, docs) TRAINER->>TRAINER: data gen → train → eval TRAINER-->>CORE: push GGUF model to registry CORE->>CORE: route in-domain queries to SLMIntegration with External Products
Modules marked 🔌 are designed to be dropped into any product without AnyGPT:
anygpt-ragpip install anygpt-rag— RAG pipeline in any Python appanygpt-mcpMulti-Tenant Isolation (Phase 2)
graph LR subgraph TA["Tenant A — Branded"] TA1[Custom UI · Logo · Colors] TA2[API /api/tenant-a/*] TA3[Isolated Vector Index] TA4[SLM v2 · Phi-4-Mini] TA5["RAPTOR=ON · Self-RAG=ON · Model=Claude"] end subgraph TB["Tenant B — White-Label"] TB1[Custom UI · Full Rebrand] TB2[API /api/tenant-b/*] TB3[Isolated Vector Index] TB4[SLM v1 · Qwen 2.5 7B] TB5["RAPTOR=OFF · Self-RAG=ON · Model=GPT-4"] endQuery Flow (Phase 3)
sequenceDiagram participant U as User participant GW as Auth + Tenant Router participant DC as Domain Classifier participant SLM as Specialized SLM participant LLM as Cloud LLM participant RAG as RAG Retrieval participant SR as Self-RAG Reflection participant SSE as SSE Stream U->>GW: Query GW->>DC: Classify domain alt In-Domain (trained docs) DC->>SLM: Route to SLM else Out-of-Domain DC->>LLM: Route to LLM end SLM->>RAG: Retrieve context RAG->>SR: Self-RAG reflection check SR->>SSE: Validated response SSE->>U: Stream chunksDevelopment Phases
gantt title AnyGPT — Development Roadmap dateFormat YYYY-MM-DD section Phase 1 MVP ✅ any-gpt core (done) :done, p1, 2025-01-01, 2025-04-01 section Phase 2 Multi-Tenant anygpt-rag (extract + Qdrant) :p2a, 2025-04-01, 6w any-gpt auth + tenant walkers :p2b, 2025-04-01, 6w any-gpt admin dashboard :p2c, after p2b, 4w section Phase 3 Advanced AI anygpt-mcp server :p3a, after p2a, 5w any-gpt query router + Self-RAG :p3b, after p2b, 5w section Phase 4 SLM + Enterprise anygpt-slm-trainer pipeline :p4a, after p2a, 8w any-gpt Tauri desktop :p4b, after p3b, 6w any-gpt Helm + K8s deploy :p4c, after p3b, 4wChecklist
anygpt-ragfromAny_GPT/services/ingestion+Any_GPT/services/retrievalany-gptAdminPage.cl.jac)anygpt-mcprepo: MCP server + Slack/Confluence/Notion connectorsany-gptanygpt-slm-trainerrepo: data gen → SFT+DPO → eval → export