Skip to content

AnyGPT Platform — Modular Multi-Repo Development Plan #544

@udithishanka

Description

@udithishanka

Overview

AnyGPT is a document-based AI assistant platform where users upload files and get accurate, cited answers — built on Jaseci (Jac + byLLM). The existing MVP already has a working RAG pipeline, FAISS vector search, CrossEncoder reranking, session management, and a full Jac-Client frontend.

This issue defines how we modularize the platform into 4 focused repos — one per team — designed to interconnect and allow standalone integration into any product.


What Already Exists (MVP)

Any_GPT/
├── main.jac                         ← Entry point
├── services/
│   ├── server.jac                   ← Jac walkers (chat, upload, session, docs)
│   ├── server.impl.jac
│   ├── rag_engine.jac               ← RAG orchestrator (FAISS + CrossEncoder)
│   ├── ingestion/                   ← loaders, chunker, file_store
│   ├── retrieval/                   ← vector_store, reranker, retriever
│   └── models/                      ← tenant, document, session nodes
├── components/                      ← Jac-Client UI components
├── pages/                           ← ChatPage, LoginPage, RegisterPage
├── hooks/                           ← useAuth, useChat
├── services/anygptService.cl.jac    ← Frontend API service
└── config/anygpt.json               ← Platform config

Stack: Jac · byLLM · Jac Client · Mantine · FAISS · OpenAI Embeddings · CrossEncoder · HF TRL (planned) · Tauri 2.0 (planned)


Current Architecture

graph TD
    subgraph CLIENT["Frontend  —  Jac Client + Mantine"]
        P1[LoginPage / RegisterPage]
        P2[ChatPage]
        C1[Sidebar]
        C2[ChatInput + ChatMessage]
        C3[FileUpload + DocumentList]
        H1[useAuth / useChat hooks]
        S1[anygptService]
    end

    subgraph BACKEND["Backend  —  Jac Walkers"]
        W1[interact walker\nchat with ReAct + streaming]
        W2[upload_document walker]
        W3[list/delete/reindex walkers]
        W4[get_user_sessions walker]
        N1[Session node]
        N2[ChatNode → byLLM ReAct]
    end

    subgraph RAG["RAG Engine"]
        R1[RagEngine\nper-user, shared CrossEncoder]
        R2[ingestion/\nloaders · chunker · file_store]
        R3[retrieval/\nvector_store · reranker · retriever]
        R4[FAISS Index\nOpenAI Embeddings]
    end

    subgraph STORAGE["Storage  —  Local FS"]
        S2[anygpt-data/tenants/\<tenant>/\nusers/\<user>/uploads/]
        S3[anygpt-data/tenants/\<tenant>/\nusers/\<user>/faiss_index/]
    end

    CLIENT --> BACKEND
    BACKEND --> RAG
    RAG --> STORAGE
Loading

Target: 4-Repo Modular Structure

The split is driven by team ownership, standalone reusability, and deployment independence — not by number of files.

graph TD
    CORE["📦 any-gpt\nCore Platform\n(this repo, extended)"]
    RAG["📦 anygpt-rag\nRAG as standalone package\n🔌 pip installable"]
    MCP["📦 anygpt-mcp\nMCP Server + Connectors\n🔌 works with any MCP client"]
    TRAINER["📦 anygpt-slm-trainer\nSLM Auto-Specialization\nGPU pipeline, different lifecycle"]

    CORE -->|uses| RAG
    CORE -->|uses| MCP
    TRAINER -->|exports models to| CORE
    RAG -.->|standalone in any product| EXTERNAL[3rd Party Products]
    MCP -.->|standalone MCP server| EXTERNAL
Loading

Repo 1: any-gpt (Core Platform — this repo, extended)

Team: Core + Frontend
What it owns: Platform walkers, UI, auth, multi-tenancy, admin dashboard, Tauri desktop, deployment configs

This is the main product repo. All platform-level features go here. The RAG engine is consumed as a package from anygpt-rag.

graph TD
    subgraph ANYG["any-gpt — Phase Roadmap"]
        PH1["✅ Phase 1  MVP\nRAG chat · doc upload · sessions\nFAISS · single tenant"]
        PH2["🔜 Phase 2  Multi-Tenant\nTenant isolation · auth (SSO/SAML)\nRBAC · per-tenant RAG config\nAdmin Dashboard"]
        PH3["🔜 Phase 3  Advanced AI\nQuery Router (SLM vs LLM)\nAgentic RAG Orchestrator\nMCP Context Engine\nSelf-RAG reflection"]
        PH4["🔜 Phase 4  Enterprise\nTauri desktop (offline llama.cpp)\nHelm / K8s deployment\nAir-gapped support\nSSE streaming"]
        PH1 --> PH2 --> PH3 --> PH4
    end
Loading

Phase 2 additions:

  • services/auth/ — SSO/SAML, RBAC, JWT
  • services/tenant/ — Tenant Manager walker (isolation · config · model selection)
  • pages/AdminPage.cl.jac — Enterprise config dashboard
  • Replace FAISS with anygpt-rag package (Qdrant/pgvector backend)

Phase 3 additions:

  • walkers/query_router.jac — Domain classifier (in-domain → SLM, out-domain → LLM)
  • walkers/agentic_rag.jac — Sub-task decomposition orchestrator
  • walkers/mcp_context.jac — MCP Context Engine walker
  • Consume anygpt-mcp as a sidecar

Phase 4 additions:

  • desktop/ — Tauri 2.0 app with local llama.cpp inference
  • deploy/ — Docker Compose + Helm charts

Repo 2: anygpt-rag (Standalone RAG Package 🔌)

Team: AI/ML
What it owns: Document ingestion pipeline, vector search, reranking, advanced retrieval strategies
Standalone: pip install anygpt-rag — usable in any Python/Jac project

The ingestion + retrieval modules are already well-isolated in the current codebase. This repo extracts and evolves them independently.

graph LR
    subgraph RAG_PKG["anygpt-rag package"]
        I1[ingestion/\nloaders · chunker · file_store]
        I2[retrieval/\nvector_store · reranker · retriever]
        I3[RagEngine\norchestrator]

        subgraph BACKENDS["Pluggable Backends"]
            B1[FAISS\ncurrent Phase 1]
            B2[Qdrant\nPhase 2]
            B3[pgvector\nPhase 2]
        end

        subgraph STRATEGIES["Retrieval Strategies"]
            S1[BM25 + Dense\ncurrent]
            S2[RAPTOR\nPhase 2]
            S3[Self-RAG\nPhase 2]
            S4[Contextual Chunking\nPhase 2]
        end
    end

    I1 --> I3
    I2 --> I3
    I3 --> BACKENDS
    I3 --> STRATEGIES
Loading

Phase roadmap:

  • Phase 1: Extract from Any_GPT as-is (FAISS + CrossEncoder)
  • Phase 2: Pluggable backends (Qdrant, pgvector), RAPTOR, Self-RAG, BM25 hybrid
  • Phase 3: Per-tenant isolated indexes, multimodal (image/table extraction)

Standalone usage:

from anygpt_rag import RagEngine
engine = RagEngine(backend="qdrant", tenant_id="acme")
engine.ingest_file("docs/manual.pdf")
results = engine.search("How do I reset my password?")

Repo 3: anygpt-mcp (MCP Server + Connectors 🔌)

Team: Integrations
What it owns: AnyGPT as an MCP server, connectors to external MCP servers, tool actions
Standalone: Run as an MCP server over Streamable HTTP or STDIO — works with Claude, Cursor, or any MCP client

graph TD
    subgraph MCP_SERVER["AnyGPT MCP Server"]
        R1[Resources\nDocs as MCP resources]
        T1[Tools\nSearch-as-tool · Q&A-as-tool]
    end

    subgraph CONNECTORS["External MCP Connectors"]
        E1[Slack]
        E2[Confluence · Notion]
        E3[GitHub · GitLab]
        E4[Google Drive]
        E5[Jira · Linear]
    end

    subgraph ACTIONS["MCP Tool Actions"]
        A1[DB Writes]
        A2[Webhook Triggers]
        A3[Workflow Automation]
    end

    MCP_CLIENT[Claude · Cursor · Any MCP Client] --> MCP_SERVER
    MCP_SERVER --> CONNECTORS
    MCP_SERVER --> ACTIONS
Loading

Standalone usage — expose your AnyGPT knowledge base as MCP:

# Run as standalone MCP server (Streamable HTTP)
anygpt-mcp serve --port 3000 --docs ./my-docs/

# Or via STDIO (for Claude Desktop, Cursor)
anygpt-mcp stdio --docs ./my-docs/

Repo 4: anygpt-slm-trainer (SLM Auto-Specialization)

Team: AI/ML (GPU workloads)
What it owns: Auto-specialization pipeline — synthetic data generation, SFT+DPO training, eval, model export
Why separate: Different infrastructure (GPU), different release cycle, different team skills (ML engineering vs app dev)

graph LR
    subgraph PIPELINE["SLM Training Pipeline"]
        T1["Step 1\nSynthetic Data Gen\nOpus crawls docs → 5K–50K Q&A pairs\n(multi-strategy)"]
        T2["Step 2\nAgent Answer Gen\nOpus + Agentic RAG → answers\nRejection sampling k=8 + DPO pairs"]
        T3["Step 3\nSFT + DPO Training\nQLoRA 4-bit → SFT on Q&A\n→ DPO alignment on preference pairs"]
        T4["Step 4\nEval Loop\nCompare SLM vs Opus\nPass → Deploy  |  Fail → Retrain"]
        T1 --> T2 --> T3 --> T4
    end

    subgraph BASE["Base Model Options"]
        M1[Phi-4-Mini 3.8B]
        M2[Qwen 2.5 7B]
        M3[Llama 3.2 3B]
        M4[Mistral 7B]
    end

    subgraph EXPORT["Model Export"]
        E1[GGUF Q4_K_M → Local/Desktop]
        E2[ONNX → Edge]
        E3[vLLM → Cloud Serving]
    end

    BASE --> PIPELINE
    T4 --> EXPORT
    EXPORT -->|push to| REG[Model Registry\nin any-gpt]
Loading

Trigger: Automatically invoked by any-gpt's Auto-Specialize Trigger walker when:

  • New documents are ingested beyond a threshold
  • SLM accuracy drops below a threshold vs LLM
  • Tenant explicitly requests a re-specialization

How the Repos Connect

sequenceDiagram
    participant U as User
    participant CORE as any-gpt
    participant RAG as anygpt-rag
    participant MCP as anygpt-mcp
    participant TRAINER as anygpt-slm-trainer

    U->>CORE: Upload docs + chat
    CORE->>RAG: ingest_file() / search()
    RAG-->>CORE: results + citations
    CORE->>MCP: fetch external context\n(Slack, Confluence, Notion)
    MCP-->>CORE: additional context chunks
    CORE-->>U: streamed answer (SSE)

    Note over CORE,TRAINER: Background — triggered async
    CORE->>TRAINER: auto_specialize(tenant_id, docs)
    TRAINER->>TRAINER: data gen → train → eval
    TRAINER-->>CORE: push GGUF model to registry
    CORE->>CORE: route in-domain queries to SLM
Loading

Integration with External Products

Modules marked 🔌 are designed to be dropped into any product without AnyGPT:

Module How to use standalone
anygpt-rag pip install anygpt-rag — RAG pipeline in any Python app
anygpt-mcp Run as MCP server (HTTP/STDIO) — works with Claude, Cursor, any MCP client

Multi-Tenant Isolation (Phase 2)

graph LR
    subgraph TA["Tenant A  —  Branded"]
        TA1[Custom UI · Logo · Colors]
        TA2[API /api/tenant-a/*]
        TA3[Isolated Vector Index]
        TA4[SLM v2 · Phi-4-Mini]
        TA5["RAPTOR=ON · Self-RAG=ON · Model=Claude"]
    end
    subgraph TB["Tenant B  —  White-Label"]
        TB1[Custom UI · Full Rebrand]
        TB2[API /api/tenant-b/*]
        TB3[Isolated Vector Index]
        TB4[SLM v1 · Qwen 2.5 7B]
        TB5["RAPTOR=OFF · Self-RAG=ON · Model=GPT-4"]
    end
Loading

Query Flow (Phase 3)

sequenceDiagram
    participant U as User
    participant GW as Auth + Tenant Router
    participant DC as Domain Classifier
    participant SLM as Specialized SLM
    participant LLM as Cloud LLM
    participant RAG as RAG Retrieval
    participant SR as Self-RAG Reflection
    participant SSE as SSE Stream

    U->>GW: Query
    GW->>DC: Classify domain
    alt In-Domain (trained docs)
        DC->>SLM: Route to SLM
    else Out-of-Domain
        DC->>LLM: Route to LLM
    end
    SLM->>RAG: Retrieve context
    RAG->>SR: Self-RAG reflection check
    SR->>SSE: Validated response
    SSE->>U: Stream chunks
Loading

Development Phases

gantt
    title AnyGPT — Development Roadmap
    dateFormat  YYYY-MM-DD
    section Phase 1  MVP  ✅
    any-gpt core (done)           :done, p1, 2025-01-01, 2025-04-01
    section Phase 2  Multi-Tenant
    anygpt-rag (extract + Qdrant) :p2a, 2025-04-01, 6w
    any-gpt auth + tenant walkers :p2b, 2025-04-01, 6w
    any-gpt admin dashboard       :p2c, after p2b, 4w
    section Phase 3  Advanced AI
    anygpt-mcp server             :p3a, after p2a, 5w
    any-gpt query router + Self-RAG :p3b, after p2b, 5w
    section Phase 4  SLM + Enterprise
    anygpt-slm-trainer pipeline   :p4a, after p2a, 8w
    any-gpt Tauri desktop         :p4b, after p3b, 6w
    any-gpt Helm + K8s deploy     :p4c, after p3b, 4w
Loading

Checklist

  • Phase 2 — Extract anygpt-rag from Any_GPT/services/ingestion + Any_GPT/services/retrieval
  • Phase 2 — Add auth, RBAC, tenant isolation walkers to any-gpt
  • Phase 2 — Build admin dashboard (AdminPage.cl.jac)
  • Phase 3 — Create anygpt-mcp repo: MCP server + Slack/Confluence/Notion connectors
  • Phase 3 — Add Query Router + Self-RAG walkers to any-gpt
  • Phase 4 — Create anygpt-slm-trainer repo: data gen → SFT+DPO → eval → export
  • Phase 4 — Tauri 2.0 desktop app (offline llama.cpp via GGUF export)
  • Phase 4 — Helm charts + K8s manifests, air-gapped deployment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions