AI Agent Governance Toolkit — Policy enforcement, zero-trust identity, execution sandboxing, and reliability engineering for autonomous AI agents. Covers 10/10 OWASP Agentic Top 10.
-
Updated
Jun 13, 2026 - Python
AI Agent Governance Toolkit — Policy enforcement, zero-trust identity, execution sandboxing, and reliability engineering for autonomous AI agents. Covers 10/10 OWASP Agentic Top 10.
A curated list of awesome responsible machine learning resources.
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM hallucination detection
Internal Safety Collapse (ISC): Turning the LLM or an AI Agent into a sensitive data generator.
A resource repository for machine unlearning in large language models
Agentlens is a trusted agent trading platform. Here, you can quickly find the Agent that meets your needs, and you can also publish your own Agent to turn it into your digital asset. We encourage everyone to transform their areas of expertise into Agents and turn them into digital assets, allowing others to see your unique strengths.
Deliver safe & effective language models
PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Safety Workshop 2022
The open agent control plane. Govern autonomous AI agents with pre-execution policy enforcement, approval gates, and audit trails. Works with LangChain, CrewAI, MCP, and any framework.
Catch your AI's mistakes and blind spots before your customers or regulators do. iFixAi runs 45 inspections, 32 graded core plus 13 extended for frontier risks like sabotage, sandbagging, and oversight evasion. It returns a letter grade in under 5 minutes. Industry and model agnostic.
Stop AI agents from doing things you didn't ask for.
Open Source LLM toolkit to build trustworthy LLM applications. TigerArmor (AI safety), TigerRAG (embedding, RAG), TigerTune (fine-tuning)
Measuring how well CLI agents like Claude Code or Codex CLI can post-train base LLMs on a single H100 GPU in 10 hours
Runtime policy enforcement for AI agents. Cryptographic audit trail, human-in-the-loop approvals, kill switch. Zero code changes.
Aligning AI With Shared Human Values (ICLR 2021)
👾 下一代透明智能体架构 | Next-Gen Transparent Agent Architecture 🔍 全行为审计 | 🛡️ 两段式安全调用 | 🧠 双水位记忆 | ⏰ 心跳任务 📊 P0 级事故率降低 80% | 兼容 OpenClaw + Claude Code 技能生态
🦀 Prevents outdated Rust code suggestions from AI assistants. This MCP server fetches current crate docs, uses embeddings/LLMs, and provides accurate context via a tool call.
Add a description, image, and links to the ai-safety topic page so that developers can more easily learn about it.
To associate your repository with the ai-safety topic, visit your repo's landing page and select "manage topics."