RBAC RAG Chatbot - FinTech AI Assistant

A comprehensive Retrieval-Augmented Generation (RAG) chatbot system built with LangChain, LangGraph, FastAPI, and Streamlit. This enterprise-grade solution provides role-based access control and domain-specific knowledge retrieval for different organizational departments.

Chat UI
Chat UI

šŸ”’ Data Privacy & Security

In traditional RAG systems, users can potentially craft malicious prompts to access sensitive information across organizational silos, bypassing intended data boundaries. This poses significant security risks where employees might gain unauthorized access to confidential documents from other departments through clever prompt engineering.

Our Solution: This system implements role-based retrieval and response generation at the vector database level, ensuring that users can only access documents relevant to their organizational role. The retrieval mechanism filters documents based on user permissions before any LLM processing occurs, preventing data leakage through prompt manipulation.

šŸš€ Features

  • Role-Based Access Control: Different access levels for various departments (Engineering, Marketing, Finance, HR, C-Level)
  • RAG Architecture: Combines retrieval and generation for accurate, context-aware responses
  • Vector Database: ChromaDB for efficient document storage and retrieval
  • Modern UI: Streamlit-based frontend with authentication
  • FastAPI Backend: RESTful API with HTTP Basic Authentication

šŸ—ļø Architecture

The project consists of three main components:

1. Backend (/backend)

  • FastAPI server with authentication and chat endpoints
  • LangGraph agents for conversation flow management
  • ChromaDB vector store for document retrieval
  • Role-based access control system

2. Frontend (/frontend)

  • Streamlit web interface
  • User authentication and session management
  • Real-time chat interface

3. Training (/training)

  • Data processing and vector store creation
  • Jupyter notebooks for experimentation
  • Document ingestion pipeline

šŸ“‹ Prerequisites

  • Python 3.12+
  • UV package manager (recommended) or pip
  • Google Gemini API key (for LLM integration)

šŸ› ļø Installation

1. Clone the Repository

mkdir RAG-chatbot
git clone https://github.com/shashanksrajak/RBAC-RAG-chatbot.git .
cd RAG-chatbot

2. Environment Setup

Create a .env file in the root directory:
GOOGLE_API_KEY=your_google_gemini_api_key_here

3. Install Dependencies

Using UV (Recommended)

# Backend
cd backend
uv sync

# Frontend
cd ../frontend
uv sync

# Training (optional)
cd ../training
uv sync

šŸš€ Quick Start

1. Start the Backend Server

cd backend
uv run src/main.py
The backend will be available at http://localhost:6001

2. Launch the Frontend

cd frontend
uv run streamlit run main.py
The frontend will be available at http://localhost:8501

3. Login with Demo Credentials

Use any of these demo accounts:

UsernamePasswordRoleAccess Level
Tonypassword123EngineeringTechnical documents
BrucesecurepassMarketingMarketing reports
SamfinancepassFinanceFinancial data
Natashahrpass123HRHR policies
Shashankpassword123C-LevelAll documents

šŸ“ Project Structure

RAG-chatbot/
ā”œā”€ā”€ backend/                    # FastAPI backend
│   ā”œā”€ā”€ src/
│   │   ā”œā”€ā”€ agents/            # LangGraph agents
│   │   │   ā”œā”€ā”€ graph.py       # Agent workflow
│   │   │   ā”œā”€ā”€ nodes.py       # Processing nodes
│   │   │   ā”œā”€ā”€ prompts.py     # LLM prompts
│   │   │   └── states.py      # State management
│   │   ā”œā”€ā”€ services/          # Business logic
│   │   │   └── chatbot.py     # Chatbot service
│   │   └── main.py            # FastAPI application
│   └── pyproject.toml
ā”œā”€ā”€ frontend/                   # Streamlit frontend
│   ā”œā”€ā”€ main.py                # Streamlit app
│   └── pyproject.toml
ā”œā”€ā”€ training/                   # Data processing
│   ā”œā”€ā”€ data/                  # Source documents
│   │   ā”œā”€ā”€ engineering/       # Technical docs
│   │   ā”œā”€ā”€ finance/          # Financial reports
│   │   ā”œā”€ā”€ general/          # General policies
│   │   ā”œā”€ā”€ hr/               # HR documents
│   │   └── marketing/        # Marketing materials
│   ā”œā”€ā”€ chatbot_agent.ipynb   # Training notebook
│   └── RAG_intro.ipynb       # RAG introduction
└── docs/                      # Documentation

šŸ”§ API Endpoints

Authentication

  • GET /login - Authenticate user and get role information
  • GET /test - Test authenticated access

Chat

  • POST /chat - Send message to chatbot (requires authentication)
    • Parameters: message (string)
    • Returns: AI response based on user's role and access level

Health Check

  • GET / - Server status check

🧠 AI Components

LangChain Integration

  • Google Gemini: Primary LLM for response generation
  • ChromaDB: Vector database for document storage
  • Text Splitters: Intelligent document chunking
  • Retrieval Chain: Semantic search and context retrieval

LangGraph Workflow

  • Retrieve Node: Fetches relevant documents based on query and access level
  • Generate Node: Creates contextual responses using retrieved information
  • State Management: Maintains conversation context and user permissions

šŸ“Š Supported Document Types

  • Markdown (.md)
  • CSV (.csv)
  • More formats can be added by extending the document loaders

šŸŽÆ Use Cases

  • Internal Knowledge Base: Company-wide information retrieval
  • Department-Specific Queries: Role-based information access
  • Document Q&A: Natural language queries over company documents
  • Compliance and Policy: Easy access to HR and legal documents
  • Technical Support: Engineering documentation assistance

Note: This is a demonstration project with hardcoded credentials. For production use, implement proper authentication and authorization mechanisms.

Interested in this project?

Check out the full source code and documentation on GitHub.