github-process-manager

🤖 GitHub Process Manager - AI-Powered Workflow and Documentation Assistant

A lightweight, local AI-powered assistant that combines Retrieval-Augmented Generation (RAG) with the Gemini API and GitHub repository integration. Upload reference documents, connect to your GitHub repositories, and get intelligent responses for process documentation, SOX compliance, MLOps workflows, DevOps pipelines, and more.

✨ Features

🧠 RAG-Powered Responses: Upload documents (.txt, .pdf, .docx) to create a knowledge base
🤖 Gemini AI Integration: Leverages Google’s Gemini Pro for intelligent responses
🔗 GitHub Repository Connection: Access PRs, issues, workflow runs, and repository files
⚡ GitHub Actions Control: Manually trigger workflows directly from the interface
� Word Document Generation: Create professionally formatted process documentation
💻 Browser-Based UI: Clean, professional interface with light blue and white theme
🪶 Lightweight & Local: Runs entirely on your machine with minimal resource usage
📊 ChromaDB Vector Storage: Efficient document embedding and retrieval
🔒 Secure Configuration: Environment-based secrets management
🎯 Multi-Template Support: SOX audits, MLOps workflows, DevOps pipelines, and generic documentation

🎯 Use Cases

SOX Compliance & Auditing

Document internal controls and procedures
Generate 5-section SOX control analysis reports
Track testing procedures and results
Create audit-ready Word documents

MLOps Workflows

Document machine learning pipelines
Track model training and validation
Generate deployment documentation
Monitor ML workflow processes

DevOps Pipelines

Document CI/CD pipelines
Track build and deployment processes
Generate pipeline documentation
Monitor infrastructure changes

General Process Documentation

Create structured process documentation
Generate professional Word reports
Track project workflows
Document best practices and procedures

📋 Prerequisites

Python 3.8+
Gemini API Key (Get one here)
GitHub Personal Access Token (optional, for GitHub features)
Git (for cloning the repository)

🚀 Quick Start

1. Clone the Repository

git clone <your-repo-url>
cd github-process-manager

2. Create Virtual Environment

# Windows
python -m venv venv
venv\Scripts\activate

# macOS/Linux
python3 -m venv venv
source venv/bin/activate

3. Install Dependencies

pip install -r requirements.txt

4. Configure Environment Variables

Copy the template and edit with your credentials:

# Windows
copy .env.template .env

# macOS/Linux
cp .env.template .env

Edit .env file:

# Required: Gemini API Key
GEMINI_API_KEY=your_gemini_api_key_here

# Optional: GitHub Integration
GITHUB_TOKEN=your_github_personal_access_token_here
GITHUB_REPO_URL=https://github.com/username/repository

# Flask Configuration
FLASK_SECRET_KEY=your_secret_key_here
FLASK_DEBUG=True

Getting Your API Keys:

Gemini API Key: Visit Google AI Studio
GitHub Token: Go to GitHub Settings → Developer Settings → Personal Access Tokens → Generate new token
- Required scopes: repo, workflow (for triggering actions)

5. Run the Application

python app.py

The application will be available at: http://localhost:5000

� Docker Deployment (Recommended)

For a consistent, isolated environment, use Docker:

Quick Start with Docker Compose

# 1. Configure environment
cp .env.template .env
# Edit .env with your API keys

# 2. Start the application
docker-compose up -d

# 3. View logs
docker-compose logs -f app

# 4. Access at http://localhost:5000

Development with VS Code Dev Container

Install Remote - Containers extension
Open project in VS Code
Press F1 → “Remote-Containers: Reopen in Container”
Environment is automatically configured with all dependencies

Docker Commands

# Stop the application
docker-compose down

# Rebuild after changes
docker-compose up -d --build

# Production mode
docker-compose -f docker-compose.prod.yml up -d

# View container shell
docker-compose exec app /bin/bash

For detailed Docker setup, see README.docker.md

�📖 Usage Guide

Upload Reference Documents

Navigate to the main Chat page
Click “Choose File” in the upload section
Select a document (.txt, .pdf, or .docx)
Click “Upload” to process the document
The document will be chunked, embedded, and stored in ChromaDB

Connect to GitHub Repository

Go to the Settings page
Enter your GitHub repository URL (e.g., https://github.com/username/repo)
Click “Connect Repository”
Once connected, the chatbot can access PRs, issues, and workflows

Chat with the AI

Type your question in the chat input
The chatbot will:
- Retrieve relevant document chunks from your uploaded files
- Fetch related GitHub repository data (if connected)
- Generate a response using Gemini AI with all context
Responses cite sources from documents and GitHub data

Trigger GitHub Actions

Go to Settings → GitHub Actions
Click “Load Workflows”
Click “Trigger” on any workflow to manually start it

Customize AI Behavior

The application supports customizable system prompts to tailor AI responses to your needs:

Using Pre-defined Templates

Go to Settings → AI System Prompt Configuration
Select a template from the dropdown:
- Default - Balanced assistant for general queries
- Technical Expert - Deep technical explanations with code examples
- Security Auditor - Security-focused analysis and compliance
- Developer Assistant - Code-heavy responses with best practices
- Data Analyst - Structured analysis with metrics and insights
- Technical Educator - Clear explanations for learning purposes
Click “Update Prompt” to apply (changes last for your session)
See the preview to verify the selected template

Creating Custom Prompts

Go to Settings → AI System Prompt Configuration
Select “Custom Prompt” from the dropdown
Write your own system instruction in the text editor
Click “Update Prompt” to apply

Example custom prompt:

You are a helpful assistant specializing in cloud infrastructure.
Focus on AWS best practices, security, and cost optimization.
Provide actionable recommendations with specific service names.

Permanent Configuration (via .env)

For persistent customization across server restarts:

Edit your .env file

Set one of these variables:

# Use a pre-defined template
SYSTEM_PROMPT_TEMPLATE=technical_expert
   
# Or set a custom prompt
CUSTOM_SYSTEM_PROMPT="Your custom system instruction here"

Restart the application

Available Templates: default, technical_expert, security_auditor, developer_assistant, data_analyst, technical_educator

Note: Session-based changes (via UI) take priority over .env settings until the server restarts.

Customize Document Templates

The application supports configurable Word document templates with custom branding:

Available Document Templates

SOX Audit - 5-section compliance reports (Control Objective, Risks, Testing, Results, Conclusion)
MLOps Workflow - ML pipeline documentation (Model Overview, Data Pipeline, Training, Validation, Deployment)
DevOps Pipeline - CI/CD documentation (Pipeline Overview, Build Steps, Quality Gates, Deployment, Monitoring)
Generic - General purpose documentation (Overview, Components, Procedures, Results, Recommendations)

Customize Branding

Edit your .env file to personalize generated documents:

# Project name for document headers
PROJECT_NAME=GitHub Process Manager

# Optional: Add company name to headers
COMPANY_NAME=Your Company Name

# Brand color (hex format #RRGGBB)
BRAND_COLOR=#4A90E2

# Optional: Add logo to document headers (.png, .jpg, .jpeg)
DOCUMENT_LOGO_PATH=/path/to/your/logo.png

# Default template type
DEFAULT_TEMPLATE_TYPE=generic

Create Custom Templates

Modify document_templates.json to add new templates:

{
  "templates": {
    "your_template": {
      "name": "Your Template Name",
      "report_title": "Your Report Title",
      "sections": [
        {"number": 1, "title": "Section 1", "key": "Section 1"},
        {"number": 2, "title": "Section 2", "key": "Section 2"}
      ],
      "keywords": ["keyword1", "keyword2"]
    }
  }
}

Template Features:

Custom section structures (3-7 sections recommended)
Keyword-based auto-detection
Configurable headers and colors
Optional logo support
Professional formatting (Calibri, proper spacing, page numbers)

Using for MLOps

The application includes specialized MLOps templates and workflows for managing machine learning operations.

MLOps Documentation Templates

Located in templates/mlops/, these guides provide comprehensive MLOps best practices:

mlops_guide.md - Complete MLOps lifecycle guide covering:
- Model development and version control
- Experiment tracking (MLflow, Weights & Biases)
- Training best practices and reproducibility
- Model validation strategies
- Deployment strategies (Blue-Green, Canary, Shadow)
- Monitoring and drift detection
- Model retraining triggers
model_validation_template.md - Structured validation report template:
- Model overview and business context
- Validation methodology (unit, integration, performance, regression)
- Performance metrics and comparison with baseline
- Bias and fairness analysis
- Failure pattern analysis
- Deployment recommendations
deployment_checklist.md - Comprehensive pre-deployment checklist:
- Model readiness verification
- Security and compliance checks
- Monitoring and observability setup
- Testing requirements (functional, performance, integration)
- Deployment strategy selection
- Rollback procedures
monitoring_guide.md - Production monitoring strategies:
- Performance metrics tracking
- Data drift detection methods
- Infrastructure monitoring
- Alert configuration
- Incident response procedures

Using MLOps Templates as RAG Documents

Navigate to the Chat page
Upload MLOps template files from templates/mlops/
Ask questions about ML workflows:
- “What metrics should I track for a classification model?”
- “How do I implement canary deployment for my model?”
- “What are the best practices for detecting data drift?”
- “Create a validation checklist for my model deployment”

MLOps GitHub Actions Workflows

Located in .github/workflows/mlops/, trigger workflows for automated documentation:

Model Validation Report (mlops-model-validation.yml):

Navigate to Settings → GitHub Actions
Select “MLOps Model Validation Report”
Provide inputs:
- Model Name: Your model identifier
- Model Version: Semantic version (e.g., 1.2.0)
- Validation Type: unit, integration, performance, or regression
- Metrics JSON: {"accuracy": 0.95, "f1": 0.93, "precision": 0.94}
Click Trigger to generate a validation report document
Download from GitHub Actions artifacts

Deployment Documentation (mlops-deployment-doc.yml):

Select “MLOps Deployment Documentation”
Provide inputs:
- Model Name: Model to deploy
- Model Version: Version number
- Deployment Target: staging, production, canary, or development
- Deployment Strategy: blue-green, canary, rolling, or shadow
Generated document includes deployment plan and rollback procedures

Example MLOps Queries

Try these queries with MLOps templates uploaded:

Model Training:

"Document the training process for a fraud detection model with 95% accuracy"

Deployment Planning:

"Create a deployment checklist for deploying a recommendation model to production"

Monitoring Setup:

"What alerts should I configure for monitoring a prediction model in production?"

Validation Reporting:

"Generate a validation report for model version 2.1.0 with accuracy 94.2%, precision 93.8%, recall 94.5%"

Integration with ML Tools

The MLOps templates include guidance for integrating with popular ML platforms:

MLflow: Experiment tracking, model registry, deployment
Weights & Biases: Real-time metrics visualization
TensorBoard: TensorFlow/PyTorch metrics
Kubeflow: Kubernetes-native ML workflows
AWS SageMaker, Google Vertex AI, Azure ML: Cloud ML platforms

Export metrics from these tools and use the GitHub Actions workflows to generate documentation with your actual performance data.

MLOps Workflow Best Practices

Version Everything: Code, data, models, configurations
Track All Experiments: Log hyperparameters, metrics, and artifacts
Validate Before Deploying: Run all tests (unit, integration, performance)
Monitor Continuously: Set up drift detection and performance alerts
Document Thoroughly: Use templates for consistency
Plan Rollbacks: Always have a tested rollback strategy

🏗️ Project Structure

github-process-manager/
├── app.py                  # Main Flask application
├── config.py               # Configuration management
├── logger.py               # Logging setup
├── rag_engine.py           # RAG document processing
├── gemini_client.py        # Gemini API integration
├── github_client.py        # GitHub API integration
├── word_generator.py       # Word document generation
├── requirements.txt        # Python dependencies
├── .env.template           # Environment variable template
├── .gitignore             # Git ignore rules
├── document_templates.json # Document template configuration
├── templates/
│   ├── base.html          # Base template
│   ├── index.html         # Chat interface
│   └── settings.html      # Settings page
├── static/
│   └── css/
│       └── style.css      # Application styling
├── .github/
│   └── workflows/
│       ├── process-analysis-doc.yml  # Generic process workflow
│       └── sox-analysis-doc.yml      # SOX-specific workflow (legacy)
├── chroma_db/             # ChromaDB storage (auto-created)
├── uploads/               # Temporary upload folder (auto-created)
├── generated_reports/     # Generated Word documents (auto-created)
└── README.md              # This file

🔧 Configuration Options

Edit config.py or set environment variables:

Variable	Description	Default
`GEMINI_API_KEY`	Google Gemini API key	Required
`GEMINI_TEMPERATURE`	AI response randomness (0.0-1.0)	0.7
`GEMINI_MAX_TOKENS`	Maximum response length	2048
`SYSTEM_PROMPT_TEMPLATE`	Pre-defined prompt template	`default`
`CUSTOM_SYSTEM_PROMPT`	Custom system instruction	None	`PROJECT_NAME`	Project name for documents	`GitHub Process Manager`
`COMPANY_NAME`	Company name for documents	None
`BRAND_COLOR`	Document brand color (hex)	`#4A90E2`
`DOCUMENT_LOGO_PATH`	Path to logo for documents	None
`DEFAULT_TEMPLATE_TYPE`	Default document template	`generic`
`DOCUMENT_TEMPLATES_PATH`	Template config file path	`document_templates.json`	`GITHUB_TOKEN`	GitHub personal access token	Optional
`GITHUB_REPO_URL`	GitHub repository URL	Optional
`FLASK_SECRET_KEY`	Flask session secret	Auto-generated
`CHROMA_DB_PATH`	ChromaDB storage location	`./chroma_db`
`CHUNK_SIZE`	Characters per document chunk	800
`CHUNK_OVERLAP`	Overlap between chunks	200
`TOP_K_RESULTS`	RAG chunks to retrieve	3
`MLOPS_FEATURES_ENABLED`	Enable MLOps features	`false`
`MLOPS_TEMPLATES_DIR`	MLOps templates directory	`templates/mlops`
`MLOPS_WORKFLOWS_DIR`	MLOps workflows directory	`.github/workflows/mlops`

🛠️ API Endpoints

Chat

POST /api/chat - Send query and get AI response

Document Management

POST /api/upload - Upload document for RAG
GET /api/rag/stats - Get RAG database statistics
POST /api/rag/clear - Clear all documents

GitHub Integration

POST /api/github/connect - Connect to repository
GET /api/github/info - Get repository info
GET /api/github/workflows - List workflows
POST /api/github/workflow/trigger - Trigger workflow
GET /api/github/pulls - Get pull requests
GET /api/github/issues - Get issues

AI Prompt Management

GET /api/prompts/templates - Get available prompt templates
GET /api/prompts/current - Get current active prompt
POST /api/prompts/update - Update system prompt (session-based)
POST /api/prompts/reset - Reset to default prompt

MLOps (Optional - requires MLOPS_FEATURES_ENABLED=true)

GET /api/mlops/status - Check MLOps feature availability and configuration
POST /api/mlops/parse-metrics - Parse and format ML metrics JSON
POST /api/mlops/validate-metrics - Validate ML metrics against schema
GET /api/mlops/templates - List available MLOps documentation templates

System

GET /health - Health check endpoint

❗ Troubleshooting

“Configuration validation failed: GEMINI_API_KEY is not set”

Make sure you’ve created a .env file from .env.template
Add your Gemini API key to the .env file
Restart the application

Documents not being processed

Check file format (.txt, .pdf, .docx only)
Ensure file size is under 16MB
Check app.log for detailed error messages

GitHub connection failing

Verify your GitHub token has correct permissions (repo, workflow)
Check that the repository URL is correct
Ensure the token hasn’t expired

ChromaDB errors

Delete the chroma_db/ folder and restart the application
This will clear all uploaded documents

📝 Features in Detail

RAG (Retrieval-Augmented Generation)

Automatically chunks documents into manageable pieces
Generates embeddings using Gemini Embedding API
Stores vectors in ChromaDB for fast similarity search
Retrieves top-K most relevant chunks for each query

Gemini Integration

Uses Gemini Pro for natural language understanding
Combines RAG context with GitHub data in prompts
Configurable temperature and token limits
Robust error handling and retries

GitHub Features

Read repository metadata
List and search pull requests and issues
Access workflow run history
Trigger workflows with custom inputs
Retrieve repository files and structure

🤝 Contributing

This is a personal project, but suggestions and improvements are welcome!

📄 License

This project is provided as-is for educational and personal use.

🙏 Acknowledgments

Google Gemini API for AI capabilities
ChromaDB for vector storage
PyGithub for GitHub integration
Flask for the web framework

📧 Support

For issues or questions, please check the logs in app.log or review the troubleshooting section above.

Built with ❤️ using Python, Flask, and ChromaDB

This site is open source. Improve this page.