ai-deployment-manager

AI Deployment Manager

Trivy Security Scan

Homebrew for AI Infrastructure - A simple CLI/UI tool that helps companies deploy and manage their own GPU infrastructure (bare metal, cloud, hybrid) without needing deep DevOps expertise.

Features

Installation

From Source

git clone https://github.com/dewitt4/ai-deployment-manager.git
cd ai-deployment-manager
make build
make install

Quick Start

# Initialize configuration
aidm init

# Deploy a GPU cluster
aidm deploy create

# Submit a workload
aidm schedule submit

# Monitor resources
aidm monitor resources

# Check costs
aidm cost report

Usage

Configuration

Initialize the AI Deployment Manager configuration:

aidm init

This creates a configuration file at ~/.aidm/config.yaml with default settings.

Deployment Commands

# Create a new GPU cluster deployment
aidm deploy create

# List all deployments
aidm deploy list

# Check deployment status
aidm deploy status

# Delete a deployment
aidm deploy delete

Workload Scheduling

# Submit a job to the queue
aidm schedule submit

# List all jobs
aidm schedule list

# Cancel a job
aidm schedule cancel <job-id>

# Check queue status
aidm schedule queue

Resource Monitoring

# View resource utilization
aidm monitor resources

# Check GPU status
aidm monitor gpu

# Run optimization
aidm monitor optimize

Cost Management

# Generate cost report
aidm cost report

# Update cost tracking
aidm cost track

# View cost allocations
aidm cost allocate

Architecture

ai-deployment-manager/
├── cmd/
│   └── aidm/           # CLI entry point
├── pkg/
│   ├── deployment/     # GPU cluster deployment automation
│   ├── scheduler/      # Workload scheduling and queue management
│   ├── monitor/        # Resource monitoring and optimization
│   ├── integration/    # AI framework integrations
│   ├── cloud/          # Multi-cloud provider support
│   └── cost/           # Cost tracking and allocation
└── internal/
    ├── config/         # Configuration management
    └── utils/          # Utility functions

Supported Platforms

Cloud Providers

AI Frameworks

GPU Types

Configuration File

Example ~/.aidm/config.yaml:

provider: local
gpu_type: nvidia
framework: pytorch

cloud:
  aws:
    region: us-west-2
  gcp:
    project: ""
  azure:
    subscription: ""

deployment:
  cluster_size: 1
  gpu_count: 1

monitoring:
  enabled: true
  interval: 60s

cost:
  tracking_enabled: true
  currency: USD

Development

Build

make build

Test

make test

Format Code

make fmt

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the Apache 2 License - see the LICENSE file for details.

Support

For issues, questions, or contributions, please open an issue on GitHub.