Skip to content

Meetjain1/TaskPilot-AI

Repository files navigation

TaskPilot-AI

Author: Meet Jain

This project implements a scalable backend for the TaskPilot-AI agent using FastAPI, real-time WebSocket streaming, VNC integration, and persistent session management. A minimal HTML/JS frontend is provided for demonstration.

  • The system is fully functional but requires a valid Anthropic API key
  • All code is production-ready and follows best practices

Project Overview

This is a sophisticated AI-powered computer automation system that allows Claude (Anthropic's AI) to control a virtual desktop environment through natural language commands. The system provides real-time interaction, persistent session management, and visual feedback through VNC integration.


Core Features

1. AI-Powered Computer Control

  • Natural Language Processing: Users can describe tasks in plain English
  • Desktop Automation: Claude can perform complex computer tasks automatically
  • Tool Integration: Access to file system, web browsing, application control, and more
  • Real-time Execution: Immediate feedback and progress updates

2. Session Management System

  • Persistent Sessions: Save and resume conversations across browser sessions
  • Session History: Complete audit trail of all interactions and actions
  • Multi-session Support: Handle multiple concurrent user sessions
  • Database Persistence: SQLite database for reliable data storage

3. Real-time Communication

  • WebSocket Streaming: Live updates of agent progress and tool outputs
  • Bidirectional Communication: Real-time chat between user and AI
  • Progress Indicators: Visual feedback for ongoing operations
  • Error Handling: Graceful error reporting and recovery

4. Visual Desktop Integration

  • VNC Server: Virtual desktop environment for AI control
  • noVNC Client: Web-based VNC viewer accessible via browser
  • Live Desktop View: Real-time visualization of AI actions
  • Cross-platform Access: Works on any device with a web browser

5. Modern Web Interface

  • Responsive Design: Works on desktop, tablet, and mobile devices
  • Real-time Chat: Live messaging with AI agent
  • Session Management: Easy creation and switching between sessions
  • Visual Feedback: Status indicators and progress updates

Technology Stack

Backend Technologies

  • FastAPI: Modern, fast web framework for building APIs with Python
  • SQLAlchemy: SQL toolkit and Object-Relational Mapping (ORM) library
  • WebSockets: Real-time bidirectional communication
  • Pydantic: Data validation using Python type annotations
  • Uvicorn: Lightning-fast ASGI server implementation

Frontend Technologies

  • HTML5: Semantic markup for structure
  • CSS3: Modern styling with Flexbox and responsive design
  • JavaScript (ES6+): Vanilla JS for interactivity and WebSocket communication
  • Google Fonts: Inter font family for modern typography

Infrastructure & Deployment

  • Docker: Containerization for consistent deployment
  • Docker Compose: Multi-container orchestration
  • Nginx: Web server for frontend static files
  • SQLite: Lightweight database for session storage

AI & Automation

  • Anthropic Claude API: Advanced AI model for task understanding and execution
  • Computer Use Tools: Specialized tools for desktop automation
  • Background Task Processing: Asynchronous task execution

Virtual Desktop

  • Ubuntu Desktop LXDE: Lightweight desktop environment
  • TigerVNC: High-performance VNC server
  • noVNC: HTML5 VNC client for web browsers
  • Websockify: WebSocket to TCP proxy for VNC

Prerequisites

System Requirements

  • Operating System: macOS, Linux, or Windows with Docker support
  • Docker: Version 20.10 or higher
  • Docker Compose: Version 2.0 or higher
  • Memory: Minimum 4GB RAM (8GB recommended)
  • Storage: At least 2GB free disk space
  • Network: Internet connection for AI API access

API Requirements

  • Anthropic API Key: Valid API key for Claude access
  • API Access: Active subscription to Anthropic's Claude API

Streamlit-like UI Behavior Simulation

Real-time Progress Streaming

1. Task Submission Flow

sequenceDiagram
    participant User
    participant Frontend
    participant Backend
    participant Agent
    participant Tools
    participant VNC

    User->>Frontend: Submit task "Search weather in Dubai"
    Frontend->>Backend: POST /sessions/{id}/messages
    Backend->>Backend: Store message in database
    Backend->>Frontend: 202 Accepted (immediate response)
    
    Backend->>Agent: Start background task
    Agent->>Tools: Execute browser_open
    Backend->>Frontend: WebSocket: {"type": "tool_output", "tool": "browser_open"}
    Tools->>VNC: Open Firefox (visible in VNC)
    
    Agent->>Tools: Execute web_search
    Backend->>Frontend: WebSocket: {"type": "tool_output", "tool": "web_search"}
    Tools->>VNC: Navigate to Google (visible in VNC)
    
    Agent->>Backend: Complete task
    Backend->>Frontend: WebSocket: {"type": "agent_output", "data": "Task completed"}
    Frontend->>User: Display completion message
    Frontend->>User: Prompt for new task
Loading

2. Real-time Progress Monitoring

WebSocket Message Types:

// Agent thinking
{"type": "agent_output", "data": "I'll help you search for the weather in Dubai. Let me open a web browser and search for this information."}

// Tool execution start
{"type": "tool_output", "tool_id": "browser_open", "result": "Opening Firefox browser..."}

// Tool execution progress
{"type": "tool_output", "tool_id": "web_search", "result": "Navigating to Google search..."}

// Tool execution complete
{"type": "tool_output", "tool_id": "web_search", "result": "Search completed successfully"}

// Agent response
{"type": "agent_output", "data": "I found the current weather in Dubai. The temperature is 25°C with sunny conditions."}

// Task completion
{"type": "task_complete", "status": "success", "message": "Task completed successfully"}

3. UI State Management

stateDiagram-v2
    [*] --> Idle
    Idle --> Loading: Submit Task
    Loading --> Streaming: Task Started
    Streaming --> Processing: Tool Execution
    Processing --> Streaming: Tool Complete
    Streaming --> Complete: Task Finished
    Complete --> Idle: New Task Prompt
    Complete --> Loading: Submit New Task
Loading

Architecture Details

System Architecture

graph TB
    subgraph "Frontend Layer"
        A[HTML/JS Frontend]
        B[WebSocket Client]
        C[HTTP Client]
    end
    
    subgraph "Backend Layer"
        D[FastAPI Server]
        E[WebSocket Manager]
        F[Session Manager]
        G[Agent Runner]
    end
    
    subgraph "Data Layer"
        H[SQLite Database]
        I[Session Storage]
        J[Message History]
    end
    
    subgraph "AI Layer"
        K[Anthropic Claude API]
        L[Computer Use Tools]
        M[Task Execution]
    end
    
    subgraph "Virtual Desktop"
        N[Ubuntu Desktop]
        O[TigerVNC Server]
        P[noVNC Client]
    end
    
    A --> D
    B --> E
    C --> F
    D --> H
    G --> K
    G --> L
    L --> M
    M --> N
    O --> P
    P --> A
Loading

Data Flow

  1. User Interaction: User sends message via frontend
  2. API Processing: FastAPI receives and stores message
  3. Agent Execution: Background task starts Claude agent
  4. Tool Execution: Agent uses computer tools to perform tasks
  5. Real-time Updates: WebSocket streams progress to frontend
  6. Visual Feedback: VNC shows desktop changes in real-time

Message Flow Architecture

sequenceDiagram
    participant Client
    participant API
    participant Database
    participant Agent
    participant Tools
    participant WebSocket
    participant VNC

    Client->>API: POST /sessions/{id}/messages
    API->>Database: Store message
    API->>Client: 202 Accepted
    
    API->>Agent: Start background task
    Agent->>Tools: Execute tool
    Tools->>VNC: Perform action (visible)
    Agent->>WebSocket: Send progress update
    WebSocket->>Client: Real-time update
    
    Agent->>Tools: Execute next tool
    Tools->>VNC: Perform action (visible)
    Agent->>WebSocket: Send progress update
    WebSocket->>Client: Real-time update
    
    Agent->>Database: Store final response
    Agent->>WebSocket: Send completion
    WebSocket->>Client: Task complete
Loading

WebSocket Communication Flow

graph LR
    A[Client] -->|Connect| B[WebSocket Server]
    B -->|Accept| A
    A -->|Send Message| B
    B -->|Process| C[Agent Runner]
    C -->|Tool Execution| D[Computer Tools]
    D -->|Action| E[VNC Desktop]
    C -->|Progress| B
    B -->|Stream| A
    C -->|Complete| B
    B -->|Final Update| A
Loading

Acknowledgments

  • Anthropic: For providing the Claude API and Computer Use tools
  • FastAPI: For the excellent web framework
  • Docker: For containerization technology
  • noVNC: For web-based VNC client

Contact

Connect with me through the following platforms:

LinkedIn Twitter

Social Media and Platforms

Discord Instagram Stack Overflow Medium Hashnode

About

AI that pilots your computer tasks

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors