A basic CI/CD platform that runs GitHub Actions workflows as Slurm jobs, providing scalable compute resources for your GitHub repositories.
- GitHub Actions Integration: Run GitHub Actions workflows on Slurm compute clusters
- Event Triggers: Supports push events and other GitHub webhook events
- Third-party Actions: Compatible some popular actions like
actions/checkout - Scalable: Leverage Slurm's job scheduling and resource management
- Self-hosted: Run on your own infrastructure with full control
Before setting up the project, ensure you have the following installed:
- Docker - For running RabbitMQ message queue
- Rust - For building and running the webhook API and worker services
- Terraform - For provisioning cloud infrastructure
- Ansible - For configuring Slurm cluster nodes
- ngrok - For exposing the API server to the public internet
- Hetzner Cloud Account - For cloud infrastructure (or adapt for your preferred provider)
Start a RabbitMQ container for message queuing:
docker run -it --rm --name rabbitmq -p 5552:5552 -p 15672:15672 -p 5672:5672 \
-e RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS='-rabbitmq_stream advertised_host localhost' \
rabbitmq:4-managementIn a separate terminal, enable required plugins:
docker exec rabbitmq rabbitmq-plugins enable rabbitmq_stream rabbitmq_stream_managementCreate SSH key pairs for infrastructure access:
# Generate bastion key (for initial access)
ssh-keygen -t rsa -b 4096 -f ~/.ssh/bastion -C "bastion-key"
# Generate compute node key (for inter-node communication)
ssh-keygen -t rsa -b 4096 -f ~/.ssh/slurm-compute -C "slurm-compute-key"Navigate to the Terraform directory:
cd infrastructure/terraformCreate a .tfvars file with your configuration:
# .tfvars
hcloud_token = "your-hetzner-cloud-token"
bastion_public_key = "ssh-rsa AAAAB3NzaC1yc2E... bastion-key" # Contents of ~/.ssh/bastion.pub
compute_private_key = "-----BEGIN OPENSSH PRIVATE KEY-----\n..." # Contents of ~/.ssh/slurm-compute
compute_public_key = "ssh-rsa AAAAB3NzaC1yc2E... slurm-compute-key" # Contents of ~/.ssh/slurm-compute.pubDeploy the infrastructure:
terraform init
terraform apply -var-file=.tfvarsImportant: Note down the output values:
controller_node_public_ipcompute_node_public_ip
Create an inventory file:
cd ../ansibleCreate inventory.ini:
[controller]
slurm-cluster-controller ansible_host=<controller_node_public_ip> ansible_user=slurm ansible_ssh_private_key_file=<path_to_bastion_private_key> ansible_ssh_common_args='-o StrictHostKeyChecking=no'
[compute]
slurm-cluster-compute-1 ansible_host=<compute_node_public_ip> ansible_user=slurm ansible_ssh_private_key_file=<path_to_bastion_private_key> ansible_ssh_common_args='-o StrictHostKeyChecking=no'Replace placeholders with actual values:
<controller_node_public_ip>: Output from Terraform<compute_node_public_ip>: Output from Terraform<path_to_bastion_private_key>: Path to your bastion private key (e.g.,/Users/username/.ssh/bastion)
Before running the Ansible playbooks, update the Slurm configuration file:
- Edit
infrastructure/ansible/files/slurm/slurm.confand replaceSLURM_COMPUTE_NODE_IPwith the actual IP address of your compute node (thecompute_node_public_ipfrom Terraform output).
Set up the controller node:
ansible-playbook -i inventory.ini setup-slurm-controller.yamlNote: If the controller setup fails due to MySQL package issues, SSH into the controller node and run:
ssh -i <path_to_bastion_private_key> slurm@<controller_node_public_ip>
sudo apt --fix-broken install -yThen re-run the controller playbook.
Set up the compute node:
ansible-playbook -i inventory.ini setup-slurm-compute.yaml \
--extra-vars slurm_controller_ip=<controller_node_public_ip> \
--extra-vars slurm_controller_hostname=slurm-cluster-controllerCreate a .env file in the ghwebhooks/crates/rabbitmq-worker/ directory with the following variables:
cd ../../ghwebhooks/crates/rabbitmq-workerCreate .env file:
GHWEBHOOKS_RMQ_CONSUMER_GITHUB_TOKEN=your_github_token_here
GHWEBHOOKS_RMQ_CONSUMER_SLURMRESTD_HOST=<controller_node_public_ip>
GHWEBHOOKS_RMQ_CONSUMER_SLURMRESTD_PORT=6820
GHWEBHOOKS_RMQ_CONSUMER_SLURMRESTD_USER=slurm
GHWEBHOOKS_RMQ_CONSUMER_SLURMRESTD_TOKEN=your_slurm_token_hereTo get the Slurm token:
- SSH into the controller node:
ssh -i <path_to_bastion_private_key> slurm@<controller_node_public_ip> - Run:
scontrol token - Copy the token value and paste it in the
.envfile
Navigate to the webhook service directory:
cd ../../../ghwebhooksStart the API server:
cargo run --bin apiIn a separate terminal, start the RabbitMQ worker:
cargo run --bin rabbitmq-workerTo make your API accessible to GitHub webhooks, expose it using ngrok.
In a separate terminal, run:
ngrok http 8000This will provide you with a public URL (e.g., https://abc123.ngrok.io) that forwards to your local API server.
Note: Copy the ngrok URL as you'll need it for configuring GitHub webhooks.
GitHub Repository
↓ (webhook)
API Server
↓ (message queue)
RabbitMQ
↓ (job processing)
Worker Service
↓ (job submission)
Slurm Controller
↓ (job execution)
Compute Nodes
-
Set up GitHub App:
- Go to GitHub Settings → Developer settings → GitHub Apps
- Click "New GitHub App"
- Fill in the app details:
- App name: Choose a unique name for your app
- Homepage URL: Your ngrok URL (e.g.,
https://abc123.ngrok.io) - Webhook URL: Your ngrok URL with
/webhookendpoint (e.g.,https://abc123.ngrok.io/webhook) - Webhook secret: Generate a secure secret (optional but recommended)
- Under "Repository permissions", grant necessary permissions (e.g., Contents: Read, Metadata: Read)
- Under "Subscribe to events", select relevant events (e.g., Push)
- Create the GitHub App
-
Install the GitHub App:
- After creating the app, install it on your target repository
- Important: Note down the App Installation ID from the installation URL or settings
- The installation ID will be needed for API authentication
-
Push Code: Push commits to trigger workflow execution
-
Monitor Jobs: Use Slurm commands (
squeue,sacct) or the Slurm REST API to monitor job status -
View Logs: Check job outputs in Slurm log directories
The services can be configured using environment variables:
RABBITMQ_URL: RabbitMQ connection string (default:amqp://localhost:5672)SLURM_CONTROLLER_HOST: Slurm controller hostnameAPI_PORT: API server port (default: 8000)GITHUB_APP_ID: Your GitHub App IDGITHUB_APP_INSTALLATION_ID: The installation ID noted during app installationGITHUB_APP_PRIVATE_KEY: Path to your GitHub App's private key file
The Slurm cluster is configured via Ansible playbooks. Key configuration files:
infrastructure/ansible/files/slurm/slurm.conf: Main Slurm configurationinfrastructure/ansible/files/slurm/slurmdbd.conf: Slurm database daemon configuration
- Database Connection Errors: Ensure MySQL is running and slurmdbd can connect
- Authentication Failures: Check Munge key synchronization between nodes
- Job Submission Failures: Verify Slurm controller and compute nodes are communicating
# Check Slurm cluster status
sinfo
# View job queue
squeue
# Check job history
sacct
# Test Munge authentication
munge -n | unmunge
# Check service status
systemctl status slurmctld
systemctl status slurmd
systemctl status slurmdbd- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
For issues and questions:
- Check the troubleshooting section above
- Review Slurm documentation
- Create an issue in the GitHub repository