Skip to content

Fix stale nginx domain, add upstream health checks, rate limiting, and WebSocket timeouts#117

Draft
Copilot wants to merge 2 commits intomainfrom
copilot/fix-nginx-domain-upstream-checks
Draft

Fix stale nginx domain, add upstream health checks, rate limiting, and WebSocket timeouts#117
Copilot wants to merge 2 commits intomainfrom
copilot/fix-nginx-domain-upstream-checks

Conversation

Copy link
Copy Markdown

Copilot AI commented Mar 4, 2026

The testnet nginx config referenced the stale testnetrpc.numbersprotocol.io domain (vs. testnetrpc.num.network used everywhere else), lacked upstream failure handling, had no rate limiting or request body caps, and WebSocket connections would silently drop after 60s with no client IP visibility. No mainnet config existed at all.

Testnet config (rpc/testnet/etc/nginx/sites-available/default)

  • Domain: Replace all testnetrpc.numbersprotocol.iotestnetrpc.num.network (server_name, SSL cert paths, HTTP redirect block)
  • Upstream: Add max_fails=3 fail_timeout=30s per server; add keepalive 16
  • Rate limiting: Add limit_req_zone (30r/s, 10m) at top level; add limit_req burst=50 nodelay in location /
  • location /: Add client_max_body_size 1m, proxy_connect_timeout 10s, proxy_read_timeout 60s, proxy_send_timeout 60s
  • location /ws: Add proxy_read_timeout 3600s, proxy_send_timeout 3600s, proxy_buffering off, X-Real-IP, X-Forwarded-For
upstream validator {
    server <ip>:9650 max_fails=3 fail_timeout=30s;
    keepalive 16;
}

limit_req_zone $binary_remote_addr zone=rpc_limit:10m rate=30r/s;

location / {
    limit_req zone=rpc_limit burst=50 nodelay;
    client_max_body_size 1m;
    proxy_connect_timeout 10s;
    proxy_read_timeout 60s;
    proxy_send_timeout 60s;
}

location /ws {
    proxy_read_timeout 3600s;
    proxy_send_timeout 3600s;
    proxy_buffering off;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}

Mainnet config (rpc/mainnet/etc/nginx/sites-available/default) — new file

Mirrors the testnet config structure for mainnetrpc.num.network with all the same protections. Uses <internal-ip-val-m1..5> and <mainnet-chain-id> placeholders to be substituted at deploy time.

Original prompt

This section details on the original issue you should resolve

<issue_title>[Feature][High] Fix stale nginx domain, add upstream health checks, rate limiting, and WebSocket timeouts</issue_title>
<issue_description>## Feature/Improvement Findings — High Priority

Two high-priority infrastructure improvements were identified that are not covered by existing issues (#96, #97, #103, #104, #108).


1. Nginx RPC Config Uses Stale Domain (numbersprotocol.io vs num.network)

File: rpc/testnet/etc/nginx/sites-available/default (lines 123, 163–164, 170, 177)

The nginx config references testnetrpc.numbersprotocol.io while the README and test files all reference testnetrpc.num.network. SSL certificates point to the old domain. Deploying from this config will serve on the wrong domain. Additionally, there is no mainnet nginx config in the repository at all.

ssl_certificate /etc/letsencrypt/live/testnetrpc.numbersprotocol.io/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/testnetrpc.numbersprotocol.io/privkey.pem;

Suggested fix:

  • Update domain to testnetrpc.num.network throughout
  • Add mainnet nginx config (mainnetrpc.num.network)
  • Consider templating configs to avoid domain hardcoding

2. Nginx Upstream Lacks Health Checks, Timeouts, Rate Limiting, and WebSocket Configuration

File: rpc/testnet/etc/nginx/sites-available/default (lines 19–25, 134–140)

The upstream block defines five backend validators with critical missing configurations:

Missing upstream protections (lines 19–25):

  • No max_fails or fail_timeout — dead validators stay in the pool
  • No proxy_read_timeout / proxy_connect_timeout — hung connections exhaust workers
  • No limit_req / limit_conn — public RPC endpoint vulnerable to DoS
  • No client_max_body_size — no request body limits
  • No keepalive connections to upstream — extra TCP overhead per request

Missing WebSocket configuration (lines 134–140):

location /ws {
    proxy_pass http://validator/...;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "Upgrade";
    proxy_set_header Host $host;
}
  • No proxy_read_timeout — defaults to 60s, disconnecting idle WebSocket subscriptions
  • No proxy_buffering off — WebSocket messages should not be buffered
  • No X-Real-IP / X-Forwarded-For headers — upstream has no visibility into real client IP

Suggested fix:

upstream validator {
    server <ip1>:9650 max_fails=3 fail_timeout=30s;
    # ... other servers
    keepalive 16;
}

# Rate limiting
limit_req_zone $binary_remote_addr zone=rpc_limit:10m rate=30r/s;

location / {
    limit_req zone=rpc_limit burst=50 nodelay;
    client_max_body_size 64k;
    proxy_connect_timeout 10s;
    proxy_read_timeout 30s;
    proxy_send_timeout 30s;
    proxy_pass http://validator/...;
}

location /ws {
    proxy_read_timeout 3600s;
    proxy_send_timeout 3600s;
    proxy_buffering off;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    # ... existing WebSocket headers
}

Generated by Health Monitor with Omni</issue_description>

Comments on the Issue (you are @copilot in this section)


🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

…d WebSocket timeouts

Co-authored-by: numbers-official <181934381+numbers-official@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix stale nginx domain and add health checks and rate limiting Fix stale nginx domain, add upstream health checks, rate limiting, and WebSocket timeouts Mar 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature][High] Fix stale nginx domain, add upstream health checks, rate limiting, and WebSocket timeouts

2 participants