Skip to content

some performance optimizations for tunnel-node#1414

Open
brightening-eyes wants to merge 1 commit into
therealaleph:mainfrom
brightening-eyes:opt/reader-notify-atomic-drain
Open

some performance optimizations for tunnel-node#1414
brightening-eyes wants to merge 1 commit into
therealaleph:mainfrom
brightening-eyes:opt/reader-notify-atomic-drain

Conversation

@brightening-eyes
Copy link
Copy Markdown
Contributor

perf(tunnel-node): event-driven backpressure, atomic settle, async udpgw DNS

High priority:

  • Replace reader_task busy-poll (50ms) with drain_notify Notify — reader now wakes only when bytes are consumed, cutting CPU waste under sustained backpressure (fast upstream / slow Apps Script).
  • Add buf_len / pkt_count atomics so the straggler settle loop reads buffer sizes lock-free (was 10k+ mutex acquisitions per batch).
  • Make DstAddr::to_socket_addr async, use tokio::net::lookup_host instead of blocking std::net::ToSocketAddrs inside async tasks.

Medium priority:

  • Collect stale sessions under global lock but abort them after releasing it — other handlers aren't blocked during task abort.
  • Cap udpgw accumulation buffer at 256 KiB; clear on overflow to prevent OOM from repeated corrupt frames.

Low priority:

  • Add sid_short(&str) helper, deduplicate nginx-404 DECOY_404_BODY const, remove redundant drop(packets), suppress dead_code on test-only pub consts.

…pgw DNS

High priority:
- therealaleph#1 Replace reader_task busy-poll (50ms) with drain_notify Notify —
  reader now wakes only when bytes are consumed, cutting CPU waste
  under sustained backpressure (fast upstream / slow Apps Script).
- therealaleph#4 Add buf_len / pkt_count atomics so the straggler settle loop
  reads buffer sizes lock-free (was 10k+ mutex acquisitions per batch).
- therealaleph#7 Make DstAddr::to_socket_addr async, use tokio::net::lookup_host
  instead of blocking std::net::ToSocketAddrs inside async tasks.

Medium priority:
- therealaleph#5 Collect stale sessions under global lock but abort them after
  releasing it — other handlers aren't blocked during task abort.
- therealaleph#8 Cap udpgw accumulation buffer at 256 KiB; clear on overflow to
  prevent OOM from repeated corrupt frames.

Low priority:
- Add sid_short(&str) helper, deduplicate nginx-404 DECOY_404_BODY
  const, remove redundant drop(packets), suppress dead_code on
  test-only pub consts.
@CaptainMirage
Copy link
Copy Markdown
Contributor

oh hey would you look at that another optimization edit

@brightening-eyes
Copy link
Copy Markdown
Contributor Author

@CaptainMirage which one do you mean?

@CaptainMirage
Copy link
Copy Markdown
Contributor

@CaptainMirage which one do you mean?

yours lol, to rephrase :
"oh hey would you look at that, another optimization pull request just like mine"

tho if u go see i did check, we have no conflicts, our PRs are complementary

@therealaleph
Copy link
Copy Markdown
Owner

Thanks for this. I pulled the PR head via tarball locally because git fetch is misbehaving on my side today, then ran the tunnel-node gate:

cd tunnel-node && cargo test
38 passed, 0 failed

I am not merging yet because I found one race in the new atomic buffer-length accounting.

In reader_task, this sequence updates the real buffer and then updates buf_len after the temporary mutex guard has already been dropped:

session.read_buf.lock().await.extend_from_slice(&buf[..n]);
session.buf_len.fetch_add(n, Ordering::Release);

Because the lock guard is not bound to a variable, it is dropped at the end of the first statement. A concurrent drain_now() can acquire the lock, drain the buffer, and store(0) before the reader's fetch_add(n) runs. That leaves buf_len greater than the actual buffer length until a later drain corrects it. Since the settle loop now trusts this atomic estimate, that can create false “bytes still buffered/growing” signals and hurt exactly the latency path this PR is trying to improve.

Please update this so the atomic is written while the read_buf lock is still held, ideally by storing the actual guard.len() after the append rather than fetch_adding from a potentially stale value. The UDP path already mostly does this correctly because the packets guard is held while pkt_count is stored.

After that change I can re-run cd tunnel-node && cargo test and look again.


Answered via LLM, Supervised @therealaleph

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants