virtio-net: fix UDP TX stall and improve TCP throughput#102
Open
agicy wants to merge 11 commits into
Open
Conversation
…leak
Replace static char pad[64] with stack-local char pad[64] = {0}.
A static variable is shared across all net devices; when per-vq
threads are introduced, concurrent TX would race on the shared
buffer. Zero-initialization prevents leaking previous stack
contents into the TAP device on the short-packet padding path.
…thmetic A malicious or buggy guest may provide iov[0].iov_len < header_len. Without this check, iov_len underflows (unsigned wrap to a huge value) and writev reads beyond the guest buffer — leaking host memory. Add a bounds check that returns the descriptor via update_used_ring with len=0 when the header doesn't fit, matching the existing pattern in rm_iov_header on the RX path.
When rm_iov_header fails (iov[0].iov_len < header_len), the descriptor has already been consumed from the avail ring by process_descriptor_chain but was never posted to the used ring. This leaks the descriptor permanently from the guest's perspective. Fix: call update_used_ring(vq, idx, 0) to return the descriptor before freeing the iov.
When readv returns a non-EAGAIN error (EIO, EFAULT, etc.), the descriptor was consumed from the avail ring but never returned. The code fell through to the success path which would call update_used_ring with a bogus length. Fix: return the descriptor via update_used_ring(vq, idx, 0) on any readv error that is not EWOULDBLOCK.
When the TAP device is closed or the bridge goes down, readv returns 0 (EOF). Without handling this case, the code falls through to the success path and calls update_used_ring with header_len bytes, plus EPOLLIN fires continuously on an EOF fd, creating a tight loop that drains guest RX buffers. Fix: detect len==0, return the descriptor via update_used_ring(vq,idx,0), set rx_ready=0 to stop further RX attempts, and break out.
The file-scope static uint8_t trashbuf[1600] is shared across all net device instances. When per-vq threads are introduced, concurrent RX drops from multiple devices would race on this single buffer. Move the buffer to a stack-local variable inside virtio_net_event_handler.
UDP streams lose descriptors because the guest may add new TX descriptors between the last empty check and enable_notify, and the suppressed notification never re-fires. Wrap the drain+enable in a loop with a post-enable re-check.
…criptor traversal Single-pass traversal that counts and fills in one loop. INDIRECT descriptors are handled inline: the indirect table size is known from vdesc->len/16, so no separate counting pass is needed. Single pass halves the cross-VM desc_table accesses for the common non-INDIRECT path (net), saving ~150-400 cycles per packet on A55. The original process_descriptor_chain() (malloc version) is unchanged.
…ket malloc/free Replace process_descriptor_chain() + free(iov) in rx_event_handler and virtq_tx_handle_one_request with process_descriptor_chain_buf() using stack-allocated iov arrays (NET_IOV_MAX=8 entries). Eliminates ~80-230 cycles of malloc/free overhead per packet on A55.
update_used_ring() emits two write_barrier() (DMB ishst) calls for every single used-ring entry. On in-order cores like A55 each barrier stalls the store buffer for ~30-50 cycles, so N entries cost 2N barriers. The new update_used_ring_batch() writes all N entries in a tight loop and emits only two barriers total — one before the idx update and one after — regardless of N. This reduces the per-packet barrier cost from 2N to 2. Original update_used_ring() is kept for callers that process one request at a time (blk, console, gpu).
Switch to batch used-ring updates and inject an IRQ after every batch flush so the guest can recycle completed TX descriptors. Without the IRQ, UDP sendto() blocks permanently after filling the virtqueue.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two virtio-net TX correctness bugs fixed:
virtqueue_enable_notify: the guest may add TX descriptors during theVRING_USED_F_NO_NOTIFYwindow. Without a re-check those descriptors are lost.sendto().Also included are per-packet CPU optimizations (
process_descriptor_chain_bufreplaces per-packetmalloc/free,update_used_ring_batchamortises DMB overhead) and six standalone bug fixes.Test results
RK3588, zone1 (
192.168.255.3/24) <-> root-linux (192.168.255.2/24), iperf3 60s:-RRXUDP TX and TCP throughput is improved.
References
Related issue #101.
Partially-fixes hvisor #240.