Skip to content

fix: multipart parsing for HTTP/2 support in ClamAV plugin#147

Open
rayshoo wants to merge 1 commit intobunkerity:mainfrom
rayshoo:main
Open

fix: multipart parsing for HTTP/2 support in ClamAV plugin#147
rayshoo wants to merge 1 commit intobunkerity:mainfrom
rayshoo:main

Conversation

@rayshoo
Copy link
Copy Markdown

@rayshoo rayshoo commented Jun 23, 2025

Description

This PR improves the ClamAV plugin's multipart form-data parsing to properly support HTTP/2 requests. The original implementation had issues parsing multipart data in HTTP/2 environments, causing file uploads to bypass ClamAV scanning.

Changes Made

  • Enhanced boundary extraction: Improved parsing of Content-Type header to handle quoted and unquoted boundary values
  • Fixed multipart parsing logic: Completely rewrote the multipart parsing function to properly handle HTTP/2 multipart data structure
  • Better section splitting: Used exact string matching instead of regex patterns for more reliable boundary detection
  • Improved header/data separation: Enhanced logic to separate headers from file data in each multipart section
  • Robust filename extraction: Better parsing of Content-Disposition headers to extract filenames accurately

Technical Details

I modified the code to support HTTP/2 because the original multipart parsing wasn't working properly. Since I'm not very familiar with Lua code, I got help from ChatGPT and Claude AI to implement these improvements. The updated code has been thoroughly tested and works excellently.

Testing

  • ✅ Tested with HTTP/2 multipart file uploads
  • ✅ Successfully detects and scans files with ClamAV
  • ✅ Properly blocks malware (tested with EICAR test file)
  • ✅ Allows clean files to pass through
  • ✅ Maintains backward compatibility with HTTP/1.1

Impact

This fix ensures that file uploads via HTTP/2 are properly scanned by ClamAV, closing a potential security gap where malicious files could bypass antivirus scanning in HTTP/2 environments.

@TheophileDiot
Copy link
Copy Markdown
Member

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 13, 2026

✅ Actions performed

Full review triggered.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 13, 2026

ClamAV plugin

Security impact: Fixes HTTP/2 multipart file upload handling to prevent scans being bypassed. Previous implementation used streaming multipart parsing incompatible with HTTP/2 request structure; redesigned to manually parse multipart sections and scan each file part via separate TCP connections using the INSTREAM protocol with 4-byte big-endian length framing.

Plugin version: Bumped to 1.9 in plugin.json.

Version drift: ClamAV 1.9 is NOT recorded in COMPATIBILITY.json (which only contains entries through 1.8). This version mismatch should be resolved.

Code changes:

  • Complete rewrite of multipart parsing logic (parse_multipart()) using exact string boundary matching instead of regex, with explicit header/body separation handling both \r\n\r\n and \n\n delimiters.
  • New helper functions: socket() (TCP connection with configured timeout), ping() (PING/PONG verification), extract_boundary() (quoted and unquoted boundary parsing), extract_filename() (Content-Disposition parsing).
  • Removed worker initialization, caching, and checksum logic; dropped init_worker(), command(), scan(), is_in_cache(), add_to_cache().
  • access() redesigned: gated on USE_CLAMAV=yes, requires POST + multipart/form-data, reads full request body into memory or from temp file, scans each parsed file part, denies on ClamAV FOUND response.
  • api() narrowed to POST /clamav/ping endpoint only, using ping() helper.

Settings: No schema changes; configuration remains: USE_CLAMAV (multisite), CLAMAV_HOST, CLAMAV_PORT, CLAMAV_TIMEOUT (all global).

Testing: Functional test suite (.tests/clamav.sh) with Docker Compose setup validates HTTP/2 multipart uploads, confirms EICAR test file detection and blocking, verifies clean file pass-through. Backward compatibility with HTTP/1.1 asserted.

Documentation: README.md provides setup instructions for Docker, Swarm, and Kubernetes deployments with environment variable examples. TODO items noted (clustered mode testing, custom ClamAV configuration, Linux integration documentation).

Note: Full implementation assisted by ChatGPT and Claude AI per author disclosure.

Walkthrough

The scanning mechanism has been redesigned to replace streaming multipart upload handling with a buffered approach: extract the multipart boundary from the Content-Type header, read the entire request body, parse multipart sections via string search, and scan each extracted part by establishing individual TCP connections to ClamAV using the nINSTREAM protocol.

Changes

Cohort / File(s) Summary
Removed Legacy Functions
clamav/clamav.lua
Deleted init_worker(), command(), the original scan() implementation, is_in_cache(), and add_to_cache(). Eliminated worker initialization, caching infrastructure, and streaming-based scanning logic.
Protocol & Socket Helpers
clamav/clamav.lua
Added socket() to establish TCP connections and ping() to verify ClamAV connectivity using simple request/response exchange.
Multipart Parsing Functions
clamav/clamav.lua
Added extract_boundary() to isolate multipart boundary from Content-Type, extract_filename() to retrieve filenames from part headers, and parse_multipart() to manually segment the request body using boundary markers and extract individual file parts.
Entry Point Modifications
clamav/clamav.lua
Rewrote access() to gate on enabled flag, enforce POST method, require multipart/form-data, parse parts, scan each via TCP, and deny requests when ClamAV returns FOUND. Narrowed api() to expose only /clamav/ping endpoint.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

Streams once flowed through the scanning gate,
Now buffered parts await their fate;
TCP sockets sing their song,
Boundaries parsed swift and strong,
ClamAV verdicts flow back bright—
Malware caught in the night. 🛡️

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title directly describes the main change: fixing multipart parsing for HTTP/2 support in the ClamAV plugin, which is the core objective of this PR.
Description check ✅ Passed The description comprehensively explains the multipart parsing improvements, technical changes, testing performed, and security impact, all directly related to the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@TheophileDiot
Copy link
Copy Markdown
Member

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 13, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@TheophileDiot
Copy link
Copy Markdown
Member

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 13, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@clamav/clamav.lua`:
- Around line 152-171: The extract_filename function (clamav:extract_filename)
should sanitize the extracted filename before returning it for logging:
strip/control characters (e.g., remove characters with byte < 32 and CR/LF),
trim surrounding whitespace, enforce a maximum length (e.g., 255 chars) and, if
resulting name is empty, return nil or a safe placeholder; also ensure the
method does not trigger an unused-self warning by explicitly acknowledging self
(either use function clamav.extract_filename(self, headers) and reference local
_ = self or change the signature to (_, headers)). Update the function to
perform these sanitisation steps and return the cleaned filename for use in
logs.
- Around line 173-186: In clamav:api replace the unused variable from the
has_variable call by either discarding it or using it: change the call site
where you currently do "local enabled, err = has_variable(...)" to "local
enabled, _ = has_variable(...)" to avoid the unused-variable warning, or
alternatively capture and log the error (e.g. use process or self logger to log
err) before returning; target the has_variable call inside the clamav:api
function to implement the change.
- Line 235: The loop uses an unused index variable `i` in the `for i, part in
ipairs(parts) do` iteration, which triggers static-analysis warnings; replace it
with the conventional `_` placeholder (i.e., `for _, part in ipairs(parts) do`)
inside the same scope (in the file clamav/clamav.lua where that loop appears)
and confirm there are no references to `i` within the loop body—if any exist,
either restore a meaningful name or use the actual name where needed instead of
`_`.
- Around line 23-31: In clamav:socket(), the CLAMAV_TIMEOUT fallback (used in
sock:settimeout) is 5000ms but plugin.json default is 1000ms; change the call to
apply tonumber only to self.variables["CLAMAV_TIMEOUT"] so the fallback of 1000
is used (e.g., use tonumber(self.variables["CLAMAV_TIMEOUT"]) or 1000) and pass
that to sock:settimeout to align the default timeout; update the CLAMAV_TIMEOUT
reference accordingly in clamav:socket().
- Around line 275-277: The log at self.logger:log(ERR, "[clamav] ClamAV result
for " .. part.filename .. ": " .. tostring(result)) is using the ERR level for
normal/clean scan results; change it to use the NOTICE level (which is declared
but unused) or ngx.INFO for successful scans, and reserve ERR only for malware
detections or explicit errors (i.e., update the logging call in the ClamAV
result handling to use NOTICE/ngx.INFO for clean results and log at ERR only
when result indicates malware or an error).
- Around line 116-147: The loop declares an unused variable `i` in `for i,
section in ipairs(sections) do`; change it to use the conventional
unused-variable name by replacing `i` with `_` (i.e., `for _, section in
ipairs(sections) do`) so static analysis no longer flags the unused binding,
leaving the rest of the loop body (checks, header parsing, extract_filename
call, and table.insert into `parts`) unchanged.
- Around line 267-270: Check the return value of sock:send when sending the
stream termination marker in the function that calls
sock:send(string.char(0,0,0,0)); if send returns nil (failure, err) handle it by
closing the socket (sock:close()), propagate or return an error instead of
proceeding to sock:receive, and only call sock:receive("*l") when send
succeeded; update any callers that rely on the local result variable to handle
the error path accordingly.
- Around line 48-67: The extract_boundary function has an unreachable fallback
pattern and triggers an unused `self` linter warning; replace the two-step match
with a single pattern that handles both quoted and unquoted values (e.g. match
'boundary=%s*"?([^";]+)"?' to capture the boundary without quotes) and then
return "--" .. boundary when found, and suppress the unused `self` warning by
either renaming the first parameter to _ (function clamav:extract_boundary(_)
...) or adding a luacheck directive (--luacheck: ignore 212/self) at the top of
the function.
- Around line 69-113: In function parse_multipart: remove the redundant initial
assignment current_pos = 1 (it's immediately overwritten after finding
first_boundary_pos) and introduce a named constant for the magic number used to
filter tiny sections (replace the literal 10 with e.g. local MIN_SECTION_LEN =
10 declared near the top of the function and use MIN_SECTION_LEN in the two
checks), optionally add a short comment documenting why that minimum length
exists; reference parse_multipart, current_pos, first_boundary_pos, and the two
section length checks.
- Around line 10-12: Remove the unused local constants NOTICE and HTTP_FORBIDDEN
from the top of clamav.lua, or if they were intended to be used, replace the
current ngx.ERR logging used for scan result messages (the scan-result logging
site currently using ERR) with ngx.log(NOTICE, ...) and replace any hard-coded
status usages with the HTTP_FORBIDDEN constant where appropriate; ensure only
the symbols NOTICE and HTTP_FORBIDDEN that are actually referenced elsewhere
(e.g., scan-result logging and forbidden responses) remain declared, otherwise
delete their declarations.
- Around line 212-229: The code buffers the entire request body into memory (the
block that reads ngx.req.get_body_data() and falls back to
ngx.req.get_body_file()) without size validation; add a configurable
CLAMAV_MAX_BODY_SIZE (default 50 * 1024 * 1024) and after the body is populated
but before calling parse_multipart() check the byte length (use `#body` or
equivalent) and return an error via self:ret(true, "Request body too large") and
log via self.logger:log(ERR, ...) when the body exceeds the limit; ensure the
new config key is read where the plugin config is loaded and referenced in this
function so deployments can override the default.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: d9160d39-9a31-486f-a2d7-402d5d82030a

📥 Commits

Reviewing files that changed from the base of the PR and between 63de9b9 and f2ef5e4.

📒 Files selected for processing (1)
  • clamav/clamav.lua
📜 Review details
🧰 Additional context used
📓 Path-based instructions (1)
**/*.lua

⚙️ CodeRabbit configuration file

**/*.lua: Lua code runs on OpenResty inside the BunkerWeb nginx container and sits on the request hot path:

  • Every plugin subclasses bunkerweb.plugin via middleclass: local <name> = class("<name>", plugin) and plugin.initialize(self, "<id>", ctx) in initialize. Hook methods (init_worker, access, log, preread, ...) must return self:ret(ok_bool, msg, [http_status]). To deny a request, return self:ret(true, "reason", utils.get_deny_status()).
  • Gate expensive work at init_worker with utils.has_variable("USE_<PLUGIN>", "yes") and skip when self.is_loading is true, matching the pattern used across clamav.lua, coraza.lua, and virustotal.lua.
  • Use local variables and local module tables; avoid globals. Cache ngx.var.* and ngx.req.* values in locals instead of re-reading them repeatedly.
  • Precompile regular expressions in module-level locals; never compile inside request loops. For ngx.re.match/find/gmatch/sub, pass the option string "jo" (j enables PCRE JIT, o compiles the pattern once and caches it), anchor patterns with ^...$ when a full match is intended, and cap input length before matching to prevent ReDoS.
  • Validate and sanitise all request-derived input. Never evaluate request-derived code via load, loadstring, or similar mechanisms.
  • Use ngx.socket for raw TCP (see the ClamAV INSTREAM pattern in clamav.lua) and resty.http for HTTP upstreams (see virustotal.lua and coraza.lua). Prefer resty.upload for streaming request bodies — clamav.lua is the reference.
  • Cache upstream scan results by a strong digest of the body (SHA-512 in virustotal.lua) rather than by filename or weaker hashes; mind cache key length.
  • Shared-dictionary read-modify-write sequences are race-prone; prefer atomic operations such as incr or explicit locking where correctness matters.
  • Never log request bodies, cookies, bearer tokens, webhook secrets, or API keys.
  • Use pcall or explicit error...

Files:

  • clamav/clamav.lua
🪛 Luacheck (1.2.0)
clamav/clamav.lua

[warning] 10-10: unused variable 'NOTICE'

(W211)


[warning] 12-12: unused variable 'HTTP_FORBIDDEN'

(W211)


[warning] 49-49: unused argument 'self'

(W212)


[warning] 79-79: value assigned to variable 'current_pos' is unused

(W311)


[warning] 116-116: unused loop variable 'i'

(W213)


[warning] 153-153: unused argument 'self'

(W212)


[warning] 175-175: unused variable 'err'

(W211)


[warning] 235-235: unused loop variable 'i'

(W213)

🔇 Additional comments (2)
clamav/clamav.lua (2)

33-46: LGTM!

The ping() function correctly handles socket lifecycle and error conditions.


279-286: Clarify deny status usage and verify return semantics.

Line 281 calls get_deny_status() to return the configured deny HTTP status (typically 403). The self:ret(true, ...) with a status code and extra data looks correct per the plugin pattern. However, the fourth parameter is nil explicitly—confirm this aligns with the bunkerweb.plugin contract.

Comment on lines +10 to +12
local NOTICE = ngx.NOTICE
local HTTP_INTERNAL_SERVER_ERROR = ngx.HTTP_INTERNAL_SERVER_ERROR
local HTTP_FORBIDDEN = ngx.HTTP_FORBIDDEN
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Remove unused variables NOTICE and HTTP_FORBIDDEN.

Static analysis correctly flags these as unused. If NOTICE was intended for logging scan results (see line 276), it should be used there instead of ERR.

🧹 Proposed fix
-local NOTICE = ngx.NOTICE
 local HTTP_INTERNAL_SERVER_ERROR = ngx.HTTP_INTERNAL_SERVER_ERROR
-local HTTP_FORBIDDEN = ngx.HTTP_FORBIDDEN
 local HTTP_OK = ngx.HTTP_OK
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
local NOTICE = ngx.NOTICE
local HTTP_INTERNAL_SERVER_ERROR = ngx.HTTP_INTERNAL_SERVER_ERROR
local HTTP_FORBIDDEN = ngx.HTTP_FORBIDDEN
local HTTP_INTERNAL_SERVER_ERROR = ngx.HTTP_INTERNAL_SERVER_ERROR
local HTTP_OK = ngx.HTTP_OK
🧰 Tools
🪛 Luacheck (1.2.0)

[warning] 10-10: unused variable 'NOTICE'

(W211)


[warning] 12-12: unused variable 'HTTP_FORBIDDEN'

(W211)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clamav/clamav.lua` around lines 10 - 12, Remove the unused local constants
NOTICE and HTTP_FORBIDDEN from the top of clamav.lua, or if they were intended
to be used, replace the current ngx.ERR logging used for scan result messages
(the scan-result logging site currently using ERR) with ngx.log(NOTICE, ...) and
replace any hard-coded status usages with the HTTP_FORBIDDEN constant where
appropriate; ensure only the symbols NOTICE and HTTP_FORBIDDEN that are actually
referenced elsewhere (e.g., scan-result logging and forbidden responses) remain
declared, otherwise delete their declarations.

Comment on lines +23 to 31
function clamav:socket()
local sock = ngx.socket.tcp()
sock:settimeout(tonumber(self.variables["CLAMAV_TIMEOUT"] or 5000))
local ok, err = sock:connect(self.variables["CLAMAV_HOST"], tonumber(self.variables["CLAMAV_PORT"]))
if not ok then
return nil, err
end
return sock
end
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Timeout fallback inconsistent with plugin.json default.

Line 25 falls back to 5000ms when CLAMAV_TIMEOUT is unset, but plugin.json specifies the default as 1000ms. Align these values.

🔧 Proposed fix
 function clamav:socket()
   local sock = ngx.socket.tcp()
-  sock:settimeout(tonumber(self.variables["CLAMAV_TIMEOUT"] or 5000))
+  sock:settimeout(tonumber(self.variables["CLAMAV_TIMEOUT"]) or 1000)
   local ok, err = sock:connect(self.variables["CLAMAV_HOST"], tonumber(self.variables["CLAMAV_PORT"]))

Note: tonumber() should wrap only the variable, not the fallback, so that tonumber(nil) returns nil and triggers the or 1000 fallback.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
function clamav:socket()
local sock = ngx.socket.tcp()
sock:settimeout(tonumber(self.variables["CLAMAV_TIMEOUT"] or 5000))
local ok, err = sock:connect(self.variables["CLAMAV_HOST"], tonumber(self.variables["CLAMAV_PORT"]))
if not ok then
return nil, err
end
return sock
end
function clamav:socket()
local sock = ngx.socket.tcp()
sock:settimeout(tonumber(self.variables["CLAMAV_TIMEOUT"]) or 1000)
local ok, err = sock:connect(self.variables["CLAMAV_HOST"], tonumber(self.variables["CLAMAV_PORT"]))
if not ok then
return nil, err
end
return sock
end
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clamav/clamav.lua` around lines 23 - 31, In clamav:socket(), the
CLAMAV_TIMEOUT fallback (used in sock:settimeout) is 5000ms but plugin.json
default is 1000ms; change the call to apply tonumber only to
self.variables["CLAMAV_TIMEOUT"] so the fallback of 1000 is used (e.g., use
tonumber(self.variables["CLAMAV_TIMEOUT"]) or 1000) and pass that to
sock:settimeout to align the default timeout; update the CLAMAV_TIMEOUT
reference accordingly in clamav:socket().

Comment on lines +48 to +67
-- Improved boundary extraction function with better quote handling
function clamav:extract_boundary(content_type)
if not content_type then
return nil
end

-- Handle both boundary=value and boundary="value" formats
local boundary = content_type:match('boundary=([^;%s]+)')
if not boundary then
boundary = content_type:match('boundary="([^"]+)"')
end

if boundary then
-- Remove quotes if present
boundary = boundary:gsub('^"', ''):gsub('"$', '')
return "--" .. boundary
end

return nil
end
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Simplify boundary extraction; suppress unused self warning.

The fallback pattern on line 57 is unreachable for standard quoted boundaries because line 55's pattern ([^;%s]+) captures "value" including quotes (whitespace/semicolon terminates, not quotes). Line 62 then strips them.

Consider a single, clearer pattern. For the static analysis warning, use the conventional _ prefix or --luacheck: ignore 212/self.

♻️ Proposed simplification
+-- luacheck: push ignore 212/self
 function clamav:extract_boundary(content_type)
   if not content_type then
     return nil
   end
-  
-  -- Handle both boundary=value and boundary="value" formats
-  local boundary = content_type:match('boundary=([^;%s]+)')
-  if not boundary then
-    boundary = content_type:match('boundary="([^"]+)"')
-  end
-  
-  if boundary then
-    -- Remove quotes if present
-    boundary = boundary:gsub('^"', ''):gsub('"$', '')
+
+  -- Capture boundary value, with or without quotes
+  local boundary = content_type:match('boundary="([^"]+)"')
+                or content_type:match('boundary=([^;%s]+)')
+  if boundary then
     return "--" .. boundary
   end
-  
   return nil
 end
+-- luacheck: pop
🧰 Tools
🪛 Luacheck (1.2.0)

[warning] 49-49: unused argument 'self'

(W212)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clamav/clamav.lua` around lines 48 - 67, The extract_boundary function has an
unreachable fallback pattern and triggers an unused `self` linter warning;
replace the two-step match with a single pattern that handles both quoted and
unquoted values (e.g. match 'boundary=%s*"?([^";]+)"?' to capture the boundary
without quotes) and then return "--" .. boundary when found, and suppress the
unused `self` warning by either renaming the first parameter to _ (function
clamav:extract_boundary(_) ...) or adding a luacheck directive (--luacheck:
ignore 212/self) at the top of the function.

Comment on lines +69 to +113
-- Enhanced multipart parsing function with proper HTTP/2 support
function clamav:parse_multipart(body, boundary)
if not body or not boundary then
return {}
end

local parts = {}

-- Split body by boundary markers
local sections = {}
local current_pos = 1

-- Find first boundary
local first_boundary_pos = body:find(boundary, 1, true)
if not first_boundary_pos then
return {}
end

current_pos = first_boundary_pos + #boundary

while true do
-- Find next boundary
local next_boundary_pos = body:find(boundary, current_pos, true)
if not next_boundary_pos then
-- Last section
local section = body:sub(current_pos)
if section and #section > 10 then -- Minimum length check
table.insert(sections, section)
end
break
end

-- Extract section from current position to next boundary
local section = body:sub(current_pos, next_boundary_pos - 1)
if section and #section > 10 then -- Minimum length check
table.insert(sections, section)
end

current_pos = next_boundary_pos + #boundary

-- Check for end boundary (ends with --)
if body:sub(current_pos, current_pos + 1) == "--" then
break
end
end
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Address static analysis warnings and document magic constant.

  1. Line 79: current_pos = 1 is immediately overwritten at line 87. Remove the initial assignment.
  2. Lines 95, 103: The magic number 10 (minimum section length) should be a named constant or documented.
🔧 Proposed fix
+local MIN_SECTION_LEN = 10  -- Minimum bytes for a valid multipart section (headers + CRLF)
+
 function clamav:parse_multipart(body, boundary)
   if not body or not boundary then
     return {}
   end
   
   local parts = {}
-  
-  -- Split body by boundary markers
   local sections = {}
-  local current_pos = 1
   
   -- Find first boundary
   local first_boundary_pos = body:find(boundary, 1, true)
   if not first_boundary_pos then
     return {}
   end
   
-  current_pos = first_boundary_pos + `#boundary`
+  local current_pos = first_boundary_pos + `#boundary`
   
   while true do
     local next_boundary_pos = body:find(boundary, current_pos, true)
     if not next_boundary_pos then
       local section = body:sub(current_pos)
-      if section and `#section` > 10 then
+      if section and `#section` > MIN_SECTION_LEN then
         table.insert(sections, section)
       end
       break
     end
     
     local section = body:sub(current_pos, next_boundary_pos - 1)
-    if section and `#section` > 10 then
+    if section and `#section` > MIN_SECTION_LEN then
       table.insert(sections, section)
     end
🧰 Tools
🪛 Luacheck (1.2.0)

[warning] 79-79: value assigned to variable 'current_pos' is unused

(W311)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clamav/clamav.lua` around lines 69 - 113, In function parse_multipart: remove
the redundant initial assignment current_pos = 1 (it's immediately overwritten
after finding first_boundary_pos) and introduce a named constant for the magic
number used to filter tiny sections (replace the literal 10 with e.g. local
MIN_SECTION_LEN = 10 declared near the top of the function and use
MIN_SECTION_LEN in the two checks), optionally add a short comment documenting
why that minimum length exists; reference parse_multipart, current_pos,
first_boundary_pos, and the two section length checks.

Comment on lines +116 to +147
for i, section in ipairs(sections) do
if section and #section > 0 then
-- Remove leading whitespace and newlines
section = section:gsub("^%s*\r?\n", "")

-- Separate headers and body (\r\n\r\n or \n\n)
local header_end = section:find("\r\n\r\n", 1, true)
local separator_len = 4
if not header_end then
header_end = section:find("\n\n", 1, true)
separator_len = 2
end

if header_end then
local headers = section:sub(1, header_end - 1)
local data = section:sub(header_end + separator_len)

-- Remove trailing \r\n
data = data:gsub("\r\n$", ""):gsub("\n$", "")

-- Extract filename from Content-Disposition header
local filename = self:extract_filename(headers)
if filename and #data > 0 then
table.insert(parts, {
headers = headers,
data = data,
filename = filename
})
end
end
end
end
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Use _ for unused loop variable.

Static analysis flags the unused i on line 116.

🧹 Proposed fix
-  for i, section in ipairs(sections) do
+  for _, section in ipairs(sections) do
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for i, section in ipairs(sections) do
if section and #section > 0 then
-- Remove leading whitespace and newlines
section = section:gsub("^%s*\r?\n", "")
-- Separate headers and body (\r\n\r\n or \n\n)
local header_end = section:find("\r\n\r\n", 1, true)
local separator_len = 4
if not header_end then
header_end = section:find("\n\n", 1, true)
separator_len = 2
end
if header_end then
local headers = section:sub(1, header_end - 1)
local data = section:sub(header_end + separator_len)
-- Remove trailing \r\n
data = data:gsub("\r\n$", ""):gsub("\n$", "")
-- Extract filename from Content-Disposition header
local filename = self:extract_filename(headers)
if filename and #data > 0 then
table.insert(parts, {
headers = headers,
data = data,
filename = filename
})
end
end
end
end
for _, section in ipairs(sections) do
if section and `#section` > 0 then
-- Remove leading whitespace and newlines
section = section:gsub("^%s*\r?\n", "")
-- Separate headers and body (\r\n\r\n or \n\n)
local header_end = section:find("\r\n\r\n", 1, true)
local separator_len = 4
if not header_end then
header_end = section:find("\n\n", 1, true)
separator_len = 2
end
if header_end then
local headers = section:sub(1, header_end - 1)
local data = section:sub(header_end + separator_len)
-- Remove trailing \r\n
data = data:gsub("\r\n$", ""):gsub("\n$", "")
-- Extract filename from Content-Disposition header
local filename = self:extract_filename(headers)
if filename and `#data` > 0 then
table.insert(parts, {
headers = headers,
data = data,
filename = filename
})
end
end
end
end
🧰 Tools
🪛 Luacheck (1.2.0)

[warning] 116-116: unused loop variable 'i'

(W213)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clamav/clamav.lua` around lines 116 - 147, The loop declares an unused
variable `i` in `for i, section in ipairs(sections) do`; change it to use the
conventional unused-variable name by replacing `i` with `_` (i.e., `for _,
section in ipairs(sections) do`) so static analysis no longer flags the unused
binding, leaving the rest of the loop body (checks, header parsing,
extract_filename call, and table.insert into `parts`) unchanged.

Comment on lines +173 to 186
function clamav:api()
if self.ctx.bw.uri == "/clamav/ping" and self.ctx.bw.request_method == "POST" then
local enabled, err = has_variable("USE_CLAMAV", "yes")
if not enabled then
return self:ret(true, "ClamAV plugin not enabled")
end
local ok, res = self:ping()
if not ok then
return self:ret(true, "ClamAV ping failed: " .. tostring(res), HTTP_INTERNAL_SERVER_ERROR)
end
return self:ret(true, "ClamAV ping successful", HTTP_OK)
end
return self:ret(false, "success")
end
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Address unused variable from has_variable call.

Line 175 captures err but never uses it. Either use _ or log the error.

🧹 Proposed fix
-    local enabled, err = has_variable("USE_CLAMAV", "yes")
+    local enabled = has_variable("USE_CLAMAV", "yes")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
function clamav:api()
if self.ctx.bw.uri == "/clamav/ping" and self.ctx.bw.request_method == "POST" then
local enabled, err = has_variable("USE_CLAMAV", "yes")
if not enabled then
return self:ret(true, "ClamAV plugin not enabled")
end
local ok, res = self:ping()
if not ok then
return self:ret(true, "ClamAV ping failed: " .. tostring(res), HTTP_INTERNAL_SERVER_ERROR)
end
return self:ret(true, "ClamAV ping successful", HTTP_OK)
end
return self:ret(false, "success")
end
function clamav:api()
if self.ctx.bw.uri == "/clamav/ping" and self.ctx.bw.request_method == "POST" then
local enabled = has_variable("USE_CLAMAV", "yes")
if not enabled then
return self:ret(true, "ClamAV plugin not enabled")
end
local ok, res = self:ping()
if not ok then
return self:ret(true, "ClamAV ping failed: " .. tostring(res), HTTP_INTERNAL_SERVER_ERROR)
end
return self:ret(true, "ClamAV ping successful", HTTP_OK)
end
return self:ret(false, "success")
end
🧰 Tools
🪛 Luacheck (1.2.0)

[warning] 175-175: unused variable 'err'

(W211)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clamav/clamav.lua` around lines 173 - 186, In clamav:api replace the unused
variable from the has_variable call by either discarding it or using it: change
the call site where you currently do "local enabled, err = has_variable(...)" to
"local enabled, _ = has_variable(...)" to avoid the unused-variable warning, or
alternatively capture and log the error (e.g. use process or self logger to log
err) before returning; target the has_variable call inside the clamav:api
function to implement the change.

Comment on lines +212 to +229
-- Read request body
ngx.req.read_body()
local body = ngx.req.get_body_data()
if not body then
local file_path = ngx.req.get_body_file()
if file_path then
local f = io.open(file_path, "rb")
if f then
body = f:read("*a")
f:close()
end
end
end

function clamav:add_to_cache(checksum, value)
local ok, err = self.cachestore:set("plugin_clamav_" .. checksum, value, 86400)
if not ok then
return false, err
end
return true
end
if not body then
self.logger:log(ERR, "[clamav] Failed to read request body")
return self:ret(true, "Empty request body")
end
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if nginx client_max_body_size or similar limits are configured in the test setup
rg -n 'client_max_body_size|CLAMAV.*SIZE|MAX.*BODY' --type-add 'nginx:*.conf' --type=nginx --type=lua --type=json

Repository: bunkerity/bunkerweb-plugins

Length of output: 53


🏁 Script executed:

# Read the full access() function context around the flagged lines
cd clamav || exit 1
wc -l clamav.lua

Repository: bunkerity/bunkerweb-plugins

Length of output: 85


🏁 Script executed:

# View the actual implementation at lines 212-229 with surrounding context
cd clamav || exit 1
sed -n '200,240p' clamav.lua | cat -n

Repository: bunkerity/bunkerweb-plugins

Length of output: 1677


🏁 Script executed:

# Search for any size limit configurations or environment variables
cd clamav || exit 1
rg -n 'MAX.*SIZE|SIZE.*LIMIT|body.*size|size.*check' clamav.lua

Repository: bunkerity/bunkerweb-plugins

Length of output: 53


🏁 Script executed:

# Check if environment variables define body limits elsewhere in the plugin
cd clamav || exit 1
rg -n 'ngx\.var\.|CLAMAV' clamav.lua | head -30

Repository: bunkerity/bunkerweb-plugins

Length of output: 453


🏁 Script executed:

# Check parse_multipart implementation to see if it has size protections
cd clamav || exit 1
rg -A 30 'function.*parse_multipart|local.*parse_multipart' clamav.lua

Repository: bunkerity/bunkerweb-plugins

Length of output: 1855


🏁 Script executed:

# Search entire bunkerweb-plugins directory for body size limits
rg -n 'client_max_body_size|MAX_BODY|DEFAULT.*BODY.*SIZE' --type=lua --type=conf --type=json | head -20

Repository: bunkerity/bunkerweb-plugins

Length of output: 99


🏁 Script executed:

# Check the full access() function signature and any variable initialization
cd clamav || exit 1
sed -n '180,220p' clamav.lua | cat -n

Repository: bunkerity/bunkerweb-plugins

Length of output: 1687


🏁 Script executed:

# Look for any nginx config templates or defaults in the plugin directory
cd clamav || exit 1
find . -name '*.conf' -o -name '*.template' -o -name 'plugin.json' -o -name 'plugin.yaml' | xargs ls -la 2>/dev/null

Repository: bunkerity/bunkerweb-plugins

Length of output: 133


🏁 Script executed:

# Check plugin.json for body size configuration schema
cd clamav || exit 1
cat plugin.json

Repository: bunkerity/bunkerweb-plugins

Length of output: 1216


🏁 Script executed:

# Look at surrounding plugins to see how they handle body limits
find .. -name 'clamav' -o -name 'coraza' -o -name 'virustotal' | head -5

Repository: bunkerity/bunkerweb-plugins

Length of output: 165


🏁 Script executed:

# Check virustotal plugin as the coding guidelines mention it—see how it handles uploads
cd ../virustotal 2>/dev/null && sed -n '1,50p' virustotal.lua | cat -n

Repository: bunkerity/bunkerweb-plugins

Length of output: 53


🏁 Script executed:

# Check if there's an example of body size validation in other clamav or similar plugins
rg -n 'ngx.req.get_body_file\|io.open' --type=lua -B 5 -A 5 | head -50

Repository: bunkerity/bunkerweb-plugins

Length of output: 53


🏁 Script executed:

# Verify Lua io.read() error handling semantics by checking error case
cd clamav || exit 1
cat > /tmp/test_io_read.lua << 'EOF'
-- Test Lua io behavior
local f = io.open("/nonexistent", "rb")
if f then
  local data = f:read("*a")
  print("Data:", data)
  f:close()
else
  print("File not found - f is nil")
end
EOF
lua /tmp/test_io_read.lua 2>&1 || echo "Lua test completed"

Repository: bunkerity/bunkerweb-plugins

Length of output: 96


🏁 Script executed:

# Check if nginx config in BunkerWeb sets client_max_body_size defaults
find .. -name '*.conf' -o -name '*.template' | xargs grep -l 'client_max_body_size' 2>/dev/null | head -5

Repository: bunkerity/bunkerweb-plugins

Length of output: 53


🏁 Script executed:

# Verify the actual behaviour: does code distinguish between empty body and read error?
cd clamav || exit 1
sed -n '213,230p' clamav.lua | cat -n

Repository: bunkerity/bunkerweb-plugins

Length of output: 611


Add explicit body size validation before buffering entire request.

Reading the entire request body into memory without size checks (lines 214–222) risks memory exhaustion on large uploads. Whilst nginx client_max_body_size may provide implicit protection, this plugin should validate explicitly to be resilient across deployments.

Add a configurable CLAMAV_MAX_BODY_SIZE setting (e.g. 50 MiB default) and reject requests exceeding it after line 222, before passing the body to parse_multipart().

Note: The file handle at line 219 is closed correctly even if f:read() fails; no leak risk here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clamav/clamav.lua` around lines 212 - 229, The code buffers the entire
request body into memory (the block that reads ngx.req.get_body_data() and falls
back to ngx.req.get_body_file()) without size validation; add a configurable
CLAMAV_MAX_BODY_SIZE (default 50 * 1024 * 1024) and after the body is populated
but before calling parse_multipart() check the byte length (use `#body` or
equivalent) and return an error via self:ret(true, "Request body too large") and
log via self.logger:log(ERR, ...) when the body exceeds the limit; ensure the
new config key is read where the plugin config is loaded and referenced in this
function so deployments can override the default.

local parts = self:parse_multipart(body, boundary)

-- Scan each file part with ClamAV
for i, part in ipairs(parts) do
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Use _ for unused loop variable.

Static analysis flags i as unused.

🧹 Proposed fix
-  for i, part in ipairs(parts) do
+  for _, part in ipairs(parts) do
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for i, part in ipairs(parts) do
for _, part in ipairs(parts) do
🧰 Tools
🪛 Luacheck (1.2.0)

[warning] 235-235: unused loop variable 'i'

(W213)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clamav/clamav.lua` at line 235, The loop uses an unused index variable `i` in
the `for i, part in ipairs(parts) do` iteration, which triggers static-analysis
warnings; replace it with the conventional `_` placeholder (i.e., `for _, part
in ipairs(parts) do`) inside the same scope (in the file clamav/clamav.lua where
that loop appears) and confirm there are no references to `i` within the loop
body—if any exist, either restore a meaningful name or use the actual name where
needed instead of `_`.

Comment on lines +267 to +270
-- Send stream termination marker
sock:send(string.char(0, 0, 0, 0))
local result = sock:receive("*l")
sock:close()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Check return value of termination marker send; handle potential nil result.

Line 268 ignores the return value of sock:send() for the stream termination marker. If this fails, the subsequent receive() may hang or return unexpected data.

🔧 Proposed fix
       -- Send stream termination marker
-      sock:send(string.char(0, 0, 0, 0))
+      local term_ok, term_err = sock:send(string.char(0, 0, 0, 0))
+      if not term_ok then
+        self.logger:log(ERR, "[clamav] Failed to send termination marker: " .. tostring(term_err))
+        sock:close()
+        return self:ret(false, "Failed to send termination marker")
+      end
       local result = sock:receive("*l")
       sock:close()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
-- Send stream termination marker
sock:send(string.char(0, 0, 0, 0))
local result = sock:receive("*l")
sock:close()
-- Send stream termination marker
local term_ok, term_err = sock:send(string.char(0, 0, 0, 0))
if not term_ok then
self.logger:log(ERR, "[clamav] Failed to send termination marker: " .. tostring(term_err))
sock:close()
return self:ret(false, "Failed to send termination marker")
end
local result = sock:receive("*l")
sock:close()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clamav/clamav.lua` around lines 267 - 270, Check the return value of
sock:send when sending the stream termination marker in the function that calls
sock:send(string.char(0,0,0,0)); if send returns nil (failure, err) handle it by
closing the socket (sock:close()), propagate or return an error instead of
proceeding to sock:receive, and only call sock:receive("*l") when send
succeeded; update any callers that rely on the local result variable to handle
the error path accordingly.

Comment on lines +275 to +277
else
self.logger:log(ERR, "[clamav] ClamAV result for " .. part.filename .. ": " .. tostring(result))
end
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use appropriate log level for successful scan results.

Line 276 logs successful ClamAV results at ERR level. This pollutes error logs during normal operation. Use NOTICE (which was declared but unused) or ngx.INFO.

🔧 Proposed fix

If retaining NOTICE:

-        self.logger:log(ERR, "[clamav] ClamAV result for " .. part.filename .. ": " .. tostring(result))
+        self.logger:log(NOTICE, "[clamav] ClamAV result for " .. part.filename .. ": " .. tostring(result))

Alternatively, log at ERR only when malware is found, and use NOTICE or omit logging for clean files.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
else
self.logger:log(ERR, "[clamav] ClamAV result for " .. part.filename .. ": " .. tostring(result))
end
else
self.logger:log(NOTICE, "[clamav] ClamAV result for " .. part.filename .. ": " .. tostring(result))
end
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clamav/clamav.lua` around lines 275 - 277, The log at self.logger:log(ERR,
"[clamav] ClamAV result for " .. part.filename .. ": " .. tostring(result)) is
using the ERR level for normal/clean scan results; change it to use the NOTICE
level (which is declared but unused) or ngx.INFO for successful scans, and
reserve ERR only for malware detections or explicit errors (i.e., update the
logging call in the ClamAV result handling to use NOTICE/ngx.INFO for clean
results and log at ERR only when result indicates malware or an error).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants