Skip to content

perf: improve compression speed and ratio in lzsst2_file_compression.pl#5

Closed
Copilot wants to merge 2 commits intomasterfrom
copilot/improve-performance-compression
Closed

perf: improve compression speed and ratio in lzsst2_file_compression.pl#5
Copilot wants to merge 2 commits intomasterfrom
copilot/improve-performance-compression

Conversation

Copy link
Contributor

Copilot AI commented Mar 4, 2026

LZ77+Huffman compressor had O(n²log n) Huffman tree construction, no early exit in match search, redundant lazy-match probes, and a suboptimal chunk size and chain length.

LZ77 (find_match / lz77_compression)

  • Early termination: last if ($best_n > $max_len) in match loop — no point scanning further once a 258-byte match is found
  • Selective hash updates: for matches > 32 bytes, only insert the first 4 and last 4 positions into the hash table instead of every position
  • Skip redundant lazy match: only probe la+1 when a match was already found at la ($n1 > 1)
  • max_chain_len: 48 → 64

Huffman (mktree_from_freq)

Replaced O(n²log n) full re-sort loop with O(n log n) insertion-sort using binary search on the already-sorted remainder:

my ($lo, $hi) = (0, scalar(@nodes));
while ($lo < $hi) {
    my $mid = ($lo + $hi) >> 1;
    if ($nodes[$mid][1] < $weight) { $lo = $mid + 1 }
    else                           { $hi = $mid     }
}
splice(@nodes, $lo, 0, $new_node);

Single-symbol frequency tables are explicitly wrapped so the symbol receives code '0' (≥1 bit), matching original behavior and preventing a zero-length encoding that breaks decompression.

find_distance_index

Linear scan → binary search (O(n) → O(log n)).

CHUNK_SIZE

1 << 191 << 21 (512 KB → 2 MB) for a larger LZ77 window and better compression ratio.

Original prompt

Objective

Improve both the runtime performance and compression ratio of Compression/lzsst2_file_compression.pl, which implements LZ77 compression (LZSS variant with hash tables and lazy matching) + Huffman coding, using a DEFLATE-like approach.

The changes must preserve full backward-compatible decompression — i.e., archives produced by the new code must still decompress correctly, and the decompressor should remain unchanged in behavior.

File to modify

Compression/lzsst2_file_compression.pl

Current bottlenecks and areas for improvement

After careful analysis, these are the specific improvements to make:

1. LZ77 Compression — Performance (lz77_compression and find_match)

a) Early termination in find_match when max_len is reached:

In find_match, the inner foreach loop iterates over all positions in the chain even after finding a maximum-length match (258). Add an early exit:

if ($n > $best_n) {
    $best_p = $p;
    $best_n = $n;
    last if ($best_n > $max_len);  # can't do better
}

b) Skip updating the hash table for positions already covered by a long match:

Currently, every position inside a match is inserted into the hash table. For very long matches this is wasteful. When $best_n is large (e.g., > 32), only insert the first few and last few positions into the hash table instead of all of them, to save time without significantly affecting compression ratio:

# Only insert boundary positions for long matches to save time
if ($best_n > 32) {
    # Insert first 4 and last 4 positions
    foreach my $i (0 .. 3, $best_n - $min_len - 3 .. $best_n - $min_len) {
        next if $i < 0 or $i > length($matched) - $min_len;
        my $key = substr($matched, $i, $min_len);
        unshift @{$table{$key}}, $la + $i;
        if (scalar(@{$table{$key}}) > $max_chain_len) {
            pop @{$table{$key}};
        }
    }
}
else {
    # Original behavior for short matches
    foreach my $i (0 .. length($matched) - $min_len) {
        my $key = substr($matched, $i, $min_len);
        unshift @{$table{$key}}, $la + $i;
        if (scalar(@{$table{$key}}) > $max_chain_len) {
            pop @{$table{$key}};
        }
    }
}

c) Avoid redundant find_match call at la+1 when no match exists at la:

The lazy matching currently always tries to find a match at la+1 even when it's unlikely to help. Only do the second find_match call when the first match was found (i.e., $n1 > 1):

if ($n1 > 1 and exists($table{$lookahead2})) {
    ($n2, $p2) = find_match(\$str, $la + 1, $min_len, $max_len, $end, \%table, \@symbols);
}

2. Increase default CHUNK_SIZE for better compression ratio

The current CHUNK_SIZE is 1 << 19 (512 KB). Increase it to 1 << 21 (2 MB). Larger chunks give the LZ77 engine a bigger window to find matches, improving compression ratio at the cost of slightly more memory (which is acceptable for modern systems). The script already says "higher value = better compression":

CHUNK_SIZE => 1 << 21,    # higher value = better compression

3. find_distance_index — Use binary search instead of linear scan

The find_distance_index function uses a linear scan through @DISTANCE_SYMBOLS to find the right bucket for a distance value. Replace this with a binary search for O(log n) lookup:

sub find_distance_index ($dist, $distance_symbols) {
    my ($lo, $hi) = (0, $#{$distance_symbols});
    while ($lo < $hi) {
        my $mid = ($lo + $hi + 1) >> 1;
        if ($distance_symbols->[$mid][0] <= $dist) {
            $lo = $mid;
        }
        else {
            $hi = $mid - 1;
        }
    }
    return $lo;
}

4. Huffman tree construction — avoid repeated full sorts

In mktree_from_freq, the "poor man's priority queue" re-sorts the full @nodes array on every iteration. This is O(n² log n). Instead, use an insertion-sort approach: after splicing the two smallest nodes and creating a merged node, insert the new node into the already-sorted remainder at the correct position using binary search:

sub mktree_from_freq ($freq) {

    my @nodes = map { [$_, $freq->{$_}] } sort { $a <=> $b } keys %$freq;
    @nodes = sort { $a->[1] <=> $b->[1] } @nodes;

    while (@nodes > 1) {
        my ($x, $y) = splice(@nodes, 0, 2);
        my $new_node;
        if (defined($y)) {
            $new_node = [[$x, $y], $x->[1] + $y->[1]];
        }
        else {
            $new_node = [[$x], $x->[1]];
        }

        # Binary search for insertion position
        my $weight = $new_node->[1];
        my ($lo, $hi) = (0, scalar(@nodes));
        while ($lo < $hi) {
            my $mid = ($lo + $hi) >> 1;
            if ($nodes[$mid][1] < $weight) {
                $lo = $mid + 1;
            }
            else {
                $hi = $mid;
            }
        }
        splice(@nodes, $lo, 0, $new_node);
    }

    walk($nodes[0], '', {}, {});
}

5. Increase `max_cha...

This pull request was created from Copilot chat.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

…ression.pl

Co-authored-by: trizen <614513+trizen@users.noreply.github.com>
Copilot AI changed the title [WIP] Improve runtime performance and compression ratio in lzsst2 perf: improve compression speed and ratio in lzsst2_file_compression.pl Mar 4, 2026
@trizen trizen closed this Mar 4, 2026
@trizen trizen deleted the copilot/improve-performance-compression branch March 4, 2026 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants