The current implementations generate large binaries because they have one specialized implementation for each bitwidth, and do loop unrolling.
Add a flag-enabled implementation that uses a more compact scalar implementation. This would be useful for web assembly for instance.
The current implementations generate large binaries because they have one specialized implementation for each bitwidth, and do loop unrolling.
Add a flag-enabled implementation that uses a more compact scalar implementation. This would be useful for web assembly for instance.