Skip to content

Conversation

@asagege
Copy link
Contributor

@asagege asagege commented Dec 3, 2025

Overview:

This PR introduces two data structures, FIFO and mutexQueue, as mentioned in #2878 , along with a significant refactoring of the existing bio utility that demonstrates their practical value.

New Creative Components:

  • FIFO - A high-performance queue that's significantly more space and time efficient than adlist.
  • mutexQueue - A thread-safe wrapper around FIFO that encapsulates synchronization primitives into a reusable abstraction. This creative design eliminates repetitive mutex/condition operations throughout the codebase.

Refactored Components:

  • bio.c - refactored using mutexQueue, with 56 lines of code eliminated while achieving more maintainable code.

These prepare the prerequisite data structures for bgIteration.

Details:

  1. FIFO - fifo.h, fifo.c, test_fifo.c

  2. mutexQueue - mutexqueue.h, mutexqueue.c, test_mutexqueue.c

  3. bio.c Refactoring:

    Before: Manual management of 3 synchronization primitives per queue
    After: Single mutexQueue abstraction handles everything

    Key improvements:

    • Simplified synchronization - mutexQueue encapsulates mutex locking and condition variable signaling, eliminating manual coordination in:

      • bioInit()
      • bioSubmitJob()
      • bioProcessBackgroundJobs()
      • bioPendingJobsOfType()
    • Thread-safe counters - Replaced manual locking of bio_jobs_counter with atomic operations

    • bioDrainWorker() now uses polling on the queue abstraction instead of managing low-level synchronization

Testing:

  • Unit tests for FIFO in test_fifo.c: all pass,
    COMPARE_PERFORMANCE_TO_ADLISTshows the improvement = 78.69% (List: 122 ms, FIFO: 26 ms)
  • Unit tests for mutexQueue in test_mutexqueue.c: all pass
  • bio.c functionality preserved with refactored implementation

@codecov
Copy link

codecov bot commented Dec 3, 2025

Codecov Report

❌ Patch coverage is 49.41176% with 86 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.43%. Comparing base (70196ee) to head (fce4bd8).
⚠️ Report is 4 commits behind head on unstable.

Files with missing lines Patch % Lines
src/mutexqueue.c 39.72% 44 Missing ⚠️
src/fifo.c 52.27% 42 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #2895      +/-   ##
============================================
- Coverage     72.45%   72.43%   -0.03%     
============================================
  Files           129      131       +2     
  Lines         70526    70664     +138     
============================================
+ Hits          51102    51182      +80     
- Misses        19424    19482      +58     
Files with missing lines Coverage Δ
src/bio.c 82.30% <100.00%> (-3.00%) ⬇️
src/fifo.c 52.27% <52.27%> (ø)
src/mutexqueue.c 39.72% <39.72%> (ø)

... and 14 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Member

@JimB123 JimB123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments on the unit tests. Looks good overall.

Comment on lines +59 to +60
fifo *q = fifoCreate();
TEST_EXPECT(fifoLength(q) == 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These statements should be moved out of the DETECT_CRASH block.

You're trying to detect a crash inside of fifoPop. However, as is, the test will pass if there's a crash inside of fifoCreate.

Comment on lines +76 to +77
fifo *q = fifoCreate();
TEST_EXPECT(fifoLength(q) == 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move these 2 statements outside of the DETECT_CRASH block.

for (int i = 0; i < iterations; i++) exerciseFifo();
long fifoMs = elapsedMs(timer);

TEST_EXPECT(listMs == fifoMs); /* This will fail, printing result */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line isn't needed, right?

@JimB123
Copy link
Member

JimB123 commented Dec 5, 2025

fifo is a handy data structure. We have a number of places in Valkey which use FIFO queues. This has much better performance characteristics (space and time) than adlist for cases just requiring a queue.

mutexQueue nicely abstracts the mutex locking and waiting functions. In the bio code, we can see how it streamlines the logic.

Note that both of these are also prerequisites to #2878.

@JimB123
Copy link
Member

JimB123 commented Dec 5, 2025

Re: the code coverage. I see that there are unit tests which cover the code. From my read of this, it looks like the Codecov process is measuring code coverage based only on the TCL (integration) tests. In this case, bio is being integration tested which results in code coverage for much (but not all) of fifo.c and mutexqueue.c.

I do see that the areas reported as uncovered by Codecov are all covered by the supplied unit tests. This looks ok to me.

* POSSIBILITY OF SUCH DAMAGE.
*/


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't remove unrelated line.

Copy link
Member

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have seen this code before, and functionality is still consistent. Mostly some high level structural and naming things.

void *fifoPeek(fifo *q);

/* Return and remove the first item from the queue.
* NOTE: asserts if the queue is empty. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had previous comments in the past that I would rather return some fixed defined value instead of crashing. This behavior is unintuitive and might lead to bugs if there aren't explicit checks here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This alternate would work, but makes the API a little less pretty:

// Return true if an item exists.  If false, `item` is undefined.
bool fifoPeek(fifo *q, void **item);
bool fifoPop(fifo *q, void **item);

Either way, the calling code is forced to perform checking. In either case, if the calling code doesn't check, that's an error.

Given that the calling code has to check anyway, I'm not seeing an advantage to making the API a little more awkward.


typedef struct fifo fifo;

/* Create a new FIFO queue. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The convention for Valkey is that functions are documented primarily in the C function with a high level comment in the .h file. I know there is some contention here, but I think it's worse that we are documenting them in both places. I would rather them just be in the C file, there is no need to be inconsistent here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be moving towards documenting an API separate from the code. Providing strong focus on what's contained (and not contained) in the API improves modularity. When users wants to understand the API, they shouldn't have to sift through the entire C file to find the documentation for the API.

Unless the core team feels strongly that the header files should be left undocumented, I'd like to leave the documentation in the header.

Comment on lines +238 to +239
/* Blindly overwrites target from source. */
static void blindlyMoveFifoContents(fifo *target, fifo *source) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/* Blindly overwrites target from source. */
static void blindlyMoveFifoContents(fifo *target, fifo *source) {
/* Overwrites target from source. */
static void overwriteFifoContents(fifo *target, fifo *source) {

Blindly is a little confusing. Would rather use more inclusive language.

* SPDX-License-Identifier: BSD-3-Clause
*/

/* FIFO - A high-performance FIFO queue implementation */
Copy link
Member

@madolson madolson Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it mutex queue and fifo? Shouldn't this just be a queue? You use both queue and fifo interchangeably throughput the code without a clear distinction, let's just pick one term so all the code is clearer

Like:
/* Look at the first item in the queue (without removing it).

  • NOTE: asserts if the queue is empty. */
    void *fifoPeek(fifo *q) {

The code uses both fifo and queue.

Copy link
Member

@JimB123 JimB123 Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mutexQueue and fifo are 2 separate things, each with their own API and definition. We really don't need consistency between them. The mutex queue provides the concept of priority queuing, so it's not strictly FIFO. Under the hood, it's implemented using fifo, but that's not required - and could change without altering the API/functionality.

Regarding consistency within fifo.c, you're right, both terms are used. However since FIFO is a specific type of queue, it's not wrong or even (IMO) confusing to refer to a fifo as a queue, in the general sense.

Note that even the parameter is simply q in most cases. This should be clear to most.

pthread_cond_wait(&bwd->bio_newjob_cond, &bwd->bio_mutex);
void bioDrainWorker(int type) {
while (bioPendingJobsOfType(type) > 0) {
usleep(1000); /* Sleep for 1ms and check again*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why so long of a sleep? I feel like we could sleep much less here.

*
* ----------------------------------------------------------------------------
*
* Copyright (c) 2009-2012, Redis Ltd.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you can't remove this. The vast majority of the code is the original implementation. We need to keep the reference for their copyright.

@JimB123
Copy link
Member

JimB123 commented Dec 8, 2025

@ranshid Answering a question raised in the core team meeting 12/8/2025. The fifo and mutexQueue serve a different purpose than the lockless queue mechanism described in #2909. Notably, the lockless mechanism in #2909:

  • is custom built for high-performance for a specific use-case
  • combines both threading concerns with the queueing data structure
  • for the queuing part:
    • provides a fixed length queue, using a constant 128k of memory (not dynamic or resizeable)
    • is purpose-built, and not designed for general use (can't store null, API and documentation are specific to purpose)
  • for the threading part
    • doesn't support blocking wait operations

Alternatively,

  • fifo - is designed to be a general purpose data structure with multiple use cases
  • mutexQueue - is designed to be general purpose, supporting both blocking and non-blocking use cases

Note that for the IO threading use case of #2909, the general purpose queuing structures described here would likely have very different performance characteristics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants