Improve the performance when using enumeration#8395
Improve the performance when using enumeration#8395aplopez wants to merge 9 commits intoSSSD:masterfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request effectively improves performance by optimizing logging, removing an unused function, and correcting a condition related to enumeration. The changes are well-aligned with the stated goals of enhancing enumeration performance, especially for large user bases. The addition of a new test case for the general enumeration scenario ensures that the modified logic is adequately covered.
|
Mistype in the commit message: "We must look into de TS cache" |
Fixed. |
|
I think fix is correct in the sense it fixes a bug. But I think logic of In particular, if |
|
Note: Covsan is green so far. |
|
Hm, Looks relevant, but why f44 only... race condition? |
I reran the tests and a different test failed. 😮💨 |
|
And now all the tests passed. There is some instability in F44, but not related to this PR. |
It is very suspicious that it was |
I thought the same until I noticed this test failed once and never again. The second time a completely different test failed. The third time, the latest, none. |
|
Hi, I think I have no further comments to the code and it looks like the enumeration performance also improved. Did you run some enumeration test with and without the latest version patches as well. If yes, can you share the result here so that we have some value for future reference, if needed? If possible, results with sssd-2.7.3 would be nice as well. bye, |
The results are instantaneous when the cache is populated, that is, when the task
I don't have such an environment and since I'm running the tests on my own PC, switching to 2.7.3 is not a simple task. Using a VM with 2.7.3 might be a posibility, but I don't know to which extent those results are comparable. If you think this is a viable way, let me know and I will give it a try. |
weas - > were |
waisting -> wasting |
Create the filter to retrieve only the requested entries. Do not create a new filter and search for matches if there is no results from the previous search. The called functions handle this case correctly but why wasting time calling them?
Function cache_req_user_by_filter_lookup() will set or not the recent filter depending on whether data->name.attr is set or not. As mentioned in the comment, it should be done base on whether the refernced attribute is name or not.
The message said that sysdb_enumpwent() had failed, but it was actually sysdb_enumpwent_filter() which failed.
|
Fixed all the typos. |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces several changes to improve user enumeration performance, particularly in environments with a large number of users. The key changes include removing an unused function, correcting a condition to enable an optimization for name-based filtering, preventing the logging of potentially very long filter strings, and updating tests accordingly. The changes appear sound and should deliver the described performance benefits. I have one suggestion to improve the accuracy of a debug log message.
|
Note: Covscan is green (barring mistype in the latest update). |
The "name" attribute was not being added to the TS cache, even though that it is part of the DN (ldb doesn't enforce it). This made the if-block in sysdb_enumpwent_filter() rather useless. In addition, once this if-block is executed, the fuction leaves without further processing.
Although ts_res.count is set to 0 when sysdb_search_ts_users() return ERR_NO_TS, before using it we make an extra check to verify that the returned code is EOK.
|
Thank you, ACK. |
This PR includes:
Enumeration, specially when there are 15,000+ users, is slow. This fix helps, but it doesn't work miracles.
In my test environment, the enumeration went from 8 minutes to about 1.
It is important to know that, with such an amount of users, many operations time out. It is necessary to increment the
timeoutin[nss]and for the domain, but also set large values forldap_enumeration_refresh_timeoutandldap_search_timeoutin the domain. I used these values to avoid any timeout (YMMV):