-
Notifications
You must be signed in to change notification settings - Fork 16
feat: support cpcsketch serde #84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
| /// Computes and checks the 16-bit seed hash from the given long seed. | ||
| /// | ||
| /// The seed hash may not be zero in order to maintain compatibility with older serialized | ||
| /// versions that did not have this concept. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that we should check the return value to prevent 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose so.
cc @leerho I can't see similar requiremeny based on barely the Rust code. Could you provide more context why 0 is not allowed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW this comment is copied from computeSeedHash's Java version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is exactly what the comment says. It is to remain compatible with older sketch versions (in other languages) that did not have the concept of the seedHash. Once you have serialized a sketch, it no longer retains any information about what language generated the serialized image. That is the whole idea and quite powerful! Once you have properly created this sketch in Rust, you will be able to import sketch images created years ago from Java, C++, or whatever.
The fact that "older versions of Rust" don't have this problem is irrelevant. :)
And yes, the method that generates the seed must check for 0, as it does in Java.
And, hmmm, it looks like C++ doesn't check for zero either. Which is a bug.
The likely reason this has not been noticed before is because we always use the DEFAULT_UPDATE_SEED, which has a non-zero seed_hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look like we need to return an error here for the result is 0 as Java:
public static short computeSeedHash(final long seed) {
final long[] seedArr = {seed};
final short seedHash = (short)(hash(seedArr, 0L)[0] & 0xFFFFL);
if (seedHash == 0) {
throw new SketchesArgumentException(
"The given seed: " + seed + " produced a seedHash of zero. "
+ "You must choose a different seed.");
}
return seedHash;
}
Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
|
I'm going to do the following tasks after this patch is merged:
For this patch, one open question is whether to include the decoding table as static values, or build it at the first access (using I tend to keep the static decoding tables. They should not increase the binary size too much. |
|
I'm going to merge this patch now. Review after commit is welcome. To reduce binary size, we'd follow #32 to exclude CpcSketch's code when users doesn't need it. |
This closes #37