[WIP] Split arb_pow_fmpz_binexp into a ui and mpz version and implement proper arb_sqr#396
[WIP] Split arb_pow_fmpz_binexp into a ui and mpz version and implement proper arb_sqr#396albinahlback wants to merge 12 commits intoflintlib:masterfrom
arb_pow_fmpz_binexp into a ui and mpz version and implement proper arb_sqr#396Conversation
And inline arb_pow_[fmpz_binexp/fmpz/ui].
This is going to be used for adding exponents when squaring
And fill some gaps in the documentation for arf.h.
|
|
||
| mag_init(resr); | ||
| mag_fast_mul(resr, xm, arb_radref(x)); | ||
| if (resr->exp < COEFF_MAX && resr->exp >= COEFF_MIN) |
There was a problem hiding this comment.
These checks are not needed; the exponent can be increment with risk of overflow.
There was a problem hiding this comment.
I presume you meant without risk of overflow ;) Och god jul, Fredrik!
|
|
||
| mag_init(resr); | ||
| mag_mul(resr, xm, arb_radref(x)); | ||
| fmpz_add_ui(&(resr->exp), &(resr->exp), 1); |
| ? arf_mul_rnd_down(z, x, y, prec) \ | ||
| : arf_mul_rnd_any(z, x, y, prec, rnd)) | ||
|
|
||
| int arf_sqr_via_mpfr(arf_ptr res, arf_srcptr x, slong prec, arf_rnd_t rnd); |
There was a problem hiding this comment.
This function is pointless extra code; arf_mul_via_mpfr already does the job.
There was a problem hiding this comment.
It removes one branching and removes two/three useless assignments/calculations. My thinking with this PR is to strip all the "unnecessary" stuff away from squaring in order to make it as fast as possible. I know it's pretty small improvement, but you do not think that it's worth it?
There was a problem hiding this comment.
mul_via_mpfr is only used for products that take 1000+ cycles to compute so a few cycles here will not be measurable.
|
|
||
| .. macro:: nn_mul_2x1(r2, r1, r0, a1, a0, b0) | ||
|
|
||
| Sets `(r_2, r_1, r_0)` to `(a_1, a_0)` multiplied by `b_0`. |
There was a problem hiding this comment.
Might want to document that these are limbs. Also, I think these macros don't allow aliasing, which might be worth documenting.
|
|
||
| Sets *res* to `-x`. | ||
| Sets *res* to `-x`. Returns 0 if this operation was made exactly and 1 if | ||
| truncation occurred. |
There was a problem hiding this comment.
"truncation" is the wrong term (means rounding down) - should be "rounding".
| { | ||
| slong cx = *x; | ||
|
|
||
| if (!COEFF_IS_MPZ(*res) && (cx > COEFF_MIN / 2 && cx < COEFF_MAX / 2)) |
There was a problem hiding this comment.
These conditions are wrong. See _fmpz_add2_fast. Anyway, I think this function is not really needed; the compiler should be able to eliminate the redundant branches in _fmpz_add2_fast when passed the same operand twice.
There was a problem hiding this comment.
It has actually one property that I do not think the compiler can transform into, and that is that we now can allow cx to be up to size COEFF_MAX / 2 instead of COEFF_MAX / 4.
There was a problem hiding this comment.
_fmpz_add2_fast/_fmpz_sub2_fast need to work for values of c larger than +/- 1. I think it's confusing if _fmpz_2times_fast in contrast to the other functions only supports |c| <= 1.
In practice, it makes no difference to go up to COEFF_MAX / 2 instead of COEFF_MAX / 4 since such huge exponents are very rare. We only care about optimizing for small exponents here. Better to be uniform and conservative.
Considering this, I don't really think this extra function is needed.
Clearly these functions and their assumptions should be documented...
There was a problem hiding this comment.
Okay, will remove it then.
|
Did you do some profiling? |
I am done with the former part, with addition of adding
nn_sqr_2which simplifiesnn_mul_2x2by skipping oneumul_ppmm.I am going to add
arf_sqr_rnd_downand then implementarb_sqrfrom there.