[WIP] Split `arb_pow_fmpz_binexp` into a `ui` and `mpz` version and implement proper `arb_sqr` by albinahlback · Pull Request #396 · flintlib/arb

albinahlback · 2021-12-21T02:20:33Z

I am done with the former part, with addition of adding nn_sqr_2 which simplifies nn_mul_2x2 by skipping one umul_ppmm.

I am going to add arf_sqr_rnd_down and then implement arb_sqr from there.

And inline arb_pow_[fmpz_binexp/fmpz/ui].

This is going to be used for adding exponents when squaring

And fill some gaps in the documentation for arf.h.

fredrik-johansson · 2021-12-22T10:31:54Z

arb/sqr.c

+
+        mag_init(resr);
+        mag_fast_mul(resr, xm, arb_radref(x));
+        if (resr->exp < COEFF_MAX && resr->exp >= COEFF_MIN)


These checks are not needed; the exponent can be increment with risk of overflow.

I presume you meant without risk of overflow ;) Och god jul, Fredrik!

fredrik-johansson · 2021-12-22T10:32:20Z

arb/sqr.c

+
+        mag_init(resr);
+        mag_mul(resr, xm, arb_radref(x));
+        fmpz_add_ui(&(resr->exp), &(resr->exp), 1);


Use MAG_EXPREF

fredrik-johansson · 2021-12-22T10:33:53Z

arf.h

        ? arf_mul_rnd_down(z, x, y, prec)        \
        : arf_mul_rnd_any(z, x, y, prec, rnd))

+int arf_sqr_via_mpfr(arf_ptr res, arf_srcptr x, slong prec, arf_rnd_t rnd);


This function is pointless extra code; arf_mul_via_mpfr already does the job.

It removes one branching and removes two/three useless assignments/calculations. My thinking with this PR is to strip all the "unnecessary" stuff away from squaring in order to make it as fast as possible. I know it's pretty small improvement, but you do not think that it's worth it?

mul_via_mpfr is only used for products that take 1000+ cycles to compute so a few cycles here will not be measurable.

fredrik-johansson · 2021-12-22T10:36:00Z

doc/source/arf.rst


+.. macro:: nn_mul_2x1(r2, r1, r0, a1, a0, b0)
+
+    Sets `(r_2, r_1, r_0)` to `(a_1, a_0)` multiplied by `b_0`.


Might want to document that these are limbs. Also, I think these macros don't allow aliasing, which might be worth documenting.

fredrik-johansson · 2021-12-22T10:37:19Z

doc/source/arf.rst


-    Sets *res* to `-x`.
+    Sets *res* to `-x`. Returns 0 if this operation was made exactly and 1 if
+    truncation occurred.


"truncation" is the wrong term (means rounding down) - should be "rounding".

fredrik-johansson · 2021-12-22T10:41:20Z

mag.h

+{
+    slong cx = *x;
+
+    if (!COEFF_IS_MPZ(*res) && (cx > COEFF_MIN / 2 && cx < COEFF_MAX / 2))


These conditions are wrong. See _fmpz_add2_fast. Anyway, I think this function is not really needed; the compiler should be able to eliminate the redundant branches in _fmpz_add2_fast when passed the same operand twice.

It has actually one property that I do not think the compiler can transform into, and that is that we now can allow cx to be up to size COEFF_MAX / 2 instead of COEFF_MAX / 4.

_fmpz_add2_fast/_fmpz_sub2_fast need to work for values of c larger than +/- 1. I think it's confusing if _fmpz_2times_fast in contrast to the other functions only supports |c| <= 1.

In practice, it makes no difference to go up to COEFF_MAX / 2 instead of COEFF_MAX / 4 since such huge exponents are very rare. We only care about optimizing for small exponents here. Better to be uniform and conservative.

Considering this, I don't really think this extra function is needed.

Clearly these functions and their assumptions should be documented...

Okay, will remove it then.

fredrik-johansson · 2021-12-22T10:42:09Z

Did you do some profiling?

albinahlback added 12 commits December 20, 2021 22:28

Add nn_sqr_2 and arf_sqr_special

500c3ee

Fix faulty comment

0bc4896

Add test for nn_sqr_2

135a520

Split up pow_fmpz_binexp into a mpz and ui version

5f9770c

And inline arb_pow_[fmpz_binexp/fmpz/ui].

Add _fmpz_2times_fast for setting 2*x+c

37fba37

This is going to be used for adding exponents when squaring

Remove unneccessary declaration

363527b

Add arf_sqr_via_mpfr and arf_sqr_rnd_down

241252f

Remove sgnbit in arf_sqr_rnd_down

56201d2

Add proper arb_sqr

872863b

Add docstrings for the new functions

487051b

And fill some gaps in the documentation for arf.h.

Fix test for arb_sqr

efd8f3b

Fix arf_sqr_special and de-inline it

780afe8

fredrik-johansson reviewed Dec 22, 2021

View reviewed changes


		.. macro:: nn_mul_2x1(r2, r1, r0, a1, a0, b0)

		Sets `(r_2, r_1, r_0)` to `(a_1, a_0)` multiplied by `b_0`.

Conversation

albinahlback commented Dec 21, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fredrik-johansson commented Dec 22, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants