diff --git a/cue-fix-clang-tidy e172ebf b/cue-fix-clang-tidy e172ebf new file mode 100644 index 000000000..333a0b576 --- /dev/null +++ b/cue-fix-clang-tidy e172ebf @@ -0,0 +1,258 @@ + + SSUUMMMMAARRYY OOFF LLEESSSS CCOOMMMMAANNDDSS + + Commands marked with * may be preceded by a number, _N. + Notes in parentheses indicate the behavior if _N is given. + A key preceded by a caret indicates the Ctrl key; thus ^K is ctrl-K. + + h H Display this help. + q :q Q :Q ZZ Exit. + --------------------------------------------------------------------------- + + MMOOVVIINNGG + + e ^E j ^N CR * Forward one line (or _N lines). + y ^Y k ^K ^P * Backward one line (or _N lines). + f ^F ^V SPACE * Forward one window (or _N lines). + b ^B ESC-v * Backward one window (or _N lines). + z * Forward one window (and set window to _N). + w * Backward one window (and set window to _N). + ESC-SPACE * Forward one window, but don't stop at end-of-file. + d ^D * Forward one half-window (and set half-window to _N). + u ^U * Backward one half-window (and set half-window to _N). + ESC-) RightArrow * Right one half screen width (or _N positions). + ESC-( LeftArrow * Left one half screen width (or _N positions). + ESC-} ^RightArrow Right to last column displayed. + ESC-{ ^LeftArrow Left to first column. + F Forward forever; like "tail -f". + ESC-F Like F but stop when search pattern is found. + r ^R ^L Repaint screen. + R Repaint screen, discarding buffered input. + --------------------------------------------------- + Default "window" is the screen height. + Default "half-window" is half of the screen height. + --------------------------------------------------------------------------- + + SSEEAARRCCHHIINNGG + + /_p_a_t_t_e_r_n * Search forward for (_N-th) matching line. + ?_p_a_t_t_e_r_n * Search backward for (_N-th) matching line. + n * Repeat previous search (for _N-th occurrence). + N * Repeat previous search in reverse direction. + ESC-n * Repeat previous search, spanning files. + ESC-N * Repeat previous search, reverse dir. & spanning files. + ESC-u Undo (toggle) search highlighting. + ESC-U Clear search highlighting. + &_p_a_t_t_e_r_n * Display only matching lines. + --------------------------------------------------- + A search pattern may begin with one or more of: + ^N or ! Search for NON-matching lines. + ^E or * Search multiple files (pass thru END OF FILE). + ^F or @ Start search at FIRST file (for /) or last file (for ?). + ^K Highlight matches, but don't move (KEEP position). + ^R Don't use REGULAR EXPRESSIONS. + ^W WRAP search if no match found. + --------------------------------------------------------------------------- + + JJUUMMPPIINNGG + + g < ESC-< * Go to first line in file (or line _N). + G > ESC-> * Go to last line in file (or line _N). + p % * Go to beginning of file (or _N percent into file). + t * Go to the (_N-th) next tag. + T * Go to the (_N-th) previous tag. + { ( [ * Find close bracket } ) ]. + } ) ] * Find open bracket { ( [. + ESC-^F _<_c_1_> _<_c_2_> * Find close bracket _<_c_2_>. + ESC-^B _<_c_1_> _<_c_2_> * Find open bracket _<_c_1_>. + --------------------------------------------------- + Each "find close bracket" command goes forward to the close bracket + matching the (_N-th) open bracket in the top line. + Each "find open bracket" command goes backward to the open bracket + matching the (_N-th) close bracket in the bottom line. + + m_<_l_e_t_t_e_r_> Mark the current top line with . + M_<_l_e_t_t_e_r_> Mark the current bottom line with . + '_<_l_e_t_t_e_r_> Go to a previously marked position. + '' Go to the previous position. + ^X^X Same as '. + ESC-M_<_l_e_t_t_e_r_> Clear a mark. + --------------------------------------------------- + A mark is any upper-case or lower-case letter. + Certain marks are predefined: + ^ means beginning of the file + $ means end of the file + --------------------------------------------------------------------------- + + CCHHAANNGGIINNGG FFIILLEESS + + :e [_f_i_l_e] Examine a new file. + ^X^V Same as :e. + :n * Examine the (_N-th) next file from the command line. + :p * Examine the (_N-th) previous file from the command line. + :x * Examine the first (or _N-th) file from the command line. + :d Delete the current file from the command line list. + = ^G :f Print current file name. + --------------------------------------------------------------------------- + + MMIISSCCEELLLLAANNEEOOUUSS CCOOMMMMAANNDDSS + + -_<_f_l_a_g_> Toggle a command line option [see OPTIONS below]. + --_<_n_a_m_e_> Toggle a command line option, by name. + __<_f_l_a_g_> Display the setting of a command line option. + ___<_n_a_m_e_> Display the setting of an option, by name. + +_c_m_d Execute the less cmd each time a new file is examined. + + !_c_o_m_m_a_n_d Execute the shell command with $SHELL. + |XX_c_o_m_m_a_n_d Pipe file between current pos & mark XX to shell command. + s _f_i_l_e Save input to a file. + v Edit the current file with $VISUAL or $EDITOR. + V Print version number of "less". + --------------------------------------------------------------------------- + + OOPPTTIIOONNSS + + Most options may be changed either on the command line, + or from within less by using the - or -- command. + Options may be given in one of two forms: either a single + character preceded by a -, or a name preceded by --. + + -? ........ --help + Display help (from command line). + -a ........ --search-skip-screen + Search skips current screen. + -A ........ --SEARCH-SKIP-SCREEN + Search starts just after target line. + -b [_N] .... --buffers=[_N] + Number of buffers. + -B ........ --auto-buffers + Don't automatically allocate buffers for pipes. + -c ........ --clear-screen + Repaint by clearing rather than scrolling. + -d ........ --dumb + Dumb terminal. + -D xx_c_o_l_o_r . --color=xx_c_o_l_o_r + Set screen colors. + -e -E .... --quit-at-eof --QUIT-AT-EOF + Quit at end of file. + -f ........ --force + Force open non-regular files. + -F ........ --quit-if-one-screen + Quit if entire file fits on first screen. + -g ........ --hilite-search + Highlight only last match for searches. + -G ........ --HILITE-SEARCH + Don't highlight any matches for searches. + -h [_N] .... --max-back-scroll=[_N] + Backward scroll limit. + -i ........ --ignore-case + Ignore case in searches that do not contain uppercase. + -I ........ --IGNORE-CASE + Ignore case in all searches. + -j [_N] .... --jump-target=[_N] + Screen position of target lines. + -J ........ --status-column + Display a status column at left edge of screen. + -k [_f_i_l_e] . --lesskey-file=[_f_i_l_e] + Use a lesskey file. + -K ........ --quit-on-intr + Exit less in response to ctrl-C. + -L ........ --no-lessopen + Ignore the LESSOPEN environment variable. + -m -M .... --long-prompt --LONG-PROMPT + Set prompt style. + -n -N .... --line-numbers --LINE-NUMBERS + Don't use line numbers. + -o [_f_i_l_e] . --log-file=[_f_i_l_e] + Copy to log file (standard input only). + -O [_f_i_l_e] . --LOG-FILE=[_f_i_l_e] + Copy to log file (unconditionally overwrite). + -p [_p_a_t_t_e_r_n] --pattern=[_p_a_t_t_e_r_n] + Start at pattern (from command line). + -P [_p_r_o_m_p_t] --prompt=[_p_r_o_m_p_t] + Define new prompt. + -q -Q .... --quiet --QUIET --silent --SILENT + Quiet the terminal bell. + -r -R .... --raw-control-chars --RAW-CONTROL-CHARS + Output "raw" control characters. + -s ........ --squeeze-blank-lines + Squeeze multiple blank lines. + -S ........ --chop-long-lines + Chop (truncate) long lines rather than wrapping. + -t [_t_a_g] .. --tag=[_t_a_g] + Find a tag. + -T [_t_a_g_s_f_i_l_e] --tag-file=[_t_a_g_s_f_i_l_e] + Use an alternate tags file. + -u -U .... --underline-special --UNDERLINE-SPECIAL + Change handling of backspaces. + -V ........ --version + Display the version number of "less". + -w ........ --hilite-unread + Highlight first new line after forward-screen. + -W ........ --HILITE-UNREAD + Highlight first new line after any forward movement. + -x [_N[,...]] --tabs=[_N[,...]] + Set tab stops. + -X ........ --no-init + Don't use termcap init/deinit strings. + -y [_N] .... --max-forw-scroll=[_N] + Forward scroll limit. + -z [_N] .... --window=[_N] + Set size of window. + -" [_c[_c]] . --quotes=[_c[_c]] + Set shell quote characters. + -~ ........ --tilde + Don't display tildes after end of file. + -# [_N] .... --shift=[_N] + Set horizontal scroll amount (0 = one half screen width). + --file-size + Automatically determine the size of the input file. + --follow-name + The F command changes files if the input file is renamed. + --incsearch + Search file as each pattern character is typed in. + --line-num-width=N + Set the width of the -N line number field to N characters. + --mouse + Enable mouse input. + --no-keypad + Don't send termcap keypad init/deinit strings. + --no-histdups + Remove duplicates from command history. + --rscroll=C + Set the character used to mark truncated lines. + --save-marks + Retain marks across invocations of less. + --status-col-width=N + Set the width of the -J status column to N characters. + --use-backslash + Subsequent options use backslash as escape char. + --use-color + Enables colored text. + --wheel-lines=N + Each click of the mouse wheel moves N lines. + + + --------------------------------------------------------------------------- + + LLIINNEE EEDDIITTIINNGG + + These keys can be used to edit text being entered + on the "command line" at the bottom of the screen. + + RightArrow ..................... ESC-l ... Move cursor right one character. + LeftArrow ...................... ESC-h ... Move cursor left one character. + ctrl-RightArrow ESC-RightArrow ESC-w ... Move cursor right one word. + ctrl-LeftArrow ESC-LeftArrow ESC-b ... Move cursor left one word. + HOME ........................... ESC-0 ... Move cursor to start of line. + END ............................ ESC-$ ... Move cursor to end of line. + BACKSPACE ................................ Delete char to left of cursor. + DELETE ......................... ESC-x ... Delete char under cursor. + ctrl-BACKSPACE ESC-BACKSPACE ........... Delete word to left of cursor. + ctrl-DELETE .... ESC-DELETE .... ESC-X ... Delete word under cursor. + ctrl-U ......... ESC (MS-DOS only) ....... Delete entire line. + UpArrow ........................ ESC-k ... Retrieve previous command line. + DownArrow ...................... ESC-j ... Retrieve next command line. + TAB ...................................... Complete filename & cycle. + SHIFT-TAB ...................... ESC-TAB Complete filename & reverse cycle. + ctrl-L ................................... Complete filename, list all. diff --git a/src/ailego/math/euclidean_distance_matrix.h b/src/ailego/math/euclidean_distance_matrix.h index e77409360..969c5532c 100644 --- a/src/ailego/math/euclidean_distance_matrix.h +++ b/src/ailego/math/euclidean_distance_matrix.h @@ -480,8 +480,6 @@ struct EuclideanDistanceMatrix { static void Compute(const ValueType *m, const ValueType *q, size_t dim, float *out); }; - - //-------------------------------------------------- // Sparse //-------------------------------------------------- diff --git a/src/ailego/math/euclidean_distance_matrix_fp16_dispatch.cc b/src/ailego/math/euclidean_distance_matrix_fp16_dispatch.cc index fb145265e..3b592bd1c 100644 --- a/src/ailego/math/euclidean_distance_matrix_fp16_dispatch.cc +++ b/src/ailego/math/euclidean_distance_matrix_fp16_dispatch.cc @@ -23,6 +23,12 @@ float SquaredEuclideanDistanceFp16NEON(const Float16 *lhs, const Float16 *rhs, size_t size); #endif +#if defined(__riscv_zvfh) +float SquaredEuclideanDistanceRVV(const Float16 *lhs, const Float16 *rhs, + size_t size); +float EuclideanDistanceRVV(const Float16 *lhs, const Float16 *rhs, size_t size); +#endif + #if defined(__AVX512FP16__) float SquaredEuclideanDistanceFp16AVX512FP16(const Float16 *lhs, const Float16 *rhs, size_t size); @@ -41,11 +47,19 @@ float SquaredEuclideanDistanceFp16AVX(const Float16 *lhs, const Float16 *rhs, float SquaredEuclideanDistanceFp16Scalar(const Float16 *lhs, const Float16 *rhs, size_t size); +#if (defined(__F16C__) && defined(__AVX__)) || \ + (defined(__ARM_NEON) && defined(__aarch64__)) || defined(__riscv_zvfh) //! Compute the distance between matrix and query (FP16, M=1, N=1) void SquaredEuclideanDistanceMatrix::Compute(const ValueType *m, const ValueType *q, size_t dim, float *out) { +#if defined(__riscv_zvfh) + if (zvec::ailego::internal::CpuFeatures::static_flags_.RISCV_ZVFH) { + *out = SquaredEuclideanDistanceRVV(m, q, dim); + return; + } +#endif #if defined(__ARM_NEON) *out = SquaredEuclideanDistanceFp16NEON(m, q, dim); #else @@ -69,7 +83,6 @@ void SquaredEuclideanDistanceMatrix::Compute(const ValueType *m, } #endif *out = SquaredEuclideanDistanceFp16Scalar(m, q, dim); - #endif //__ARM_NEON } @@ -77,9 +90,23 @@ void SquaredEuclideanDistanceMatrix::Compute(const ValueType *m, void EuclideanDistanceMatrix::Compute(const ValueType *m, const ValueType *q, size_t dim, float *out) { +#if defined(__riscv_zvfh) + if (zvec::ailego::internal::CpuFeatures::static_flags_.RISCV_ZVFH) { + *out = EuclideanDistanceRVV(m, q, dim); + return; + } +#if !defined(__ARM_NEON) && !defined(__AVX__) + float sum = 0.0f; + for (size_t i = 0; i < dim; ++i) { + sum += MathHelper::SquaredDifference(m[i], q[i]); + } + *out = std::sqrt(sum); + return; +#endif +#endif SquaredEuclideanDistanceMatrix::Compute(m, q, dim, out); *out = std::sqrt(*out); } } // namespace ailego -} // namespace zvec \ No newline at end of file +} // namespace zvec diff --git a/src/ailego/math/euclidean_distance_matrix_fp16_rvv.cc b/src/ailego/math/euclidean_distance_matrix_fp16_rvv.cc new file mode 100644 index 000000000..f88603d2d --- /dev/null +++ b/src/ailego/math/euclidean_distance_matrix_fp16_rvv.cc @@ -0,0 +1,64 @@ +// Copyright 2025-present the zvec project +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include +#include +#include "euclidean_distance_matrix.h" + +namespace zvec { +namespace ailego { + +#if defined(__riscv_zvfh) +namespace { + +static inline float SquaredEuclideanDistanceRVVImpl(const Float16 *lhs, + const Float16 *rhs, + size_t size) { + const _Float16 *lhs_fp16 = reinterpret_cast(lhs); + const _Float16 *rhs_fp16 = reinterpret_cast(rhs); + const size_t vlmax = __riscv_vsetvlmax_e16m4(); + vfloat32m8_t v_sum = __riscv_vfmv_v_f_f32m8(0.0f, vlmax); + + while (size != 0) { + const size_t vl = __riscv_vsetvl_e16m4(size); + vfloat16m4_t v_lhs = __riscv_vle16_v_f16m4(lhs_fp16, vl); + vfloat16m4_t v_rhs = __riscv_vle16_v_f16m4(rhs_fp16, vl); + vfloat32m8_t v_diff = __riscv_vfwsub_vv_f32m8(v_lhs, v_rhs, vl); + v_sum = __riscv_vfmacc_vv_f32m8_tu(v_sum, v_diff, v_diff, vl); + lhs_fp16 += vl; + rhs_fp16 += vl; + size -= vl; + } + + vfloat32m1_t v_zero = __riscv_vfmv_v_f_f32m1(0.0f, 1); + vfloat32m1_t v_red = __riscv_vfredusum_vs_f32m8_f32m1(v_sum, v_zero, vlmax); + return __riscv_vfmv_f_s_f32m1_f32(v_red); +} + +} // namespace + +float SquaredEuclideanDistanceRVV(const Float16 *lhs, const Float16 *rhs, + size_t size) { + return SquaredEuclideanDistanceRVVImpl(lhs, rhs, size); +} + +float EuclideanDistanceRVV(const Float16 *lhs, const Float16 *rhs, + size_t size) { + return std::sqrt(SquaredEuclideanDistanceRVVImpl(lhs, rhs, size)); +} + +#endif // __riscv_zvfh + +} // namespace ailego +} // namespace zvec diff --git a/src/ailego/math/euclidean_distance_matrix_fp32_dispatch.cc b/src/ailego/math/euclidean_distance_matrix_fp32_dispatch.cc index cc3044389..76b679a2b 100644 --- a/src/ailego/math/euclidean_distance_matrix_fp32_dispatch.cc +++ b/src/ailego/math/euclidean_distance_matrix_fp32_dispatch.cc @@ -18,6 +18,12 @@ namespace zvec { namespace ailego { +#if defined(__riscv_vector) +float SquaredEuclideanDistanceRVV(const float *lhs, const float *rhs, + size_t size); +float EuclideanDistanceRVV(const float *lhs, const float *rhs, size_t size); +#endif + #if defined(__ARM_NEON) void SquaredEuclideanDistanceFp32NEON(const float *lhs, const float *rhs, size_t size, float *out); @@ -44,20 +50,31 @@ float SquaredEuclideanDistanceFp32Scalar(const float *lhs, const float *rhs, //----------------------------------------------------------- // SquaredEuclideanDistance //----------------------------------------------------------- +#if defined(__SSE__) || defined(__ARM_NEON) || defined(__riscv_vector) //! Compute the distance between matrix and query (FP32, M=1, N=1) void SquaredEuclideanDistanceMatrix::Compute(const ValueType *m, const ValueType *q, size_t dim, float *out) { +#if defined(__riscv_vector) + if (zvec::ailego::internal::CpuFeatures::static_flags_.RISCV_VECTOR) { + *out = SquaredEuclideanDistanceRVV(m, q, dim); + return; + } +#endif // __riscv_vector + #if defined(__ARM_NEON) SquaredEuclideanDistanceFp32NEON(m, q, dim, out); -#else + return; +#endif // __ARM_NEON + #if defined(__AVX512F__) if (zvec::ailego::internal::CpuFeatures::static_flags_.AVX512F) { *out = SquaredEuclideanDistanceFp32AVX512(m, q, dim); return; } #endif // __AVX512F__ + #if defined(__AVX__) if (zvec::ailego::internal::CpuFeatures::static_flags_.AVX) { *out = SquaredEuclideanDistanceFp32AVX(m, q, dim); @@ -71,9 +88,10 @@ void SquaredEuclideanDistanceMatrix::Compute(const ValueType *m, return; } #endif // __SSE__ + *out = SquaredEuclideanDistanceFp32Scalar(m, q, dim); -#endif // __ARM_NEON } +#endif // __SSE__ || __ARM_NEON || __riscv_vector //----------------------------------------------------------- // EuclideanDistance @@ -82,9 +100,16 @@ void SquaredEuclideanDistanceMatrix::Compute(const ValueType *m, void EuclideanDistanceMatrix::Compute(const ValueType *m, const ValueType *q, size_t dim, float *out) { +#if defined(__riscv_vector) + if (zvec::ailego::internal::CpuFeatures::static_flags_.RISCV_VECTOR) { + *out = EuclideanDistanceRVV(m, q, dim); + return; + } +#endif // __riscv_vector + SquaredEuclideanDistanceMatrix::Compute(m, q, dim, out); *out = std::sqrt(*out); } } // namespace ailego -} // namespace zvec \ No newline at end of file +} // namespace zvec diff --git a/src/ailego/math/euclidean_distance_matrix_fp32_rvv.cc b/src/ailego/math/euclidean_distance_matrix_fp32_rvv.cc new file mode 100644 index 000000000..c4833149d --- /dev/null +++ b/src/ailego/math/euclidean_distance_matrix_fp32_rvv.cc @@ -0,0 +1,64 @@ +// Copyright 2025-present the zvec project +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include +#include +#include "distance_matrix_euclidean_utility.i" +#include "euclidean_distance_matrix.h" + +namespace zvec { +namespace ailego { + +#if defined(__riscv_vector) +namespace { + +static inline float SquaredEuclideanDistanceRVVImpl(const float *lhs, + const float *rhs, + size_t size) { + const size_t vlmax = __riscv_vsetvlmax_e32m8(); + vfloat32m8_t v_sum = __riscv_vfmv_v_f_f32m8(0.0f, vlmax); + + while (size != 0) { + size_t vl = __riscv_vsetvl_e32m8(size); + vfloat32m8_t v_lhs = __riscv_vle32_v_f32m8(lhs, vl); + vfloat32m8_t v_rhs = __riscv_vle32_v_f32m8(rhs, vl); + vfloat32m8_t v_d = __riscv_vfsub_vv_f32m8(v_lhs, v_rhs, vl); + v_sum = __riscv_vfmacc_vv_f32m8_tu(v_sum, v_d, v_d, vl); + lhs += vl; + rhs += vl; + size -= vl; + } + + vfloat32m1_t v_zero = __riscv_vfmv_v_f_f32m1(0.0f, 1); + vfloat32m1_t v_red = __riscv_vfredusum_vs_f32m8_f32m1(v_sum, v_zero, vlmax); + return __riscv_vfmv_f_s_f32m1_f32(v_red); +} + +} // namespace + +//! Squared Euclidean Distance +float SquaredEuclideanDistanceRVV(const float *lhs, const float *rhs, + size_t size) { + return SquaredEuclideanDistanceRVVImpl(lhs, rhs, size); +} + +//! Euclidean Distance +float EuclideanDistanceRVV(const float *lhs, const float *rhs, size_t size) { + return std::sqrt(SquaredEuclideanDistanceRVVImpl(lhs, rhs, size)); +} + +#endif // __riscv_vector + +} // namespace ailego +} // namespace zvec diff --git a/src/ailego/math/euclidean_distance_matrix_int8_dispatch.cc b/src/ailego/math/euclidean_distance_matrix_int8_dispatch.cc index d64ca1efc..b5705d528 100644 --- a/src/ailego/math/euclidean_distance_matrix_int8_dispatch.cc +++ b/src/ailego/math/euclidean_distance_matrix_int8_dispatch.cc @@ -18,6 +18,12 @@ namespace zvec { namespace ailego { +#if defined(__riscv_vector) +float SquaredEuclideanDistanceRVV(const int8_t *lhs, const int8_t *rhs, + size_t size); +float EuclideanDistanceRVV(const int8_t *lhs, const int8_t *rhs, size_t size); +#endif + #if defined(__AVX2__) float SquaredEuclideanDistanceInt8AVX2(const int8_t *lhs, const int8_t *rhs, size_t size); @@ -36,6 +42,12 @@ void SquaredEuclideanDistanceMatrix::Compute(const ValueType *m, const ValueType *q, size_t dim, float *out) { +#if defined(__riscv_vector) + if (zvec::ailego::internal::CpuFeatures::static_flags_.RISCV_VECTOR) { + *out = SquaredEuclideanDistanceRVV(m, q, dim); + return; + } +#endif #if defined(__AVX2__) if (zvec::ailego::internal::CpuFeatures::static_flags_.AVX2) { *out = SquaredEuclideanDistanceInt8AVX2(m, q, dim); @@ -57,9 +69,15 @@ void SquaredEuclideanDistanceMatrix::Compute(const ValueType *m, void EuclideanDistanceMatrix::Compute(const ValueType *m, const ValueType *q, size_t dim, float *out) { +#if defined(__riscv_vector) + if (zvec::ailego::internal::CpuFeatures::static_flags_.RISCV_VECTOR) { + *out = EuclideanDistanceRVV(m, q, dim); + return; + } +#endif SquaredEuclideanDistanceMatrix::Compute(m, q, dim, out); *out = std::sqrt(*out); } } // namespace ailego -} // namespace zvec \ No newline at end of file +} // namespace zvec diff --git a/src/ailego/math/euclidean_distance_matrix_int8_rvv.cc b/src/ailego/math/euclidean_distance_matrix_int8_rvv.cc new file mode 100644 index 000000000..31ebe716a --- /dev/null +++ b/src/ailego/math/euclidean_distance_matrix_int8_rvv.cc @@ -0,0 +1,61 @@ +// Copyright 2025-present the zvec project +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include +#include +#include "euclidean_distance_matrix.h" + +namespace zvec { +namespace ailego { + +#if defined(__riscv_vector) +namespace { + +static inline float SquaredEuclideanDistanceRVVImpl(const int8_t *lhs, + const int8_t *rhs, + size_t size) { + const size_t vlmax = __riscv_vsetvlmax_e8m2(); + vint32m8_t v_sum = __riscv_vmv_v_x_i32m8(0, vlmax); + + while (size != 0) { + const size_t vl = __riscv_vsetvl_e8m2(size); + vint8m2_t v_lhs = __riscv_vle8_v_i8m2(lhs, vl); + vint8m2_t v_rhs = __riscv_vle8_v_i8m2(rhs, vl); + vint16m4_t v_d = __riscv_vwsub_vv_i16m4(v_lhs, v_rhs, vl); + v_sum = __riscv_vwmacc_vv_i32m8_tu(v_sum, v_d, v_d, vl); + lhs += vl; + rhs += vl; + size -= vl; + } + + vint32m1_t v_zero = __riscv_vmv_v_x_i32m1(0, 1); + vint32m1_t v_red = __riscv_vredsum_vs_i32m8_i32m1(v_sum, v_zero, vlmax); + return static_cast(__riscv_vmv_x_s_i32m1_i32(v_red)); +} + +} // namespace + +float SquaredEuclideanDistanceRVV(const int8_t *lhs, const int8_t *rhs, + size_t size) { + return SquaredEuclideanDistanceRVVImpl(lhs, rhs, size); +} + +float EuclideanDistanceRVV(const int8_t *lhs, const int8_t *rhs, size_t size) { + return std::sqrt(SquaredEuclideanDistanceRVVImpl(lhs, rhs, size)); +} + +#endif // __riscv_vector + +} // namespace ailego +} // namespace zvec diff --git a/src/ailego/math/inner_product_matrix_fp16_dispatch.cc b/src/ailego/math/inner_product_matrix_fp16_dispatch.cc index 3c46bc32b..d45934e3c 100644 --- a/src/ailego/math/inner_product_matrix_fp16_dispatch.cc +++ b/src/ailego/math/inner_product_matrix_fp16_dispatch.cc @@ -27,6 +27,11 @@ float MinusInnerProductFp16NEON(const Float16 *lhs, const Float16 *rhs, size_t size); #endif +#if defined(__riscv_zvfh) +float InnerProductRVV(const Float16 *lhs, const Float16 *rhs, size_t size); +float MinusInnerProductRVV(const Float16 *lhs, const Float16 *rhs, size_t size); +#endif + #if defined(__AVX__) float InnerProductFp16AVX(const Float16 *lhs, const Float16 *rhs, size_t size); float MinusInnerProductFp16AVX(const Float16 *lhs, const Float16 *rhs, @@ -56,6 +61,13 @@ float MinusInnerProductFp16Scalar(const Float16 *lhs, const Float16 *rhs, void InnerProductMatrix::Compute(const ValueType *m, const ValueType *q, size_t dim, float *out) { +#if defined(__riscv_zvfh) + if (zvec::ailego::internal::CpuFeatures::static_flags_.RISCV_ZVFH) { + *out = InnerProductRVV(m, q, dim); + return; + } +#endif // __riscv_zvfh + #if defined(__ARM_NEON) *out = InnerProductFp16NEON(m, q, dim); #else @@ -64,28 +76,34 @@ void InnerProductMatrix::Compute(const ValueType *m, *out = InnerProductFp16AVX512FP16(m, q, dim); return; } -#endif //__AVX512FP16__ +#endif // __AVX512FP16__ #if defined(__AVX512F__) if (zvec::ailego::internal::CpuFeatures::static_flags_.AVX512F) { *out = InnerProductFp16AVX512(m, q, dim); return; } -#endif //__AVX512F__ +#endif // __AVX512F__ #if defined(__AVX__) if (zvec::ailego::internal::CpuFeatures::static_flags_.AVX) { *out = InnerProductFp16AVX(m, q, dim); return; } -#endif //__AVX__ +#endif // __AVX__ *out = InnerProductFp16Scalar(m, q, dim); - -#endif //__ARM_NEON +#endif // __ARM_NEON } //! Compute the distance between matrix and query (FP16, M=1, N=1) void MinusInnerProductMatrix::Compute(const ValueType *m, const ValueType *q, size_t dim, float *out) { +#if defined(__riscv_zvfh) + if (zvec::ailego::internal::CpuFeatures::static_flags_.RISCV_ZVFH) { + *out = MinusInnerProductRVV(m, q, dim); + return; + } +#endif // __riscv_zvfh + #if defined(__ARM_NEON) *out = MinusInnerProductFp16NEON(m, q, dim); #else @@ -94,23 +112,22 @@ void MinusInnerProductMatrix::Compute(const ValueType *m, *out = MinusInnerProductFp16AVX512FP16(m, q, dim); return; } -#endif //__AVX512FP16__ +#endif // __AVX512FP16__ #if defined(__AVX512F__) if (zvec::ailego::internal::CpuFeatures::static_flags_.AVX512F) { *out = MinusInnerProductFp16AVX512(m, q, dim); return; } -#endif //__AVX512F__ +#endif // __AVX512F__ #if defined(__AVX__) if (zvec::ailego::internal::CpuFeatures::static_flags_.AVX) { *out = MinusInnerProductFp16AVX(m, q, dim); return; } -#endif //__AVX__ +#endif // __AVX__ *out = MinusInnerProductFp16Scalar(m, q, dim); - -#endif //__ARM_NEON +#endif // __ARM_NEON } //-------------------------------------------------- @@ -123,7 +140,7 @@ float InnerProductSparseInSegmentFp16AVX512FP16(uint32_t m_sparse_count, uint32_t q_sparse_count, const uint16_t *q_sparse_index, const Float16 *q_sparse_value); -#endif //__AVX512FP16__ +#endif // __AVX512FP16__ #if defined(__AVX__) float InnerProductSparseInSegmentFp16AVX(uint32_t m_sparse_count, @@ -132,7 +149,7 @@ float InnerProductSparseInSegmentFp16AVX(uint32_t m_sparse_count, uint32_t q_sparse_count, const uint16_t *q_sparse_index, const Float16 *q_sparse_value); -#endif //__AVX__ +#endif // __AVX__ float InnerProductSparseInSegmentFp16Scalar(uint32_t m_sparse_count, const uint16_t *m_sparse_index, @@ -162,14 +179,14 @@ float ComputeInnerProductSparseInSegmentFp16(uint32_t m_sparse_count, m_sparse_count, m_sparse_index, m_sparse_value, q_sparse_count, q_sparse_index, q_sparse_value); } -#endif +#endif // __AVX512FP16__ #if defined(__AVX__) if (zvec::ailego::internal::CpuFeatures::static_flags_.AVX) { return InnerProductSparseInSegmentFp16AVX(m_sparse_count, m_sparse_index, m_sparse_value, q_sparse_count, q_sparse_index, q_sparse_value); } -#endif +#endif // __AVX__ return InnerProductSparseInSegmentFp16Scalar(m_sparse_count, m_sparse_index, m_sparse_value, q_sparse_count, q_sparse_index, q_sparse_value); diff --git a/src/ailego/math/inner_product_matrix_fp16_rvv.cc b/src/ailego/math/inner_product_matrix_fp16_rvv.cc new file mode 100644 index 000000000..2f059a0f3 --- /dev/null +++ b/src/ailego/math/inner_product_matrix_fp16_rvv.cc @@ -0,0 +1,60 @@ +// Copyright 2025-present the zvec project +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include +#include "inner_product_matrix.h" + +namespace zvec { +namespace ailego { + +#if defined(__riscv_zvfh) +namespace { + +static inline float InnerProductRVVImpl(const Float16 *lhs, const Float16 *rhs, + size_t size) { + const _Float16 *lhs_fp16 = reinterpret_cast(lhs); + const _Float16 *rhs_fp16 = reinterpret_cast(rhs); + const size_t vlmax = __riscv_vsetvlmax_e16m4(); + vfloat32m8_t v_sum = __riscv_vfmv_v_f_f32m8(0.0f, vlmax); + + while (size != 0) { + const size_t vl = __riscv_vsetvl_e16m4(size); + vfloat16m4_t v_lhs = __riscv_vle16_v_f16m4(lhs_fp16, vl); + vfloat16m4_t v_rhs = __riscv_vle16_v_f16m4(rhs_fp16, vl); + v_sum = __riscv_vfwmacc_vv_f32m8_tu(v_sum, v_lhs, v_rhs, vl); + lhs_fp16 += vl; + rhs_fp16 += vl; + size -= vl; + } + + vfloat32m1_t v_zero = __riscv_vfmv_v_f_f32m1(0.0f, 1); + vfloat32m1_t v_red = __riscv_vfredusum_vs_f32m8_f32m1(v_sum, v_zero, vlmax); + return __riscv_vfmv_f_s_f32m1_f32(v_red); +} + +} // namespace + +float InnerProductRVV(const Float16 *lhs, const Float16 *rhs, size_t size) { + return InnerProductRVVImpl(lhs, rhs, size); +} + +float MinusInnerProductRVV(const Float16 *lhs, const Float16 *rhs, + size_t size) { + return -InnerProductRVVImpl(lhs, rhs, size); +} + +#endif // __riscv_zvfh + +} // namespace ailego +} // namespace zvec diff --git a/src/ailego/math/inner_product_matrix_fp32_dispatch.cc b/src/ailego/math/inner_product_matrix_fp32_dispatch.cc index 8b289b6e6..90ef826e7 100644 --- a/src/ailego/math/inner_product_matrix_fp32_dispatch.cc +++ b/src/ailego/math/inner_product_matrix_fp32_dispatch.cc @@ -17,9 +17,15 @@ namespace zvec { namespace ailego { + //-------------------------------------------------- // Dense //-------------------------------------------------- +#if defined(__riscv_vector) +float InnerProductRVV(const float *lhs, const float *rhs, size_t size); +float MinusInnerProductRVV(const float *lhs, const float *rhs, size_t size); +#endif + #if defined(__ARM_NEON) float InnerProductFp32NEON(const float *lhs, const float *rhs, size_t size); float MinusInnerProductFp32NEON(const float *lhs, const float *rhs, @@ -49,6 +55,13 @@ float MinusInnerProductFp32Scalar(const float *lhs, const float *rhs, //! Compute the distance between matrix and query (FP32, M=1, N=1) void InnerProductMatrix::Compute(const float *m, const float *q, size_t dim, float *out) { +#if defined(__riscv_vector) + if (zvec::ailego::internal::CpuFeatures::static_flags_.RISCV_VECTOR) { + *out = InnerProductRVV(m, q, dim); + return; + } +#endif // __riscv_vector + #if defined(__ARM_NEON) *out = InnerProductFp32NEON(m, q, dim); #else @@ -72,6 +85,7 @@ void InnerProductMatrix::Compute(const float *m, const float *q, return; } #endif // __SSE__ + *out = InnerProductFp32Scalar(m, q, dim); #endif // __ARM_NEON } @@ -80,6 +94,13 @@ void InnerProductMatrix::Compute(const float *m, const float *q, void MinusInnerProductMatrix::Compute(const float *m, const float *q, size_t dim, float *out) { +#if defined(__riscv_vector) + if (zvec::ailego::internal::CpuFeatures::static_flags_.RISCV_VECTOR) { + *out = MinusInnerProductRVV(m, q, dim); + return; + } +#endif // __riscv_vector + #if defined(__ARM_NEON) *out = MinusInnerProductFp32NEON(m, q, dim); #else @@ -103,6 +124,7 @@ void MinusInnerProductMatrix::Compute(const float *m, return; } #endif // __SSE__ + *out = MinusInnerProductFp32Scalar(m, q, dim); #endif // __ARM_NEON } @@ -118,6 +140,7 @@ float InnerProductSparseInSegmentFp32SSE(uint32_t m_sparse_count, const uint16_t *q_sparse_index, const float *q_sparse_value); #endif + float InnerProductSparseInSegmentFp32Scalar(uint32_t m_sparse_count, const uint16_t *m_sparse_index, const float *m_sparse_value, @@ -147,9 +170,11 @@ float ComputeInnerProductSparseInSegmentFp32(uint32_t m_sparse_count, q_sparse_index, q_sparse_value); } #endif + return InnerProductSparseInSegmentFp32Scalar(m_sparse_count, m_sparse_index, m_sparse_value, q_sparse_count, q_sparse_index, q_sparse_value); } + } // namespace ailego } // namespace zvec diff --git a/src/ailego/math/inner_product_matrix_fp32_rvv.cc b/src/ailego/math/inner_product_matrix_fp32_rvv.cc new file mode 100644 index 000000000..c942f6871 --- /dev/null +++ b/src/ailego/math/inner_product_matrix_fp32_rvv.cc @@ -0,0 +1,59 @@ +// Copyright 2025-present the zvec project +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include +#include "inner_product_matrix.h" + +namespace zvec { +namespace ailego { + +#if defined(__riscv_vector) +namespace { + +static inline float InnerProductRVVImpl(const float *lhs, const float *rhs, + size_t size) { + const size_t vlmax = __riscv_vsetvlmax_e32m8(); + vfloat32m8_t v_sum = __riscv_vfmv_v_f_f32m8(0.0f, vlmax); + + while (size != 0) { + size_t vl = __riscv_vsetvl_e32m8(size); + vfloat32m8_t v_lhs = __riscv_vle32_v_f32m8(lhs, vl); + vfloat32m8_t v_rhs = __riscv_vle32_v_f32m8(rhs, vl); + v_sum = __riscv_vfmacc_vv_f32m8_tu(v_sum, v_lhs, v_rhs, vl); + lhs += vl; + rhs += vl; + size -= vl; + } + + vfloat32m1_t v_zero = __riscv_vfmv_v_f_f32m1(0.0f, 1); + vfloat32m1_t v_red = __riscv_vfredusum_vs_f32m8_f32m1(v_sum, v_zero, vlmax); + return __riscv_vfmv_f_s_f32m1_f32(v_red); +} + +} // namespace + +//! Inner Product +float InnerProductRVV(const float *lhs, const float *rhs, size_t size) { + return InnerProductRVVImpl(lhs, rhs, size); +} + +//! Minus Inner Product +float MinusInnerProductRVV(const float *lhs, const float *rhs, size_t size) { + return -InnerProductRVVImpl(lhs, rhs, size); +} + +#endif // __riscv_vector + +} // namespace ailego +} // namespace zvec diff --git a/src/ailego/math/inner_product_matrix_int8_dispatch.cc b/src/ailego/math/inner_product_matrix_int8_dispatch.cc index d2faac29e..50516cd73 100644 --- a/src/ailego/math/inner_product_matrix_int8_dispatch.cc +++ b/src/ailego/math/inner_product_matrix_int8_dispatch.cc @@ -21,6 +21,11 @@ namespace ailego { //-------------------------------------------------- // Dense //-------------------------------------------------- +#if defined(__riscv_vector) +float InnerProductRVV(const int8_t *lhs, const int8_t *rhs, size_t size); +float MinusInnerProductRVV(const int8_t *lhs, const int8_t *rhs, size_t size); +#endif + #if defined(__AVX2__) float InnerProductInt8AVX2(const int8_t *lhs, const int8_t *rhs, size_t size); float MinusInnerProductInt8AVX2(const int8_t *lhs, const int8_t *rhs, @@ -39,6 +44,13 @@ float MinusInnerProductInt8Scalar(const int8_t *m, const int8_t *q, size_t dim); //! Compute the distance between matrix and query (INT8, M=1, N=1) void InnerProductMatrix::Compute(const int8_t *m, const int8_t *q, size_t dim, float *out) { +#if defined(__riscv_vector) + if (zvec::ailego::internal::CpuFeatures::static_flags_.RISCV_VECTOR) { + *out = InnerProductRVV(m, q, dim); + return; + } +#endif // __riscv_vector + #if defined(__AVX2__) if (zvec::ailego::internal::CpuFeatures::static_flags_.AVX2) { *out = InnerProductInt8AVX2(m, q, dim); @@ -51,8 +63,7 @@ void InnerProductMatrix::Compute(const int8_t *m, const int8_t *q, *out = InnerProductInt8SSE(m, q, dim); return; } - -#endif //__SSE4_1__ +#endif // __SSE4_1__ *out = InnerProductInt8Scalar(m, q, dim); } @@ -61,6 +72,13 @@ void InnerProductMatrix::Compute(const int8_t *m, const int8_t *q, void MinusInnerProductMatrix::Compute(const int8_t *m, const int8_t *q, size_t dim, float *out) { +#if defined(__riscv_vector) + if (zvec::ailego::internal::CpuFeatures::static_flags_.RISCV_VECTOR) { + *out = MinusInnerProductRVV(m, q, dim); + return; + } +#endif // __riscv_vector + #if defined(__AVX2__) if (zvec::ailego::internal::CpuFeatures::static_flags_.AVX2) { *out = MinusInnerProductInt8AVX2(m, q, dim); @@ -73,7 +91,7 @@ void MinusInnerProductMatrix::Compute(const int8_t *m, *out = MinusInnerProductInt8SSE(m, q, dim); return; } -#endif //__SSE4_1__ +#endif // __SSE4_1__ *out = MinusInnerProductInt8Scalar(m, q, dim); } diff --git a/src/ailego/math/inner_product_matrix_int8_rvv.cc b/src/ailego/math/inner_product_matrix_int8_rvv.cc new file mode 100644 index 000000000..4bb97006c --- /dev/null +++ b/src/ailego/math/inner_product_matrix_int8_rvv.cc @@ -0,0 +1,59 @@ +// Copyright 2025-present the zvec project +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include +#include "inner_product_matrix.h" + +namespace zvec { +namespace ailego { + +#if defined(__riscv_vector) +namespace { + +static inline float InnerProductRVVImpl(const int8_t *lhs, const int8_t *rhs, + size_t size) { + const size_t vlmax = __riscv_vsetvlmax_e8m2(); + vint32m8_t v_sum = __riscv_vmv_v_x_i32m8(0, vlmax); + + while (size != 0) { + const size_t vl = __riscv_vsetvl_e8m2(size); + const vint8m2_t v_lhs8 = __riscv_vle8_v_i8m2(lhs, vl); + const vint8m2_t v_rhs8 = __riscv_vle8_v_i8m2(rhs, vl); + const vint16m4_t v_lhs16 = __riscv_vsext_vf2_i16m4(v_lhs8, vl); + const vint16m4_t v_rhs16 = __riscv_vsext_vf2_i16m4(v_rhs8, vl); + v_sum = __riscv_vwmacc_vv_i32m8_tu(v_sum, v_lhs16, v_rhs16, vl); + lhs += vl; + rhs += vl; + size -= vl; + } + + const vint32m1_t v_zero = __riscv_vmv_v_x_i32m1(0, 1); + const vint32m1_t v_red = __riscv_vredsum_vs_i32m8_i32m1(v_sum, v_zero, vlmax); + return static_cast(__riscv_vmv_x_s_i32m1_i32(v_red)); +} + +} // namespace + +float InnerProductRVV(const int8_t *lhs, const int8_t *rhs, size_t size) { + return InnerProductRVVImpl(lhs, rhs, size); +} + +float MinusInnerProductRVV(const int8_t *lhs, const int8_t *rhs, size_t size) { + return -InnerProductRVVImpl(lhs, rhs, size); +} + +#endif // __riscv_vector + +} // namespace ailego +} // namespace zvec diff --git a/src/ailego/math/mips_euclidean_distance_matrix_fp16_dispatch.cc b/src/ailego/math/mips_euclidean_distance_matrix_fp16_dispatch.cc index 8e40563cf..c537d2793 100644 --- a/src/ailego/math/mips_euclidean_distance_matrix_fp16_dispatch.cc +++ b/src/ailego/math/mips_euclidean_distance_matrix_fp16_dispatch.cc @@ -18,6 +18,16 @@ namespace zvec { namespace ailego { +#if defined(__riscv_zvfh) +float MipsEucldeanDistanceRepeatedQuadraticInjectionRVV(const Float16 *lhs, + const Float16 *rhs, + size_t size, size_t m, + float e2); +float MipsEucldeanDistanceSphericalInjectionRVV(const Float16 *lhs, + const Float16 *rhs, size_t size, + float e2); +#endif + #if defined(__ARM_NEON) float MipsEuclideanDistanceRepeatedQuadraticInjectionFp16NEON( const Float16 *lhs, const Float16 *rhs, size_t size, size_t m, float e2); @@ -47,10 +57,16 @@ float MipsEuclideanDistanceRepeatedQuadraticInjectionFp16Scalar( float MipsEuclideanDistanceSphericalInjectionFp16Scalar( const ailego::Float16 *p, const ailego::Float16 *q, size_t dim, float e2); - //! Compute the distance between matrix and query by SphericalInjection void MipsSquaredEuclideanDistanceMatrix::Compute( const ValueType *p, const ValueType *q, size_t dim, float e2, float *out) { +#if defined(__riscv_zvfh) + if (zvec::ailego::internal::CpuFeatures::static_flags_.RISCV_ZVFH) { + *out = MipsEucldeanDistanceSphericalInjectionRVV(p, q, dim, e2); + return; + } +#endif // __riscv_zvfh + #if defined(__ARM_NEON) *out = MipsEuclideanDistanceSphericalInjectionFp16NEON(p, q, dim, e2); #else @@ -59,22 +75,30 @@ void MipsSquaredEuclideanDistanceMatrix::Compute( *out = MipsEuclideanDistanceSphericalInjectionFp16AVX512(p, q, dim, e2); return; } -#endif +#endif // __AVX512F__ + #if defined(__AVX__) if (zvec::ailego::internal::CpuFeatures::static_flags_.AVX) { *out = MipsEuclideanDistanceSphericalInjectionFp16AVX(p, q, dim, e2); return; } -#endif //__AVX__ +#endif // __AVX__ + *out = MipsEuclideanDistanceSphericalInjectionFp16Scalar(p, q, dim, e2); - return; -#endif //__ARM_NEON +#endif // __ARM_NEON } //! Compute the distance between matrix and query by RepeatedQuadraticInjection void MipsSquaredEuclideanDistanceMatrix::Compute( const ValueType *p, const ValueType *q, size_t dim, size_t m, float e2, float *out) { +#if defined(__riscv_zvfh) + if (zvec::ailego::internal::CpuFeatures::static_flags_.RISCV_ZVFH) { + *out = MipsEucldeanDistanceRepeatedQuadraticInjectionRVV(p, q, dim, m, e2); + return; + } +#endif // __riscv_zvfh + #if defined(__ARM_NEON) *out = MipsEuclideanDistanceRepeatedQuadraticInjectionFp16NEON(p, q, dim, m, e2); @@ -85,18 +109,19 @@ void MipsSquaredEuclideanDistanceMatrix::Compute( m, e2); return; } -#endif +#endif // __AVX512F__ + #if defined(__AVX__) if (zvec::ailego::internal::CpuFeatures::static_flags_.AVX) { *out = MipsEuclideanDistanceRepeatedQuadraticInjectionFp16AVX(p, q, dim, m, e2); return; } -#endif //__AVX__ +#endif // __AVX__ + *out = MipsEuclideanDistanceRepeatedQuadraticInjectionFp16Scalar(p, q, dim, m, e2); - return; -#endif //__ARM_NEON +#endif // __ARM_NEON } } // namespace ailego diff --git a/src/ailego/math/mips_euclidean_distance_matrix_fp16_rvv.cc b/src/ailego/math/mips_euclidean_distance_matrix_fp16_rvv.cc new file mode 100644 index 000000000..49d5212cd --- /dev/null +++ b/src/ailego/math/mips_euclidean_distance_matrix_fp16_rvv.cc @@ -0,0 +1,100 @@ +// Copyright 2025-present the zvec project +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include +#include "mips_euclidean_distance_matrix.h" + +namespace zvec { +namespace ailego { + +#if defined(__riscv_zvfh) +namespace { + +static inline float HorizontalReduceF32M8(vfloat32m8_t value, size_t vlmax) { + vfloat32m1_t zero = __riscv_vfmv_v_f_f32m1(0.0f, 1); + vfloat32m1_t red = __riscv_vfredusum_vs_f32m8_f32m1(value, zero, vlmax); + return __riscv_vfmv_f_s_f32m1_f32(red); +} + +static inline float InnerProductRVV(const Float16 *lhs, const Float16 *rhs, + size_t size) { + const _Float16 *lhs_fp16 = reinterpret_cast(lhs); + const _Float16 *rhs_fp16 = reinterpret_cast(rhs); + const size_t vlmax = __riscv_vsetvlmax_e16m4(); + vfloat32m8_t v_sum = __riscv_vfmv_v_f_f32m8(0.0f, vlmax); + + while (size != 0) { + const size_t vl = __riscv_vsetvl_e16m4(size); + vfloat16m4_t v_lhs = __riscv_vle16_v_f16m4(lhs_fp16, vl); + vfloat16m4_t v_rhs = __riscv_vle16_v_f16m4(rhs_fp16, vl); + v_sum = __riscv_vfwmacc_vv_f32m8_tu(v_sum, v_lhs, v_rhs, vl); + lhs_fp16 += vl; + rhs_fp16 += vl; + size -= vl; + } + + return HorizontalReduceF32M8(v_sum, vlmax); +} + +static inline float SquaredNormRVV(const Float16 *src, size_t size) { + const _Float16 *src_fp16 = reinterpret_cast(src); + const size_t vlmax = __riscv_vsetvlmax_e16m4(); + vfloat32m8_t v_sum = __riscv_vfmv_v_f_f32m8(0.0f, vlmax); + + while (size != 0) { + const size_t vl = __riscv_vsetvl_e16m4(size); + vfloat16m4_t v_src = __riscv_vle16_v_f16m4(src_fp16, vl); + v_sum = __riscv_vfwmacc_vv_f32m8_tu(v_sum, v_src, v_src, vl); + src_fp16 += vl; + size -= vl; + } + + return HorizontalReduceF32M8(v_sum, vlmax); +} + +} // namespace + +float MipsEucldeanDistanceSphericalInjectionRVV(const Float16 *lhs, + const Float16 *rhs, size_t size, + float e2) { + float sum = InnerProductRVV(lhs, rhs, size); + float u2 = SquaredNormRVV(lhs, size); + float v2 = SquaredNormRVV(rhs, size); + return ComputeSphericalInjection(sum, u2, v2, e2); +} + +float MipsEucldeanDistanceRepeatedQuadraticInjectionRVV(const Float16 *lhs, + const Float16 *rhs, + size_t size, size_t m, + float e2) { + float sum = InnerProductRVV(lhs, rhs, size); + float u2 = SquaredNormRVV(lhs, size); + float v2 = SquaredNormRVV(rhs, size); + + sum = e2 * (u2 + v2 - 2.0f * sum); + u2 *= e2; + v2 *= e2; + for (size_t i = 0; i < m; ++i) { + float d = u2 - v2; + sum += d * d; + u2 = u2 * u2; + v2 = v2 * v2; + } + return sum; +} + +#endif // __riscv_zvfh + +} // namespace ailego +} // namespace zvec diff --git a/src/ailego/math/mips_euclidean_distance_matrix_fp32_dispatch.cc b/src/ailego/math/mips_euclidean_distance_matrix_fp32_dispatch.cc index f48626a3f..ea71b8f06 100644 --- a/src/ailego/math/mips_euclidean_distance_matrix_fp32_dispatch.cc +++ b/src/ailego/math/mips_euclidean_distance_matrix_fp32_dispatch.cc @@ -18,6 +18,16 @@ namespace zvec { namespace ailego { +#if defined(__riscv_vector) +float MipsEucldeanDistanceRepeatedQuadraticInjectionRVV(const float *lhs, + const float *rhs, + size_t size, size_t m, + float e2); +float MipsEucldeanDistanceSphericalInjectionRVV(const float *lhs, + const float *rhs, size_t size, + float e2); +#endif + #if defined(__ARM_NEON) float InnerProductAndSquaredNormFp32NEON(const float *lhs, const float *rhs, size_t size, float *sql, float *sqr); @@ -63,7 +73,14 @@ float MipsInnerProductSparseInSegment(uint32_t m_sparse_count, //! Compute the distance between matrix and query by SphericalInjection void MipsSquaredEuclideanDistanceMatrix::Compute( const ValueType *p, const ValueType *q, size_t dim, float e2, float *out) { -#if __ARM_NEON +#if defined(__riscv_vector) + if (zvec::ailego::internal::CpuFeatures::static_flags_.RISCV_VECTOR) { + *out = MipsEucldeanDistanceSphericalInjectionRVV(p, q, dim, e2); + return; + } +#endif // __riscv_vector + +#if defined(__ARM_NEON) float u2{0.0f}; float v2{0.0f}; float sum = InnerProductAndSquaredNormFp32NEON(p, q, dim, &u2, &v2); @@ -76,28 +93,37 @@ void MipsSquaredEuclideanDistanceMatrix::Compute( *out = MipsEuclideanDistanceSphericalInjectionFp32AVX512(p, q, dim, e2); return; } -#endif //__AVX512F__ +#endif // __AVX512F__ + #if defined(__AVX__) if (zvec::ailego::internal::CpuFeatures::static_flags_.AVX) { *out = MipsEuclideanDistanceSphericalInjectionFp32AVX(p, q, dim, e2); return; } #endif // __AVX__ + #if defined(__SSE__) if (zvec::ailego::internal::CpuFeatures::static_flags_.SSE) { *out = MipsEuclideanDistanceSphericalInjectionFp32SSE(p, q, dim, e2); return; } #endif // __SSE__ + *out = MipsEuclideanDistanceSphericalInjectionFp32Scalar(p, q, dim, e2); - return; -#endif //__ARM_NEON +#endif // __ARM_NEON } //! Compute the distance between matrix and query by RepeatedQuadraticInjection void MipsSquaredEuclideanDistanceMatrix::Compute( const ValueType *p, const ValueType *q, size_t dim, size_t m, float e2, float *out) { +#if defined(__riscv_vector) + if (zvec::ailego::internal::CpuFeatures::static_flags_.RISCV_VECTOR) { + *out = MipsEucldeanDistanceRepeatedQuadraticInjectionRVV(p, q, dim, m, e2); + return; + } +#endif // __riscv_vector + #if defined(__ARM_NEON) float u2{0.0f}; float v2{0.0f}; @@ -120,7 +146,8 @@ void MipsSquaredEuclideanDistanceMatrix::Compute( m, e2); return; } -#endif //__AVX512F__ +#endif // __AVX512F__ + #if defined(__AVX__) if (zvec::ailego::internal::CpuFeatures::static_flags_.AVX) { *out = MipsEuclideanDistanceRepeatedQuadraticInjectionFp32AVX(p, q, dim, m, @@ -135,12 +162,11 @@ void MipsSquaredEuclideanDistanceMatrix::Compute( e2); return; } -#endif //__SSE__ +#endif // __SSE__ + *out = MipsEuclideanDistanceRepeatedQuadraticInjectionFp32Scalar(p, q, dim, m, e2); - - return; -#endif //__ARM_NEON +#endif // __ARM_NEON } // Sparse diff --git a/src/ailego/math/mips_euclidean_distance_matrix_fp32_rvv.cc b/src/ailego/math/mips_euclidean_distance_matrix_fp32_rvv.cc new file mode 100644 index 000000000..c9f9781f8 --- /dev/null +++ b/src/ailego/math/mips_euclidean_distance_matrix_fp32_rvv.cc @@ -0,0 +1,97 @@ +// Copyright 2025-present the zvec project +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include +#include "mips_euclidean_distance_matrix.h" + +namespace zvec { +namespace ailego { + +#if defined(__riscv_vector) +namespace { + +static inline float HorizontalReduceF32M8(vfloat32m8_t value, size_t vlmax) { + vfloat32m1_t zero = __riscv_vfmv_v_f_f32m1(0.0f, 1); + vfloat32m1_t red = __riscv_vfredusum_vs_f32m8_f32m1(value, zero, vlmax); + return __riscv_vfmv_f_s_f32m1_f32(red); +} + +static inline float InnerProductRVV(const float *lhs, const float *rhs, + size_t size) { + const size_t vlmax = __riscv_vsetvlmax_e32m8(); + vfloat32m8_t v_sum = __riscv_vfmv_v_f_f32m8(0.0f, vlmax); + + while (size != 0) { + size_t vl = __riscv_vsetvl_e32m8(size); + vfloat32m8_t v_lhs = __riscv_vle32_v_f32m8(lhs, vl); + vfloat32m8_t v_rhs = __riscv_vle32_v_f32m8(rhs, vl); + v_sum = __riscv_vfmacc_vv_f32m8_tu(v_sum, v_lhs, v_rhs, vl); + lhs += vl; + rhs += vl; + size -= vl; + } + + return HorizontalReduceF32M8(v_sum, vlmax); +} + +static inline float SquaredNormRVV(const float *src, size_t size) { + const size_t vlmax = __riscv_vsetvlmax_e32m8(); + vfloat32m8_t v_sum = __riscv_vfmv_v_f_f32m8(0.0f, vlmax); + + while (size != 0) { + size_t vl = __riscv_vsetvl_e32m8(size); + vfloat32m8_t v_src = __riscv_vle32_v_f32m8(src, vl); + v_sum = __riscv_vfmacc_vv_f32m8_tu(v_sum, v_src, v_src, vl); + src += vl; + size -= vl; + } + + return HorizontalReduceF32M8(v_sum, vlmax); +} + +} // namespace + +float MipsEucldeanDistanceSphericalInjectionRVV(const float *lhs, + const float *rhs, size_t size, + float e2) { + float sum = InnerProductRVV(lhs, rhs, size); + float u2 = SquaredNormRVV(lhs, size); + float v2 = SquaredNormRVV(rhs, size); + return ComputeSphericalInjection(sum, u2, v2, e2); +} + +float MipsEucldeanDistanceRepeatedQuadraticInjectionRVV(const float *lhs, + const float *rhs, + size_t size, size_t m, + float e2) { + float sum = InnerProductRVV(lhs, rhs, size); + float u2 = SquaredNormRVV(lhs, size); + float v2 = SquaredNormRVV(rhs, size); + + sum = e2 * (u2 + v2 - 2.0f * sum); + u2 *= e2; + v2 *= e2; + for (size_t i = 0; i < m; ++i) { + float d = u2 - v2; + sum += d * d; + u2 = u2 * u2; + v2 = v2 * v2; + } + return sum; +} + +#endif // __riscv_vector + +} // namespace ailego +} // namespace zvec diff --git a/src/ailego/math/mips_euclidean_distance_matrix_int8_dispatch.cc b/src/ailego/math/mips_euclidean_distance_matrix_int8_dispatch.cc index f0f744940..223eb49bd 100644 --- a/src/ailego/math/mips_euclidean_distance_matrix_int8_dispatch.cc +++ b/src/ailego/math/mips_euclidean_distance_matrix_int8_dispatch.cc @@ -18,6 +18,16 @@ namespace zvec { namespace ailego { +#if defined(__riscv_vector) +float MipsEucldeanDistanceRepeatedQuadraticInjectionRVV(const int8_t *lhs, + const int8_t *rhs, + size_t size, size_t m, + float e2); +float MipsEucldeanDistanceSphericalInjectionRVV(const int8_t *lhs, + const int8_t *rhs, size_t size, + float e2); +#endif + #if defined(__AVX2__) float MipsEuclideanDistanceRepeatedQuadraticInjectionInt8AVX2( const int8_t *lhs, const int8_t *rhs, size_t size, size_t m, float e2); @@ -43,19 +53,26 @@ float MipsEuclideanDistanceSphericalInjectionInt8Scalar(const int8_t *lhs, //! Compute the distance between matrix and query by SphericalInjection void MipsSquaredEuclideanDistanceMatrix::Compute( const ValueType *p, const ValueType *q, size_t dim, float e2, float *out) { +#if defined(__riscv_vector) + if (zvec::ailego::internal::CpuFeatures::static_flags_.RISCV_VECTOR) { + *out = MipsEucldeanDistanceSphericalInjectionRVV(p, q, dim, e2); + return; + } +#endif // __riscv_vector + #if defined(__AVX2__) if (zvec::ailego::internal::CpuFeatures::static_flags_.AVX2) { *out = MipsEuclideanDistanceSphericalInjectionInt8AVX2(p, q, dim, e2); return; } -#endif +#endif // __AVX2__ #if defined(__SSE4_1__) if (zvec::ailego::internal::CpuFeatures::static_flags_.SSE4_1) { *out = MipsEuclideanDistanceSphericalInjectionInt8SSE(p, q, dim, e2); return; } -#endif //__SSE4_1__ +#endif // __SSE4_1__ *out = MipsEuclideanDistanceSphericalInjectionInt8Scalar(p, q, dim, e2); } @@ -64,20 +81,28 @@ void MipsSquaredEuclideanDistanceMatrix::Compute( void MipsSquaredEuclideanDistanceMatrix::Compute( const ValueType *p, const ValueType *q, size_t dim, size_t m, float e2, float *out) { +#if defined(__riscv_vector) + if (zvec::ailego::internal::CpuFeatures::static_flags_.RISCV_VECTOR) { + *out = MipsEucldeanDistanceRepeatedQuadraticInjectionRVV(p, q, dim, m, e2); + return; + } +#endif // __riscv_vector + #if defined(__AVX2__) if (zvec::ailego::internal::CpuFeatures::static_flags_.AVX2) { *out = MipsEuclideanDistanceRepeatedQuadraticInjectionInt8AVX2(p, q, dim, m, e2); return; } -#endif +#endif // __AVX2__ + #if defined(__SSE4_1__) if (zvec::ailego::internal::CpuFeatures::static_flags_.SSE4_1) { *out = MipsEuclideanDistanceRepeatedQuadraticInjectionInt8SSE(p, q, dim, m, e2); return; } -#endif //__SSE4_1__ +#endif // __SSE4_1__ *out = MipsEuclideanDistanceRepeatedQuadraticInjectionInt8Scalar(p, q, dim, m, e2); diff --git a/src/ailego/math/mips_euclidean_distance_matrix_int8_rvv.cc b/src/ailego/math/mips_euclidean_distance_matrix_int8_rvv.cc new file mode 100644 index 000000000..1268e4245 --- /dev/null +++ b/src/ailego/math/mips_euclidean_distance_matrix_int8_rvv.cc @@ -0,0 +1,100 @@ +// Copyright 2025-present the zvec project +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include +#include "mips_euclidean_distance_matrix.h" + +namespace zvec { +namespace ailego { + +#if defined(__riscv_vector) +namespace { + +static inline float HorizontalReduceI32M8(vint32m8_t value, size_t vlmax) { + const vint32m1_t zero = __riscv_vmv_v_x_i32m1(0, 1); + const vint32m1_t red = __riscv_vredsum_vs_i32m8_i32m1(value, zero, vlmax); + return static_cast(__riscv_vmv_x_s_i32m1_i32(red)); +} + +static inline float InnerProductRVV(const int8_t *lhs, const int8_t *rhs, + size_t size) { + const size_t vlmax = __riscv_vsetvlmax_e8m2(); + vint32m8_t v_sum = __riscv_vmv_v_x_i32m8(0, vlmax); + + while (size != 0) { + const size_t vl = __riscv_vsetvl_e8m2(size); + const vint8m2_t v_lhs8 = __riscv_vle8_v_i8m2(lhs, vl); + const vint8m2_t v_rhs8 = __riscv_vle8_v_i8m2(rhs, vl); + const vint16m4_t v_lhs16 = __riscv_vsext_vf2_i16m4(v_lhs8, vl); + const vint16m4_t v_rhs16 = __riscv_vsext_vf2_i16m4(v_rhs8, vl); + v_sum = __riscv_vwmacc_vv_i32m8_tu(v_sum, v_lhs16, v_rhs16, vl); + lhs += vl; + rhs += vl; + size -= vl; + } + + return HorizontalReduceI32M8(v_sum, vlmax); +} + +static inline float SquaredNormRVV(const int8_t *src, size_t size) { + const size_t vlmax = __riscv_vsetvlmax_e8m2(); + vint32m8_t v_sum = __riscv_vmv_v_x_i32m8(0, vlmax); + + while (size != 0) { + const size_t vl = __riscv_vsetvl_e8m2(size); + const vint8m2_t v_src8 = __riscv_vle8_v_i8m2(src, vl); + const vint16m4_t v_src16 = __riscv_vsext_vf2_i16m4(v_src8, vl); + v_sum = __riscv_vwmacc_vv_i32m8_tu(v_sum, v_src16, v_src16, vl); + src += vl; + size -= vl; + } + + return HorizontalReduceI32M8(v_sum, vlmax); +} + +} // namespace + +float MipsEucldeanDistanceSphericalInjectionRVV(const int8_t *lhs, + const int8_t *rhs, size_t size, + float e2) { + const float sum = InnerProductRVV(lhs, rhs, size); + const float u2 = SquaredNormRVV(lhs, size); + const float v2 = SquaredNormRVV(rhs, size); + return ComputeSphericalInjection(sum, u2, v2, e2); +} + +float MipsEucldeanDistanceRepeatedQuadraticInjectionRVV(const int8_t *lhs, + const int8_t *rhs, + size_t size, size_t m, + float e2) { + float sum = InnerProductRVV(lhs, rhs, size); + float u2 = SquaredNormRVV(lhs, size); + float v2 = SquaredNormRVV(rhs, size); + + sum = e2 * (u2 + v2 - 2.0f * sum); + u2 *= e2; + v2 *= e2; + for (size_t i = 0; i < m; ++i) { + const float d = u2 - v2; + sum += d * d; + u2 = u2 * u2; + v2 = v2 * v2; + } + return sum; +} + +#endif // __riscv_vector + +} // namespace ailego +} // namespace zvec diff --git a/src/ailego/math/norm1_matrix.h b/src/ailego/math/norm1_matrix.h index 7e8d9cbc8..b06f3bb9b 100644 --- a/src/ailego/math/norm1_matrix.h +++ b/src/ailego/math/norm1_matrix.h @@ -116,7 +116,8 @@ struct Norm1Matrix< } }; -#if defined(__SSE__) || (defined(__ARM_NEON) && defined(__aarch64__)) +#if defined(__SSE__) || (defined(__ARM_NEON) && defined(__aarch64__)) || \ + defined(__riscv_vector) /*! L1-Norm Matrix (FP32, M=1) */ template <> @@ -127,10 +128,10 @@ struct Norm1Matrix { //! Compute the L1-norm of vectors static void Compute(const ValueType *m, size_t dim, float *out); }; -#endif // __SSE__ || (__ARM_NEON && __aarch64__) +#endif // __SSE__ || (__ARM_NEON && __aarch64__) || __riscv_vector #if (defined(__F16C__) && defined(__AVX__)) || \ - (defined(__ARM_NEON) && defined(__aarch64__)) + (defined(__ARM_NEON) && defined(__aarch64__)) || defined(__riscv_zvfh) /*! L1-Norm Matrix (FP16, M=1) */ template <> @@ -141,7 +142,7 @@ struct Norm1Matrix { //! Compute the L1-norm of vectors static void Compute(const ValueType *m, size_t dim, float *out); }; -#endif // (__F16C__ && __AVX__) || (__ARM_NEON && __aarch64__) +#endif // (__F16C__ && __AVX__) || (__ARM_NEON && __aarch64__) || __riscv_zvfh } // namespace ailego } // namespace zvec diff --git a/src/ailego/math/norm1_matrix_fp16.cc b/src/ailego/math/norm1_matrix_fp16.cc index e75b3e0a8..c0499a2ac 100644 --- a/src/ailego/math/norm1_matrix_fp16.cc +++ b/src/ailego/math/norm1_matrix_fp16.cc @@ -20,6 +20,10 @@ namespace zvec { namespace ailego { +#if defined(__riscv_zvfh) +float Norm1RVV(const Float16 *m, size_t dim); +#endif + #define NORM_FP32_STEP_GENERAL SA_FP32_GENERAL #define NORM_FP32_STEP_SSE SA_FP32_SSE #define NORM_FP32_STEP_AVX SA_FP32_AVX @@ -68,10 +72,16 @@ static const __m512 ABS_MASK_FP32_AVX512 = #define SA_FP16_NEON(v_m, v_sum) v_sum = vaddq_f16(vabsq_f16(v_m), v_sum); #if (defined(__F16C__) && defined(__AVX__)) || \ - (defined(__ARM_NEON) && defined(__aarch64__)) + (defined(__ARM_NEON) && defined(__aarch64__)) || defined(__riscv_zvfh) //! Compute the L1-norm of vectors (FP16, M=1) void Norm1Matrix::Compute(const ValueType *m, size_t dim, float *out) { +#if defined(__riscv_zvfh) + if (zvec::ailego::internal::CpuFeatures::static_flags_.RISCV_ZVFH) { + *out = Norm1RVV(m, dim); + return; + } +#endif #if defined(__ARM_NEON) NORM_FP16_1_NEON(m, dim, out, ) #else @@ -81,10 +91,19 @@ void Norm1Matrix::Compute(const ValueType *m, size_t dim, return; } #endif +#if defined(__AVX__) NORM_FP16_1_AVX(m, dim, out, ) +#else + float sum = 0.0f; + const ValueType *m_end = m + dim; + while (m != m_end) { + sum += Float16::Absolute(*m++); + } + *out = sum; +#endif #endif } -#endif // (__F16C__ && __AVX__) || (__ARM_NEON && __aarch64__) +#endif // (__F16C__ && __AVX__) || (__ARM_NEON && __aarch64__) || __riscv_zvfh } // namespace ailego } // namespace zvec diff --git a/src/ailego/math/norm1_matrix_fp16_rvv.cc b/src/ailego/math/norm1_matrix_fp16_rvv.cc new file mode 100644 index 000000000..9fe080503 --- /dev/null +++ b/src/ailego/math/norm1_matrix_fp16_rvv.cc @@ -0,0 +1,44 @@ +// Copyright 2025-present the zvec project +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include +#include "norm1_matrix.h" + +namespace zvec { +namespace ailego { + +#if defined(__riscv_zvfh) +float Norm1RVV(const Float16 *m, size_t dim) { + const _Float16 *m_fp16 = reinterpret_cast(m); + const size_t vlmax = __riscv_vsetvlmax_e16m4(); + vfloat32m8_t v_sum = __riscv_vfmv_v_f_f32m8(0.0f, vlmax); + + while (dim != 0) { + const size_t vl = __riscv_vsetvl_e16m4(dim); + vfloat16m4_t v_m = __riscv_vle16_v_f16m4(m_fp16, vl); + vfloat16m4_t v_abs = __riscv_vfabs_v_f16m4(v_m, vl); + v_sum = __riscv_vfwadd_wv_f32m8_tu(v_sum, v_sum, v_abs, vl); + m_fp16 += vl; + dim -= vl; + } + + vfloat32m1_t v_zero = __riscv_vfmv_v_f_f32m1(0.0f, 1); + vfloat32m1_t v_reduce = + __riscv_vfredusum_vs_f32m8_f32m1(v_sum, v_zero, vlmax); + return __riscv_vfmv_f_s_f32m1_f32(v_reduce); +} +#endif // __riscv_zvfh + +} // namespace ailego +} // namespace zvec diff --git a/src/ailego/math/norm1_matrix_fp32.cc b/src/ailego/math/norm1_matrix_fp32.cc index 2e7279118..ec7d838b2 100644 --- a/src/ailego/math/norm1_matrix_fp32.cc +++ b/src/ailego/math/norm1_matrix_fp32.cc @@ -20,6 +20,10 @@ namespace zvec { namespace ailego { +#if defined(__riscv_vector) +float Norm1RVV(const float *m, size_t dim); +#endif + #define NORM_FP32_STEP_GENERAL SA_FP32_GENERAL #define NORM_FP32_STEP_SSE SA_FP32_SSE #define NORM_FP32_STEP_AVX SA_FP32_AVX @@ -56,13 +60,20 @@ namespace ailego { //! Calculate sum of absolute (NEON) #define SA_FP32_NEON(v_m, v_sum) v_sum = vaddq_f32(vabsq_f32(v_m), v_sum); -#if defined(__SSE__) || (defined(__ARM_NEON) && defined(__aarch64__)) +#if defined(__SSE__) || (defined(__ARM_NEON) && defined(__aarch64__)) || \ + defined(__riscv_vector) //! Compute the L1-norm of vectors (FP32, M=1) void Norm1Matrix::Compute(const ValueType *m, size_t dim, float *out) { +#if defined(__riscv_vector) + if (zvec::ailego::internal::CpuFeatures::static_flags_.RISCV_VECTOR) { + *out = Norm1RVV(m, dim); + return; + } +#endif #if defined(__ARM_NEON) NORM_FP32_1_NEON(m, dim, out, ) -#else +#elif defined(__SSE__) #if defined(__AVX512F__) if (zvec::ailego::internal::CpuFeatures::static_flags_.AVX512F) { NORM_FP32_1_AVX512(m, dim, out, ) @@ -76,9 +87,18 @@ void Norm1Matrix::Compute(const ValueType *m, size_t dim, } #endif NORM_FP32_1_SSE(m, dim, out, ) +#else + ailego_assert(m && dim && out); + const ValueType *m_end = m + dim; + if (m != m_end) { + *out = MathHelper::Absolute(*m++); + } + while (m != m_end) { + *out += MathHelper::Absolute(*m++); + } #endif } -#endif // __SSE__ || (__ARM_NEON && __aarch64__) +#endif // __SSE__ || (__ARM_NEON && __aarch64__) || __riscv_vector } // namespace ailego } // namespace zvec diff --git a/src/ailego/math/norm1_matrix_fp32_rvv.cc b/src/ailego/math/norm1_matrix_fp32_rvv.cc new file mode 100644 index 000000000..2baeeba41 --- /dev/null +++ b/src/ailego/math/norm1_matrix_fp32_rvv.cc @@ -0,0 +1,42 @@ +// Copyright 2025-present the zvec project +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include + +namespace zvec { +namespace ailego { + +#if defined(__riscv_vector) +float Norm1RVV(const float *m, size_t dim) { + const size_t vlmax = __riscv_vsetvlmax_e32m8(); + vfloat32m8_t v_sum = __riscv_vfmv_v_f_f32m8(0.0f, vlmax); + + while (dim != 0) { + const size_t vl = __riscv_vsetvl_e32m8(dim); + vfloat32m8_t v_m = __riscv_vle32_v_f32m8(m, vl); + vfloat32m8_t v_abs = __riscv_vfabs_v_f32m8(v_m, vl); + v_sum = __riscv_vfadd_vv_f32m8_tu(v_sum, v_sum, v_abs, vl); + m += vl; + dim -= vl; + } + + vfloat32m1_t v_zero = __riscv_vfmv_v_f_f32m1(0.0f, 1); + vfloat32m1_t v_reduce = + __riscv_vfredusum_vs_f32m8_f32m1(v_sum, v_zero, vlmax); + return __riscv_vfmv_f_s_f32m1_f32(v_reduce); +} +#endif // __riscv_vector + +} // namespace ailego +} // namespace zvec diff --git a/src/ailego/math/norm2_matrix.h b/src/ailego/math/norm2_matrix.h index 3c905147d..f4a598c53 100644 --- a/src/ailego/math/norm2_matrix.h +++ b/src/ailego/math/norm2_matrix.h @@ -371,7 +371,8 @@ struct SquaredNorm2Matrix= 2>::type> { } }; -#if defined(__SSE__) || (defined(__ARM_NEON) && defined(__aarch64__)) +#if defined(__SSE__) || (defined(__ARM_NEON) && defined(__aarch64__)) || \ + defined(__riscv_vector) /*! L2-Norm Matrix (FP32, M=1) */ template <> @@ -393,10 +394,10 @@ struct SquaredNorm2Matrix { //! Compute the squared L2-norm of vectors static void Compute(const ValueType *m, size_t dim, float *out); }; -#endif // __SSE__ || (__ARM_NEON && __aarch64__) +#endif // __SSE__ || (__ARM_NEON && __aarch64__) || __riscv_vector #if (defined(__F16C__) && defined(__AVX__)) || \ - (defined(__ARM_NEON) && defined(__aarch64__)) + (defined(__ARM_NEON) && defined(__aarch64__)) || defined(__riscv_zvfh) /*! L2-Norm Matrix (FP16, M=1) */ template <> @@ -418,7 +419,7 @@ struct SquaredNorm2Matrix { //! Compute the squared L2-norm of vectors static void Compute(const ValueType *m, size_t dim, float *out); }; -#endif // (__F16C__ && __AVX__) || (__ARM_NEON && __aarch64__) +#endif // (__F16C__ && __AVX__) || (__ARM_NEON && __aarch64__) || __riscv_zvfh } // namespace ailego } // namespace zvec diff --git a/src/ailego/math/norm2_matrix_fp16.cc b/src/ailego/math/norm2_matrix_fp16.cc index 6bb8dd06c..70a0f12b5 100644 --- a/src/ailego/math/norm2_matrix_fp16.cc +++ b/src/ailego/math/norm2_matrix_fp16.cc @@ -20,6 +20,11 @@ namespace zvec { namespace ailego { +#if defined(__riscv_zvfh) +float Norm2RVV(const Float16 *m, size_t dim); +float SquaredNorm2RVV(const Float16 *m, size_t dim); +#endif + #define NORM_FP32_STEP_GENERAL SS_FP32_GENERAL #define NORM_FP32_STEP_SSE SS_FP32_SSE #define NORM_FP32_STEP_AVX SS_FP32_AVX @@ -53,10 +58,16 @@ namespace ailego { #define SS_FP16_NEON(v_m, v_sum) v_sum = vfmaq_f16(v_sum, v_m, v_m); #if (defined(__F16C__) && defined(__AVX__)) || \ - (defined(__ARM_NEON) && defined(__aarch64__)) + (defined(__ARM_NEON) && defined(__aarch64__)) || defined(__riscv_zvfh) //! Compute the L2-norm of vectors (FP16, M=1) void Norm2Matrix::Compute(const ValueType *m, size_t dim, float *out) { +#if defined(__riscv_zvfh) + if (zvec::ailego::internal::CpuFeatures::static_flags_.RISCV_ZVFH) { + *out = Norm2RVV(m, dim); + return; + } +#endif #if defined(__ARM_NEON) NORM_FP16_1_NEON(m, dim, out, std::sqrt) #else @@ -66,13 +77,29 @@ void Norm2Matrix::Compute(const ValueType *m, size_t dim, return; } #endif +#if defined(__AVX__) NORM_FP16_1_AVX(m, dim, out, std::sqrt) +#else + float sum = 0.0f; + const ValueType *m_end = m + dim; + while (m != m_end) { + float v = static_cast(*m++); + sum += v * v; + } + *out = std::sqrt(sum); +#endif #endif } //! Compute the L2-norm of vectors (FP16, M=1) void SquaredNorm2Matrix::Compute(const ValueType *m, size_t dim, float *out) { +#if defined(__riscv_zvfh) + if (zvec::ailego::internal::CpuFeatures::static_flags_.RISCV_ZVFH) { + *out = SquaredNorm2RVV(m, dim); + return; + } +#endif #if defined(__ARM_NEON) NORM_FP16_1_NEON(m, dim, out, ) #else @@ -82,10 +109,20 @@ void SquaredNorm2Matrix::Compute(const ValueType *m, size_t dim, return; } #endif +#if defined(__AVX__) NORM_FP16_1_AVX(m, dim, out, ) +#else + float sum = 0.0f; + const ValueType *m_end = m + dim; + while (m != m_end) { + float v = static_cast(*m++); + sum += v * v; + } + *out = sum; +#endif #endif } -#endif // (__F16C__ && __AVX__) || (__ARM_NEON && __aarch64__) +#endif // (__F16C__ && __AVX__) || (__ARM_NEON && __aarch64__) || __riscv_zvfh } // namespace ailego -} // namespace zvec \ No newline at end of file +} // namespace zvec diff --git a/src/ailego/math/norm2_matrix_fp16_rvv.cc b/src/ailego/math/norm2_matrix_fp16_rvv.cc new file mode 100644 index 000000000..be1ba7e4b --- /dev/null +++ b/src/ailego/math/norm2_matrix_fp16_rvv.cc @@ -0,0 +1,56 @@ +// Copyright 2025-present the zvec project +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include +#include +#include "norm2_matrix.h" + +namespace zvec { +namespace ailego { + +#if defined(__riscv_zvfh) +namespace { + +static inline float SquaredNorm2RVVImpl(const Float16 *m, size_t dim) { + const _Float16 *m_fp16 = reinterpret_cast(m); + const size_t vlmax = __riscv_vsetvlmax_e16m4(); + vfloat32m8_t v_sum = __riscv_vfmv_v_f_f32m8(0.0f, vlmax); + + while (dim != 0) { + const size_t vl = __riscv_vsetvl_e16m4(dim); + vfloat16m4_t v_m = __riscv_vle16_v_f16m4(m_fp16, vl); + v_sum = __riscv_vfwmacc_vv_f32m8_tu(v_sum, v_m, v_m, vl); + m_fp16 += vl; + dim -= vl; + } + + vfloat32m1_t v_zero = __riscv_vfmv_v_f_f32m1(0.0f, 1); + vfloat32m1_t v_reduce = + __riscv_vfredusum_vs_f32m8_f32m1(v_sum, v_zero, vlmax); + return __riscv_vfmv_f_s_f32m1_f32(v_reduce); +} + +} // namespace + +float SquaredNorm2RVV(const Float16 *m, size_t dim) { + return SquaredNorm2RVVImpl(m, dim); +} + +float Norm2RVV(const Float16 *m, size_t dim) { + return std::sqrt(SquaredNorm2RVVImpl(m, dim)); +} +#endif // __riscv_zvfh + +} // namespace ailego +} // namespace zvec diff --git a/src/ailego/math/norm2_matrix_fp32.cc b/src/ailego/math/norm2_matrix_fp32.cc index 8cc76c1f5..dbb7e2dc6 100644 --- a/src/ailego/math/norm2_matrix_fp32.cc +++ b/src/ailego/math/norm2_matrix_fp32.cc @@ -19,6 +19,11 @@ namespace zvec { namespace ailego { +#if defined(__riscv_vector) +float Norm2RVV(const float *m, size_t dim); +float SquaredNorm2RVV(const float *m, size_t dim); +#endif + #define NORM_FP32_STEP_GENERAL SS_FP32_GENERAL #define NORM_FP32_STEP_SSE SS_FP32_SSE #define NORM_FP32_STEP_AVX SS_FP32_AVX @@ -43,13 +48,20 @@ namespace ailego { //! Calculate sum of squared (NEON) #define SS_FP32_NEON(v_m, v_sum) v_sum = vfmaq_f32(v_sum, v_m, v_m); -#if defined(__SSE__) || (defined(__ARM_NEON) && defined(__aarch64__)) +#if defined(__SSE__) || (defined(__ARM_NEON) && defined(__aarch64__)) || \ + defined(__riscv_vector) //! Compute the L2-norm of vectors (FP32, M=1) void Norm2Matrix::Compute(const ValueType *m, size_t dim, float *out) { +#if defined(__riscv_vector) + if (zvec::ailego::internal::CpuFeatures::static_flags_.RISCV_VECTOR) { + *out = Norm2RVV(m, dim); + return; + } +#endif #if defined(__ARM_NEON) NORM_FP32_1_NEON(m, dim, out, std::sqrt) -#else +#elif defined(__SSE__) #if defined(__AVX512F__) if (zvec::ailego::internal::CpuFeatures::static_flags_.AVX512F) { NORM_FP32_1_AVX512(m, dim, out, std::sqrt) @@ -63,15 +75,24 @@ void Norm2Matrix::Compute(const ValueType *m, size_t dim, } #endif NORM_FP32_1_SSE(m, dim, out, std::sqrt) +#else + SquaredNorm2Matrix::Compute(m, dim, out); + *out = std::sqrt(*out); #endif } //! Compute the squared L2-norm of vectors (FP32, M=1) void SquaredNorm2Matrix::Compute(const ValueType *m, size_t dim, float *out) { +#if defined(__riscv_vector) + if (zvec::ailego::internal::CpuFeatures::static_flags_.RISCV_VECTOR) { + *out = SquaredNorm2RVV(m, dim); + return; + } +#endif #if defined(__ARM_NEON) NORM_FP32_1_NEON(m, dim, out, ) -#else +#elif defined(__SSE__) #if defined(__AVX512F__) if (zvec::ailego::internal::CpuFeatures::static_flags_.AVX512F) { NORM_FP32_1_AVX512(m, dim, out, ) @@ -85,9 +106,20 @@ void SquaredNorm2Matrix::Compute(const ValueType *m, size_t dim, } #endif NORM_FP32_1_SSE(m, dim, out, ) +#else + ailego_assert(m && dim && out); + const ValueType *m_end = m + dim; + if (m != m_end) { + ValueType v = *m++; + *out = static_cast(v * v); + } + while (m != m_end) { + ValueType v = *m++; + *out += static_cast(v * v); + } #endif } -#endif // __SSE__ || (__ARM_NEON && __aarch64__) +#endif // __SSE__ || (__ARM_NEON && __aarch64__) || __riscv_vector } // namespace ailego } // namespace zvec diff --git a/src/ailego/math/norm2_matrix_fp32_rvv.cc b/src/ailego/math/norm2_matrix_fp32_rvv.cc new file mode 100644 index 000000000..d1677a434 --- /dev/null +++ b/src/ailego/math/norm2_matrix_fp32_rvv.cc @@ -0,0 +1,54 @@ +// Copyright 2025-present the zvec project +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include +#include + +namespace zvec { +namespace ailego { + +#if defined(__riscv_vector) +namespace { + +float SquaredNorm2RVVImpl(const float *m, size_t dim) { + const size_t vlmax = __riscv_vsetvlmax_e32m8(); + vfloat32m8_t v_sum = __riscv_vfmv_v_f_f32m8(0.0f, vlmax); + + while (dim != 0) { + const size_t vl = __riscv_vsetvl_e32m8(dim); + vfloat32m8_t v_m = __riscv_vle32_v_f32m8(m, vl); + v_sum = __riscv_vfmacc_vv_f32m8_tu(v_sum, v_m, v_m, vl); + m += vl; + dim -= vl; + } + + vfloat32m1_t v_zero = __riscv_vfmv_v_f_f32m1(0.0f, 1); + vfloat32m1_t v_reduce = + __riscv_vfredusum_vs_f32m8_f32m1(v_sum, v_zero, vlmax); + return __riscv_vfmv_f_s_f32m1_f32(v_reduce); +} + +} // namespace + +float SquaredNorm2RVV(const float *m, size_t dim) { + return SquaredNorm2RVVImpl(m, dim); +} + +float Norm2RVV(const float *m, size_t dim) { + return std::sqrt(SquaredNorm2RVVImpl(m, dim)); +} +#endif // __riscv_vector + +} // namespace ailego +} // namespace zvec diff --git a/src/ailego/math/normalizer.cc b/src/ailego/math/normalizer.cc index a31a9f350..a90a0d8ae 100644 --- a/src/ailego/math/normalizer.cc +++ b/src/ailego/math/normalizer.cc @@ -13,10 +13,40 @@ // limitations under the License. #include "normalizer.h" +#include +#include "ailego/internal/cpu_features.h" namespace zvec { namespace ailego { +#if defined(__riscv_vector) +static inline void NormalizeRVV(float *arr, size_t dim, float norm) { + while (dim != 0) { + const size_t vl = __riscv_vsetvl_e32m8(dim); + vfloat32m8_t v_arr = __riscv_vle32_v_f32m8(arr, vl); + v_arr = __riscv_vfdiv_vf_f32m8(v_arr, norm, vl); + __riscv_vse32_v_f32m8(arr, v_arr, vl); + arr += vl; + dim -= vl; + } +} +#endif // __riscv_vector + +#if defined(__riscv_zvfh) +static inline void NormalizeRVV(Float16 *arr, size_t dim, float norm) { + _Float16 *arr_fp16 = reinterpret_cast<_Float16 *>(arr); + const _Float16 norm_fp16 = static_cast<_Float16>(norm); + while (dim != 0) { + const size_t vl = __riscv_vsetvl_e16m8(dim); + vfloat16m8_t v_arr = __riscv_vle16_v_f16m8(arr_fp16, vl); + v_arr = __riscv_vfdiv_vf_f16m8(v_arr, norm_fp16, vl); + __riscv_vse16_v_f16m8(arr_fp16, v_arr, vl); + arr_fp16 += vl; + dim -= vl; + } +} +#endif // __riscv_zvfh + #if (defined(__ARM_NEON) && defined(__aarch64__)) static inline void NormalizeNEON(float *arr, size_t dim, float norm) { float *last = arr + dim; @@ -392,9 +422,16 @@ static inline void NormalizeSSE(float *arr, size_t dim, float norm) { } #endif // __SSE__ -#if defined(__SSE__) || (defined(__ARM_NEON) && defined(__aarch64__)) +#if defined(__SSE__) || (defined(__ARM_NEON) && defined(__aarch64__)) || \ + defined(__riscv_vector) //! Compute the norm of vector void Normalizer::Compute(ValueType *arr, size_t dim, float norm) { +#if defined(__riscv_vector) + if (zvec::ailego::internal::CpuFeatures::static_flags_.RISCV_VECTOR) { + NormalizeRVV(arr, dim, norm); + return; + } +#endif #if defined(__ARM_NEON) NormalizeNEON(arr, dim, norm); #else @@ -410,15 +447,28 @@ void Normalizer::Compute(ValueType *arr, size_t dim, float norm) { return; } #endif // __AVX__ +#if !defined(__SSE__) + ailego_assert(arr && dim && norm); + for (size_t i = 0; i < dim; ++i) { + arr[i] /= norm; + } +#else NormalizeSSE(arr, dim, norm); +#endif #endif // __ARM_NEON } -#endif // __SSE__ || (__ARM_NEON && __aarch64__) +#endif // __SSE__ || (__ARM_NEON && __aarch64__) || __riscv_vector #if (defined(__F16C__) && defined(__AVX__)) || \ - (defined(__ARM_NEON) && defined(__aarch64__)) + (defined(__ARM_NEON) && defined(__aarch64__)) || defined(__riscv_zvfh) //! Compute the norm of vector void Normalizer::Compute(ValueType *arr, size_t dim, float norm) { +#if defined(__riscv_zvfh) + if (zvec::ailego::internal::CpuFeatures::static_flags_.RISCV_ZVFH) { + NormalizeRVV(arr, dim, norm); + return; + } +#endif #if defined(__ARM_NEON) NormalizeNEON(reinterpret_cast(arr), dim, norm); #else @@ -428,10 +478,17 @@ void Normalizer::Compute(ValueType *arr, size_t dim, float norm) { return; } #endif // __AVX512F__ +#if defined(__AVX__) NormalizeAVX(reinterpret_cast(arr), dim, norm); +#else + ailego_assert(arr && dim && norm); + for (size_t i = 0; i < dim; ++i) { + arr[i] /= norm; + } +#endif #endif // __ARM_NEON } -#endif // (__F16C__ && __AVX__) || (__ARM_NEON && __aarch64__) +#endif // (__F16C__ && __AVX__) || (__ARM_NEON && __aarch64__) || __riscv_zvfh } // namespace ailego -} // namespace zvec \ No newline at end of file +} // namespace zvec diff --git a/src/ailego/math/normalizer.h b/src/ailego/math/normalizer.h index 2c191b0e7..25acdce09 100644 --- a/src/ailego/math/normalizer.h +++ b/src/ailego/math/normalizer.h @@ -51,7 +51,8 @@ struct Normalizer { } }; -#if defined(__SSE__) || (defined(__ARM_NEON) && defined(__aarch64__)) +#if defined(__SSE__) || (defined(__ARM_NEON) && defined(__aarch64__)) || \ + defined(__riscv_vector) /*! Normalizer (FP32) */ template <> @@ -78,10 +79,10 @@ struct Normalizer { } } }; -#endif // __SSE__ || (__ARM_NEON && __aarch64__) +#endif // __SSE__ || (__ARM_NEON && __aarch64__) || __riscv_vector #if (defined(__F16C__) && defined(__AVX__)) || \ - (defined(__ARM_NEON) && defined(__aarch64__)) + (defined(__ARM_NEON) && defined(__aarch64__)) || defined(__riscv_zvfh) /*! Normalizer (FP16) */ template <> @@ -108,7 +109,7 @@ struct Normalizer { } } }; -#endif // (__F16C__ && __AVX__) || (__ARM_NEON && __aarch64__) +#endif // (__F16C__ && __AVX__) || (__ARM_NEON && __aarch64__) || __riscv_zvfh } // namespace ailego } // namespace zvec diff --git a/src/ailego/math_batch/euclidean_distance_batch_dispatch.cc b/src/ailego/math_batch/euclidean_distance_batch_dispatch.cc index 5c8ffb254..5b0fa44ab 100644 --- a/src/ailego/math_batch/euclidean_distance_batch_dispatch.cc +++ b/src/ailego/math_batch/euclidean_distance_batch_dispatch.cc @@ -87,9 +87,39 @@ void compute_one_to_many_squared_euclidean_avx2_fp16_12( // float *results); #endif +#if defined(__riscv_vector) +void compute_one_to_many_squared_euclidean_rvv_fp32_1( + const float *query, const float **ptrs, + std::array &prefetch_ptrs, size_t dimensionality, + float *results); + +void compute_one_to_many_squared_euclidean_rvv_fp32_12( + const float *query, const float **ptrs, + std::array &prefetch_ptrs, size_t dimensionality, + float *results); +#endif + +#if defined(__riscv_zvfh) +void compute_one_to_many_squared_euclidean_rvv_fp16_1( + const ailego::Float16 *query, const ailego::Float16 **ptrs, + std::array &prefetch_ptrs, + size_t dimensionality, float *results); + +void compute_one_to_many_squared_euclidean_rvv_fp16_12( + const ailego::Float16 *query, const ailego::Float16 **ptrs, + std::array &prefetch_ptrs, + size_t dimensionality, float *results); +#endif + void SquaredEuclideanDistanceBatchImpl::compute_one_to_many( const ValueType *query, const ValueType **ptrs, std::array &prefetch_ptrs, size_t dim, float *sums) { +#if defined(__riscv_vector) + if (zvec::ailego::internal::CpuFeatures::static_flags_.RISCV_VECTOR) { + return compute_one_to_many_squared_euclidean_rvv_fp32_1( + query, ptrs, prefetch_ptrs, dim, sums); + } +#endif #if defined(__AVX2__) if (zvec::ailego::internal::CpuFeatures::static_flags_.AVX2) { return compute_one_to_many_squared_euclidean_avx2_fp32_1( @@ -104,6 +134,12 @@ void SquaredEuclideanDistanceBatchImpl::compute_one_to_many( const ailego::Float16 *query, const ailego::Float16 **ptrs, std::array &prefetch_ptrs, size_t dim, float *sums) { +#if defined(__riscv_zvfh) + if (zvec::ailego::internal::CpuFeatures::static_flags_.RISCV_ZVFH) { + return compute_one_to_many_squared_euclidean_rvv_fp16_1( + query, ptrs, prefetch_ptrs, dim, sums); + } +#endif #if defined(__AVX512FP16__) if (zvec::ailego::internal::CpuFeatures::static_flags_.AVX512_FP16) { return compute_one_to_many_squared_euclidean_avx512fp16_fp16_1( @@ -129,6 +165,12 @@ void SquaredEuclideanDistanceBatchImpl::compute_one_to_many( void SquaredEuclideanDistanceBatchImpl::compute_one_to_many( const float *query, const float **ptrs, std::array &prefetch_ptrs, size_t dim, float *sums) { +#if defined(__riscv_vector) + if (zvec::ailego::internal::CpuFeatures::static_flags_.RISCV_VECTOR) { + return compute_one_to_many_squared_euclidean_rvv_fp32_12( + query, ptrs, prefetch_ptrs, dim, sums); + } +#endif #if defined(__AVX512F__) if (zvec::ailego::internal::CpuFeatures::static_flags_.AVX512F) { return compute_one_to_many_squared_euclidean_avx512f_fp32_12( @@ -151,6 +193,12 @@ void SquaredEuclideanDistanceBatchImpl:: const ailego::Float16 **ptrs, std::array &prefetch_ptrs, size_t dim, float *sums) { +#if defined(__riscv_zvfh) + if (zvec::ailego::internal::CpuFeatures::static_flags_.RISCV_ZVFH) { + return compute_one_to_many_squared_euclidean_rvv_fp16_12( + query, ptrs, prefetch_ptrs, dim, sums); + } +#endif #if defined(__AVX512FP16__) if (zvec::ailego::internal::CpuFeatures::static_flags_.AVX512_FP16) { return compute_one_to_many_squared_euclidean_avx512fp16_fp16_12( diff --git a/src/ailego/math_batch/euclidean_distance_batch_impl_fp16_rvv.cc b/src/ailego/math_batch/euclidean_distance_batch_impl_fp16_rvv.cc new file mode 100644 index 000000000..612e561b2 --- /dev/null +++ b/src/ailego/math_batch/euclidean_distance_batch_impl_fp16_rvv.cc @@ -0,0 +1,97 @@ +// Copyright 2025-present the zvec project +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include +#include +#include + +namespace zvec::ailego::DistanceBatch { + +#if defined(__riscv_zvfh) + +void compute_one_to_many_squared_euclidean_rvv_fp16_1( + const ailego::Float16 *query, const ailego::Float16 **ptrs, + std::array &prefetch_ptrs, + size_t dimensionality, float *results) { + const _Float16 *q_ptr = reinterpret_cast(query); + const _Float16 *m_ptr = reinterpret_cast(ptrs[0]); + + const size_t vlmax = __riscv_vsetvlmax_e16m4(); + vfloat32m8_t acc = __riscv_vfmv_v_f_f32m8(0.0f, vlmax); + + size_t dim = 0; + while (dim < dimensionality) { + size_t vl = __riscv_vsetvl_e16m4(dimensionality - dim); + + vfloat16m4_t q = __riscv_vle16_v_f16m4(q_ptr + dim, vl); + vfloat16m4_t m = __riscv_vle16_v_f16m4(m_ptr + dim, vl); + + // Widen the fp16 difference to fp32 (matches the scalar/AVX reference, + // which converts to fp32 before subtracting), then square-accumulate. + vfloat32m8_t diff = __riscv_vfwsub_vv_f32m8(q, m, vl); + acc = __riscv_vfmacc_vv_f32m8_tu(acc, diff, diff, vl); + + if (prefetch_ptrs[0] != nullptr) { + ailego_prefetch(prefetch_ptrs[0] + dim); + } + + dim += vl; + } + + vfloat32m1_t zero = __riscv_vfmv_v_f_f32m1(0.0f, 1); + vfloat32m1_t red = __riscv_vfredusum_vs_f32m8_f32m1(acc, zero, vlmax); + results[0] = __riscv_vfmv_f_s_f32m1_f32(red); +} + +void compute_one_to_many_squared_euclidean_rvv_fp16_12( + const ailego::Float16 *query, const ailego::Float16 **ptrs, + std::array &prefetch_ptrs, + size_t dimensionality, float *results) { + const _Float16 *q_ptr = reinterpret_cast(query); + + const size_t vlmax = __riscv_vsetvlmax_e16m4(); + + for (size_t channel = 0; channel < 12; ++channel) { + const _Float16 *m_ptr = reinterpret_cast(ptrs[channel]); + + vfloat32m8_t acc = __riscv_vfmv_v_f_f32m8(0.0f, vlmax); + + size_t dim = 0; + while (dim < dimensionality) { + size_t vl = __riscv_vsetvl_e16m4(dimensionality - dim); + + vfloat16m4_t q = __riscv_vle16_v_f16m4(q_ptr + dim, vl); + vfloat16m4_t m = __riscv_vle16_v_f16m4(m_ptr + dim, vl); + + // Widen the fp16 difference to fp32 (matches the scalar/AVX reference, + // which converts to fp32 before subtracting), then square-accumulate. + vfloat32m8_t diff = __riscv_vfwsub_vv_f32m8(q, m, vl); + acc = __riscv_vfmacc_vv_f32m8_tu(acc, diff, diff, vl); + + if (prefetch_ptrs[channel] != nullptr) { + ailego_prefetch(prefetch_ptrs[channel] + dim); + } + + dim += vl; + } + + vfloat32m1_t zero = __riscv_vfmv_v_f_f32m1(0.0f, 1); + vfloat32m1_t red = __riscv_vfredusum_vs_f32m8_f32m1(acc, zero, vlmax); + results[channel] = __riscv_vfmv_f_s_f32m1_f32(red); + } +} + +#endif + +} // namespace zvec::ailego::DistanceBatch diff --git a/src/ailego/math_batch/euclidean_distance_batch_impl_fp32_rvv.cc b/src/ailego/math_batch/euclidean_distance_batch_impl_fp32_rvv.cc new file mode 100644 index 000000000..45b6cc888 --- /dev/null +++ b/src/ailego/math_batch/euclidean_distance_batch_impl_fp32_rvv.cc @@ -0,0 +1,91 @@ +// Copyright 2025-present the zvec project +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include +#include + +namespace zvec::ailego::DistanceBatch { + +#if defined(__riscv_vector) + +void compute_one_to_many_squared_euclidean_rvv_fp32_1( + const float *query, const float **ptrs, + std::array &prefetch_ptrs, size_t dimensionality, + float *results) { + const float *q_ptr = query; + const float *m_ptr = ptrs[0]; + + const size_t vlmax = __riscv_vsetvlmax_e32m8(); + vfloat32m8_t acc = __riscv_vfmv_v_f_f32m8(0.0f, vlmax); + + size_t dim = 0; + while (dim < dimensionality) { + size_t vl = __riscv_vsetvl_e32m8(dimensionality - dim); + + vfloat32m8_t q = __riscv_vle32_v_f32m8(q_ptr + dim, vl); + vfloat32m8_t m = __riscv_vle32_v_f32m8(m_ptr + dim, vl); + + vfloat32m8_t diff = __riscv_vfsub_vv_f32m8(q, m, vl); + acc = __riscv_vfmacc_vv_f32m8_tu(acc, diff, diff, vl); + + if (prefetch_ptrs[0] != nullptr) { + ailego_prefetch(prefetch_ptrs[0] + dim); + } + + dim += vl; + } + + vfloat32m1_t zero = __riscv_vfmv_v_f_f32m1(0.0f, 1); + vfloat32m1_t red = __riscv_vfredusum_vs_f32m8_f32m1(acc, zero, vlmax); + results[0] = __riscv_vfmv_f_s_f32m1_f32(red); +} + +void compute_one_to_many_squared_euclidean_rvv_fp32_12( + const float *query, const float **ptrs, + std::array &prefetch_ptrs, size_t dimensionality, + float *results) { + const float *q_ptr = query; + + const size_t vlmax = __riscv_vsetvlmax_e32m8(); + + for (size_t channel = 0; channel < 12; ++channel) { + const float *m_ptr = ptrs[channel]; + vfloat32m8_t acc = __riscv_vfmv_v_f_f32m8(0.0f, vlmax); + + size_t dim = 0; + while (dim < dimensionality) { + size_t vl = __riscv_vsetvl_e32m8(dimensionality - dim); + + vfloat32m8_t q = __riscv_vle32_v_f32m8(q_ptr + dim, vl); + vfloat32m8_t m = __riscv_vle32_v_f32m8(m_ptr + dim, vl); + + vfloat32m8_t diff = __riscv_vfsub_vv_f32m8(q, m, vl); + acc = __riscv_vfmacc_vv_f32m8_tu(acc, diff, diff, vl); + + if (prefetch_ptrs[channel] != nullptr) { + ailego_prefetch(prefetch_ptrs[channel] + dim); + } + + dim += vl; + } + + vfloat32m1_t zero = __riscv_vfmv_v_f_f32m1(0.0f, 1); + vfloat32m1_t red = __riscv_vfredusum_vs_f32m8_f32m1(acc, zero, vlmax); + results[channel] = __riscv_vfmv_f_s_f32m1_f32(red); + } +} + +#endif + +} // namespace zvec::ailego::DistanceBatch diff --git a/src/ailego/math_batch/inner_product_distance_batch_dispatch.cc b/src/ailego/math_batch/inner_product_distance_batch_dispatch.cc index 78376626a..6e2b2cf51 100644 --- a/src/ailego/math_batch/inner_product_distance_batch_dispatch.cc +++ b/src/ailego/math_batch/inner_product_distance_batch_dispatch.cc @@ -93,9 +93,51 @@ void compute_one_to_many_inner_product_avx2_int8_12( float *results); #endif +#if defined(__riscv_vector) +void compute_one_to_many_inner_product_rvv_fp32_1( + const float *query, const float **ptrs, + std::array &prefetch_ptrs, size_t dimensionality, + float *results); + +void compute_one_to_many_inner_product_rvv_fp32_12( + const float *query, const float **ptrs, + std::array &prefetch_ptrs, size_t dimensionality, + float *results); +#endif + +#if defined(__riscv_zvfh) +void compute_one_to_many_inner_product_rvv_fp16_1( + const ailego::Float16 *query, const ailego::Float16 **ptrs, + std::array &prefetch_ptrs, + size_t dimensionality, float *results); + +void compute_one_to_many_inner_product_rvv_fp16_12( + const ailego::Float16 *query, const ailego::Float16 **ptrs, + std::array &prefetch_ptrs, + size_t dimensionality, float *results); +#endif + +#if defined(__riscv_vector) +void compute_one_to_many_inner_product_rvv_int8_1( + const int8_t *query, const int8_t **ptrs, + std::array &prefetch_ptrs, size_t dimensionality, + float *results); + +void compute_one_to_many_inner_product_rvv_int8_12( + const int8_t *query, const int8_t **ptrs, + std::array &prefetch_ptrs, size_t dimensionality, + float *results); +#endif + void InnerProductDistanceBatchImpl::compute_one_to_many( const ValueType *query, const ValueType **ptrs, std::array &prefetch_ptrs, size_t dim, float *sums) { +#if defined(__riscv_vector) + if (zvec::ailego::internal::CpuFeatures::static_flags_.RISCV_VECTOR) { + return compute_one_to_many_inner_product_rvv_fp32_1( + query, ptrs, prefetch_ptrs, dim, sums); + } +#endif #if defined(__AVX2__) if (zvec::ailego::internal::CpuFeatures::static_flags_.AVX2) { return compute_one_to_many_inner_product_avx2_fp32_1( @@ -110,6 +152,12 @@ void InnerProductDistanceBatchImpl::compute_one_to_many( const ailego::Float16 *query, const ailego::Float16 **ptrs, std::array &prefetch_ptrs, size_t dim, float *sums) { +#if defined(__riscv_zvfh) + if (zvec::ailego::internal::CpuFeatures::static_flags_.RISCV_ZVFH) { + return compute_one_to_many_inner_product_rvv_fp16_1( + query, ptrs, prefetch_ptrs, dim, sums); + } +#endif #if defined(__AVX512FP16__) if (zvec::ailego::internal::CpuFeatures::static_flags_.AVX512_FP16) { return compute_one_to_many_inner_product_avx512fp16_fp16_1( @@ -135,6 +183,12 @@ void InnerProductDistanceBatchImpl::compute_one_to_many( void InnerProductDistanceBatchImpl::compute_one_to_many( const int8_t *query, const int8_t **ptrs, std::array &prefetch_ptrs, size_t dim, float *sums) { +#if defined(__riscv_vector) + if (zvec::ailego::internal::CpuFeatures::static_flags_.RISCV_VECTOR) { + return compute_one_to_many_inner_product_rvv_int8_1( + query, ptrs, prefetch_ptrs, dim, sums); + } +#endif // #if defined(__AVX512BW__) // TODO: this version is problematic // return compute_one_to_many_avx512_int8( // query, ptrs, prefetch_ptrs, dim, sums); @@ -167,6 +221,12 @@ InnerProductDistanceBatchImpl::GetQueryPreprocessFunc() { void InnerProductDistanceBatchImpl::compute_one_to_many( const ValueType *query, const ValueType **ptrs, std::array &prefetch_ptrs, size_t dim, float *sums) { +#if defined(__riscv_vector) + if (zvec::ailego::internal::CpuFeatures::static_flags_.RISCV_VECTOR) { + return compute_one_to_many_inner_product_rvv_fp32_12( + query, ptrs, prefetch_ptrs, dim, sums); + } +#endif #if defined(__AVX2__) if (zvec::ailego::internal::CpuFeatures::static_flags_.AVX2) { return compute_one_to_many_inner_product_avx2_fp32_12( @@ -181,6 +241,12 @@ void InnerProductDistanceBatchImpl::compute_one_to_many( const ailego::Float16 *query, const ailego::Float16 **ptrs, std::array &prefetch_ptrs, size_t dim, float *sums) { +#if defined(__riscv_zvfh) + if (zvec::ailego::internal::CpuFeatures::static_flags_.RISCV_ZVFH) { + return compute_one_to_many_inner_product_rvv_fp16_12( + query, ptrs, prefetch_ptrs, dim, sums); + } +#endif #if defined(__AVX512FP16__) if (zvec::ailego::internal::CpuFeatures::static_flags_.AVX512_FP16) { return compute_one_to_many_inner_product_avx512fp16_fp16_12( @@ -206,6 +272,12 @@ void InnerProductDistanceBatchImpl::compute_one_to_many( void InnerProductDistanceBatchImpl::compute_one_to_many( const int8_t *query, const int8_t **ptrs, std::array &prefetch_ptrs, size_t dim, float *sums) { +#if defined(__riscv_vector) + if (zvec::ailego::internal::CpuFeatures::static_flags_.RISCV_VECTOR) { + return compute_one_to_many_inner_product_rvv_int8_12( + query, ptrs, prefetch_ptrs, dim, sums); + } +#endif // #if defined(__AVX512BW__) // TODO: this version is problematic // return compute_one_to_many_avx512_int8( // query, ptrs, prefetch_ptrs, dim, sums); diff --git a/src/ailego/math_batch/inner_product_distance_batch_impl_fp16_rvv.cc b/src/ailego/math_batch/inner_product_distance_batch_impl_fp16_rvv.cc new file mode 100644 index 000000000..efff8ee56 --- /dev/null +++ b/src/ailego/math_batch/inner_product_distance_batch_impl_fp16_rvv.cc @@ -0,0 +1,91 @@ +// Copyright 2025-present the zvec project +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include +#include +#include + +namespace zvec::ailego::DistanceBatch { + +#if defined(__riscv_zvfh) + +void compute_one_to_many_inner_product_rvv_fp16_1( + const ailego::Float16 *query, const ailego::Float16 **ptrs, + std::array &prefetch_ptrs, + size_t dimensionality, float *results) { + const _Float16 *q_ptr = reinterpret_cast(query); + const _Float16 *m_ptr = reinterpret_cast(ptrs[0]); + + const size_t vlmax = __riscv_vsetvlmax_e16m4(); + vfloat32m8_t acc = __riscv_vfmv_v_f_f32m8(0.0f, vlmax); + + size_t dim = 0; + while (dim < dimensionality) { + size_t vl = __riscv_vsetvl_e16m4(dimensionality - dim); + + vfloat16m4_t q = __riscv_vle16_v_f16m4(q_ptr + dim, vl); + vfloat16m4_t m = __riscv_vle16_v_f16m4(m_ptr + dim, vl); + + acc = __riscv_vfwmacc_vv_f32m8_tu(acc, q, m, vl); + + if (prefetch_ptrs[0] != nullptr) { + ailego_prefetch(prefetch_ptrs[0] + dim); + } + + dim += vl; + } + + vfloat32m1_t zero = __riscv_vfmv_v_f_f32m1(0.0f, 1); + vfloat32m1_t red = __riscv_vfredusum_vs_f32m8_f32m1(acc, zero, vlmax); + results[0] = __riscv_vfmv_f_s_f32m1_f32(red); +} + +void compute_one_to_many_inner_product_rvv_fp16_12( + const ailego::Float16 *query, const ailego::Float16 **ptrs, + std::array &prefetch_ptrs, + size_t dimensionality, float *results) { + const _Float16 *q_ptr = reinterpret_cast(query); + + const size_t vlmax = __riscv_vsetvlmax_e16m4(); + + for (size_t channel = 0; channel < 12; ++channel) { + const _Float16 *m_ptr = reinterpret_cast(ptrs[channel]); + + vfloat32m8_t acc = __riscv_vfmv_v_f_f32m8(0.0f, vlmax); + + size_t dim = 0; + while (dim < dimensionality) { + size_t vl = __riscv_vsetvl_e16m4(dimensionality - dim); + + vfloat16m4_t q = __riscv_vle16_v_f16m4(q_ptr + dim, vl); + vfloat16m4_t m = __riscv_vle16_v_f16m4(m_ptr + dim, vl); + + acc = __riscv_vfwmacc_vv_f32m8_tu(acc, q, m, vl); + + if (prefetch_ptrs[channel] != nullptr) { + ailego_prefetch(prefetch_ptrs[channel] + dim); + } + + dim += vl; + } + + vfloat32m1_t zero = __riscv_vfmv_v_f_f32m1(0.0f, 1); + vfloat32m1_t red = __riscv_vfredusum_vs_f32m8_f32m1(acc, zero, vlmax); + results[channel] = __riscv_vfmv_f_s_f32m1_f32(red); + } +} + +#endif + +} // namespace zvec::ailego::DistanceBatch diff --git a/src/ailego/math_batch/inner_product_distance_batch_impl_fp32_rvv.cc b/src/ailego/math_batch/inner_product_distance_batch_impl_fp32_rvv.cc new file mode 100644 index 000000000..72076567c --- /dev/null +++ b/src/ailego/math_batch/inner_product_distance_batch_impl_fp32_rvv.cc @@ -0,0 +1,89 @@ +// Copyright 2025-present the zvec project +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include +#include + +namespace zvec::ailego::DistanceBatch { + +#if defined(__riscv_vector) + +void compute_one_to_many_inner_product_rvv_fp32_1( + const float *query, const float **ptrs, + std::array &prefetch_ptrs, size_t dimensionality, + float *results) { + const float *q_ptr = query; + const float *m_ptr = ptrs[0]; + + const size_t vlmax = __riscv_vsetvlmax_e32m8(); + vfloat32m8_t acc = __riscv_vfmv_v_f_f32m8(0.0f, vlmax); + + size_t dim = 0; + while (dim < dimensionality) { + size_t vl = __riscv_vsetvl_e32m8(dimensionality - dim); + + vfloat32m8_t q = __riscv_vle32_v_f32m8(q_ptr + dim, vl); + vfloat32m8_t m = __riscv_vle32_v_f32m8(m_ptr + dim, vl); + + acc = __riscv_vfmacc_vv_f32m8_tu(acc, q, m, vl); + + if (prefetch_ptrs[0] != nullptr) { + ailego_prefetch(prefetch_ptrs[0] + dim); + } + + dim += vl; + } + + vfloat32m1_t zero = __riscv_vfmv_v_f_f32m1(0.0f, 1); + vfloat32m1_t red = __riscv_vfredusum_vs_f32m8_f32m1(acc, zero, vlmax); + results[0] = __riscv_vfmv_f_s_f32m1_f32(red); +} + +void compute_one_to_many_inner_product_rvv_fp32_12( + const float *query, const float **ptrs, + std::array &prefetch_ptrs, size_t dimensionality, + float *results) { + const float *q_ptr = query; + + const size_t vlmax = __riscv_vsetvlmax_e32m8(); + + for (size_t channel = 0; channel < 12; ++channel) { + const float *m_ptr = ptrs[channel]; + vfloat32m8_t acc = __riscv_vfmv_v_f_f32m8(0.0f, vlmax); + + size_t dim = 0; + while (dim < dimensionality) { + size_t vl = __riscv_vsetvl_e32m8(dimensionality - dim); + + vfloat32m8_t q = __riscv_vle32_v_f32m8(q_ptr + dim, vl); + vfloat32m8_t m = __riscv_vle32_v_f32m8(m_ptr + dim, vl); + + acc = __riscv_vfmacc_vv_f32m8_tu(acc, q, m, vl); + + if (prefetch_ptrs[channel] != nullptr) { + ailego_prefetch(prefetch_ptrs[channel] + dim); + } + + dim += vl; + } + + vfloat32m1_t zero = __riscv_vfmv_v_f_f32m1(0.0f, 1); + vfloat32m1_t red = __riscv_vfredusum_vs_f32m8_f32m1(acc, zero, vlmax); + results[channel] = __riscv_vfmv_f_s_f32m1_f32(red); + } +} + +#endif + +} // namespace zvec::ailego::DistanceBatch diff --git a/src/ailego/math_batch/inner_product_distance_batch_impl_int8_rvv.cc b/src/ailego/math_batch/inner_product_distance_batch_impl_int8_rvv.cc new file mode 100644 index 000000000..5f23a6003 --- /dev/null +++ b/src/ailego/math_batch/inner_product_distance_batch_impl_int8_rvv.cc @@ -0,0 +1,93 @@ +// Copyright 2025-present the zvec project +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include +#include + +namespace zvec::ailego::DistanceBatch { + +#if defined(__riscv_vector) + +void compute_one_to_many_inner_product_rvv_int8_1( + const int8_t *query, const int8_t **ptrs, + std::array &prefetch_ptrs, size_t dimensionality, + float *results) { + const int8_t *q_ptr = query; + const int8_t *m_ptr = ptrs[0]; + + const size_t vlmax = __riscv_vsetvlmax_e8m2(); + vint32m8_t acc = __riscv_vmv_v_x_i32m8(0, vlmax); + + size_t dim = 0; + while (dim < dimensionality) { + size_t vl = __riscv_vsetvl_e8m2(dimensionality - dim); + + vint8m2_t q8 = __riscv_vle8_v_i8m2(q_ptr + dim, vl); + vint8m2_t m8 = __riscv_vle8_v_i8m2(m_ptr + dim, vl); + vint16m4_t q16 = __riscv_vsext_vf2_i16m4(q8, vl); + vint16m4_t m16 = __riscv_vsext_vf2_i16m4(m8, vl); + + acc = __riscv_vwmacc_vv_i32m8_tu(acc, q16, m16, vl); + + if (prefetch_ptrs[0] != nullptr) { + ailego_prefetch(prefetch_ptrs[0] + dim); + } + + dim += vl; + } + + vint32m1_t zero = __riscv_vmv_v_x_i32m1(0, 1); + vint32m1_t red = __riscv_vredsum_vs_i32m8_i32m1(acc, zero, vlmax); + results[0] = static_cast(__riscv_vmv_x_s_i32m1_i32(red)); +} + +void compute_one_to_many_inner_product_rvv_int8_12( + const int8_t *query, const int8_t **ptrs, + std::array &prefetch_ptrs, size_t dimensionality, + float *results) { + const int8_t *q_ptr = query; + + const size_t vlmax = __riscv_vsetvlmax_e8m2(); + + for (size_t channel = 0; channel < 12; ++channel) { + const int8_t *m_ptr = ptrs[channel]; + vint32m8_t acc = __riscv_vmv_v_x_i32m8(0, vlmax); + + size_t dim = 0; + while (dim < dimensionality) { + size_t vl = __riscv_vsetvl_e8m2(dimensionality - dim); + + vint8m2_t q8 = __riscv_vle8_v_i8m2(q_ptr + dim, vl); + vint8m2_t m8 = __riscv_vle8_v_i8m2(m_ptr + dim, vl); + vint16m4_t q16 = __riscv_vsext_vf2_i16m4(q8, vl); + vint16m4_t m16 = __riscv_vsext_vf2_i16m4(m8, vl); + + acc = __riscv_vwmacc_vv_i32m8_tu(acc, q16, m16, vl); + + if (prefetch_ptrs[channel] != nullptr) { + ailego_prefetch(prefetch_ptrs[channel] + dim); + } + + dim += vl; + } + + vint32m1_t zero = __riscv_vmv_v_x_i32m1(0, 1); + vint32m1_t red = __riscv_vredsum_vs_i32m8_i32m1(acc, zero, vlmax); + results[channel] = static_cast(__riscv_vmv_x_s_i32m1_i32(red)); + } +} + +#endif + +} // namespace zvec::ailego::DistanceBatch diff --git a/thirdparty/FastPFOR/FastPFOR-0.4.0 b/thirdparty/FastPFOR/FastPFOR-0.4.0 index 2be1f9769..d6890b32b 160000 --- a/thirdparty/FastPFOR/FastPFOR-0.4.0 +++ b/thirdparty/FastPFOR/FastPFOR-0.4.0 @@ -1 +1 @@ -Subproject commit 2be1f976935b8ff9296b029f574d7f964be9d35d +Subproject commit d6890b32b2d6e185f5a2470892df59f80fc20ed0 diff --git a/thirdparty/RaBitQ-Library/RaBitQ-Library-0.1 b/thirdparty/RaBitQ-Library/RaBitQ-Library-0.1 index 858b0d6c4..11923fdb6 160000 --- a/thirdparty/RaBitQ-Library/RaBitQ-Library-0.1 +++ b/thirdparty/RaBitQ-Library/RaBitQ-Library-0.1 @@ -1 +1 @@ -Subproject commit 858b0d6c480766d0e4f08fc5e02f34b53d698fad +Subproject commit 11923fdb6abe4c48895d1afe9df010a942d1f07b diff --git a/thirdparty/cppjieba/cppjieba-5.6.7 b/thirdparty/cppjieba/cppjieba-5.6.7 index b3602bef7..8fe7b9304 160000 --- a/thirdparty/cppjieba/cppjieba-5.6.7 +++ b/thirdparty/cppjieba/cppjieba-5.6.7 @@ -1 +1 @@ -Subproject commit b3602bef7d1f67521a61788a74fb5801a0e62cd3 +Subproject commit 8fe7b9304d2730320e2e275721b1b330056c581b diff --git a/thirdparty/gflags/gflags-2.2.2 b/thirdparty/gflags/gflags-2.2.2 index e171aa2d1..531935032 160000 --- a/thirdparty/gflags/gflags-2.2.2 +++ b/thirdparty/gflags/gflags-2.2.2 @@ -1 +1 @@ -Subproject commit e171aa2d15ed9eb17054558e0b3a6a413bb01067 +Subproject commit 5319350323577cff4c42ab59118531d04f13edf4 diff --git a/thirdparty/googletest/googletest-1.10.0 b/thirdparty/googletest/googletest-1.10.0 index 703bd9caa..a721f1b20 160000 --- a/thirdparty/googletest/googletest-1.10.0 +++ b/thirdparty/googletest/googletest-1.10.0 @@ -1 +1 @@ -Subproject commit 703bd9caab50b139428cea1aaff9974ebee5742e +Subproject commit a721f1b20c605f635413e22d8b7427a8b4f3956c diff --git a/thirdparty/limonp/limonp-v1.0.2 b/thirdparty/limonp/limonp-v1.0.2 index 9d74077df..ac17a7b8a 160000 --- a/thirdparty/limonp/limonp-v1.0.2 +++ b/thirdparty/limonp/limonp-v1.0.2 @@ -1 +1 @@ -Subproject commit 9d74077dfcdf8073536c97a00bb79d7a3c3fdaba +Subproject commit ac17a7b8a53332972bdc45b3090c974ed6fa2d92 diff --git a/thirdparty/lz4/lz4-1.9.4 b/thirdparty/lz4/lz4-1.9.4 index 5ff839680..0774d0553 160000 --- a/thirdparty/lz4/lz4-1.9.4 +++ b/thirdparty/lz4/lz4-1.9.4 @@ -1 +1 @@ -Subproject commit 5ff839680134437dbf4678f3d0c7b371d84f4964 +Subproject commit 0774d05537f9762f838f7ab541b7765f1a729cb5 diff --git a/thirdparty/magic_enum/magic_enum-0.9.7 b/thirdparty/magic_enum/magic_enum-0.9.7 index 83ab7f4f5..4c597d68a 160000 --- a/thirdparty/magic_enum/magic_enum-0.9.7 +++ b/thirdparty/magic_enum/magic_enum-0.9.7 @@ -1 +1 @@ -Subproject commit 83ab7f4f578bd00e1026a7cd9f7baa4f1a62cbeb +Subproject commit 4c597d68ae51ee4287dea8cd4fad81907974a5ea diff --git a/thirdparty/protobuf/protobuf-3.21.12 b/thirdparty/protobuf/protobuf-3.21.12 index f0dc78d7e..c88ae9951 160000 --- a/thirdparty/protobuf/protobuf-3.21.12 +++ b/thirdparty/protobuf/protobuf-3.21.12 @@ -1 +1 @@ -Subproject commit f0dc78d7e6e331b8c6bb2d5283e06aa26883ca7c +Subproject commit c88ae99513a3f3e74e324a50e9da45476c4cb0a2 diff --git a/thirdparty/rocksdb/rocksdb-8.1.1 b/thirdparty/rocksdb/rocksdb-8.1.1 index 6a4361504..3883a8d05 160000 --- a/thirdparty/rocksdb/rocksdb-8.1.1 +++ b/thirdparty/rocksdb/rocksdb-8.1.1 @@ -1 +1 @@ -Subproject commit 6a436150417120a3f9732d65a2a5c2b8d19b60fc +Subproject commit 3883a8d05ecda71c30f724558b91529cde19e3e8 diff --git a/thirdparty/yaml-cpp/yaml-cpp-0.6.3 b/thirdparty/yaml-cpp/yaml-cpp-0.6.3 index 9a3624205..2decf96e9 160000 --- a/thirdparty/yaml-cpp/yaml-cpp-0.6.3 +++ b/thirdparty/yaml-cpp/yaml-cpp-0.6.3 @@ -1 +1 @@ -Subproject commit 9a3624205e8774953ef18f57067b3426c1c5ada6 +Subproject commit 2decf96e915d2b0c26c68c1659665789dfef2633