Skip to content

feat: add diskann index#369

Open
richyreachy wants to merge 136 commits into
alibaba:mainfrom
richyreachy:feat/diskann_index
Open

feat: add diskann index#369
richyreachy wants to merge 136 commits into
alibaba:mainfrom
richyreachy:feat/diskann_index

Conversation

@richyreachy
Copy link
Copy Markdown
Collaborator

Add diskann index into Zvec to lower memory usage in vector search as per the description: #325

Comment thread src/db/index/common/doc.cc
-Wl,--whole-archive
$<TARGET_FILE:core_knn_flat_static>
$<TARGET_FILE:core_knn_flat_sparse_static>
$<TARGET_FILE:core_knn_hnsw_static>
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

代码有重复,可以调整一下

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

没有调整吧?

Comment thread .github/workflows/03-macos-linux-build.yml
Comment thread .github/workflows/03-macos-linux-build.yml
Comment thread src/core/algorithm/diskann/diskann_algorithm.cc Outdated

virtual ~DiskAnnQueryParams() = default;

int list_size() const {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

加上参数的注释吧

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

没看到有加?

Comment thread src/core/algorithm/diskann/diskann_builder.cc Outdated
Comment thread src/core/algorithm/diskann/diskann_builder.cc Outdated
Comment thread src/core/algorithm/diskann/diskann_holder.h Outdated
Comment thread src/core/algorithm/diskann/diskann_indexer.h Outdated
Comment thread src/core/algorithm/diskann/CMakeLists.txt Outdated
Comment thread src/include/zvec/db/query_params.h
::memcpy(&(vec_value[0]), node_fp_coords_copy, meta_.element_size());

topk_heap.emplace(cached_neighbor.first,
VectorInfo(cur_expanded_dist, vec_value));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move(vec_value)


std::string vec_value;
vec_value.resize(meta_.element_size());
::memcpy(&(vec_value[0]), data_buf, meta_.element_size());
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里有两次copy,先拷贝到data_buf,再拷贝到vec_value,可以只拷贝到vec_value,用c_str()做距离计算吗?

::memcpy(&(vec_value[0]), data_buf, meta_.element_size());

topk_heap.emplace(frontier_neighbor.first,
VectorInfo(cur_expanded_dist, vec_value));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move(vec_value)

std::vector<AlignedRead> frontier_read_reqs;
frontier_read_reqs.reserve(2 * beam_width_);

std::vector<std::pair<diskann_id_t, std::pair<uint32_t, diskann_id_t *>>>
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

和上面函数类似的问题一起改下吧

::memcpy(&(vec_value[0]), node_fp_coords_copy, meta_.element_size());

topk_heap.emplace(cached_neighbor.first,
VectorInfo(cur_expanded_dist, vec_value));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move

void *node_fp_coords = node_disk_buf;
memcpy(data_buf, node_fp_coords, disk_bytes_per_point_);

float cur_expanded_dist = dc.dist(ctx->query(), data_buf);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以直接用node_fp_coords做计算吗?是否可以不用拷贝到data_buf?

::memcpy(&(vec_value[0]), data_buf, meta_.element_size());

topk_heap.emplace(frontier_neighbor.first,
VectorInfo(cur_expanded_dist, vec_value));
Copy link
Copy Markdown
Collaborator

@egolearner egolearner Jun 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move,更进一步可以提供一个helper

std::string make_vector_copy(const void* xxx) {
    return {(const char*)xxx, meta_.element_size()};
}
topk_heap.emplace(frontier_neighbor.first,
                          VectorInfo(cur_expanded_dist, make_vector_copy(data_buf)));


// Do Train
ailego::Params params;
params.set("zvec.cluster.multi_chunk_cluster.count", num_centers);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

使用常量定义?

-Wl,--whole-archive
$<TARGET_FILE:core_knn_flat_static>
$<TARGET_FILE:core_knn_flat_sparse_static>
$<TARGET_FILE:core_knn_hnsw_static>
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

没有调整吧?


virtual ~DiskAnnQueryParams() = default;

int list_size() const {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

没看到有加?

const VectorData &query, const BaseIndexQueryParam::Pointer &search_param,
core::IndexContext::Pointer &context) override;

virtual int Add(const VectorData &vector, uint32_t doc_id) override;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

去掉virtual

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants