Skip to content

Conversation

@lucifertrj
Copy link

All Submissions:

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New models submission:

  • Have you added an explanation of why it's important to include this model?
  • Have you added tests for the new model? Were canonical values for tests computed via the original model?
  • Have you added the code snippet for how canonical values were computed?
  • Have you successfully ran tests with your changes locally?

New Model: BAAI/bge-m3

MIT-Licensed:

Model Name Dimension Sequence Length Introduction
BAAI/bge-m3 1024 8192 multilingual; unified fine-tuning (dense, sparse, and colbert) from bge-m3-unsupervised

🔗 Colab Notebook: Open In Colab

Code Snippet for Canonical Values:

docs = ["hello world", "flag embedding"]
embeddings = list(embedding_model.embed(docs))
embeddings = np.stack(embeddings, axis=0)

canonical = np.round(embeddings[0, :5], 4)
print(f"Canonical vector values: {canonical}")

Output:

Canonical vector values: [-0.0404  0.037  -0.029   0.0161 -0.0357]

Added these values in: tests/test_text_onnx_embeddings.py

@coderabbitai
Copy link

coderabbitai bot commented Feb 5, 2026

📝 Walkthrough

Walkthrough

This pull request adds support for the BAAI/bge-m3 multilingual text embedding model to the FastEmbed library. A new model entry is registered in the supported_onnx_models list with metadata including embedding dimension (1024), model size (2.27 GB), license (MIT), and file paths for the ONNX model and associated resources. A corresponding canonical embedding vector is added to the test suite for verification purposes.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding support for the BAAI/BGE-M3 model with corresponding test coverage, which aligns perfectly with the changeset.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, providing model details, testing information, canonical values, and code snippets used to compute those values.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@lucifertrj
Copy link
Author

lucifertrj commented Feb 5, 2026

@joein Supported Models docs page needs to be updated

Screenshot 2026-02-05 at 21 05 41

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant