EfficientNetV2B2 still performing better than SmolVLM-1.7B. See the details:
| Model Name | Params | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|---|
| EfficientNetV2B2 | 8M | 90.51% | 92.11% | 88.18% | 89.62% |
| SmolVLM-1.7B (Zero-Shot Prompting) | 1.7B | 79.05% | 68.57% | 69.19% | 68.62% |
| SmolVLM-500M (Zero-Shot Prompting) | 500M | 62.85% | 62.25% | 54.98% | 55.32% |
| SmolVLM-1.7B (Few-Shot Prompting) | 1.7B | 54.55% | 62.68% | 50.68% | 48.70% |
| SmolVLM-500M (Few-Shot Prompting) | 500M | 53.36% | 51.34% | 50.73% | 45.13% |
| SmolVLM-256M (Zero-Shot Prompting) | 256M | 50.20% | 48.91% | 43.58% | 44.38% |
| SmolVLM-256M (Few-Shot Prompting) | 256M | 50.20% | 52.75% | 46.30% | 43.60% |
There is an attachment with the detailed results of each model.
References
- http://promptingguide.ai/
- https://keras.io/guides/transfer_learning/
- https://huggingface.co/blog/smolvlm
- https://keras.io/api/applications/efficientnet_v2/#efficientnetv2b2-function
- https://huggingface.co/HuggingFaceTB/SmolVLM-256M-Instruct
- https://huggingface.co/HuggingFaceTB/SmolVLM-500M-Instruct
- https://huggingface.co/HuggingFaceTB/SmolVLM-Instruct