Skip to content

New ranking feature based on word embeddings Word2Vec based on cosinus value #34

@martinreynaert

Description

@martinreynaert

We currently have ranking feature:
(skip[12]?0:(*vit)->cosine_rank);

This is based on the top 20 semantic nearest neighbours as returned on the basis of word2vec word embeddings and a further check on the cosine values. This works, but is too slow for production work. Perhaps the current request will warrant this earlier feature to be renamed.

I would like a new feature that for each pair of variant and particular CC retrieves the cosine value (as does ticcltool W2V-dist). Given all the values for all the CCs for a variant, the smallest value should then be ranked 'best', i.e. being assigned 1. Larger values then get assigned ranks 2, 3 ,4, etc. Possible draws get the same rank.

Many thanks!

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions