Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions src/SimMetrics.Net/Extensions.cs
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,9 @@ public static List<string> ApproximatelyEquals(this List<string> list, string wo
var num = l.ApproximatelyEquals(word, simMetricType);
var thr = 1 - num;
if (thr <= threshold)
{
newList.Add(l);
}
}
return newList;
}
Expand Down
2 changes: 1 addition & 1 deletion src/SimMetrics.Net/Metric/ChapmanLengthDeviation.cs
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ public override double GetUnnormalisedSimilarity(string firstWord, string second
return GetSimilarity(firstWord, secondWord);
}

public override string LongDescriptionString => "Implements the Chapman Length Deviation algorithm whereby the length deviation of the word strings is used to determine if the strings are similar in size - This apporach is not intended to be used single handedly but rather alongside other approaches";
public override string LongDescriptionString => "Implements the Chapman Length Deviation algorithm whereby the length deviation of the word strings is used to determine if the strings are similar in size - This approach is not intended to be used single handedly but rather alongside other approaches";

public override string ShortDescriptionString => "ChapmanLengthDeviation";
}
Expand Down
10 changes: 7 additions & 3 deletions src/SimMetrics.Net/Metric/ChapmanMeanLength.cs
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,11 @@

namespace SimMetrics.Net.Metric
{
/// <summary>
/// This method only the lengths of the two words, not at the actual characters.
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar issue: The sentence is missing a verb. Should be "This method looks at only the lengths of the two words" or "This method uses only the lengths of the two words".

Suggested change
/// This method only the lengths of the two words, not at the actual characters.
/// This method uses only the lengths of the two words, not the actual characters.

Copilot uses AI. Check for mistakes.
/// It uses some cutoff(ChapmanMeanLengthMaxString) and a polynomial scaling(1 - num2^4) to produce a score.
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space after opening parenthesis in both "cutoff(ChapmanMeanLengthMaxString)" and "scaling(1 - num2^4)". Should be "cutoff (ChapmanMeanLengthMaxString)" and "scaling (1 - num2^4)".

Suggested change
/// It uses some cutoff(ChapmanMeanLengthMaxString) and a polynomial scaling(1 - num2^4) to produce a score.
/// It uses some cutoff (ChapmanMeanLengthMaxString) and a polynomial scaling (1 - num2^4) to produce a score.

Copilot uses AI. Check for mistakes.
/// That means it's really a length-based heuristic similarity, not Chapman Mean Length.
/// </summary>
public sealed class ChapmanMeanLength : AbstractStringMetric
{
private const int ChapmanMeanLengthMaxString = 500;
Expand Down Expand Up @@ -39,9 +44,8 @@ public override double GetUnnormalisedSimilarity(string firstWord, string second
return GetSimilarity(firstWord, secondWord);
}

public override string LongDescriptionString => "Implements the Chapman Mean Length algorithm provides a similarity measure between two strings from size of the mean length of the vectors - this approach is suppossed to be used to determine which metrics may be best to apply rather than giveing a valid response itself";
public override string LongDescriptionString => "Implements the Chapman Mean Length algorithm provides a similarity measure between two strings from size of the mean length of the vectors - this approach is suppossed to be used to determine which metrics may be best to apply rather than giving a valid response itself";

public override string ShortDescriptionString => "ChapmanMeanLength";
}
}

}
76 changes: 76 additions & 0 deletions src/SimMetrics.Net/Metric/ChapmanMeanLengthTrue.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
using System;
using SimMetrics.Net.API;

namespace SimMetrics.Net.Metric;

/// <summary>
/// Correct Chapman Mean Length implementation.
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation says "Correct Chapman Mean Length implementation" but doesn't explain what makes this "correct" compared to the existing ChapmanMeanLength class, or what the algorithm actually does. Consider adding more context about the Chapman Mean Length algorithm and how this differs from the existing implementation.

Suggested change
/// Correct Chapman Mean Length implementation.
/// Implements the Chapman Mean Length similarity algorithm, which measures the similarity between two strings
/// based on the length of their longest common subsequence (LCS), normalized by the mean length of the input strings.
/// This implementation is considered "correct" as it strictly follows the original Chapman Mean Length definition,
/// addressing inaccuracies or deviations present in the existing <see cref="ChapmanMeanLength"/> class.
/// The similarity score is calculated as 2 * LCS(firstWord, secondWord) / (|firstWord| + |secondWord|).
/// For more details, see: Chapman, S. (1995). "String similarity metrics for information retrieval".

Copilot uses AI. Check for mistakes.
/// </summary>
public sealed class ChapmanMeanLengthTrue : AbstractStringMetric
{
private const double DefaultMismatchScore = 0.0;
private const double DefaultPerfectScore = 1.0;

public override double GetSimilarity(string firstWord, string secondWord)
{
if (string.IsNullOrEmpty(firstWord) || string.IsNullOrEmpty(secondWord))
{
return DefaultMismatchScore;
}
Comment on lines +16 to +19
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition string.IsNullOrEmpty(firstWord) || string.IsNullOrEmpty(secondWord) returns 0.0 even when both strings are empty. Two empty strings should arguably return 1.0 (perfect match) since they are identical. Consider checking if (firstWord == null || secondWord == null) return DefaultMismatchScore; followed by a separate check for the case when both strings are empty: if (firstWord.Length == 0 && secondWord.Length == 0) return DefaultPerfectScore;

Suggested change
if (string.IsNullOrEmpty(firstWord) || string.IsNullOrEmpty(secondWord))
{
return DefaultMismatchScore;
}
if (firstWord == null || secondWord == null)
{
return DefaultMismatchScore;
}
if (firstWord.Length == 0 && secondWord.Length == 0)
{
return DefaultPerfectScore;
}
if (firstWord.Length == 0 || secondWord.Length == 0)
{
return DefaultMismatchScore;
}

Copilot uses AI. Check for mistakes.

// Compute LCS length
var lcs = LongestCommonSubsequence(firstWord, secondWord);

// Chapman Mean Length formula
var score = 2.0 * lcs / (firstWord.Length + secondWord.Length);

return score switch
{
< DefaultMismatchScore => DefaultMismatchScore,
> DefaultPerfectScore => DefaultPerfectScore,
_ => score
};
}

public override string GetSimilarityExplained(string firstWord, string secondWord)
{
throw new NotImplementedException();
}

public override double GetSimilarityTimingEstimated(string firstWord, string secondWord)
{
return 0.0;
Comment on lines +40 to +42
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GetSimilarityTimingEstimated method returns a constant 0.0, but this metric has O(m*n) time complexity due to the LCS algorithm. Consider implementing a proper timing estimate like firstWord.Length * secondWord.Length * estimatedTimingConstant (see Levenstein implementation for reference).

Copilot uses AI. Check for mistakes.
}

public override double GetUnnormalisedSimilarity(string firstWord, string secondWord)
{
return GetSimilarity(firstWord, secondWord);
}

public override string LongDescriptionString => "A true implementation of the Chapman Mean Length algorithm";

public override string ShortDescriptionString => nameof(ChapmanMeanLengthTrue);

private static int LongestCommonSubsequence(string s1, string s2)
{
int m = s1.Length, n = s2.Length;
int[,] dp = new int[m + 1, n + 1];

for (var i = 0; i < m; i++)
{
for (var j = 0; j < n; j++)
{
if (s1[i] == s2[j])
{
dp[i + 1, j + 1] = dp[i, j] + 1;
}
else
{
dp[i + 1, j + 1] = Math.Max(dp[i, j + 1], dp[i + 1, j]);
}
Comment on lines +63 to +70
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both branches of this 'if' statement write to the same variable - consider using '?' to express intent better.

Suggested change
if (s1[i] == s2[j])
{
dp[i + 1, j + 1] = dp[i, j] + 1;
}
else
{
dp[i + 1, j + 1] = Math.Max(dp[i, j + 1], dp[i + 1, j]);
}
dp[i + 1, j + 1] = (s1[i] == s2[j])
? dp[i, j] + 1
: Math.Max(dp[i, j + 1], dp[i + 1, j]);

Copilot uses AI. Check for mistakes.
}
}

return dp[m, n];
Comment on lines +4 to +74
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file uses file-scoped namespace declaration (with semicolon), while all other files in the Metric directory use block-scoped namespace declarations (with braces). Consider using block-scoped namespaces for consistency with the existing codebase.

Suggested change
namespace SimMetrics.Net.Metric;
/// <summary>
/// Correct Chapman Mean Length implementation.
/// </summary>
public sealed class ChapmanMeanLengthTrue : AbstractStringMetric
{
private const double DefaultMismatchScore = 0.0;
private const double DefaultPerfectScore = 1.0;
public override double GetSimilarity(string firstWord, string secondWord)
{
if (string.IsNullOrEmpty(firstWord) || string.IsNullOrEmpty(secondWord))
{
return DefaultMismatchScore;
}
// Compute LCS length
var lcs = LongestCommonSubsequence(firstWord, secondWord);
// Chapman Mean Length formula
var score = 2.0 * lcs / (firstWord.Length + secondWord.Length);
return score switch
{
< DefaultMismatchScore => DefaultMismatchScore,
> DefaultPerfectScore => DefaultPerfectScore,
_ => score
};
}
public override string GetSimilarityExplained(string firstWord, string secondWord)
{
throw new NotImplementedException();
}
public override double GetSimilarityTimingEstimated(string firstWord, string secondWord)
{
return 0.0;
}
public override double GetUnnormalisedSimilarity(string firstWord, string secondWord)
{
return GetSimilarity(firstWord, secondWord);
}
public override string LongDescriptionString => "A true implementation of the Chapman Mean Length algorithm";
public override string ShortDescriptionString => nameof(ChapmanMeanLengthTrue);
private static int LongestCommonSubsequence(string s1, string s2)
{
int m = s1.Length, n = s2.Length;
int[,] dp = new int[m + 1, n + 1];
for (var i = 0; i < m; i++)
{
for (var j = 0; j < n; j++)
{
if (s1[i] == s2[j])
{
dp[i + 1, j + 1] = dp[i, j] + 1;
}
else
{
dp[i + 1, j + 1] = Math.Max(dp[i, j + 1], dp[i + 1, j]);
}
}
}
return dp[m, n];
namespace SimMetrics.Net.Metric
{
/// <summary>
/// Correct Chapman Mean Length implementation.
/// </summary>
public sealed class ChapmanMeanLengthTrue : AbstractStringMetric
{
private const double DefaultMismatchScore = 0.0;
private const double DefaultPerfectScore = 1.0;
public override double GetSimilarity(string firstWord, string secondWord)
{
if (string.IsNullOrEmpty(firstWord) || string.IsNullOrEmpty(secondWord))
{
return DefaultMismatchScore;
}
// Compute LCS length
var lcs = LongestCommonSubsequence(firstWord, secondWord);
// Chapman Mean Length formula
var score = 2.0 * lcs / (firstWord.Length + secondWord.Length);
return score switch
{
< DefaultMismatchScore => DefaultMismatchScore,
> DefaultPerfectScore => DefaultPerfectScore,
_ => score
};
}
public override string GetSimilarityExplained(string firstWord, string secondWord)
{
throw new NotImplementedException();
}
public override double GetSimilarityTimingEstimated(string firstWord, string secondWord)
{
return 0.0;
}
public override double GetUnnormalisedSimilarity(string firstWord, string secondWord)
{
return GetSimilarity(firstWord, secondWord);
}
public override string LongDescriptionString => "A true implementation of the Chapman Mean Length algorithm";
public override string ShortDescriptionString => nameof(ChapmanMeanLengthTrue);
private static int LongestCommonSubsequence(string s1, string s2)
{
int m = s1.Length, n = s2.Length;
int[,] dp = new int[m + 1, n + 1];
for (var i = 0; i < m; i++)
{
for (var j = 0; j < n; j++)
{
if (s1[i] == s2[j])
{
dp[i + 1, j + 1] = dp[i, j] + 1;
}
else
{
dp[i + 1, j + 1] = Math.Max(dp[i, j + 1], dp[i + 1, j]);
}
}
}
return dp[m, n];
}

Copilot uses AI. Check for mistakes.
Comment on lines +57 to +74
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The LCS implementation uses O(m*n) space complexity. For large strings, consider optimizing to use O(min(m, n)) space by only keeping the previous and current rows of the DP table, since you only need the final result and not the actual LCS itself.

Suggested change
int[,] dp = new int[m + 1, n + 1];
for (var i = 0; i < m; i++)
{
for (var j = 0; j < n; j++)
{
if (s1[i] == s2[j])
{
dp[i + 1, j + 1] = dp[i, j] + 1;
}
else
{
dp[i + 1, j + 1] = Math.Max(dp[i, j + 1], dp[i + 1, j]);
}
}
}
return dp[m, n];
// Ensure s2 is the shorter string to minimize space
if (n > m)
{
// Swap to always use less space
var temp = s1; s1 = s2; s2 = temp;
m = s1.Length; n = s2.Length;
}
int[] prev = new int[n + 1];
int[] curr = new int[n + 1];
for (int i = 0; i < m; i++)
{
for (int j = 0; j < n; j++)
{
if (s1[i] == s2[j])
{
curr[j + 1] = prev[j] + 1;
}
else
{
curr[j + 1] = Math.Max(prev[j + 1], curr[j]);
}
}
// Swap rows for next iteration
var tempArr = prev; prev = curr; curr = tempArr;
}
return prev[n];

Copilot uses AI. Check for mistakes.
}
}
1 change: 1 addition & 0 deletions src/SimMetrics.Net/SimMetrics.Net.csproj
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
<VersionPrefix>1.0.5.0</VersionPrefix>
<Authors>Hamed Fathi;Stef Heyenrath</Authors>
<TargetFrameworks>net20;net35;net40;net45;netstandard1.0;netstandard2.0</TargetFrameworks>
<LangVersion>12</LangVersion>
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting LangVersion to 12 (C# 12) may cause compatibility issues with the older target frameworks (net20, net35, net40). C# 12 features may not be fully supported by these legacy frameworks. Consider using a lower LangVersion that's compatible across all target frameworks, or only use language features that are supported by the oldest target framework.

Suggested change
<LangVersion>12</LangVersion>
<LangVersion>2</LangVersion>

Copilot uses AI. Check for mistakes.
<AssemblyName>SimMetrics.Net</AssemblyName>
<PackageId>SimMetrics.Net</PackageId>
<PackageTags>algorithms;artifical;intelligence</PackageTags>
Expand Down
19 changes: 9 additions & 10 deletions tests/SimMetrics.Net.Tests/AssertUtil.cs
Original file line number Diff line number Diff line change
@@ -1,17 +1,16 @@
using Xunit;

namespace SimMetrics.Net.Tests
namespace SimMetrics.Net.Tests;

internal static class AssertUtil
{
internal static class AssertUtil
public static void Equal<T>(T expected, T actual)
{
public static void Equal<T>(T expected, T actual)
{
Assert.Equal(expected, actual);
}
Assert.Equal(expected, actual);
}

public static void Equal<T>(T expected, T actual, string message)
{
Assert.True(expected.Equals(actual), message);
}
public static void Equal<T>(T expected, T actual, string message)
{
Assert.True(expected.Equals(actual), message);
}
}
4 changes: 0 additions & 4 deletions tests/SimMetrics.Net.Tests/SimMetrics.Net.Tests.csproj
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,6 @@
<PropertyGroup>
<Authors>Stef Heyenrath</Authors>
<TargetFramework>net8.0</TargetFramework>
<AssemblyName>SimMetrics.Net.Tests</AssemblyName>
<PackageId>SimMetrics.Net.Tests</PackageId>
<GenerateRuntimeConfigurationFiles>true</GenerateRuntimeConfigurationFiles>
<DebugType>full</DebugType>
</PropertyGroup>
Expand All @@ -14,8 +12,6 @@
</ItemGroup>

<ItemGroup>
<!--<Reference Include="System" />-->
<!--<Reference Include="Microsoft.CSharp" />-->
<PackageReference Include="Microsoft.NET.Test.Sdk" Version="17.14.1" />
<PackageReference Include="NFluent" Version="3.1.0" />
<PackageReference Include="xunit" Version="2.9.3" />
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
using SimMetrics.Net.Metric;
using Xunit;

namespace SimMetrics.Net.Tests.SimilarityClasses.LengthBased;
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test file uses file-scoped namespace declaration (with semicolon), while all other test files in the SimilarityClasses directory use block-scoped namespace declarations (with braces). Consider using block-scoped namespaces for consistency with the existing codebase.

Copilot uses AI. Check for mistakes.

public sealed class ChapmanMeanLengthTrueTests
{
private readonly ChapmanMeanLengthTrue _sut = new();

[Theory]
[InlineData("Davdi", 0.800000)]
[InlineData("david", 0.800000)]
[InlineData("David", 1.000000)]
[InlineData("Maday", 0.400000)]
[InlineData("Daves", 0.600000)]
[InlineData("divaD", 0.200000)]
[InlineData("Dave", 0.666667)]
[InlineData("Dovid", 0.800000)]
[InlineData("Dadiv", 0.600000)]
[InlineData("Da.v.id", 0.833333)]
[InlineData("Dav id", 0.909091)]
[InlineData("12345", 0.000000)]
[InlineData("Divad", 0.600000)]
[InlineData("D-avid", 0.909091)]
[InlineData("xxxxx", 0.000000)]
Comment on lines +10 to +25
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding test cases for edge conditions like null strings, empty strings, and identical strings to ensure the metric handles these cases correctly (the implementation handles null/empty by returning 0.0).

Copilot uses AI. Check for mistakes.
public void GetSimilarity(string test, double expected)
{
var result = _sut.GetSimilarity("David", test);

Assert.Equal(expected, result, 5);
}
}