Description
Description
This RFC proposes adding a function to calculate the Hamming distance between two strings.
The function should have the following signature (a: string, b: string): number
.
The function should take two strings as arguments and return the Hamming distance between them. The Hamming distance is defined as the number of characters that have to be changed to convert one string to the other. Since it only allows substitutions, it can only be used to compare strings of the same length.
Additionally, in order to account for code points and grapheme clusters, we should add separate packages for dealing with each, as the underlying algorithms are likely to differ. We then can provide a more general API which unifies the underlying algorithms. Accordingly, we should create the following packages:
-
@stdlib/string/base/distances/hamming
: compares UTF-16 code units. -
@stdlib/string/base/distances/hamming-code-points
: compares Unicode code points. -
@stdlib/string/base/distances/hamming-grapheme-clusters
: compares grapheme clusters (i.e., visual characters)
Once the above are completed, we can add
-
@stdlib/string/distances/hamming
: unifies the above "base" packages and provides an option for specifying the computation "mode" (i.e.,code_units
,code_points
, orgrapheme_clusters
, withgrapheme_clusters
being the default).
Related Issues
Related issues #151.
Questions
No.
Other
No.
Checklist
- I have read and understood the Code of Conduct.
- Searched for existing issues and pull requests.
- The issue name begins with
RFC:
.