Skip to content

[RFC]: Hamming distance between two strings #836

Open
@Planeshifter

Description

@Planeshifter

Description

This RFC proposes adding a function to calculate the Hamming distance between two strings.

The function should have the following signature (a: string, b: string): number.

The function should take two strings as arguments and return the Hamming distance between them. The Hamming distance is defined as the number of characters that have to be changed to convert one string to the other. Since it only allows substitutions, it can only be used to compare strings of the same length.

Additionally, in order to account for code points and grapheme clusters, we should add separate packages for dealing with each, as the underlying algorithms are likely to differ. We then can provide a more general API which unifies the underlying algorithms. Accordingly, we should create the following packages:

  • @stdlib/string/base/distances/hamming: compares UTF-16 code units.
  • @stdlib/string/base/distances/hamming-code-points: compares Unicode code points.
  • @stdlib/string/base/distances/hamming-grapheme-clusters: compares grapheme clusters (i.e., visual characters)

Once the above are completed, we can add

  • @stdlib/string/distances/hamming: unifies the above "base" packages and provides an option for specifying the computation "mode" (i.e., code_units, code_points, or grapheme_clusters, with grapheme_clusters being the default).

Related Issues

Related issues #151.

Questions

No.

Other

No.

Checklist

  • I have read and understood the Code of Conduct.
  • Searched for existing issues and pull requests.
  • The issue name begins with RFC:.

Metadata

Metadata

Assignees

No one assigned

    Labels

    AcceptedRFC feature request which has been accepted.FeatureIssue or pull request for adding a new feature.Good First IssueA good first issue for new contributors!JavaScriptIssue involves or relates to JavaScript.RFCRequest for comments. Feature requests and proposed changes.UtilitiesIssue or pull request concerning general utilities.difficulty: 3Likely to be challenging but manageable.priority: LowLow priority concern or feature request.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions