Exports a function that looks for a subsequence of a given sequence of characters in an input string, ignoring other characters, and outputs what proportion of that sequence was found, as a number between 0 and 1.
Kind of like
/.*a.*b.*c.*/, while counting the longest a-b-c subsequence.
This works great to rank some text items from which a user wants to choose one really quickly by approximately typing it. The heuristic is simple, predictable and tolerates abbreviations and inaccurate typing.
For example, a user might type "nsfudt", and scatter-rank would find that it matches 100% with "node js foundation" (because all the letters are there in the right order), somewhat also with "apache foundation" (because "fudt" is in there) and hardly at all with "wikipedia" (because only the "d" would hit).
var rank = scatterRank("abcd"); rank("abcd") // => 1 all OK rank("abc") // => 0.75 "d" wasn't found rank("XYZ") // => 0 nothing found rank("gah") // => 0.25 only "a" found rank("_a___b_c_d_") // => 1 all OK rank("dcba") // => 0.25 only "d" found; rest were out of order rank("cab") // => 0.5 "ab" was longest matching subsequence
Alternatively, pass the string to search in as second parameter:
scatterRank("abcd", "i know my abcs") // => 0.75
fast-levenshtein is better for ranking mispellings and approximations, but too rigid for abbreviations to match well.
string_score is like a fuzzy, more complex version of this. It's better for natural language use-cases where you don't necessarily know much about the string you're looking for. Modifies the
fuzzy is more complex, can't produce a rank value, and to be honest I don't understand what it does.
fuzzy-filter can filter strings for matches and do fancy highlighting stuff, but also can't produce a similarity value.