Neutrinos Peludos Magnéticos

    @tuofeng/node-simhash

    0.1.0 • Public • Published

    node-simhash

    A simple command line tool for comparing text files using the simhash algorithm and contrasting with the jaccard index.

    Build Status

    References

    Near duplicate detection (moz.com)

    Installation

    If you have just clone this like then run the following

    npm install
    npm link
    

    Or if you would like to install globally

    npm install https://github.com/sjhorn/node-simhash -g
    

    Command line tool usage

    Using node

    simhash file1.txt file2.txt
    
    simhash https://file.com/page1.html https://file.com/page2.html
    
    

    Using lib

    var simhash = require('node-simhash');
     
    simhash.compare(string1, string2);
     

    Methods

    .summary(file1, file2)

    Compare two text strings using both simhash and jaccard index and print a summary

    .compare(file1, file2)

    Compare two text strings using both simhash and jaccard index

    .hammingWeight(number)

    Count the binary ones in a number.

    .shingles(string, words_per_single=2)

    Convert string to set of shingles using the default of 2 words per shingle and tokenize using the natural libraries default tokenizer.

    .jaccardIndex(string1, string2)

    Compare two strings by tokeniseing and then compare the intersection of shingles to the union of shingles.

    .createBinaryString(number)

    Print a 32-bit number as a binary string of 32 characters

    .shingleHashList(set)

    Convert a set of shingles to a set of crc-32 hashes.

    Install

    npm i @tuofeng/node-simhash

    DownloadsWeekly Downloads

    4

    Version

    0.1.0

    License

    none

    Unpacked Size

    21.2 kB

    Total Files

    12

    Last publish

    Collaborators

    • avatar