Wondering what’s next for npm?Check out our public roadmap! »

    load-balance-lines

    1.0.5 • Public • Published

    load-balance-lines

    Parallelize newline-delimited data processing by load balancing lines between multiple processes

    htop

    Summary

    Install

    # Make the executable accessible within your project npm scripts as load-balance-lines 
    # or, out of npm scripts, as ./node_modules/.bin/load-balance-lines 
    npm i load-balance-lines
    # or globally 
    npm i -g load-balance-lines

    Basic use

    Take a huge pile of data with atomic data elements separated by newline breaks, typically NDJSON.

    # Make sure your executable is... executable 
    chmod +x /path/to/my/executable
    # and let's go! 
    cat data.ndjson | load-balance-lines /path/to/my/executable some args

    or without the cat command, using <

    load-balance-lines /path/to/my/executable some args for the executable < data.ndjson

    Simple demo

    see test

    Real case demo

    For the needs of wikidata-rank, we need to parse a full dump of Wikidata

    • get the latest dump (currently 31G gzipped)
    wget -c https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.json.gz
    • Use nice to use the maximum amount of CPU possible while letting the priority to other processes
    • Use pigz to decompress it using threads (drop-in replacement to the single threaded gzip)
    nice pigz -d < latest-all.json.gz | nice load-balance-lines /path/to/wikidata-rank/scripts/calculate_base_scores

    Options

    Number of processes

    By default, there will be as many processes as CPU cores, but it can be modified by setting an environment variable

    export LBL_PROCESSES=4 ; cat data.ndjson | load-balance-lines ./my/script

    Verbose

    By default, the load balancer is silent to let stdout free for sub-processes outputs, but you can get some basic informations by setting LBL_VERBOSE

    export LBL_VERBOSE=true ; cat data.ndjson | load-balance-lines ./my/script

    Install

    npm i load-balance-lines

    DownloadsWeekly Downloads

    1

    Version

    1.0.5

    License

    MIT

    Unpacked Size

    6.73 kB

    Total Files

    6

    Last publish

    Collaborators

    • avatar