Have ideas to improve npm?Join in the discussion! »

    words-n-numbers

    5.0.1 • Public • Published

    Words'n'numbers

    Tokenizing strings of text. Extracting arrays of words and optionally number, emojis, tags, usernames and email addresses from strings. For Node.js and the browser. When you need more than just [a-z] regular expressions. Part of document processing for search-index and nowsearch.xyz.

    Inspired by extractwords

    NPM version NPM downloads Build Status Known Vulnerabilities JavaScript Style Guide MIT License

    Initiating

    Node.js

    const wnn = require('words-n-numbers')
    // wnn available

    Browser

    <script src="wnn.js"></script>
    
    <script>
      //wnn available
    </script>

    Use

    The default regex should catch every unicode character from for every language.

    Only words

    let stringOfWords = 'A 1000000 dollars baby!'
    wnn.extract(stringOfWords)
    // returns ['A', 'dollars', 'baby']

    Only words, converted to lowercase

    let stringOfWords = 'A 1000000 dollars baby!'
    wnn.extract(stringOfWords, { toLowercase: true })
    // returns ['a', 'dollars', 'baby']

    Predefined regex for words and numbers, converted to lowercase

    let stringOfWords = 'A 1000000 dollars baby!'
    wnn.extract(stringOfWords, { regex: wnn.wordsNumbers, toLowercase: true })
    // returns ['a', '1000000', 'dollars', 'baby']

    Predefined regex for words and emoticons, converted to lowercase

    let stringOfWords = 'A ticket to 大阪 costs ¥2000 👌😄 😢'
    wnn.extract(stringOfWords, { regex: wnn.wordsEmojis, toLowercase: true })
    // returns [ 'A', 'ticket', 'to', '大阪', 'costs', '👌😄', '😢' ]

    Predefined regex for numbers and emoticons

    let stringOfWords = 'A ticket to 大阪 costs ¥2000 👌😄 😢'
    wnn.extract(stringOfWords, { regex: wnn.numbersEmojis, toLowercase: true })
    // returns [ '2000', '👌😄', '😢' ]

    Predefined regex for words, numbers and emoticons, converted to lowercase

    let stringOfWords = 'A ticket to 大阪 costs ¥2000 👌😄 😢'
    wnn.extract(stringOfWords, { regex: wnn.wordsNumbersEmojis, toLowercase: true })
    // returns [ 'a', 'ticket', 'to', '大阪', 'costs', '2000', '👌😄', '😢' ]

    Predefined regex for #tags

    let stringOfWords = 'A #49ticket to #大阪 or two#tickets costs ¥2000 👌😄😄 😢'
    wnn.extract(stringOfWords, { regex: wnn.tags, toLowercase: true })
    // returns [ '#49ticket', '#大阪' ]

    Predefined regex for @usernames

    let stringOfWords = 'A #ticket to #大阪 costs bob@bob.com, @alice and @美林 ¥2000 👌😄😄 😢'
    wnn.extract(stringOfWords, { regex: wnn.usernames, toLowercase: true })
    // returns [ '@alice123', '@美林' ]

    Predefined regex for email addresses

    let stringOfWords = 'A #ticket to #大阪 costs bob@bob.com, alice.allison@alice123.com, some-name.nameson.nameson@domain.org and @美林 ¥2000 👌😄😄 😢'
    wnn.extract(stringOfWords, { regex: wnn.email, toLowercase: true })
    // returns [ 'bob@bob.com', 'alice.allison@alice123.com', 'some-name.nameson.nameson@domain.org' ]

    Custom regex

    let stringOfWords = 'This happens at 5 o\'clock !!!'
    wnn.extract(stringOfWords, { regex: '[a-z\'0-9]+' })
    // returns ['This', 'happens', 'at', '5', 'o\'clock']

    API

    Extract function

    Returns an array of words and optionally numbers.

    wnn.extract(stringOfText, \<options-object\>)

    Options object

    {
      regex: '[custom or predefined regex]',  // defaults to wnn.words
      toLowercase: [true / false]             // defaults to false
    }

    Predefined regex'es

    wnn.words              // only words, any language <-- default
    wnn.numbers            // only numbers
    wnn.emojis             // only emojis
    wnn.wordsNumbers       // words (any language) and numbers
    wnn.wordsEmojis        // words (any language) and emojis
    wnn.numbersEmojis      // numbers and emojis
    wnn.wordsNumbersEmojis // words (any language), numbers and emojis
    wnn.tags               // #tags (any language
    wnn.usernames          // @usernames (any language)
    wnn.email              // email addresses. Most valid addresses,
                           //   but not to be used as a validator

    Languages supported

    Supports most languages supported by stopword, and others too. Some languages like Japanese and Chinese simplified needs to be tokenized. May add tokenizers at a later stage.

    PR's welcome

    PR's and issues are more than welcome =)

    Install

    npm i words-n-numbers

    DownloadsWeekly Downloads

    14

    Version

    5.0.1

    License

    MIT

    Unpacked Size

    27.6 kB

    Total Files

    12

    Last publish

    Collaborators

    • avatar