Detecting URLs, emails, and IP addresses, for Fin Natural language processor.
The core lexer doesn't treat URLs, emails and IP addresses any differently, so it will separate between the
www.google. This is obviously inaccurate, and will lead to many inaccuracies in the POS tagger and the dependency parser
The solution to this problem is to have an preprocessor function that takes out the URLs and a postprocessor function that puts them back after the lexer, the POS tagger and the dependency parser are done with the sentence.
And while we're at it, we can attach a detector to the prototype that gives you the URLs that have been detected.
npm i --save fin-urls
The above example will give you: