28 packages found
Description
日本語で書かれた技術書のコーパス
Keywords
Publisher
Description
A wrapper for CETEMPúblico, an European Portuguese corpus of news extracts from the newspaper Público, with 180 million words tagged automatically using PALAVRAS.
Keywords
Publisher
Description
Corpus representaion stored in JSON and wrapped into Corpus CRUD API
Keywords
Publisher
Description
Corpus CRUD API wrapper
Keywords
Publisher
Description
translate languages using a statistical model
Keywords
Publisher
Description
A JavaScript (Node.js) library that converts a tagged (monolinear) text to DLx JSON format
Keywords
Publisher
Description
Corpus CRUD API wrapper
Keywords
Publisher
Description
A Node.js library for concordancing a corpus formatted according to the Data Format for Digital Linguistis (DaFoDiL)
Keywords
Publisher
Description
Text corpora from Project Gutenburg used by NLTK.
Keywords
Publisher
Description
State of the Union addresses by U.S. Presidents as a UMD bundle.
Keywords
- stdlib
- datasets
- dataset
- data
- speeches
- politics
- usa
- us
- president
- sotu
- state of the union
- addresses
- text
- corpus
- View more
Publisher
Description
Spam Assassin public mail corpus as a UMD bundle.
Keywords
- stdlib
- datasets
- dataset
- data
- spam
- spam assassin
- ham
- text
- classification
- classifier
- corpus
- View more
Publisher
Description
The text of Moby Dick by Herman Melville as a UMD bundle.
Keywords
Publisher
Description
Text mining library
Keywords
Publisher
Description
List of ~636,000 Spanish words
Keywords
Publisher
Description
A core type to handle CoNLL-U format
Keywords
Publisher
Description
Some classes to represent elements in a text corpus.
Keywords
Publisher
Description
List of ~336,000 French words
Keywords
Publisher
Description
A dashboard to visualize a synthesis on a structured corpus, using several charts (pie, histogram, ...)
Keywords
Publisher
Description
Calculate how many documents contain a certain term, within a list (`Array`) of text documents.
Keywords
Publisher
Description
A CJK text tokenizer