cleanNLP - A Tidy Data Model for Natural Language Processing
Provides a set of fast tools for converting a textual corpus into a set of normalized tables. Users may make use of the 'udpipe' back end with no external dependencies, or a Python back ends with 'spaCy' <https://spacy.io>. Exposed annotation tasks include tokenization, part of speech tagging, named entity recognition, and dependency parsing.
Last updated
algorithmsspatial-analysistext-analysis
8.46 score 218 stars 267 scripts 558 downloadsgenlasso - Path Algorithm for Generalized Lasso Problems
Computes the solution path for generalized lasso problems. Important use cases are the fused lasso over an arbitrary graph, and trend fitting of any given polynomial order. Specialized implementations for the latter two subproblems are given to improve stability and speed. See Taylor Arnold and Ryan Tibshirani (2016) <doi:10.1080/10618600.2015.1008638>.
Last updated
7.76 score 35 stars 5 dependents 219 scripts 881 downloadstif - Text Interchange Format
Provides validation functions for common interchange formats for representing text data in R. Includes formats for corpus objects, document term matrices, and tokens. Other annotations can be stored by overloading the tokens structure.
Last updated
corpusnatural-language-processingterm-frequencytext-processingtokenizer
3.89 score 37 stars 21 scriptsctrialsgov - Query Data from U.S. National Library of Medicine's Clinical Trials Database
Tools to create and query database from the U.S. National Library of Medicine's Clinical Trials database <https://clinicaltrials.gov/>. Functions provide access a variety of techniques for searching the data using range queries, categorical filtering, and by searching for full-text keywords.
Last updated
3.48 score 30 scripts 230 downloadscoreNLP - Wrappers Around Stanford CoreNLP Tools
Provides a minimal interface for applying annotators from the 'Stanford CoreNLP' java library. Methods are provided for tasks such as tokenisation, part of speech tagging, lemmatisation, named entity recognition, coreference detection and sentiment analysis.
Last updated
openjdk
3.05 score 1 stars 56 scripts 270 downloads