Text preprocessing
This module provides sentence split, tokenization, part-of-speech tagging, lemmatization and dependency parsing. RadText provides two sub-modules for text preprocessing.
preprocess:spacy
spaCy is an open-source Python library for Natural Language Processing.
Options
Option name |
Default |
Description |
|---|---|---|
–spacy-model |
|
The spaCy model |
Example Usage
$ radtext-preprocess spacy -i /path/to/input.xml -o /path/to/output.xml
import spacy
from radtext.models.preprocess_spacy import BioCSpacy
nlp = spacy.load(argv['--spacy-model'])
processor = BioCSpacy(nlp)
preprocess:stanza
Stanza is a collection of efficient tools for Natural Language Processing.
Example Usage
$ radtext-preprocess stanza -i /path/to/input.xml -o /path/to/output.xml
import stanza
from radtext.models.preprocess_stanza import BioCStanza
nlp = stanza.Pipeline('en', processors='tokenize,pos,lemma,depparse')
processor = BioCStanza(nlp)