Named entity recognition
The named entity recognition (NER) module recognizes mention spans of a particular entity type (e.g., abnormal findings) from the reports. RadText provides two sub-modules for NER.
ner:regex
The rule-based method uses regular expressions that combine information from terminological resources and characteristics of the entities of interest. They are manually constructed by domain experts.
Options
Option name |
Default |
Description |
|---|---|---|
–phrase |
|
Phrase patterns |
Example Usage
$ radext-ner regex --phrase /path/to/patterns.yml -i /path/to/input.xml -o /path/to/output.xml
from pathlib import Path
from radtext.models.ner.ner_regex import NerRegExExtractor, BioCNerRegex
from radtext.cmd.ner import load_yml
patterns = load_yml(argv['--phrases'])
extractor = NerRegExExtractor(patterns)
processor = BioCNerRegex(extractor, name=Path(argv['--phrases']).stem)
Phrase patterns
The pattern file is in the yaml format. It contains a list of concepts where the key serves as the
preferred name. Each concept should contain three attributes: concept_id, include, and
exclude.
include contains the regular expressions that the concept will match.
exclude contains the regular expressions that the concept will not match, even if its substring will match the regular
expressions in the include
Using the following example, RadText will recognize “emphysema”, but reject “subcutaneous emphysema” though “emphysema” is part of “subcutaneous emphysema”.
Emphysema:
concept_id: RID4799
include:
- emphysema
exclude:
- subcutaneous emphysema
ner:spacy
SpaCy’s PhraseMatcher provides another way to efficiently match large terminology lists. RadText uses PhraseMatcher to recognize concepts in the RadLex ontology.
Options
Option name |
Default |
Description |
|---|---|---|
–radlex |
|
The RadLex ontology file |
–spacy-model |
|
The spaCy model |
Example Usage
$ radext-ner spacy --radlex /path/to/Radlex4.1.xlsx -i /path/to/input.xml -o /path/to/output.xml
import spacy
from radtext.models.ner.ner_spacy import NerSpacyExtractor, BioCNerSpacy
from radtext.models.ner.radlex import RadLex4
nlp = spacy.load(argv['--spacy-model'], exclude=['ner', 'parser', 'senter'])
radlex = RadLex4(argv['--radlex'])
matchers = radlex.get_spacy_matchers(nlp)
extractor = NerSpacyExtractor(nlp, matchers)
processor = BioCNerSpacy(extractor, 'RadLex')