Frequently asked questions#
HuSpaCy is slow, what can I do?#
Not it's not. :) You have several options to speed up your processing pipeline.
- If accuracy is not crucial use a smaller model:
md
<lg
<trf
- Utilize GPU: use the following directive before loading the model (and make sure all GPU related dependencies are installer). This simple notebook might help you get started.
- Batch processing of multiple documents are always faster. Use the
Language.pipe()
method, and increase thebatch_size
if needed. Additionally, then_process
parameter can be used to orchestrate multiprocessing when running models on CPU. - Disable components not needed. When mining documents for named entities, the default model unnecessarily computes lemmata, PoS tags and dependency trees. You can easily disable them during model loading (c.f.
spacy.load()
orhuspacy.load()
) or usingLanguage.disable_pipe()
or
Models require too much RAM, how can I reduce their memory footprint?#
HuSpaCy models use distinct language models for almost each of their components. This architecture decision enables the model achieving higher precision compromising on increased memory usage. However, if you only need certain components, others can be disabled as shown above.
The NER model usually confuses ORG and LOC entities, why is that?#
The underlying model has been trained on corpora following the "tag-for-meaning" guideline which yields context dependent labels. For example referring to "Budapest" in the context of the Hungarian government should yield the ORG
label while in other contexts it should be tagged as a LOC
.
Can I use HuSpaCy for my commercial software?#
Yes, the tool is licensed under Apache 2.0 license, while all the models are CC BY-SA 4.0.