Skip to content

Models overview#

We provide several pretrained models:

  1. hu_core_news_lg is a CNN-based large model which achieves a good balance between accuracy and processing speed. This default model provides tokenization, sentence splitting, part-of-speech tagging (UD labels w/ detailed morphosyntactic features), lemmatization, dependency parsing and named entity recognition and ships with pretrained word vectors.
  2. hu_core_news_trf is built on huBERT and provides the same functionality as the large model except the word vectors. It comes with much higher accuracy in the price of increased computational resource usage. We suggest using it with GPU support.
  3. hu_core_news_md greatly improves on hu_core_news_lg's throughput by loosing some accuracy. This model could be a good choice when processing speed is crucial.
  4. hu_core_news_trf_xl is an experimental model built on XLM-RoBERTa-large. It provides the same functionality as the hu_core_news_trf model, however it comes with slightly higher accuracy in the price of significantly increased computational resource usage. We suggest using it with GPU support.

HuSpaCy's model versions follows spaCy's versioning scheme.

A demo of the models is available at Hugging Face Spaces.

To read more about the model's architecture we suggest reading the relevant sections from spaCy's documentation.

Comparison#

Models md lg trf trf_xl
Embeddings 100d floret 300d floret transformer:
huBERT
transformer:
XLM-RoBERTa-large
Target hardware CPU CPU GPU GPU
Accuracy
Resource usage

Performance comparison#

Models md lg trf trf_xl
Latest version 3.7.0 3.7.0 3.7.0 3.5.2
Token F1 99.89 99.89 99.89 99.89
Sentence F1 97.22 98.21 99.00 99.33
PoS Accuracy 96.89 96.70 98.20 98.05
Morph. Accuracy 94.51 93.63 96.58 96.59
Lemma Accuracy 97.45 97.60 98.61 98.95
LAS 73.69 76.92 85.68 86.87
UAS 80.90 83.32 90.01 91.12
NER F1 84.05 86.54 91.74 91.55
Throughput (token/sec) 2400 (CPU) 848 (CPU) 2822 (GPU) 2318 (GPU)
Size 127 MB 401 MB 1.27 GB 5.55 GB
Memory usage 2.4 GB 3.3 GB 4.8 GB 18 GB

Last update: January 3, 2024