Keyphrase Extraction#

Keyphrase extraction is a well-studied problem of natural language processing, thus many ready-made solutions exist. textacy is higher-level NLP library (built on spaCy) implementing several keyword extraction methods. By using this tool, we can easily build a simple solution for this problem.

First, you need to load a HuSpaCy model, and process the text you wish to analyze:

import huspacy

nlp = huspacy.load()

doc = nlp(text)

Then, you need to decide which key term extraction method should be utilized, as textacy implements several ones. For the sake of simplicity we rely on SGRank and fine-tune it through PoS and word n-gram filters.

from textacy.extract.keyterms.sgrank import sgrank as keywords

terms: List[Tuple[str, float]] = keywords(doc, topn=10, include_pos=("NOUN", "PROPN"),  ngrams=(1, 2, 3))

This example is available on Hugging Face Spaces, while the full source code is on GitHub.

Last update: January 3, 2024