Severin Perez


Introduction to Search Relevance Models

October 13, 2020
Search relevance is a difficult problem in information retrieval. How do you ensure that you get the best results back from searching a collection of documents? Let's explore a few basic strategies, including simple searching, term-frequency searching, and TF-IDF searching.

Reference: TF-IDF

October 11, 2020
Term frequency-inverse document frequency is a means of assigning weight to a search term when comparing individual documents within a corpus. It is an improvement on the bag-of-words model in that it considers the relative rarity of a term within a larger corpus.

Influential NLP Papers on Google Scholar

September 05, 2020
Natural language processing is a complex and evolving field. Part computer science, part linguistics, part statistics--it can be a challenge deciding where to begin. One starting place is to look at the most influential papers in academic literature--if you can master these papers, then you'll be well on the path to becoming an NLP expert.

Key Python Libraries for NLP

August 30, 2020
One of the great things about using Python for natural language processing (NLP) is the large ecosystem of tools and libraries. From tokenization, to machine learning, to data visualization--Python has something for every NLP task in your workflow. Of course, choosing the right tool isn't always so easy.

Reference: Stemming

August 23, 2020
In natural language processing, stemming is the process of reducing a word to its stem form. Typically, stemming is used as part of an NLP pipeline in order to reduce all words in a text to their stems so that they can be analyzed together.

Reference: Lemmatization

August 21, 2020
Lemmatization is the process of reducing a word to its lemma (canonical form). In natural language processing, a lemmatizer may be used to reduce all words in a given text to their lemmas, which makes comparative analysis possible based on canonical forms.

Tag: nlp (p. 1)
© Severin Perez, 2021