Skip to main content

What is text.similar() & text.common_contexts() of nltk

Let's first define our input text, I will just Copy/Paste the first paragraph of Game of Thrones Wikipedia page:

input_text = "Game of Thrones is an American fantasy drama television series \
created by David Benioff and D. B. Weiss for HBO. It is an adaptation of A Song \
of Ice and Fire, George R. R. Martin's series of fantasy novels, the first of \
which is A Game of Thrones. The show was filmed in Belfast and elsewhere in the \
United Kingdom, Canada, Croatia, Iceland, Malta, Morocco, Spain, and the \
United States.[1] The series premiered on HBO in the United States on April \
17, 2011, and concluded on May 19, 2019, with 73 episodes broadcast over \
eight seasons. Set on the fictional continents of Westeros and Essos, Game of \
Thrones has several plots and a large ensemble cast, and follows several story \
arcs. One arc is about the Iron Throne of the Seven Kingdoms, and follows a web \
of alliances and conflicts among the noble dynasties either vying to claim the \
throne or fighting for independence from it. Another focuses on the last \
descendant of the realm's deposed ruling dynasty, who has been exiled and is \
plotting a return to the throne, while another story arc follows the Night's \
Watch, a brotherhood defending the realm against the fierce peoples and \
legendary creatures of the North."

To be able to apply nltk functions we need to convert our text of type 'str' to 'nltk.text.Text'.

import nltk

text = nltk.Text( input_text.split() )

text.similar()

Distributional similarity: find other words which appear in the same contexts as the specified word; list most similar words first.

Parameters:
  • word (str) – The word used to seed the similarity search
  • num (int) – The number of words to generate (default=20)

The similar() method takes an input_word and returns other words who appear in a similar range of contexts in the text.

For example let's see what are the words used in similar context to the word 'game' in our text:


text.similar('game') #output: song web

text.common_contexts()

Find contexts where the specified words appear; list most frequent common contexts first.

Parameters:
  • word (str) – The word used to seed the similarity search
  • num (int) – The number of words to generate (default=20)
The common_contexts() method allows you to examine the contexts that are shared by two or more words. Let's see in which context the words 'game' and 'web' were used in the text:

text.common_contexts(['game', 'web']) #outputs a_of

This means that in the text we'll find 'a game of' and 'a song of'.


Comments

Popular posts from this blog

Batch and Online Learning

  It is the criterion used to classify Machine Learning systems is whether or not the system can learn incrementally from a stream of incoming data. Batch learning In batch learning , the system is incapable of learning incrementally: it must be trained using all the available data. This will generally take a lot of time and computing resources, so it is typically done offline. First the system is trained, and then it is launched into production and runs without learning anymore; it just applies what it has learned. This is called offline learning . If you want a batch learning system to know about new data (such as a new type of spam), you need to train a new version of the system from scratch on the full dataset (not just the new data, but also the old data), then stop the old system and replace it with the new one. Fortunately, the whole process of training, evaluating, and launching a Machine Learning system can be automated fairly easily (as shown in Figure 1-3 ), so even a batch