Skip to main content

What is text.similar() & text.common_contexts() of nltk

Let's first define our input text, I will just Copy/Paste the first paragraph of Game of Thrones Wikipedia page:

input_text = "Game of Thrones is an American fantasy drama television series \
created by David Benioff and D. B. Weiss for HBO. It is an adaptation of A Song \
of Ice and Fire, George R. R. Martin's series of fantasy novels, the first of \
which is A Game of Thrones. The show was filmed in Belfast and elsewhere in the \
United Kingdom, Canada, Croatia, Iceland, Malta, Morocco, Spain, and the \
United States.[1] The series premiered on HBO in the United States on April \
17, 2011, and concluded on May 19, 2019, with 73 episodes broadcast over \
eight seasons. Set on the fictional continents of Westeros and Essos, Game of \
Thrones has several plots and a large ensemble cast, and follows several story \
arcs. One arc is about the Iron Throne of the Seven Kingdoms, and follows a web \
of alliances and conflicts among the noble dynasties either vying to claim the \
throne or fighting for independence from it. Another focuses on the last \
descendant of the realm's deposed ruling dynasty, who has been exiled and is \
plotting a return to the throne, while another story arc follows the Night's \
Watch, a brotherhood defending the realm against the fierce peoples and \
legendary creatures of the North."

To be able to apply nltk functions we need to convert our text of type 'str' to 'nltk.text.Text'.

import nltk

text = nltk.Text( input_text.split() )

text.similar()

Distributional similarity: find other words which appear in the same contexts as the specified word; list most similar words first.

Parameters:
  • word (str) – The word used to seed the similarity search
  • num (int) – The number of words to generate (default=20)

The similar() method takes an input_word and returns other words who appear in a similar range of contexts in the text.

For example let's see what are the words used in similar context to the word 'game' in our text:


text.similar('game') #output: song web

text.common_contexts()

Find contexts where the specified words appear; list most frequent common contexts first.

Parameters:
  • word (str) – The word used to seed the similarity search
  • num (int) – The number of words to generate (default=20)
The common_contexts() method allows you to examine the contexts that are shared by two or more words. Let's see in which context the words 'game' and 'web' were used in the text:

text.common_contexts(['game', 'web']) #outputs a_of

This means that in the text we'll find 'a game of' and 'a song of'.


Comments

Popular posts from this blog

Standard Deviation And Variance

   Standard Deviation :  Standard deviation is a number that describes how spread out the values are. A low standard deviation means that most of the numbers are close to the mean (average) value. A high standard deviation means that the values are spread out over a wider range. Example: This time we have registered the speed of 7 cars: speed = [ 86 , 87 , 88 , 86 , 87 , 85 , 86 ] The standard deviation is:  0.9 Meaning that most of the values are within the range of 0.9 from the mean value, which is 86.4. Let us do the same with a selection of numbers with a wider range: speed = [ 32 , 111 , 138 , 28 , 59 , 77 , 97 ] The standard deviation is:  37.85 Meaning that most of the values are within the range of 37.85 from the mean value, which is 77.4. As you can see, a higher standard deviation indicates that the values are spread out over a wider range. The NumPy module has a method to calculate the standard deviation:  import  numpy speed = [ 86 , 87 , 8...

Normalization

 What is Normalization? Normalization is a scaling technique in which values are shifted and rescaled so that they end up ranging  between 0 and 1. It is also known as Min-Max scaling. Here’s the formula for normalization: Here, Xmax and Xmin are the maximum and the minimum values of the feature respectively. When the value of X is the minimum value in the column, the numerator will be 0, and hence X’ is 0. On the other hand, when the value of X is the maximum value in the column, the numerator is equal to  the denominator and thus the value of X’ is 1 If the value of X is between the minimum and the maximum value, then the value of X’ is between 0 and  1