Skip to main content

What is text.similar() & text.common_contexts() of nltk

Let's first define our input text, I will just Copy/Paste the first paragraph of Game of Thrones Wikipedia page:

input_text = "Game of Thrones is an American fantasy drama television series \
created by David Benioff and D. B. Weiss for HBO. It is an adaptation of A Song \
of Ice and Fire, George R. R. Martin's series of fantasy novels, the first of \
which is A Game of Thrones. The show was filmed in Belfast and elsewhere in the \
United Kingdom, Canada, Croatia, Iceland, Malta, Morocco, Spain, and the \
United States.[1] The series premiered on HBO in the United States on April \
17, 2011, and concluded on May 19, 2019, with 73 episodes broadcast over \
eight seasons. Set on the fictional continents of Westeros and Essos, Game of \
Thrones has several plots and a large ensemble cast, and follows several story \
arcs. One arc is about the Iron Throne of the Seven Kingdoms, and follows a web \
of alliances and conflicts among the noble dynasties either vying to claim the \
throne or fighting for independence from it. Another focuses on the last \
descendant of the realm's deposed ruling dynasty, who has been exiled and is \
plotting a return to the throne, while another story arc follows the Night's \
Watch, a brotherhood defending the realm against the fierce peoples and \
legendary creatures of the North."

To be able to apply nltk functions we need to convert our text of type 'str' to 'nltk.text.Text'.

import nltk

text = nltk.Text( input_text.split() )

text.similar()

Distributional similarity: find other words which appear in the same contexts as the specified word; list most similar words first.

Parameters:
  • word (str) – The word used to seed the similarity search
  • num (int) – The number of words to generate (default=20)

The similar() method takes an input_word and returns other words who appear in a similar range of contexts in the text.

For example let's see what are the words used in similar context to the word 'game' in our text:


text.similar('game') #output: song web

text.common_contexts()

Find contexts where the specified words appear; list most frequent common contexts first.

Parameters:
  • word (str) – The word used to seed the similarity search
  • num (int) – The number of words to generate (default=20)
The common_contexts() method allows you to examine the contexts that are shared by two or more words. Let's see in which context the words 'game' and 'web' were used in the text:

text.common_contexts(['game', 'web']) #outputs a_of

This means that in the text we'll find 'a game of' and 'a song of'.


Comments

Popular posts from this blog

Standard Deviation And Variance

   Standard Deviation :  Standard deviation is a number that describes how spread out the values are. A low standard deviation means that most of the numbers are close to the mean (average) value. A high standard deviation means that the values are spread out over a wider range. Example: This time we have registered the speed of 7 cars: speed = [ 86 , 87 , 88 , 86 , 87 , 85 , 86 ] The standard deviation is:  0.9 Meaning that most of the values are within the range of 0.9 from the mean value, which is 86.4. Let us do the same with a selection of numbers with a wider range: speed = [ 32 , 111 , 138 , 28 , 59 , 77 , 97 ] The standard deviation is:  37.85 Meaning that most of the values are within the range of 37.85 from the mean value, which is 77.4. As you can see, a higher standard deviation indicates that the values are spread out over a wider range. The NumPy module has a method to calculate the standard deviation:  import  numpy speed = [ 86 , 87 , 8...

Normalization

 What is Normalization? Normalization is a scaling technique in which values are shifted and rescaled so that they end up ranging  between 0 and 1. It is also known as Min-Max scaling. Here’s the formula for normalization: Here, Xmax and Xmin are the maximum and the minimum values of the feature respectively. When the value of X is the minimum value in the column, the numerator will be 0, and hence X’ is 0. On the other hand, when the value of X is the maximum value in the column, the numerator is equal to  the denominator and thus the value of X’ is 1 If the value of X is between the minimum and the maximum value, then the value of X’ is between 0 and  1

Fit vs. Transform

  Fit vs. Transform in SciKit libraries for Machine Learning We have seen methods such as fit(), transform(), and fit_transform() in a lot of SciKit’s libraries. And almost all tutorials, including the ones I’ve written, only tell you to just use one of these methods. The obvious question that arises here is, what do those methods mean? What do you mean by fit something and transform something? The transform() method makes some sense, it just transforms the data, but what about fit()? In this post, we’ll try to understand the difference between the two. To better unders t and the meaning of these methods, we’ll take the Imputer class as an example, because the Imputer class has these methods. But before we get started, keep in mind that fitting something like an imputer is different from fitting a whole model. You use an Imputer to handle the missing value in dataset. Imputer gives you easy methods to replace NaNs and blanks with something like the mean of the column or even m...