Suppose you are given a set of
n documents numbered 1 through n. Suppose there is
a list of m words numbered 1 through m that are of interest to you. Define vectors x1
through
xn as follows: the jth entry of xi is the number of times the jth word appears
in the ith document. You can think of xi as a summary of the information in the ith
document.
I selected three articles from Wikipedia and counted the number of times the words
fur
, blood, bone, feather appeared in each. Here are the results:
Article fur blood bone feather
Mammal 6 4 15 0
Reptile 1 5 0 3
Bird 0 7 5 43
Suppose you want to find the document that puts the most emphasis on word
j.
One way of doing this is to find the i that gives the largest value of
(x
i · ej)/||xi||.
Do this with the word
blood. Note that the document you obtain is not the one in
which the word appears the most number of times.
I think I know how to approach this question. But I don't know how to define ej. I thought it meant the vector for blood. But then the vector would only contain 3 entries and each document vector contains 4 entries. Could somebody please help me. Thank you very much.