Video Details

A Brief History of Information Retrieval

Keyword Density – The percent of times the word is used out of all the words in the document.

TF-IDF – Whether the terms being used are more frequent in the corpus as a whole.

Co-Occurrence – Using terms or phrases that are often associated with the keyword.

Topic Models – More advanced, scalable and nuanced system. For example, if a search includes the words gravity, space, planet, you could be looking for information about astronomy or Star Trek. However, if the search also includes William Shatner, it is more likely the search is for Star Trek.

The Correlation Process

It’s easy to do control tests with the keyword and title tag on one page and not the other, but very difficult to do enough to reach any statistical significance. Instead, use this correlation process:

  • In LDA, the correlation with keywords in the title is very low. However, when you look at search results, they almost all have the keyword in the title tag. The difference is measuring what a keyword is in the search results versus measuring what it is correlated with making it appear higher in the search results.

  • To compare two search engines, it can be interesting to look at raw prominence. This isn’t as interesting when trying to figure out what will rank you well.

  • Raw prominence – what features appear more or less often on top results.

  • Correlation of features with higher rankings – what things (keywords, links, domain metrics) predict that a page will rank higher than another.

  • Links get you further than on page factors.

  • Write good content that makes it clear what your topic is.

The LDA Tool

  • This tool will give a score of how topically relevant the content entered is to the word entered.

  • How to apply the LDA score – when you are looking at how to rank better, if you have a low LDA score try to increase it. However, this evidence is anecdotal at the moment.

  • There can be high levels of fluctuation with the LDA score, dependent on the sample.