Recently, a post on the Moz blog seemed to ignite a particularly intriguing debate that centered around Google’s famed list of the 200+ factors that they use to rank results. Within the post, the author posited that Google has never relied on keyword density as a ranking factor. While this ignited a fiery debate within the comments section, it also ushers in an important conversation that search marketers should keep in mind–one that touches on the merits of looking at correlation vs causation, and one that looks at the complexities of language as a looming variable in the world of search.
To answer the initial question: No, it is very unlikely that Google uses keyword density as a ranking factor. However, to say that keywords in content won’t influence your position in search is naive, at best. Descriptive keywords not only dictate the way in which bots and search engines process and index your site, but also the way in which the public at large talks about your product or service, playing a major role in search. However, the early days of search still seem to guide the strategies and tactics; exact-match keywords strategically dot a page, rampantly reinforcing the keywords for which you are attempting to rank.
Yet Google’s come a long way; from the very public introduction of the Hummingbird algorithm, to the publicly announced, but less discussed addition of Ray Kurzweil to the Google Search team, and further explorations into AI, Google is becoming more fluid, adaptive, and intrinsically intelligent with how it understands and interacts with language. Today, we wanted to take a look at three complex ways in which Google processes queries and indexes information. Term frequency, semantic distance, the evolution of Google’s understanding of pronouns, synonyms and natural variants, and co-citation and co-occurrence, all govern how Google understands language on the Web.
While many may think that this is simply another word for keyword density, Google has made numerous references over the years to term frequency and inverse document frequency in applications for patents, as well as other documents. Term frequency and inverse document frequency focus less on keywords and how often they appear on a page, and more on the proportion of keywords to other lexical items within a document.
Expertly covered by Cyrus Sheperd on this Moz blog, TF-IDF is a ratio that helps Google compare the importance of particular keywords based on how often they appear in contrast to other documents on the page, as well as the greater corpus of documents as a whole. Supported by Hummingbird, this allows Google to have a more complex understanding of the way in which natural language can support overarching topics from a top level. Using language in a way that’s natural, and in a way that resonates within your niche or industry may be a better use of your time than trying to ensure your document includes your keywords a set number of times!
This goes without saying, but using synonyms and natural occurring variants of your target keyword help Google to identify a natural match for the searcher. In the previously referenced Moz blog, they use the example of “dog photos.” There’s a good chance that if someone is referring to dog photos, that other words on the page might exist, including “pictures of dogs”, “dog pictures”, “puppy pics” or “canine shots”. By ensuring that synonyms of your target keyword regularly appear, Google and other search engines are able to affirm the page’s intent and align it with that of the searcher by finding words with similar meanings that could potentially answer a user’s query.
Over 70% of searches rely on synonyms. According to Shepard, “To solve this problem, search engines possess vast corpuses of synonyms and close variants for billions of phrases, which allows them to match content to queries even when searchers use different words than your text.” Again, this is more incentive for marketers and webmasters alike to create copy that departs from a minimum requirement for keyword density, and instead rewards natural language that allows users to refer to their target keyword and other potential variations.
Related to the idea of synonyms and variants are the idea of co-citation and co-occurrence. First of all, Bill Slawski, of SEO by the Sea, has stated that co-citation and co-occurrence are part and parcel of the Hummingbird algorithm, which uses co-citation to identify words that may be synonyms. The search engines rely on corpus of linguistic rules and may even replace a query for a synonym where co-citation and co-occurrence have determined a better match or a heightened probability of a better search result.
This also helps determine and parse out different search queries for words that may have multiple meanings; in the example above, “dog picture” is a very different search than “dog motion picture”. However, in a more extreme scenario, a “plant” could refer to a tree, a shrub, or a factory, while a “bank” may refer to an institution that lends money, an index of thoughts or memories, or the land that dots either side of a river. A “trunk” may refer to an article of furniture, a part of a tree, a car, or an elephant. Contextual clues within the content help parse out the inferred meaning of the content on-site and ensure that Google serves a page that’s relevant to the searcher.
However, this is also playing a significant role in off-site optimization as well. While keyword-rich anchor text is still valuable, it is noticeably declining in importance due to concerns about spam. In a different piece, Rand Fishkin noted that queries for “cell phone ratings” regularly returned results on the first page that didn’t even contain the word “ratings” within the title, and instead used “reviews” or “reports”. This is a highly competitive query, yet Google used co-occurrence from both on-site and off-site content to determine that these sites are more relevant than those that contain the keyword.
One benefit of looking at co-occurrence from the search engines’ point of view, is that it is extremely hard to manipulate. This relies on a heavily updated corpus featuring an amalgamation of sources that are talking about the keyword in such a way to support the surrounding co-occurring words or phrases. It is an incredible testament to the algorithm’s ability to understand and naturally parse out how language intrinsically sounds. While Latent Semantic Indexing has been around long before Google or search engines, co-occurrence is a part of the algorithm that works much in the same way, identifying relationships between phrases and lexical items to extract and assign meaning.
The growing ability to detect and extract meaning to seemingly unrelated pieces of text illustrates Google’s growing ability to use artificial intelligence to understand language. From leaning on a user’s personal historical searches to understand pronouns, like a recent Google patent demonstrates, Google continues to lean on the information available to make the search process a more conversational and intuitive one.
Similarly, in appointing Ray Kurzweil and their acquisition of DeepMind, Google continues to leverage some of the sharpest minds in artificial intelligence to truly understand and engage with a user’s language.
Language is an incredibly dynamic and fundamental component of society, and Google and other search engines continue to expand their indices to ensure that they provide the best experience possible. As a result, marketers need to forget about manipulating Google’s search results, and instead engage with their community in their own voice. Worry less about keyword density, and instead look at how to present something in a way that is engaging and natural. Relevant, unique, and natural content both on-site and within the online community will help influence your position as an influencer and industry-leader.