Tuesday, April 17, 2012

6 Miscellaneous information on search engines (1)


6.1 Google SandBox

   At the beginning of 2004, a new and mysterious term appeared among seo specialists – Google SandBox. This is the name of a new Google spam filter that excludes new sites from search results. The work of the SandBox filter results in new sites being absent from search results for virtually any phrase. This even happens with sites that have high-quality unique content and which are promoted using legitimate techniques.

   The SandBox is currently applied only to the English segment of the Internet; sites in other languages are not yet affected by this filter. However, this filter may expand its influence. It is assumed that the aim of the SandBox filter is to exclude spam sites – indeed, no search spammer will be able to wait for months until he gets the necessary results. However, many perfectly valid new sites suffer the consequences. So far, there is no precise information as to what the SandBox filter actually is. Here are some assumptions based on practical seo experience:

   - SandBox is a filter that is applied to new sites. A new site is put in the sandbox and is kept there for some time until the search engine starts treating it as a normal site.

   - SandBox is a filter applied to new inbound links to new sites. There is a fundamental difference between this and the previous assumption: the filter is not based on the age of the site, but on the age of inbound links to the site. In other words, Google treats the site normally but it refuses to acknowledge any inbound links to it unless they have existed for several months. Since such inbound links are one of the main ranking factors, ignoring inbound links is equivalent to the site being absent from search results. It is difficult to say which of these assumptions is true, it is quite possible that they are both true.

   - The site may be kept in the sandbox from 3 months to a year or more. It has also been noticed that sites are released from the sandbox in batches. This means that the time sites are kept in the sandbox is not calculated individually for each site, but for groups of sites. All sites created within a certain time period are put into the same group and they are eventually all released at the same time. Thus, individual sites in a group can spend different times in the sandbox depending where they were in the group capture-release cycle. 


   Typical indications that your site is in the sandbox include:

   - Your site is normally indexed by Google and the search robot regularly visits it.
   - Your site has a PageRank; the search engine knows about and correctly displays inbound links to your site.
   - A search by site address (www.site.com) displays correct results, with the correct title, snippet (resource description), etc.
   - Your site is found by rare and unique word combinations present in the text of its pages.
   - Your site is not displayed in the first thousand results for any other queries, even for those for which it was initially created. Sometimes, there are exceptions and the site appears among 500-600 positions for some queries. This does not change the sandbox situation, of course.

   There no practical ways to bypass the Sandbox filter. There have been some suggestions about how it may be done, but they are no more than suggestions and are of little use to a regular webmaster. The best course of action is to continue seo work on the site content and structure and wait patiently until the sandbox is disabled after which you can expect a dramatic increase in ratings, up to 400-500 positions.

6.2 Google LocalRank

   On February 25, 2003, the Google Company patented a new algorithm for ranking pages called LocalRank. It is based on the idea that pages should be ranked not by their global link citations, but by how they are cited among pages that deal with topics related to the particular query. The LocalRank algorithm is not used in practice (at least, not in the form it is described in the patent). However, the patent contains several interesting innovations we think any seo specialist should know about. Nearly all search engines already take into account the topics to which referring pages are devoted. It seems that rather different algorithms are used for the LocalRank algorithm and studying the patent will allow us to learn general ideas about how it may be implemented.

   While reading this section, please bear in mind that it contains theoretical information rather than practical guidelines.

   The following three items comprise the main idea of the LocalRank algorithm:

   1. An algorithm is used to select a certain number of documents relevant to the search query (let it be N). These documents are initially sorted by some criteria (this may be PageRank, relevance or a group of other criteria). Let us call the numeric value of this criterion OldScore.

   2. Each of the N N selected pages goes through a new ranking procedure and it gets a new rank. Let us call it LocalScore.

   3. The OldScore and LocalScore values for each page are multiplied, to yield a new value – NewScore. The pages are finally ranked based on NewScore


No comments:

Post a Comment