Search Engine Spying

 Search Engine Honesty ( )


Each browser sends some information to one or more servers each time you look at a web page.  To see what information your browser is sending click here:   Your Internet host name and address are the last two items shown.


If you have a dial-up account, your computer’s Internet address and host name change every time you connect.  However, if you use broadband, “always on” services or use an Internet connection at your workplace you probably have a more or less permanently assigned Internet address.  Every time you do a search, the search engine could record the keywords you used and associate them with your computer’s Internet address.  If you visit a site that is running that search engine’s targeted ads, the search engine can track your progress through that site and record how many minutes or seconds you spent on any given page.  (Targeted ads actually come from the search engine’s server.)  The more sites that run a particular engine’s ads and the more searches performed on a particular engine, the more tracking information can be accumulated.  This is one of the benefits of size.  Google has an enormous tracking advantage because such a large portion of web sites run Google ads.


Yahoo Search has an additional tracking feature.  If you do a search on Yahoo, the results page has the site titles, URLs, and descriptions in the usual manner.   However, if you click on one of the sites, you are actually taken momentarily to Yahoo’s server, which can then record that you clicked through to the site before “redirecting” your browser to the site. If you back up to the results page and click on another site, Yahoo knows that too.


Search engines can therefore accumulate very extensive information on the detailed search and browsing habits of individuals without using cookies, which can be disabled by users.  This information is critical to controlling click fraud.  The pattern generated by some robot program clicking on targeted ads or even a human generating click fraud income is presumably measurably different from that of a legitimate web user, especially if you track the users very extensively. 


Tracking information can also be used to enhance the targeting of targeted advertising.  You may have noticed that some of the search engine text ads you are seeing are targeted to terms you have previously searched for or sites you have previously viewed as opposed to the subject of the site you are currently viewing.  This is another reason that targeted advertising works regardless of the quality or subject of the site it is displayed on and another advantage of size.


Tracking is helpful for site “merit” computations.  If a viewer clicks through to a site and immediately backs up to a results page, this might indicate a low quality page.  If a site is running an engine’s targeted ads, the engine gets information on every page viewed.


Google’s free toolbar program (if set to display PageRank) provides tracking information on every page visited regardless of whether the site is running Google ads.


Any time a viewer supplies an email address, name, or street address to a web site, that site could associate that information with his Internet address and therefore with all the tracking information previously or subsequently collected by that organization or any cooperating organization.


Tracking could also obviously be used to help a search engine identify and suppress competition or for other editorial uses.


Google AdSense Advertising Network

Internet advertising has historically been rather labor intensive. Web sites, individual pages, and ad placements must be reviewed and monitored. As a result, most more lucrative web advertising has been confined to a relatively small number of larger web sites where the economies of scale made sense.  Google, partly as a result of their dominant position, has been able to develop an advertising system ("AdSense®") that replaces human management with automation to the point where cost-per-click "targeted" text ads can be successfully and economically displayed on essentially any web site, including millions of smaller sites.  No competitor has a similar capability. 

In the AdSense network, Google partners with web sites to co-produce the material seen by a viewer. The site organization designs, produces, and maintains the editorial material in the site's pages and Google designs, produces and manages their ad content. When a viewer displays a page from the partner site, the site's server sends the editorial content to the viewer's screen and Google's servers simultaneously send the Google ads.  Google's ads are targeted to the content of the page on which they appear by means of Google systems that read each partner page and perform automated keyword analysis. Advertisers pay Google only when viewers click on an ad and jump to the advertiser's web site.  Google shares the click income with its AdSense partners. 

Google's dominant position in both the search and advertising areas allow its servers to obtain a magnitude and scope of tracking information not available to competitors. This tracking data aids Google in detecting click fraud and otherwise achieving the high level of  automated management required by the AdSense system.

Government Use of Search Engine Data


The Patriot Act of 2001 Title I Section 126 (reauthorized in March 2006) assumes that the U.S. Justice Department will be conducting “data mining” activities using private company electronic databases.  Data mining is defined as the use of automated means to “find a pattern indicating terrorist or other criminal activity” in an electronic database.  


It is obvious that the data collected by major search engines is a prime candidate for data mining.  The same sort of pattern analysis of search and tracking data that is used to find spam or click fraud is directly applicable to the terrorism and “other criminal activity” problem.  “Gee, the same guy that did a search for ‘bomb making manual’ visited site ‘X’ and site ‘Y’.  Two weeks ago he also visited site ‘W’, posted two messages on ‘Q’, and used Gmail to send six emails.”  The government only has to deal with three companies in order to get 80 percent of worldwide search data and probably even more extensive tracking data. 


In an apparent trial balloon for such programs, the Justice Department in 2005 issued subpoenas directing that search engines provide search engine data, supposedly to be used to study access by children to pornography, an innocuous if unlikely application.  Presumably this task would require tracking sufficiently intrusive to distinguish children from adults.  Google is resisting the request in court.  Apparently, the other major search engines have already complied with the government’s request. 


If the trial is successful, the government can quietly issue further subpoenas for much more extensive data based on much more powerful National Security justifications.


Newsweek recently (5/22/06 issue) reported comments by an AT&T technician to the effect that the National Security Agency (NSA) has their own "secret room" at an AT&T facility for the purpose of tapping into Internet data.  Raw data collected in this manner could be correlated with terminal address data from the search engine databases in order to intercept emails and web communications from individuals identified using search engine mining.


Copyright © 2006 Azinet LLC