Introducing web site statistics

Published on 2001-07-13 by John Collins.

What are web site statistics

Web site stats show the performance of your web site, in terms of how much traffic (or visitors) it is attracting, and they also more detailed information about these visitors which help you to understand where your traffic is coming from. In the early days of the Internet, having a simple 'hit counter' on your site was an achievement, but these days statistics tools go way beyond simply counting the amount of page views a site has received.

How the statistics are collected

CGI (Common Gateway Interface) is a general term to describe a group of programming languages that operate on the server which hosts your site, rather than operating in the browser window of your visitors like HTML does. The most popular 'flavour' of CGI is a language called Perl, and in general most site statistics applications are written in this language.

Perl contains 'environment variables' such as the visitor's browser (Internet Explorer or Netscape Navigator), and details about the server of the visitor, which may be used to interpret the geographical location of the visitor. These environment variables are accessed, recorded and then written to a log file which can be used by an interpretation program to be analysed or may be manually analysed by the web designer to draw a picture of what is happening with the site.

What can be recorded

The following details about your site visitors may be accessed:

Some of the above require further explanation:-

Sorting your bots from your humans

This is a classic pitfall in compiling site statistics. A 'Bot' is a program employed by search engines to register your site with theirs, and is automated to repeatedly come back to your site to check that it is still there. This visit to your site will increment your traditional simple hit counter, which will give an inaccurately high number of visitors to your site. In some of my past projects, this can get as high as 40-50% of total hits, so care must be taken to separate the humans from the non-humans!

Implementing changes to your site based on its statistics

It is important to listen to your audience. Visitors to your site will make up their own minds which part of it they prefer, and this should show it your stats. If you have five sections to your site, and one of them hardly gets any hits, then there is no point in doing lots of work updating and maintaining this section, take it off! This may not always sit well with you, and it may be possible to turn such a situation around, but it is simply a case of accepting and implementing the wishes of your site's audience. As a general rule, I would not advise any rash reactions until your site has been up and running for six months, to ensure that you are not simply experiencing a 'blip' in the stats.

Site surveys

Another way to respond to your site's audience is to engage them in a survey or poll. Most will be willing to do so, provided they are short, quick and do not ask for to many personal details. If you plan to make major changes to your site and you have had a steady flow of traffic for the past year, then you do not want to jeopardise this audience by forcing changes on them that they may not like.

The key here again is to listen to your audience, and to balance that with the design and commercial needs of running the site. Generally site surveys can also be programmed using Perl, and tend to be relatively simple to implement.

Working with your log files

As the web administrator your will want to decide how you will make sense of the large amounts of data from the log files of your site. Many site statistics applications have built-in analysers designed to do the work for you, and these may even produce site reports with charts and stats to be shown to the owner of the site. However, these more advanced packages have their drawbacks. Generally they are more expensive to buy and are also capable of misinterpreting the collected data.

My personal preference is to manually analyse the data by eye, to ensure the correct information is passed onto the reports. This is not always that practical when a site may be receiving thousands of hits a week, but there are methods to work around this. You can easily modify your stats tool to e-mail your log file to you in a 'comma delimitated format', which will easily be imported into most database programs such as Microsoft Access. The actual Perl programming involved is beyond the scope of this article, but when achieved it enables you to use the powerful data analysing tools of your database to create impressive reports, with bar charts and pie charts to make the data easily understood.

Compiling accurate reports

As stated above, the only way to be 100% sure of the data passed to your reports is to go through them yourself by eye. Even then there is of course the inevitable human error. I would advise quarterly reports, and depending upon whether you are doing them for yourself or for presentation to someone else, you may decide to do more work on the presentation side of things.

For the web site designer and administrator, the report is an essential tool for finding out who to design for, e.g. what are the most common browsers, screen resolutions or favourite sections. For the site owner or marketer, the report is essential in deciding which sections are most cost effective, where the traffic is coming from and for attracting potential advertisers onto the site (they will want to see your stats!).

As you can see, this is a complicated and broad area, and should be given careful consideration for any future Internet development.

Updated 2020 : note that the above post is out-of-date, given this post was originally published in 2001, but is left here for archival purposes.