|
|
parselog converts a webserver logfile into Comma Separated Value (CSV) format. The contents of the logfile can then be loaded into a number of database and spreadsheet applications for additional analysis. The information produced by parselog is best described as a basis for further research; the actual utility of the numbers depends on who is looking for what. While processing, parselog also calculates several statistics. parselog supports the common logfile format (CLF), however it prefers Extended NCSA/combined format (which includes referrer and useragent data). If user agent and referral information is not supplied, parselog treats all traffic as human. However, the produced information is of limited value, as these two fields are central to further analysis. parselog skips email headers; it can handle a whole bunch of logfiles exported from an email reader to a single file, To:, From: lines and all. parselog filters referrals based on either search engines, webmailer, or URL. That is, if a user arrived on the site from a search engine, their visit will be logged to the search engine logfile; if they clicked on a link from inside their web-based inbox, their visit will be logged to the webmail logfile; while if they followed a link from a static page, their visit will be logged to the referring URL logfile. parselog can also filter referrals from selected "local" referring URLs to a separate file. parselog filters spiders, monitors, harvesters and unrecognised User Agents to separate logfiles. All other useragents are treated as human. parselog currently supports over 200 distinct useragents, search engines and webmailers. The lists are completely user-definable (via text-based INI file), to a maximum of 200 each. parselog filters HTTP POST operations to a separate logfile. In order to make reports and statistics based on POST meaningful, parselog first filters calls to FrontPage Server Extensions from the logfile (which utilise this POST function). parselog filters 403, 404 and 405 errors to separate logfiles, while filtering all other errors to the errors logfile. parselog extrapolates bookmarks by using a predefined statistic, the marketshare for Internet Explorer. This is a fuzzy number at the best of times, however it is certain that other browsers account for some other traffic. Therefore, parselog allows a configurable extrapolation factor to multiply measured Internet Explorer 4 and 5 bookmarking activity to predict actual user bookmarking activity. The default marketshare is 70%; therefore 100 Internet Explorer favorites probably means that 130 users bookmarked the site, in total. parselog uses a similar approach to calculate the number of times the website was accessed from an inbox. It does this by counting the number of referrals from web-based inboxes, and then multiplying this by a configurable extrapolation factor, in order to predict actual email referral activity. The default factor is 50%; therefore 100 webmail referrals probably means that 150 users visited the site by following a link in an email in their inbox, in total. parselog calculates several other statistics (see below); these are be streamed every n hits to the stats logfile. parselog generates a text-based report. parselog runs in "automatic" mode only; runtime options are configured using the INI file. Screenshots:
|