While reading Jeremy Keith’s blog entry “Safari Askew” I remembered that I had looked at this just before Christmas, and found an answer.
The problem is to do with Safari 4’s “Top Sites” feature, which shows you a pretty grid of thumbnails for the sites you visit most regularly (or that you have pinned in place). The interesting thing is that these thumbnails are live (ish) previews of what those pages currently look like. If you don’t happen to have a Top Sites page open in a tab, and if Safari considers that the current thumbnail is sufficiently out of date, it will automatically go and retrieve the latest version.

This can cause headaches for site owners, because in order to show the actual state of the page, Safari relies on full page requests: it downloads all the HTML, CSS, images, and JavaScript for the page, and then displays everything exactly as if the user were viewing the page in a standard tab. Adverts are rendered, page tracking scripts are executed, and to the server it looks just like a regular page hit. This can lead to the site recording unnecessary actions, and your site analytics being all messed up.
At Skyscanner, for example, we noticed this because Google Analytics was showing an unusually high number of Safari users (8.5%) with an abnormally high bounce rate (the proportion of sessions where users view a single page, then walk away with no further interaction): Safari 4 users were twice as likely to bounce as other browsers. Useless sessions generated by Top Sites were the problem.
As Jeremy noted, the user agent that Safari 4 reports for a Top Sites request is exactly the same as for a normal page request. Fortunately, there is a way to distinguish the two types of request: in the current version of Safari 4 (4.0.4) the Top Sites request for the base page (but not its JS/CSS/image resources) carries an additional HTTP header, namely “X-Purpose: preview“.
An easy way to verify this is to use an HTTP debugging proxy like Fiddler or Charles to watch what happens when Top Sites makes a request — see the screen grabs below:

If your pages are dynamically generated, you can adjust your server-side code to examine the HTTP headers of the incoming request, and take appropriate action if this is a “preview” request. Here’s some sample PHP code:
<?php
if ($_SERVER["HTTP_X_PURPOSE"] == "preview") {
echo "preview";
} else {
echo "normal";
}
?>
("X-Purpose" is not a standard HTTP header, and you won’t find “HTTP_X_PURPOSE” in the PHP documentation. It’s the CGI specification that specifies how HTTP headers should be handled: they should be made into an environment variable with an “HTTP_” prefix followed by the header name, with dashes replaced by underscores. Hence, the value of the "X-Purpose" header is placed in the "HTTP_X_PURPOSE" environment variable, and retrieved as $_SERVER["HTTP_X_PURPOSE"].)
If all you’re looking to do it fix your site stats in Google Analytics, then you should just make sure that you don’t write out the GA tracking code for preview requests. If you are concerned about excessive load on your servers, unwanted user actions, or spurious advert impressions, you can take more aggressive action, perhaps by rendering a lightweight version of the page. An extreme possibility I considered was generating a completely different version of the page, specifically designed to look good in the thumbnail format of the Top Sites preview page:

However, doing this runs counter to the notion that these thumbnails represent previews, and I don’t know how your users would react. More importantly, Google might consider this cloaking, and come round your house in the middle of the night with a baseball bat. Just because it’s possible, doesn’t mean it’s a good idea…






