How to detect a page request from Safari 4’s Top Sites feature

While reading Jeremy Keith’s blog entry “Safari Askew” I remembered that I had looked at this just before Christmas, and found an answer.

The problem is to do with Safari 4’s “Top Sites” feature, which shows you a pretty grid of thumbnails for the sites you visit most regularly (or that you have pinned in place). The interesting thing is that these thumbnails are live (ish) previews of what those pages currently look like. If you don’t happen to have a Top Sites page open in a tab, and if Safari considers that the current thumbnail is sufficiently out of date, it will automatically go and retrieve the latest version.

Safari 4's Top Sites feature

This can cause headaches for site owners, because in order to show the actual state of the page, Safari relies on full page requests: it downloads all the HTML, CSS, images, and JavaScript for the page, and then displays everything exactly as if the user were viewing the page in a standard tab. Adverts are rendered, page tracking scripts are executed, and to the server it looks just like a regular page hit. This can lead to the site recording unnecessary actions, and your site analytics being all messed up.

At Skyscanner, for example, we noticed this because Google Analytics was showing an unusually high number of Safari users (8.5%) with an abnormally high bounce rate (the proportion of sessions where users view a single page, then walk away with no further interaction): Safari 4 users were twice as likely to bounce as other browsers. Useless sessions generated by Top Sites were the problem.

As Jeremy noted, the user agent that Safari 4 reports for a Top Sites request is exactly the same as for a normal page request. Fortunately, there is a way to distinguish the two types of request: in the current version of Safari 4 (4.0.4) the Top Sites request for the base page (but not its JS/CSS/image resources) carries an additional HTTP header, namely “X-Purpose: preview“.

An easy way to verify this is to use an HTTP debugging proxy like Fiddler or Charles to watch what happens when Top Sites makes a request — see the screen grabs below:

Normal and Top Sites HTTP requests from Safari 4

If your pages are dynamically generated, you can adjust your server-side code to examine the HTTP headers of the incoming request, and take appropriate action if this is a “preview” request. Here’s some sample PHP code:

<?php
if ($_SERVER["HTTP_X_PURPOSE"] == "preview") {
	echo "preview";
} else {
	echo "normal";
}
?>

("X-Purpose" is not a standard HTTP header, and you won’t find “HTTP_X_PURPOSE” in the PHP documentation. It’s the CGI specification that specifies how HTTP headers should be handled: they should be made into an environment variable with an “HTTP_” prefix followed by the header name, with dashes replaced by underscores. Hence, the value of the "X-Purpose" header is placed in the "HTTP_X_PURPOSE" environment variable, and retrieved as $_SERVER["HTTP_X_PURPOSE"].)

If all you’re looking to do it fix your site stats in Google Analytics, then you should just make sure that you don’t write out the GA tracking code for preview requests. If you are concerned about excessive load on your servers, unwanted user actions, or spurious advert impressions, you can take more aggressive action, perhaps by rendering a lightweight version of the page. An extreme possibility I considered was generating a completely different version of the page, specifically designed to look good in the thumbnail format of the Top Sites preview page:

Safari 4 Top Sites with custom preview thumbnail: PROBABLY A BAD IDEA

However, doing this runs counter to the notion that these thumbnails represent previews, and I don’t know how your users would react. More importantly, Google might consider this cloaking, and come round your house in the middle of the night with a baseball bat. Just because it’s possible, doesn’t mean it’s a good idea…

10 Replies to “How to detect a page request from Safari 4’s Top Sites feature”

  1. I’d think this should be fixed on the GA side for you. File a bug.

    And as far as substituting the page for preview requests – you have to remember that Safari 4 will use your latest actual visit to the page as preview when you do visit it. So you’ll get intermingled inconsistent previews for your site, the more it is popular with a given user – the less he’ll see of your customized preview rendering.

    Personally I’d hate that, too. I kind of glance over this whole thing to find a layout I want to get back to, exactly the way I browse the History cover-flow of previews. It might be intended to be used that way, I don’t know. That’s how I use it.

  2. Just curious but I was wondering if this this is the same as the page request for the Google Chrome Top-sites preview as well, or if there’s a different way to detect it?

  3. @godDLL: This can’t be fixed by Google Analytics right now, because the X-Purpose: preview header is only send with the request for the base page. The header is not sent as part of requests for images, CSS, or JavaScript that have to be downloaded and executed as part of the Top Sites preview. GA relies on you downloading a tracking script, and then requesting a tracking pixel image – these requests are don’t have the header, and so can’t be detected.

    If there’s a bug report to be filed, it would be with Apple, to get them to attach the X-Purpose: preview header to all HTTP requests caused by Top Sites. Only then would GA — and other analytics providers — be able to take action.

    However, just solving the analytics matter doesn’t make the problem of unwanted user actions go away. To fix that, a site owner must adjust their own code to take account of Top Sites requests.

    @E Scott: Chrome does it differently — this is a Safari thing rather than a Webkit thing. I haven’t investigated it thoroughly, but it looks like Chrome shows a thumbnail image of the last time you used the favourite page. If you don’t happen to have the page open somewhere, and the thumbnail image is out of date, Chrome does not make any additional requests to show you a live preview.

    Personally, I think that’s better behaviour. You don’t get the “live preview” effect as in Safari, but personally I don’t find that necessary. And as a site operator, I much prefer not to have to deal with filtering out these spurious sessions.

  4. Personally, the irritating bit of top sites is that it gobbles disk space for the cache (even if top sites is off). There isn’t a thing you can do about it other than lock down the cache directory so it can’t write there or similar nasty hacks.

    My hackintosh SSD isn’t that big and the macbook pro disk is getting tight-ish too. Sometimes the cache runs away and fills a gig…

    And besides, I don’t want it taking up costly bandwidth on the 3G dongle and just ought to have control generally.

  5. Thanks for this article, I noticed in the last couple of weeks a rather high amount of bounces and was wondering what it was. Going to incorporate your php into my Google Analytics code immediately.

  6. Surely its a feature, not a bug, to have this header sent. If I were a web master and my page was in people’s Top Sites in Safari, I’d consider that a good problem to have, not a headache.

  7. Thanks for the helpful writeup. I am surprised by the lack of attention this is getting. While it is nice to be in someone’s “top sites”, it is frustrating to have it inflate your visits and bounce rate in GA. I have been hoping that Safari would take notice of the issue and start using a distinct user agent for the preview request. Still hoping for that. In the mean time you’ve proposed a good work around.

  8. Ah yes, I just noted X-PURPOSE: preview
    My first thought was, “Oh no, not again”.
    This is an interesting feature from Safari 4, but why can’t browsers all just get along.
    Mozilla has a simular feature with it’s X-MOZ: prefetech.
    And not to be out-done, I note HTML5 has a documented rel prefetch link attribute.
    This can also be used in HTML 4.01 as it doesn’t break any standards.

  9. There is something even more interesting here:
    Chromium sends the “preview” header for each request using the Instant feature. But! when I finish typing and press Enter, I’m left with a page served with a previewer in ‘mind’ (does a server have one? :P)

Comments are closed.