On URLs, keywords, and memorability

As I explained in my last article, URL Shortening is Bonkers, a “resource” on the internet is a thing: a photo, a news article, a video, a user profile. We reach these resources by means of URLs (links), but bear in mind that the URL is not the thing itself; it is just a signpost to it.

These ubiquitous signposts of the modern internet are based on a system computer scientists developed in the 1970s to work with files on magnetic disks. This is why reading them out loud (“aitch-tee-tee-pee, colon slash slash example dot com slash monkey slash blah dot see-gee-eye”) makes you sound like a massive geek: to be using these things in the 1970s you probably were a massive geek.

It may seem a bit daft that we’re still doing it that way, but that’s just how evolving systems work. It’s why 60% of the world’s railway tracks are approximately two horses wide. Once the ruts have formed, it’s hard to break out of them.

The great benefit of URLs is that they are, as it says in the name, uniform. At web scale, you need a common standard for everyone to unambiguously signpost their own resources, without stepping on anyone else’s toes. URLs do that. But they are not intrinsically optimized for humans to understand, remember, or use directly.

A well-constructed, descriptive permalink, with a date and title in the URL, is a delight: it tells you when the article was written, and gives you a hint of its content (information scent). But remembering that URL so you can go to that article at some point in the future (two minutes or two weeks from now) takes effort. Did it contain full title of the article, or was it truncated? Did it use dashes for spaces, or underscores? Did it have “.html” at the end? This is why bookmarks were invented.

Just by virtue of being shorter than a typical URL, you might think that short URLs would be easier to remember than full links, but that’s not necessarily the case. The payload of most short URLs is an apparently random sequence of upper- and lower-case letters and numbers, and our brains find it easier to cope with structure than with randomness.

It doesn’t have to be that bad, though. Bit.ly allows you customize your short links with meaningful words and phrases instead of randomness. The short URL http://bit.ly/monkeys is nice and easy to remember. But customized URLs are handed out on a first-come, first-served basis, so just as with internet domains and user names on popular services, as more and more short words are taken, customized links will become longer (http://j.mp/weirdalbinopygmymonkeys) or more random (http://j.mp/monkeys71).

The key idea here is actually no different from what AOL was doing with keywords a decade ago (remember keywords?), or what Google does with search. Instead of going directly to a resource with a single, long, unmemorable identifier, you use two shorter, much more memorable parts in a hub-and-spoke fashion:

Go to the well-known hub (Bit.ly, Google, AOL)
Use the hub’s human-optimized naming scheme to access your target resource

Different hubs take different approaches to their “human-optimized naming scheme”. Biy.ly hands out names for free, but on a first-come, first-served basis. AOL used to monetize their names by making companies pay to claim their keywords. And Google, of course, doesn’t really use keywords: they use search phrases. Search phrases are completely freeform and flexible, but are less precise in leading you to exactly where you want to go. A company might want their flagship product to be top of the search results when you type its name in Google, but that doesn’t actually mean it will happen. (Like, any search for consumer electronics, ever.)

As a method for reaching a resource that you know is there, hub-and-spoke works pretty well, especially when the hub is so deeply embedded in many people’s everyday lives (e.g. Twitter, Facebook, Google) that they have the hub hard-wired in their memory already. What it lacks in directness, it makes up for with memorability and low mental overhead. If you see the link “facebook.com/vauxhallcorsa” on a billboard while you’re driving home from work, you don’t have to try to remember whether the advert said “vauxhall.com” or “vauxhall.co.uk”, and then figure out the company’s site navigation to get to the information page for the car.

Companies sometimes make use of this on their own sites, for example with promotional codes: “Enter the code XEPO29 to see exclusive deals!” More commonly, though, the typical corporate home page offers three ways of reaching content: headline links, general site navigation, and search. Say you want to “send someone a link” to the product page of a really neat waistcoat you have found. You tell them to go to the example.com home page, and:

…click on the “steampunk satin waistcoat” link. Only works if there is a direct link to that specific product on the home page. Doesn’t work for large sites.
…click on the link for “clothing”. On the Clothing page, click on the link for “jackets, waistcoats”. On the Jackets and Waistcoats page, click on… (etc.)
…type “steampunk satin waistcoat” in the search box. The product will probably be top of the search results. (Unless it’s part of a whole range of similar products.)

To supplement these three traditional paths to a resource, sites could use internal keywords. If the site recognizes that you have entered a keyword in the search box, it could take you directly to the exact resource instead of to a page of search results. (The “I’m feeling lucky” path.) Or it could show a shortcut to the exact resource prominently above the general search results.

Unfortunately, for large sites this still suffers from problems of scale: if you have thousands of resources (many of them similar), the list of pleasingly short, meaningful unique keywords will be quickly exhausted, and you will end up with keywords like “VampireRedSteampunkSatinWaistcoat” or “waistcoat227”.

However, if you relax the restriction that the keywords have to be just meaningful, and just have to be memorable, there are all kinds of interesting things you can do. I’ll show a fun little example in my next article: Donkey Bridges

One Reply to “On URLs, keywords, and memorability”

One problem with short URLs is that the keywords may well be chosen to be very memorable to the person creating it, but not to anyone else. If I take your “monkeys71” example, that requires me to remember a number that has absolutely no meaning to me.

Well-crafted long URLs are formed along a hub-and-spoke-and-hub-and-spoke-ad-infinitum pattern, which can offer you a greater chance of remembering them by trial and error and/or convention.

Remember the “www”-prefixed domain names? Nowadays, they’re a little less commonly used, though still all around the ‘web. Back in the day where the internet and web weren’t exactly synonymous, it was entirely common to find an “ftp” host name for a website, too. Today, a “blog” hostname is fairly common.

If you remember, for example, that you read a blog post on “sunpig.com” and can’t remember “www.sunpig.com/martin/archives” as the prefix for blog posts, “blog.sunpig.com” may be your best bet.

Extending that concept, “blog.sunpig.com/archive” is a fairly intuitive place to look for older blog posts.

I’m a great fan of hub-and-spoke-type navigation. What I don’t quite see is how even human-readable, memorable short links are going to solve problems, though. Their main benefit seems to be to start from a blank slate with regards to what spokes exist, and that benefit is going to dwindle by design the more popular the hub becomes.

What I really like, though, are servers that handle mismatching URLs by making suggestions. There’s an Apache module that works for static files at least, and I think there was a wordpress plugin for doing the same with blog posts.

Would be nice if that tech was commonplace!

Comments are closed.