{"id":938,"date":"2004-02-04T14:28:32","date_gmt":"2004-02-04T14:28:32","guid":{"rendered":"http:\/\/sunpig.com\/mt-entry-938.html"},"modified":"2006-09-23T19:30:11","modified_gmt":"2006-09-23T19:30:11","slug":"personal-search","status":"publish","type":"post","link":"https:\/\/sunpig.com\/martin\/2004\/02\/04\/personal-search\/","title":{"rendered":"Personal Search"},"content":{"rendered":"<p>I regularly find myself thinking, &#8220;I know I read a web page about [XYZ] last month, but <em>where the hell was it?<\/em>&#8221;  I may be able to remember certain key phrases, and these sometimes help me find it again by using <a href=\"http:\/\/www.google.com\/\">Google<\/a> or some other search engine.  Sometimes I can also find the page by doing a full-text search on my <a href=\"http:\/\/www.mozilla.org\/products\/firebird\/\">browser<\/a> cache.  (I use the &#8220;Find in Files&#8221; functionality of <a href=\"http:\/\/www.textpad.com\/\">TextPad<\/a>, because Windows&#8217; own search is too slow.)  But that doesn&#8217;t help if I was looking at the page more than a week or so ago, because it will have dropped out of the cache.  (I have my cache set to 1GB.)<\/p>\n<p>What I would <em>really<\/em> like is &#8220;Personal Search.&#8221;  This would take the form of an extra option on a search engine that would alow me to restrict my searches to <em>only<\/em> the pages I have visited.<\/p>\n<p>I don&#8217;t think it would be too difficult, technically.  First of all, you would have to have some mechanism of reporting to the Search Engine Company (SEC) whenever you visit a page on the web.  I think the <a href=\"http:\/\/toolbar.google.com\/\">Google Toolbar<\/a> might already do this.  Likewise, it shouldn&#8217;t be too hard to build something for Mozilla that would perform this task.<\/p>\n<p>The Search Engine Company would then have to record this page view in a database, and associate it with your personal browsing history.  It wouldn&#8217;t have to store the whole page itself, because chances are good that the page has already been spidered and is present in its main index already.  If the page <em>is<\/em> new to the index, it will have to be added.  (No big deal, and this even adds value to the main index as a whole.)  Because the SEC only needs to store a list of URLs (and probably timestamps, too) against a user ID, this wouldn&#8217;t even take up impossible amounts of disk space.<\/p>\n<p>Next, the SEC has to implement the search filter:  whenever I do a search with the &#8220;only show results for pages I&#8217;ve visited&#8221; checkbox ticked, this should limit the search results appropriately, based on my browsing history.  And voil&agrave;!  My own Personal Search results.<\/p>\n<p>There are a couple of down sides to this idea, though.  For one, it requires the SEC to keep a complete track of my browsing activity.  Depending on legal jurisdictions, this history could be used in ways I&#8217;m not entirely happy with.  The scheme would have to have some way of turning off indexing completely, or for the duration of a browser session.<\/p>\n<p>Secondly, not all web pages <em>can<\/em> be indexed by the SEC, and not all pages <em>should<\/em> be indexed by them, either.  (For example, newspaper or magazine archives that require subscriptions.)  There isn&#8217;t just the preference of the end user (me) to take into account, but also the preference of the web site owner.  As a result, I may find that there are still gaps in my Personal Search.  However, I think these gaps would still be less annoying than not being able to get back to web page XYZ that I remember from last month.<\/p>\n<p>Finally, there&#8217;s a question of cost.  To a certain extent, search engines fulfil a public service to the population of the Internet.  &#8220;Personal Search&#8221; would be a service that I imagine people might be willing to pay for.  After all, it means you don&#8217;t have to manage an enormous search index on your own computer.  I <em>could<\/em> keep all the pages I&#8217;ve ever visited in a cache somewhere, but I really don&#8217;t want to spend a couple of hundred pounds on disk space every year.<\/p>\n<p>It all sounds too easy.  Can someone tell me now why this wouldn&#8217;t work? Or alternatively, can you tell me if there are any search engines out there that do this already?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I regularly find myself thinking, &#8220;I know I read a web page about [XYZ] last month, but <em>where ther hell was it?<\/em>&#8221;  I may be able to remember certain key phrases, and these sometimes help me find it again by using <a href=\"http:\/\/www.google.com\/\">Google<\/a> or some other search engine.  Sometimes I can also find the page by doing a full-text search on my <a href=\"http:\/\/www.mozilla.org\/products\/firebird\/\">browser<\/a> cache.  (I use the &#8220;Find in Files&#8221; functionality of <a href=\"http:\/\/www.textpad.com\/\">TextPad<\/a>, because Windows&#8217; own search is too slow.)  But that doesn&#8217;t help if I was looking at the page more than a week or so ago, because it will have dropped out of the cache.  (I have my cache set to 1GB.)<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[11],"tags":[],"class_list":["post-938","post","type-post","status-publish","format-standard","hentry","category-techie"],"_links":{"self":[{"href":"https:\/\/sunpig.com\/martin\/wp-json\/wp\/v2\/posts\/938","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sunpig.com\/martin\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sunpig.com\/martin\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sunpig.com\/martin\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/sunpig.com\/martin\/wp-json\/wp\/v2\/comments?post=938"}],"version-history":[{"count":0,"href":"https:\/\/sunpig.com\/martin\/wp-json\/wp\/v2\/posts\/938\/revisions"}],"wp:attachment":[{"href":"https:\/\/sunpig.com\/martin\/wp-json\/wp\/v2\/media?parent=938"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sunpig.com\/martin\/wp-json\/wp\/v2\/categories?post=938"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sunpig.com\/martin\/wp-json\/wp\/v2\/tags?post=938"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}