Skip to content

Latest commit

 

History

History
41 lines (24 loc) · 3.36 KB

S01. Human Augmentation.md

File metadata and controls

41 lines (24 loc) · 3.36 KB

Human Augmentation in Web Search

1.1 Please

  • Take a few moments to read the opening README.md file which provides a brief overview of this project.
  • Interact with this and other articles I'll post here. I'm looking forward to hearing your thoughts, questions, and concerns.

Discussion Link: https://hachyderm.io/@davidshq/109814269551861134

1.2 What is Human Augmentation?

In the early days of the web (and even today) there were directories of websites. These directories were organized and curated by humans. Some of the most popular ones included Tim Berner-Lee's Word Wide Web Virtual Library, Yahoo!, the Open Directory Project (ODP, aka DMOZ), and the Global Navigator Network (GNN, by O'Reilly).

As the web grew it became impossible for even hundreds of humans to manually curate the entire web and so there was this shift towards search engines which used automated means of finding and ranking web pages. Now Google dominates web search, Bing holds a distant second, and various innovative upstarts (such as DuckDuckGo) are trying to carve out a niche for themselves.

Now it is time for another fundamental shift in the way we find information on the web. It is not a return to the completely human curated directory nor is it further innovations on the automated search engine - but the best of both.

Human augmentation involves an underlying search index built by automated means that is augmented with active human curation and collaboration.

1.3 Why Human Augmentation?

Search engines using automated methods are powerful and useful - but they are also limited. They provide us with access to a greater variety of material but often fail to lead us to the best material.

  1. Results are often dominated by those with the most financial resources. Ranking highly is often dependent on Search Engine Optimization (SEO) and the ability to generate SEO optimized content is highly tied to one's financial resources.

  2. Content is often written to rank highly with machines rather than to deliver maximal readability and usefulness to humans.

  3. High ranked content is often stale as various "signals" used in automatically ranking results grant a higher ranking to older content.

  4. Paradoxically, newer content (or content published frequently) is often ranked higher than more accurate, understandable, and useful content (which was published years ago).

  5. Results are often highly redundant. Do a search for a medical condition and you are likely to encounter numerous encyclopedic entries from various organizations (e.g. hospitals, commercial entities, etc.) that contain similar information. Each contains good information but when ranked one after another they don't provide the diversity of information available.

These are all areas in which human curators can easily outperform automated search engines. Humans can:

  • Determine readability and usefulness of content apart from mechanical and maniputable signals.
  • Analyze whether newer material is more useful or correct than older material.
  • Ensure that there are a diversity of sources and perspectives available.

1.4 More to Come

I have much more to say on this and other topics but I don't want to release massive, overwhelming reams of text - and I want to revise and enhance my thoughts based on the feedback, criticism, and questions I receive.

Discussion Link: https://hachyderm.io/@davidshq/109814269551861134