Richard Rogers 1 Jan 1998

Playing with Search Engines

and making lowly web information into knowledge

Climate change is global: it occurs nowhere in particular and everywhere at once. The same could be said of the information supply about this ecological crisis: on the World Wide Web, scientific facts seem to emerge from nothingness, their sources are hardly locatable. A virtual geography of Global Climate Change on the Web...

Scientists, scholars and artists have long sought reputation here by publishing and showing there. In order to be given space at home, you first have to have been granted it abroad. Only by working, publishing and showing elsewhere could you gain admission to the societies and museums, and the print journals and demonstration halls at home. Many have arrived by first leaving.

While more fluid than I describe it such a reputational model remains in place today. But now one no longer has to surface mail a text to India in order to be able later to drop off another across town at the royal society. There is a new abroad, a new colony, a new space for reputation-seeking scientists, scholars, artists, even businessmen. Just as distant journals and museums once served as potential reputational resources for the scholar and artist in waiting (or for those needing to polish off the tarnish), now the net holds out the same promise. The new feathers in one's cap needn't be off-prints of articles from far-flung journals or a coloured handbill from a Senegalese playhouse, but a URL or three.

Anchored by Ink or Tainted by the Net

There's a difference between ink and digiprint, however. A net publication does not have the standing of, say, a colonial journal of old. There are some elementary out-of-web reasons for its lower epistemological status, its low knowledge value. (The in-web reasons relate to recently evolving search engine logics, described presently.)

From regurgitated reports of the downing by the US Navy of the TWA flight 800, over cold fusion announcements to the Drudge Report's Washingtonian whispers, purely net-based 'information' is regarded as 'floating', like jetsam or hearsay. It washes up on screen, so to speak. Of course the digital can be montaged at will - forwarded, anonymously remailed, pulled down and reloaded. But moreso it's self-publication that lowers its status. (In this sense, the net can be said to disempower.) Unless the piece is by an established name, many print journals won't touch even the most well-considered contribution to a discussion list. The list piece hasn't been routed through the proper channels. Call it renegade prose tainted by the net.

By its very nature net information remains dubious unless anchored by a recognised knowledge maker - an intellectual editor, as the International Herald Tribune calls it. Net information may become knowledge, in other words, when it is vetted by and attached to a credible source - one with a reputation of following and checking 'method', be it the journalist's (independent or third-party confirmation) or the scientist's (reproducibility or non-falsification). With the bone fides, it is fit to print or upload.

Strides have been made that may raise the status of web-based information. Respected journals and newspapers - with articles refereed, independently confirmed or otherwise stamped with approval - have been coming on-line for a few years now. Anchored by ink, counterpart net journals and papers do have the capacity to raise the status of net information more generally, but many of the articles are stored in databases out of reach of the web crawling spiders that feed search engines. In the main, open-ended search engine queries won't return these deep articles. Only shallow information is fished onto your plate.

Web as Resource versus (Own) Source

The low epistemological status of web-based information notwithstanding, day in and day out both the journalist and the scientist turn there first for a snapshot of what's what. The web is used as a resource. The search engine is the tool and the ranked URL list the finished product. Generally speaking, search engines return results according to queried key word location on the web page and recurrence towards the top of that page. (They like redundancy.) Newer engines as webcrawler boost a page's ranking in your returned URL list if the site is frequently linked to. (They like the hyperlinked.)

Thus the determined 'relevancy' (as it's called) of the site for the queried word or phrase is a matter of not only that phrase's location and frequency in the page but also the page's 'popularity'. While it's still the case that early engines as Yahoo! have actual human editors verifying and arbitrating the status of self-reported URLs

See S. Steinberg, 'Seek and Ye Shall Find (Maybe)' in Wired 4.05

the newer fully automated engines (with those web crawling spiders) continue to move in the direction of ranking according to these new measures of popularity. The more the site is inter-hyperlinked (and the better the site is 'metatagged' or labelled for the roving spider to spot), the more 'reliable' and 'relevant' the search engine believes the site and presumably the information to be. In turn the engine's interface delivers attractive reliability percentages and star symbols, providing us with a sense of security, 'value certainty' even.

Significantly, the web is thus becoming self-referential, a domain unto itself, its own source, feeding itself, if you will. Once the loop with the out-of-net is closed, once the human relevance arbiters have been unplugged, one can begin to speak (softly) of a pure web context. Information generated for the sole benefit of the medium. Unavoidably the dictum must run the medium is the medium. (I speak of Manuel Castells' historical progression. If TV drove the medium is the message, if the VCR put forward the message is the medium and if multimedia implies the message is the message, then self-referential web network computing means the medium is the medium.)

See M. Castells, The Rise of the Network Society, Oxford 1996, pp. 327-375

The searcher does not generally comprehend such self- referential web truth, unless it's viewed (as it must be) from the perspective of web commerce. Most search engine commentary revolves around making your own site more popular and thus more findable. Tips to get more hits.

See Danny Sullivan's www.searchenginewatch.com

(Not surprisingly, search engines score towards the top here.) While it's always good to know, site relevancy logics for the purposes of web commerce do not help to raise the general epistemological value of web-delivered information. Daily the web is queried and commercial information popularity logic is returned.

In order to raise the epistemological value of returned web information, one may wish to design a logic informed not by web commerce, but by information geographies - a relational logic which at the same time takes account of the web's self- referentiality and emergent information evolution.

Knowledge Mapping, or Authoring the Context of Web Information

The principle behind 'knowledge mapping' is simple or should be. Once information is purposively plotted or visualised in reference to other information and once one learns to read such a rendering, the rendered information becomes knowledge. In this case, the map provides the necessary context of otherwise disparate, lonely and lowly pieces of information, returned by current search engines. The pieces of information themselves gain no higher status, but rather the map of information (with legend or interface) becomes both a source of knowledge (for multiple interpretations of the map) and knowledge itself (from the authoring cartographer's built-in interpretation). It also is knowledge in the sense that it provides the context from which the field of information can be further queried.

To understand the context of web information and to transform the information into knowledge, the information cartographer explores and renders geographies of the web, that is, the basic relationships on the web between the specific sources of specific information.

The most obvious and indeed rather telling relationship between sources of information on a particular subject on the web is the sources' hyperlinking choices. Who links to whom? Who reciprocates?

For an illustration of some of the hyperlinking choices of organisations involved in the global climate change debate, look on the following screen. Note the governmental scientific bodies hyperlink only with one another, the corporations mainly stand alone, the corporate lobby group (GCC) links to governmental scientific bodies and not to their backers, and the non-governmental organisations link to one another and to most every other party in the debate. As with city maps, where one may ask whether the lay-out of streets follows some particular historical logic of flow, with knowledge maps one is invited to interpret the strategic choices behind hyperlinking.

Noortje Marres has made such interpretations | Noortje Marres

A second relationship between sources of information concerns semantic or discursive choices. Who's speaking the same language? What's their angle on a certain phrase, or how do the parties frame a key quotation, referred to by all or many parties to the debate?

Figures 5 portrays the parties' commentary upon a principal knowledge claim made by governmental scientists at IPCC. The phrase has a high level of recurrence across parties' sites, according to a textual analysis tool (TACT), better known for aiding in the interpretation of the works of James Joyce. While resulting from scientific research and appearing in print, on the web the .gov scientists' otherwise floating quotation becomes anchored in a map showing its application by different .coms and .orgs. Mapping the various appropriations of the quotation renders the web information context, in a glance.

Taking the two sets of maps together, one may inquire into relationships between hyperlinking choices and discursive affinities. Do those who use the principal knowledge claim in a favourable light tend to link to one another?

Authoring Web Knowledge

In all the aim is to create a web-based, socio-epistemological tool for the knowledgeable understanding of otherwise floating web-based information. It would allow the web navigator to view significant relationships between major parties (in this case) to a debate, as represented on the web and only on the web - that new renegade and self-referential space in need of an epistemological rerendering boost.

The built-in interpretation of the knowledge map, dubbed geographies of knowledge and power, is as follows. Mapping the sources' or parties' interlocking hyperlinks renders geographies of power, or socio-political alliance. Mapping the parties' shared terminology renders geographies of knowledge, or knowledge claim affinities. These are agreements or disagreements about the certainty value of scientific statements.

Querying a Californian internet archive, refreshing searches would show the evolution of these relationships over time, thus making the web debate rendering tool dynamic, and open for new interpretative knowledge claims by cartographers and readers alike.

Thanks to:

The Geographies Research Team in Amsterdam and in London, Geographies Noortje Marres, Alexander Wilkie, Milo Grootjen, Noel Douglas, and Alex Somers,

Janet Abrams at the Netherlands Design Institute, reader.