Given the explosion of the number of documents resulting from the computer Network, the role of libraries in preserving, conserving and making available documents is still fundamental. Unlike the technical illusion of a world of documents that is “auto-organized”, open to all and where information moves freely, tackling the issue using the experience of libraries as a starting-point enables us to imagine, on the contrary, a balance between technological innovation and social conditions for the creation and dissemination of documents. Access to knowledge for all requires “libraries” in the digital world ... and libraries change in nature and function when faced with the need to discharge their social and cultural missions in cyberspace.

A socio-technological definition

The traditional meaning of “library” is a place in which books are:

- held for future generations so as to give continuity to cultural production and constitute the collective memory;

- organized, with “classification” affected by the emergence of new skills and making it easier to locate documents;

- and open to reading by all, anywhere. A library user can consult books “on the spot” or use the “library ¬network” to obtain in his or her home-town documents that are not held in the local collection.

Libraries have “networked” for a long time now, to share out work by creating “union catalogs” and to ensure Universal Access to Publications. Ever since computers were invented, documentation centers have used them to set up “data bases”, which are really points to access references, in particular in Science.

Libraries, then, have responsibilities to readers (enabling them to access all the information in the world) and to documents (ensuring that they will be legible tomorrow, and described in a catalogued data bank). We can use their experience as a framework in order to think about “digital documents” in terms of duration and in terms of the organization of access to knowledge.

It was in the 1990s when libraries engendered the concept of “digital libraries” and decided to confer upon them technological concerns as well as social and cultural missions. In 1999, Christine Borgman [1] noted two separate approaches used by library professionals. She described “a set of electronic resources and associated technical capabilities for creating, searching, and using information” but also emphasized that “digital libraries are constructed, collected and organized, by (and for) a community of users, and their functional capabilities support the information needs and uses of that community ”, an approach that was also proposed by the Virginia Tech Institute in 1998: “The digital library is not merely equivalent to a digitised collection with information management tools. It is also a series of activities that brings together collections, services, and people in support of the full life cycle of creation, dissemination, use, and preservation of data, information, and knowledge.” [2]

The use of information technology in digital libraries is no longer confined to the production of catalogs or access portals, but encompasses the storing, research, and supply in an always legible format of documents themselves in their diversity. For the Association of Research Libraries, “digital library collections are not limited to document surrogates: they extend to digital artefacts that cannot be represented or distributed in print formats.” [3] The digital library is also a “multimedia” library.

Digitizing and archiving

With the development of networks, documents were increasingly being read through digital forms. Libraries soon started wondering about the “digitization” of print documents and of transformation of films and sound recording, so as to facilitate their dissemination. They were also quick to realize that there was a rapidly growing set of documents that were born digital, in particular websites, so another aspect of their work is to organize their preservation and ensure their duplication.

The digitization of print works was launched in the early 1990s in many libraries and archive centers throughout the world. For instance, the Gallica programme of the Bibliothèque nationale de France (the National Library of France) proposes more than 70,000 nineteenth-century works, 80,000 images and dozens of hours of sound recordings [5]. The “Google Print” programme [6], announced to media in December 2004, aims to digitize hundreds of thousands of works in five libraries in the United States. The announcement was widely reported, in particular in France, where the Bibliothèque nationale de France President seized the opportunity to expand Gallica towards a multilingual European digitization programme [7]. Access to the culture of the past has become, through digitization, not only of economic but also of geopolitical” importance [8]: There are several ways of looking at the world and they are contained in books. In order to build peace, all linguistic forms, and the various historic paths must be made to coexist in the digital world. China and India, by launching a partnership with the Internet Archive and the University of Michigan [9]; the Arabic world, despite the looting and destruction of the Library of Baghdad, the place where writing was invented; and Africa, for the thirteenth-century manuscripts found in Timbuktu [10], all have digitization projects that might restore the balance from an overly “western” vision of culture and knowledge.

There is a grave danger however that funding from developed countries and international organizations will lead to a one-way flow of culture and knowledge, in particular because the legal status of digitized works has not been clearly established. Digitization apparently gives new rights to the company that undertakes it, which is in fact a new way of grabbing the heritage. The example of Leonardo da Vinci’s Leicester Codex is significant: bought by Bill Gates, the manuscript is in a bank safe room, the only version available is digital, and the copyright has been attributed to Corbis. Unless we are careful, digitisation may lead to a new privatisation of the public domain.

The other aspect of activity to constitute digital libraries consists in “archiving the web”. Like the Internet Archive [4], many public and private programmes are aiming to build “collections” of digital images of what is nevertheless the recent past of the Internet. This raises several problems:

- How to select the part of the web to be archived (sampling);

- Whether such archiving should take place in conjunction with the publishers of websites, or whether a library can consider that it can archive these available documents in the name of fair use;

- How to gather the documents dispersed on millions of machines;

- How to transform these documents (web pages) so that the reader of tomorrow can experience them in more or less the same way as a reader might at the time of their publication (reproduce as faithfully as possible not only site content but also appearance);

- How to enable a reader to read documents that no longer exist on the Internet, due to sites disappearing, but which are not yet in the public domain. One might imagine that many authors would like to see the works that they leave freely on the Internet when they are created remaining there in library archives.

The question of archiving the web also raises a more fundamental problem, that of the definition of a document in its conversion to digital [11]. One of the myths of the Internet consists in replacing the “document”, that can be read again and again, with the “information flow”, constantly renewed, and more like “audiovisual communication”. The myth is rooted in a very real evolution in social practices around reading/writing: blogs, mail, video sequences, podcasts, evolving sites, wiki... The status of authors is changing. How can we keep track of all this upheaval? How can we make the ideas and actions of the previous years available? In short, how can we transform today’s Internet flow into documents that will still be legible tomorrow?

Two strategies for finding digital documents

Because computers have become the preferred tool in the creation of new documents, as much on the writer’s table as in academia, and in video editing as much as in musical composition, the number of documents published (here meaning put on the web) is growing as never before. The question of identifying the documents that meet the need of a reader, whether that need be scientific, political or cultural, then takes a dominant place. How is it possible to find one’s bearings in the proliferation of information?

Two strategies have been put in place:

- “Search engines” (Google, Yahoo!, MSN) use the content of documents to carry out the search. This favours specific searches, where the question contains many words (for instance, the search for a quotation), but makes looking for concepts harder.

- The classifications of digital libraries and increasingly the tools proposed as part of the “Semantic Web”[12], which aim to elaborate “documentary languages ” in which one can “browse” to look for documents, which are grouped according to proximity of meaning.

- The two strategies complement each other [13]. Whilst the first is based on calculations and thus computer power, the second requires human intervention. The first is subject to the imprecision of language, manipulations by “referencing” services, and the hidden choices of algorithms; the second suffers from visions of the classification of knowledge that are often too specific and subjective.

Each type of classification (from Dewey in libraries, to the Yahoo! directory) reflects the ways of seeing the world, the “current” preoccupations of the group that creates, develops and uses it. The classifications used on the Internet are today mainly linked to the needs and conceptions of developed countries. Having a classification system that is evolving, comprehensive, multifaceted and genuinely global demands a great deal of human investment. With the Internet we have the capacity for many people to work together on these tasks. Cooperative projects such as the Open Directory Project [14] and folksonomy [15] allow readers themselves to become involved in the classification of digital documents. A new role for digital libraries is then to find the technical and human means to stimulate that dynamic, to ensure that the points of view of the whole world are duly respected, and to facilitate the translations of concepts.

The computational model of search engines is for its part skewed by the economic constraints weighing on the firms running them. Associating document searches and advertising revenue has become a necessity that in turn has an effect on the balance of the prominence of the documents found. The documents that appear first in a list of responses are in turn cited, which makes them even better known. It is a “media” type effect which tends to divide documents into a few that are widely read and quoted, and the rest that stay virtually unknown. The question of cultural and linguistic diversity and that of the qualification of science (peer review) cannot be taken into account by the algorithm model of search engines, even less so with long documents such as books [16].

With the main search engines, we can see the emergence of a genuine “new media” on the Internet. This media, a tool for promotion, for selling advertising space and amplifying audience “success”, is presented only in the perspective of a “technical” tool intended to make better use of the resources on the web. Underneath this seeming cliché ,we can however already note trends that will favour documents produced in English, in developed countries; documents for the “general public” will be prioritized by the system of counting links (Google’s page rank) to the detriment of research and critical works... In short, far from constituting a means of accessing all information, there is a great risk that only some information will be favoured, namely, the information that has the means to garner an audience by using various marketing techniques which aim to ensure that the sites appear at the top of the list. This new emergent media domination is grounds for concern for developing countries, especially since no rules to limit concentration, and no anti-monopoly law have been established for this sector.

Access to knowledge

Because libraries enable documents to be read that have been located and classified outside the rules of the market and religious and ideological pressure, they are essential tools for spreading access to knowledge to the world as a whole and in particular women. It is because they are services open to all that libraries have always sought to promote reading, thinking, to all sectors of the population. In order to carry out these missions, and ultimately to improve the standard of living and degree of awareness of individuals and countries, libraries have relied on “limitations and exceptions” under intellectual property rights legislation. Public reading, the use of works under copyright in schools and universities, the dissemination of science... have been made possible by numerous rules of use in legislation and jurisprudence on copyright. This is the case for the notion of “fair use” which enables libraries to provide the public, on the spot or at home, with the books, music, films and documentaries that they have acquired on a regular basis. Library acquisitions provide an essential economic impetus for many works, in particular documents that are critical, specialized, high-level, or in minority languages in a given country.

These exceptions and limitations have been deeply affected by the transition to digital and especially by the dissemination of documents on electronic networks.

The International Federation of Library Associations (IFLA) has for instance noted [14] :

- New layers of rights on digital information, e.g. database right (the organization of information in databases confers ownership, even if the information itself is not subject to copyright);

- Digital Rights Management (DRM), which prevents readers from using legal exceptions (for example, private copy);

- Non-negotiable licences that override fair use provisions (every digital document proposes a “licence”, a private contract the terms of which, drafted by the publisher only, take precedence over the law).

It should be added that the rules of preserving of electronic documents have been harmed by the practices of publishers. Only independent bodies, which have been specifically assigned to the task, can ensure the impartiality and completeness of archiving and preserving documents. Examples abound in history of documents that have disappeared once their use is no longer deemed to be economic.

We should note too that libraries are involved in the extension of collective uses of the Internet. They are places that house telecenters or multimedia creation centers. Libraries are tools for popular education enabling many people to engage in the collective learning of how to read electronic documents. However the rules of law, like trade practices, only consider the “individual” uses of documents. Such a restricted concept affects in particular women in countries where they are subject to pressure that limits their access to education and knowledge, and for whom libraries are cultural havens.

Given the above, how can we maintain and expand to developing countries and those parts of the community that have little access to reading, the service rendered by libraries in the digital world? This question is a major challenge in sustainable world development. It is also a public health issue (access to knowledge that can help coping with pandemics), one of peace-building (through the mutual understanding of peoples and cultures), and the extension of democracy and human rights. It is one of the reasons that have led librarians, notably IFLA, to take part in the global civil society move to draft a “Treaty on access to knowledge” [18].

The three challenges of digital libraries

We have noted three main lines around which it seems we should envisage the construction of digital libraries. They cover the traditional activities of libraries and in so doing show that the experience acquired with books and journals in recent decades can also be of service in a situation that is evolving extremely rapidly, more marked by communication than by the management of documentary information, like that of the Internet today.

- Conservation and digitization: how to select the documents to digitize, how to ensure that all ideas and all languages are covered? How to archive the flow of information circulating on the Internet for the benefit of future generations? How to preserve as common goods documents in the public domain that have been digitized?

- Document searches: How to develop models of search engines and classification in order to avoid knowledge becoming merely a reflection of the “popularity” of an idea or concept? How to develop multilingualism and browsing by concept promised by the “semantic web” by involving all Internet users from all over the world?

- Access to documents: how to maintain the limitations and exceptions to intellectual property that enable libraries to take part in the free movement of knowledge in the digital world? How to prevent new rights and ownership techniques associated with digital documents from shrinking the capacity of all to access knowledge?

By looking at the Internet as librarians do, we are in a better position to understand the need to keep track of its dynamic activity. We can situate reflection better in terms of duration and are less subject to the whims of the media. We can at last gauge in the field of ideas the importance of implementing the standards for description (metadata) and interoperability (translation, cooperation in document description and the need to constantly reformat documents so that they remain legible as technology evolves) that are at the basis of Internet technology.

In so doing, we come across a social conception of information and knowledge, which at one and the same time builds the heritage (works of the past) and access to the most recent information (scientific publications). We can put into perspective the strictly commercial visions of the production of culture and knowledge by looking from the perspective of information common goods and their effect on the development of individuals and countries. together based on documentary languages (thesaurus, anthologies, dictionaries...) spread out on the Web. Automatically extracting knowledge from documents, classifying, translating, exchanging information... are certainly difficult myths to achieve. However, the effects in the management of digital documents of the tools of the semantic web will soon be significant.

27 March 2006

couverture du livre enjeux de mots This text is an extract from the book Word Matters: multicultural perspectives on information societies. This book, which has been coordinated by Alain Ambrosi, Valérie Peugeot and Daniel Pimienta was released on November 5, 2005 by C & F Éditions.

The text is under the Creative Commons licence, by, non commercial.

Knowledge should be shared in free access... But authors and editors need an economy to keep on creating and working. If you can afford it, please buy the book on line (39 €)