El.pub Analytic Issue Number 15
Top | Topic News | Topics | Search | Feedback
Adverts from Google
Analytic 15 - Looking back - seven years of El.pub (Part 2)
Page 5 of 6
Accessing knowledge and the arrival of digital libraries
For most of the fifty or so years that we have been using computers, information retrieval has meant getting records from structured files and databases. Much of the information in the files was indexing designed to unambiguously identify the data items that could be retrieved. In the last ten years and particularly since the advent of the web this has changed and any type of information can be made available: books, maps, music, pictures, films, etc. The earlier record-type data was created by entering information manually and was indexed at source; much of the new content is entered by scanning in information from paper, film or magnetic media and carries no indexing information of its own. Most of the rest is entered as free text from word processors. People may have thought that free text search would find the information easily, but they soon found that the sheer volume of the web and its volatility prevented this, although experience with free text search in limited document stores had already identified formidable problems before the web appeared.
It is surprising in some ways how little of the information stored in books is freely available on the web. The US Library of Congress has more than 18 million books, whilst the Guttenburg project that started in 1971 and aims to create copies of the world's books that are out of copyright has about 7000 books online so far. Another project to make 1 million books available online by 2005 has reached about 1500 to date. A similar situation exists with images and sounds, and so far as 3D images of objects (such as buildings or statues) are concerned we have not really started yet.
It is tempting to blame this on the controllers of copyright. It is more likely that the lack of progress on transferring information from traditional repositories to digital media is due to a combination of factors that include lack of demand, technical difficulties, a lack of common purpose and thus co-operation, and problems of use.
As far as lack of demand is concerned it is clear that there is no lack of new information on the web and in digital libraries. The web is actually considerably larger in storage terms than the 18 million books noted above. It is estimated that they would need about 20 terabytes of storage whereas the web currently carries more than 100 terabytes of information. Scientific and statistical data is available faster and to more people than ever before. Digital libraries of scientific papers are growing rapidly and are easily (if not freely) available from almost any location. The same applies to information about products and service for sale, and as e-commerce grows, applies to their delivery. It is also true that large amounts of news on politics, society and sport, together with new literature and music are created directly for the web and other digital media without ever appearing on traditional media. What is left to be captured is the record of that human thought and achievement, that is more than about 50 years old, and that was expressed either in writing or in some other physical form (buildings, artefacts, etc.). The most important of those records - religious works, writings of genius, masterworks - are available. The demand for the rest comes from historians and intellectuals who are studying very specific ideas in depth and who have rarely had a lot of help in finding the relevant records. Their demand counts for little in economic terms.
There are two principal technical difficulties with digital archiving of existing sources. The first is that advances in scanning technology seem to have ground to a halt once the demands of office document copying were met. Anyone who has tried to scan a book into a computer without damaging the book will know that the process is time consuming and of uncertain outcome. This is particularly the case with old books. Organisations (such as national museums and libraries) that are concerned with the professional preservation and archiving of artefacts such as early manuscripts can afford to employ resources beyond the scope of the average historical researcher. The second technical difficulty lies in the tools for organising and processing the resulting digital content. Here there is no lack of either will or research, only of results.
It is well known that researchers in the humanities have not embraced the use of computers to the same extent as those working in science and economics. It is even fashionable to disparage their use in some circles. This error impacts on everyone, as the skills of those working in the humanities, particularly in the areas of analysis and understanding of language and behaviour, are exactly those skills that are required to solve problems in areas such as artificial intelligence that have proved intractable for workers with predominantly technical training and outlook. The computer and particularly its use in the building of annotated digital libraries, opens the possibility of co-operative working amongst humanities researchers that has been technically impossible in the past. There seems to be little appreciation of the value that might be added by such co-operation. Attempts by technologists to build structured frameworks (such as the semantic web) to analyse documents would be greatly enhanced if more research workers in the humanities were engaged in parallel actions in their own areas of expertise. There are of course some areas such as art history where this is happening. However realisation of the value of cross-fertilisation here seems to escape many R&D funding bodies.
The last of the difficulties noted above is problems of use. The acknowledged difficulty, is in processing large amounts of information so that the item required can be easily identified and presented. Over the last few years the business community has discovered that, in the modern world, 'knowledge' is becoming the most important factor in success in the market place. Knowledge is a property of people. While some knowledge can be captured and applied in an automatic way, the majority is held (in an intrinsic sense) by individuals. It is certain that only people can generate and apply new knowledge. The consequence of this is that investment needs to be made in equipping people with knowledge (education and training) and in enhancing their ability to process knowledge. There are a number of RTD themes being followed to address the problem. Unfortunately they reduce in the main to attempts to solve problems of language manipulation that have been studied for fifty years with little practical success (understanding, disambiguation, translation, summarisation). As Noam Chomsky has continually pointed out, there are many observed aspects of language for which we have no explanation whatsoever, such as the way in which children acquire the ability to understand complex syntactic variations without tuition. The lack of tools to help knowledge workers use digital content that captures the ideas and experience of other workers (whether written / visual material, or direct networking) is a major obstacle to progress and should be an RTD priority.
Page 5 of 6
Comment on this issue
Comments on the content, style and analysis are welcome and may be published; send them to: mailto:email@example.com
Download the WinWord or Acrobat version
URL: download the WinWord
document, from: http://www.elpub.org/analytic/analytic15f.doc
URL: download the Acrobat (pdf format) document, from: http://www.elpub.org/analytic/analytic15f.pdf
Subscribe FREE to El.pub Analytic
A free email containing the latest issue of El.pub Analytic as soon as it's published in the format of your choice.
||A free email alerter of the latest news items and associated URLs.|
File Downloads - Please note
|File downloads from the El.pub site are currently suspended - the links however have not been updated to reflect this. If you would like access to a particular download file - please email firstname.lastname@example.org with a suitable request confirming a description of the file you wish to download.|
El.pub - Interactive
Electronic Publishing R & D News and Resources
We welcome feedback and contributions to the information service, and proposals for subjects for the news service (mail to: email@example.com)
Edited by: Logical Events Limited - electronic marketing, search engine marketing, pay per click advertising, search engine optimisation, website optimisation consultants in London, UK. Visit our website at: www.logicalevents.co.uk
Last up-dated: 1 December 2016
© 2016 Copyright and disclaimer El.pub and www.elpub.org are brand names owned by Logical Events Limited - no unauthorised use of them or the contents of this website is permitted without prior permission.