El.pub Analytic Issue Number 6
Top | Topic News | Topics | Search | Feedback
Adverts from Google
Contents: The problem | Hypertext, SGML and search engines | Mixing tagging and linking | Navigating islands of information | Problem summary | The semantic web: what is it? | what will it do? | what do they say about it? | will it work? | Conclusions
In order for computers to create web pages that bear some resemblance to nice printed pages, tags are embedded in the text but don't appear on the screen. (If you are not familiar with this click on View and then Source or Page source in your browser menu bar. You will see the tagged version of the page and appreciate how much tagging is actually embedded.) The tags that are embedded in web pages fall into two classes. First, there are layout / processing tags that tell the computer what typefaces to use, where on the screen to place text and pictures, what plug-ins to use to render animations and so on. Second, there are metadata tags that can be used by other programmes such as search engines, that say something about the page such as who wrote it and may include keywords describing the content.
The main tagging system used at present is HTML. However, HTML has a restricted set of tags and is being replaced by XML which has no such inherent limitations, but will only work in the presence of a browser or other programme that recognises the tags used and has instructions on what to do with them.
In essence the semantic web is a move to vastly increase the use of metadata tags in web pages to expand the processing that can be carried out on the tag contents. The semantic web is not an attempt to use the actual content that appears on the screen for further processing. The tags related to a particular page do not necessarily have to be stored with the web page nor to appear in its source document. The information is intended to be used by computer programmes not by humans. In particular the reader of a page is not intended to see the tag information, although what they see may well be influenced by the tagging. (As indeed is currently the case where, for example, the tagging provides a text alternative to a picture, for use when a particular browser is unable to display the picture format used).
At one level the semantic web can be viewed as an attempt to add information, that consists to all intents and purposes of structured database records, to web pages to improve the use of the pages. And indeed one might say that this is dual to a view that web pages are simply poorly structured database records with a lot of comments embedded in them (the content).
The success of the approach will ultimately be judged on the value of the applications that are created to use the metadata that is created.
It is when we get to some ideas about the applications that are proposed that some of the protagonists ascend into a cloud of hype.
An immediate application that is proposed is to aid in interoperability between metadata. It is recognised that XML tags can be created for different purposes, by different groups but may represent the same meaning (which is one of the sources of the description "semantic web"). At the simplest level this might allow the translation of dates in tags from one format to another. The metadata would identify the format in the tag and this information might be used by a query programme to compare dates held in different web pages using different XML tag sets. At a more complex level translation between different e-commerce systems might be possible, for example, translating between different national standards used in tendering. (Translation is meant in technical terms not language translation, for example whether safety requirement 96 in country A corresponds to environmental requirement x-25 in country B).
The interoperability applications spread into improving search engine performance. Keywords are used in metadata at present without any very clear means of deciding whether a taxonomy used by one set of authors bears much resemblance to that used by another. If the taxonomies were identified and codified elsewhere (such as the Dewey classification, or NASDAQ codes) then some interoperability might be achieved.
Another type of application is to improve such systems as access control. The W3C proposes to use semantic web techniques to control access to individual web documents (on their site) by individual users.
All this sounds amazingly rational and low key. Here are some quotes from recent documents on the semantic web and related subjects (see further reading for the refs.):
On the Semantic Web, the target audience is the machines rather than humans. To satisfy the demands of this audience, information needs to be available in machine-processable form rather than as unstructured text. 4
The Semantic Web concept is to do for data what HTML did for textual information systems: to provide sufficient flexibility to be able to represent all databases, and logic rules to link them together to great added value. ... The Semantic Web is primarily a tool for interoperability. 5
So, when we talk about extra data that is meaningful to tools, we sometimes describe that information as being semantic (semantic just means "meaning").... So now that we have defined "semantic metadata" as being extra information about a page that allows tools to do interesting things, let's discuss the different ways that this meaningful information gets created. 8
To date, the Web has developed most rapidly as a medium of documents for people rather than for data and information that can be processed automatically. The Semantic Web aims to make up for this. 6
The problems start to arise when people let their imagination go and then we get quotes like:
An important feature provided by the Semantic Web is the ability to respond to the query "What is the basis for that statement?"; to help a user to answer the question "Do I believe that?", or alternatively, "How much risk is there to my achieving my objectives if I act on the basis of that statement?". The language of the Semantic Web must thus be able to represent information destined for automated processing. 5
The Semantic Web will bring structure to the meaningful content of Web pages, creating an environment where software agents roaming from page to page can readily carry out sophisticated tasks for users. 6
Until anyone can create metadata about any page and share that metadata with everyone, there will not truly be a "semantic web". 8
Note that these more futuristic views are largely drawn from different sections of the earlier documents.
One could say that the semantic web is just the planning for the real work with XML. There is no doubt that there will be applications of XML, where the web is extended by well thought out addition of metadata and that some of those applications will make a major contribution to improving the power of the web. The critical factors however are scalability and generality.
As the volume and complexity of digital information increases, human labour is not scalable to most aspects of a KM [knowledge management] solution. On the other hand, the available KM and search technology is simply not sophisticated enough to permit a fully automated answer. [ref 1] , says Katherine Adams.
The semantic web will rely on the addition of metadata on a huge scale and the substitution of 'metadata' for 'KM' seems to me to be reasonable and is not being addressed by the proponents of the semantic web.
A question that is also not addressed is whether the web is the right place to look up database style information. In the recent Scientific American article on the semantic web [ref. 6] the authors say:
The Semantic Web will bring structure to the meaningful content of Web pages, creating an environment where software agents roaming from page to page can readily carry out sophisticated tasks for users. Such an agent coming to the clinic's Web page will know not just that the page has keywords such as "treatment, medicine, physical, therapy" (as might be encoded today) but also that Dr. Hartman works at this clinic on Mondays, Wednesdays and Fridays and that the script takes a date range in yyyy-mm-dd format and returns appointment times. ... Instead these semantics were encoded into the Web page when the clinic's office manager (who never took Comp Sci 101) massaged it into shape using off-the-shelf software for writing Semantic Web pages along with resources listed on the Physical Therapy Association's site.
The trouble with this example is that if the web page includes information about Dr. Hartman it presumably also includes information about a lot of other staff, about treatments, about the location, etc. etc. Is this replicated on all the clinic pages? Who keeps it up-to-date? Should this sort of information even be in web pages? It is well to remember that one of the main reasons for introducing databases back in the late 60s was to reduce the number of copies of a data item to one, so that there were not lots of out-of-date versions of a fact such as a customer's address floating around the company's computer files.
It is doubtful whether anyone would really add this level of metadata to a web page. As Joshua Allen says [ref.8]:
The final scalability challenge threatening to cloud metadata's bright future is the probability that the amount of metadata being generated will be far greater than the data being tagged, and the number of queries made against metadata could exceed the number of queries made to request the related web pages.
The other problem is generality. Will there be a mega-browser supporting all these different XML metadata systems or will there be a never-ending list of new plug-ins? It is more likely that the applications will run against metadata on a specific set of web sites connected with a particular institution like IBM, MIT, or W3C. The question that then arises is whether they will consider it valuable to make their systems interoperate. A standard for storing metadata about databases, IRDS, was created in the 80s to support cross database access, but not as far as I know ever used widely.
The XML community are keen to blame the failure of SGML on its complexity and contrast the "simplicity" of XML. In fact anything that can be done with XML could have been done with SGML. The real problem with SGML was that the cost of marking up information and inserting it into documents was too high to be covered by the added value generated by the results. Publishers found it cheaper to use people than computers to generate page layouts.
Comments on the content, style and analysis are welcome and may be published; send them to: mailto:firstname.lastname@example.org
URL: download the WinWord document, from: http://www.elpub.org/analytic/analytic06.doc
||A free email alerter of the latest news items and associated URLs.|
File Downloads - Please note
|File downloads from the El.pub site are currently suspended - the links however have not been updated to reflect this. If you would like access to a particular download file - please email email@example.com with a suitable request confirming a description of the file you wish to download.|
El.pub - Interactive
Electronic Publishing R & D News and Resources
We welcome feedback and contributions to the information service, and proposals for subjects for the news service (mail to: firstname.lastname@example.org)
Edited by: Logical Events Limited - electronic marketing, search engine marketing, pay per click advertising, search engine optimisation, website optimisation consultants in London, UK. Visit our website at: www.logicalevents.org
Last up-dated: 29 June 2018
© 2018 Copyright and disclaimer El.pub and www.elpub.org are brand names owned by Logical Events Limited - no unauthorised use of them or the contents of this website is permitted without prior permission.