XBRL and the Semantic Web
Written by Andy Greener Posted on February 26, 2009
Andy Greener is a software architect at Her Majesty’s Revenue & Customs with responsibility for, among other things, interoperability and the use of standards. He is also strategy architect for HMRC’s Company Tax online service, which has mandated the use of XBRL for the accounts and tax computation components of all company tax returns filed after March 31, 2011. He is a Chartered Engineer and Chartered Information Technology Professional with over 27 years’ experience as a software engineer.
In Part 1 of this three-part post, Andy looked at why it’s much more difficult for computers to discern meaning than it is for humans. In this second installment, he discusses how display- and meaning-oriented markup languages culminate in the Semantic Web, which opens the information content in web pages to intelligent applications. (In Part 3 published subsequently, Andy discusses how, in the financial world, Inline XBRL serves both human and computer in ways best tailored to each of them.)
Web pages are primarily meant for human consumption. The mark-up language HTML (and its cousin XHTML) were designed deliberately to allow web page authors to describe the look, feel, and visual layout of a page for a web browser to render. But like our own visual analysis of a document, the meaning of the words or figures on the page, or their relationship to embedded images or other documents, requires deeper analysis and a level of mark-up that is beyond HTML’s designed capability.
At the other end of the mark-up spectrum, XML (and therefore XBRL) is regularly used to mark-up the deeper meaning (or "semantics") of documents, though usually data-rich documents rather than prose. Individual data items can be identified and related to the concepts they represent; entire complex structures of information can be described in a way that a computer can make use of. The real "meaning" of the information may still not be fully apparent from the mark-up; but it gives more complex software systems a leg-up in understanding the information and applying ever more intelligent analysis or manipulation to it. Of course, most XML documents are actually generated by software applications for other software applications to consume — the ability of XML to serialise complex data structures for communication purposes is one of its most useful traits — but the meaning is at least preserved in transit.
Why might it be useful to combine the two ends of the mark-up spectrum? Well, there is now a vast and mind-boggling variety of information available on the web, but much of it is marked-up as HTML and therefore largely inaccessible to intelligent software applications, and most of it will remain so. However, it is easy to see where more intelligent searching, or more intelligent aggregation of information, might be of huge benefit. If only intelligent software applications could read, understand, and traverse the web as easily as human beings, but at lightning speed! The barrier is not the mechanics of retrieving web pages and traversing hyperlinks, but of understanding the page content in the first place and knowing which links to follow.
This “spectral fusion’” is already happening — it’s called the Semantic Web — and various standards have emerged to allow the mark-up of web pages to be extended so that semantic concepts and relationships can be identified and described. Human browsers remain oblivious to all this extra (hidden) mark-up, but its presence opens up the information content of web pages to intelligent applications, which in turn remain largely oblivious to all the HTML mark-up targeted solely at aiding human comprehension.
It is probably fair to say that the techniques and standards of the Semantic Web are still in their infancy, relatively speaking, and that many "specialist areas" have yet to be properly explored. One specialist area dear to our hearts here of course is business and financial information. When it comes to web publishing, a financial report is on an equal footing with a weather report, a news report, a scientific paper, a cooking recipe or even a blog. It contains information of interest to the human reader, and therefore by definition to any intelligent software application acting ultimately on behalf of the human reader, or indeed autonomously.
The eXtensible Business Reporting (mark-up) Language (XBRL) was specifically invented to address the data representation end of the spectrum for business and financial data. HTML is of course subject-neutral, so it addresses the publishing end of the spectrum for business and financial information. How do we get them to meet in the middle and provide the business and finance pieces of the Semantic Web jigsaw puzzle?
Step forward, Inline XBRL – the topic of my final post in this series, to be published next week.


Bob Schneider is a Partner in
Wilson So is the Director of Hitachi Consulting Corporation
March 4th, 2009 at 12:08 am
Great blog Andy…
Looking forward to your piece on Inline XBRL. I aim to demonstrate XBRL tagging (and Inline XBRL output) against the current HMRC Corporation Tax taxonomy next Wednesday (11th March) at the Clarity Systems Breakfast Seminar at The Royal Society. See link for details…
It would be great if you could make it.