XBRL: The Meaning of Information
Written by Andy Greener Posted on February 19, 2009
Andy Greener is a software architect at Her Majesty’s Revenue & Customs with responsibility for, among other things, interoperability and the use of standards. He is also strategy architect for HMRC’s Company Tax online service, which has mandated the use of XBRL for the accounts and tax computation components of all company tax returns filed after March 31, 2011. He is a Chartered Engineer and Chartered Information Technology Professional with over 27 years’ experience as a software engineer. You can contact Andy by email.
This is the first of a three-part post on XBRL and the Semantic Web. (In Part 2 published subsequently, Andy looks at how display- and meaning-oriented markup languages culminate in the Semantic Web, which opens the information content in web pages to intelligent applications. In Part 3, Andy discusses how, in the financial world, Inline XBRL serves both human and computer in ways best tailored to each of them. )
What is the meaning of information?
We’re all used to absorbing information from documents and drawing our own conclusions based on the apparent facts or patterns of information we see there. Indeed, many professions are founded on this useful ability of the human brain, honed to near perfection by years of education or training. But what are we actually seeing and understanding when we look at a document designed for human consumption? How do we actually get at the meaning? Before we examine this last question, let’s take a small diversion back to some "first principles."
Fundamentally, when we look at a document we’re looking at an organised collection of symbols laid out in two-dimensional form. We can discern some meaning just from the style, size, or position of these symbols. We hope at least some of the symbols are in an alphabet we recognise, and we make an assumption or educated guess about the direction of travel of our eyes in absorbing symbol sequences. There should be collections of symbols (words, numbers, or emoticons), collections of collections of symbols (sentences and paragraphs), punctuation marks, and so on. Some sentences may stand on their own in a larger or bolder font (titles or headings), some (numeric) symbols may sit alone, denoting a page or chapter number. The absence of symbols ("white space") may be as significant as some of the symbols themselves. All of this visual information provides cues for our attention and draws our eye (and therefore our brain) to the deeper and more meaningful information that the document is trying to convey.
Speaking of deeper meaning, the collections of alphabetic symbols probably represent words (which stand for concepts) in a familiar human language — words that are defined in a dictionary so that you can learn to associate meaning with those ordered collections of symbols. Some words may appear in a different dictionary altogether, and they may be subtly different visually as a result — we are all familiar with the italicised Latin phrases that pepper the text of the erudite (or pretentious!). Human beings are innately capable of organising collections of concepts (at least verbally) into structures that are governed by a set of rules – i.e., a grammar — and which can convey complex and subtle layers of deeper meaning as a result.
For a human being, then, understanding the meaning of a document involves many layers of (sometimes unconscious) information analysis, both visual and conceptual.
Imagine now that you are rendered blind. The two-dimensional nature of written documents is no longer apparent to you — those parts of your brain that unconsciously or consciously manage all the visual cues, from font size to paragraph layout, from section titles to numbers, are now bereft of input and, as a result, defunct. Instead, a colleague is going to read the document to you from start to finish, in serial form. What you are going to hear is a one-dimensional stream of concepts (words) and some supporting descriptions, thus: "page 1" – "start paragraph" – "What" – "is" – "the" – "meaning" – "of" – "information" – "question mark" – "end paragraph" – "start paragraph" … and so on.
Of course, you no longer need to discern some things from visual cues — the order of the words and punctuation is now self-evident, whether you’re listening to Chinese, Arabic, or English, and you probably no longer care about the physical page structure of the document. But you do need to know, for instance, which words need emphasis, how the words have been collected together into sentences and paragraphs, and which sentences are actually headings, sub-headings, or quotes. Your colleague may “adorn” the text with explicit instructions, such as new paragraph, and may use audible cues (such as raising or changing his voice) to imply emphasised words or quotations.
Run-of-the-mill computers are of course devoid of human senses (particularly the one we call "common") and need an especially pedantic form of the document "serialisation" illustrated above to make any sense of prose, even if only to re-create the two-dimensional visual form we humans take for granted. It is still beyond the capabilities of most computers to divine any kind of meaning from a stream of words, let alone the deeper meaning we regularly infer.
Our thought experiment illustrates what early type-setters in the printing industry referred to generically as "mark-up" – a term that has found its way in to the world of computer-based document rendering, most prominently as the ‘M’ in "XML" and "HTML." Interspersed in a stream of words are "instructions" that put the two-dimensional information back into the document, allowing a computer to "render" the serialised document onto a screen or a piece of paper in a form that, visually at least, we humans are familiar with.
I’m not going to delve into the minutiae of mark-up here, but suffice to say that when you take a peek at the source of a web page, or any XML or XBRL document, all that stuff inside the angle-brackets is mark-up — instructions for a computer that make some sense of the document content. But, just like the layers of meaning that we perceive when we read a document, there are different kinds of mark-up, each with a different job to do.
In Part 2 of this post, to be published next week, Andy will discuss how display- and meaning-oriented markup languages culminate in the Semantic Web, which opens the information content in web pages to intelligent applications.


Bob Schneider is a Partner in
Wilson So is the Director of Hitachi Consulting Corporation