Considering Alternate Representations of XBRL

Written by Kurt Cagle     Posted on June 9, 2009

Kurt Cagle is the managing editor of XML Today and is a contributing editor to O’Reilly Media, DevX, and TechNewsWorld.  He is working with Diane Mueller on a general XBRL book for O’Reilly Media to be published in early 2010. Mr. Cagle runs a consulting firm, Metaphorical Web, dealing with future technology issues, XML, distributed computing, and the like. He can be reached by email.  

I’ve been working with a fair amount of raw XBRL files recently, both as research for a book and because my interest in the technology comes — not from its use as an accounting format — but because it is an XML format. I suspect, given everything else, that this is a somewhat unusual perspective. An accountant sees an XBRL "document" as a set of "facts" about a given company, and is largely unconcerned about the internal representation of the format because, all other things being equal, this representation is simply "code."

An XML programmer, on the other hand, sees the world a little bit differently. Most XML programmers are not domain experts (save in their own domain of working with code). For them, XML is something that can be queried, transformed, refactored, and bound to all kinds of external representations…something that can be served up on the Web and stored within specialized kinds of databases. The specific element names are pretty much unimportant (for the most part); it’s the broader structural characteristics of the XML that they tend to work with most heavily.

Given that perspective, the XML developer’s view of XBRL is, to put it bluntly, rather grim. XBRL was designed deliberately to have as little structure as possible, with most of this structure added in via linked lists. Structure does exist, of course, but it’s all referential. For instance, XBRL has the notion of a context, in which all data items from a given report each have a context attribute which points to a corresponding data structure that gives details about the context: the reporting period, reporting agency, any specifics about the preparer, and so forth.  Any XML designer would likely fold up all of the key information about that context into a contained element, with an appropriate identifier, and any elements that span contexts could then reference the primary context as an IDREF.

Similarly, linkbases, one of the central characteristics of XBRL, intermix a huge amount of information that needs to be processed and stored in memory in a far more efficient form, typically one that involves resolving three to four orders of links. Yet for some areas such as labels, the most likely putative cases for having more than one label per schema, localization, is generally better solved by having multiple localization files, each of a different language, with the mappings contained in a direct one-to-one manner between schema name and label all the terms in that language. When multiple instances do arise, the XML can be modified accordingly with secondary attributes to discern this information.

The idea in both of these cases is the same. If XBRL.org was to sanction multiple potential acceptable alternate representations of XBRL, each of which could, with acceptable XSLT transformations, be rectified into the canonical format with no loss of representation, such alternate representations would dramatically reduce both the size and the complexity of XBRL documents from the standpoint of the Web developer. In some cases, especially those involving interbank transfer of information in Europe, the multidimensional nature of linkbases satisfy a very real need for hypercube-oriented processing.  In situations like this, the alternative formats may not be tenable. But in most cases (such as the case of the SEC filing data), the hyperdimensional linkages used by canonical XBRL are overkill, and don’t in fact work terribly well on the Web.

This would have additional implications. Functions, which are currently hideously complex, would end up being made much simpler in alternative formats, especially if you can bind them to specific representations that would be appropriate to the format at hand. An XQuery representation, for instance, would work especially well for XBRL stored within XML databases. In a similar manner, such an approach could be used to build JSON representations of data with JavaScript functional layers, something that would go a long way to making XBRL appealing to the Web 2.0 crowd.

The language certainly has most of the mechanisms in place to handle the most complex use cases, but it needs to be able to scale down in those cases where it has less need for the complex processing. Without it, others potential standards (such as the US National Information Exchange Model, or NIEM) may render it obsolete quickly. (I plan a discussion of an NIEM-based XBRL for a future post.)

Like many standards, XBRL needs to be cognizant of changing technologies and requirements, or it runs the risk of becoming ossified, something which benefits no one in the long run.

Add to Del.icio.us | Digg this

6 Responses to “Considering Alternate Representations of XBRL”

  1. bex Says:

    “If XBRL.org was to sanction multiple potential acceptable alternate representations of XBRL, each of which could, with acceptable XSLT transformations, be rectified into the canonical format with no loss of representation”

    If we can render any limited form into the canonical form, why is there a need to allow for limited formats? It seems, at first glance, that preparers are not benefited by this rule. You are adding the complexity of format choice (and the potential for different reports or periods to be in different formats) while requiring the ability to always be able to produce a different format. Given that there are no restrictions on the intermediate steps that lead to an XBRL filing why shouldn’t a preparer choose whatever intermediate format they want, but still be required to submit canonically?

    Consumers also do not seem to get much benefit. Consumers would now have to have engines capable of reading all formats possible, or worse have to enter into format negotiations (with the end-game always being the canonical form).

    I understand the desire to fix the programmatic aspects of XBRL (working with the files can get tiring), however something with so few “stakes in the ground” must wind up over-engineered. Even XBRLS, as I understand it, only has the goal of simplifying XBRL, not trying to create alternative formats. (I have only lightly read the XBRLS documents – so take this statement with a large grain of salt.)

  2. Kurt Cagle Says:

    The key in the above statement is the word “transformations”. This is one of the things that non-xml people working with XBRL often fail to understand – perhaps one of the most significant facets of XML is the fact that there exists a language that is specifically designed to transform one XML format to another – XSLT. An XSLT can be thought of as a filter that can be applied programmatically. The key point here is that if you establish both a canonical and one or more alternative XBRL formats, the alternatives should each be clearly defined such that, when an appropriate XSLT transformation (or stylesheet, as such documents are known) is applied, it can produce the canonical XBRL.

    I actually do this all the time with my own work – take the SEC XBRLs (which can be large, complex and hyperlinked) and run them through a transformation that will generate a format that is much more readily useful. Moreover, the transformation can perform additional operations, such as validating the content against established business rules in other formats (such as Schematron), creating visual presentations of the content, and so on. What’s more, you can also extend XSLT, so that if you add new elements into an established XBRL, you can import the XSLT that supports these new elements as a subordinate file.

    So, to put this into a workflow – company A works with alternate XBRL-A internally in order to work as efficiently as possible with their underlying data models. When the time comes to submit their XBRLs, they could apply the requisite transformations that will generate the output either as a series of XBRL documents or as a zipped resource containing same, and either post or send off the appropriate output.

    Similarly, if a given person needed company A’s financials in canonical format, then all they would need to do would be to submit the XBRL-A to an open transformation web service that would return the results in an XBRL-core document. Similarly, the SEC or other regulatory body could apply the same set of transformations on XBRL-A to get back their canonical format. The XSLT tools are freely available and open source, the stylesheets themselves would be the responsibility of the format designers to provide. Moreover, by going with an XSLT approach, existing tools could incorporate an XSLT engine (if it doesn’t already have one – many already do) and could then load the associated stylesheets as appropriate to handle the processing without changing the underlying XBRL processing engine.

    I’d also raise questions about consumers not profiting. First, the question really comes down to which consumers you’re talking about. If you can create a format that’s more efficient at encoding information for the 95% of companies that do not require special use cases, can move this into a format which makes it easier to handle ready binding to different presentation outputs and can do so that the only tools that a person needed to make this happen were open source XSLT tools that can readily be run on a server, this means that an analyst could use a web browser to view the appropriate output, and with XForms (another open standard tool) could even create useful XBRL instances from that same browser. XML database users could more efficiently store this information and search it if there’s more underlying schematic structure, and can generate reports analysing multiple XBRLs tools as appropriate.

    The big point here is that a decision was made about ten years ago to express this information in XML, but it was XML that at the time lacked most supporting tools such as XSLT, XQuery or XForms (not to mention the whole raft of Semantic Web related tools such as RDF, RDFa and SPARQL), and it was also informed heavily by database vendors that wanted a piece of the XBRL action. Now, most of those same vendors have XML Databases that are much more readily suited for handling structured content as well as these tools, there’s a decade of revised methodology concerning the best use of XML, and processing power has improved by a factor of a hundred or more.

    Many of the big headache issues that have faced the XBRL.org committee could be readily solved if they simply admitted the obvious – that XBRL IS XML, not some weird variant, and that the tools exist, are freely available, are powerful, and are preferable to the cumbersome solutions put in place to try to turn XBRL into its own meta-language.

  3. Evan Lenz Says:

    Nice article, Kurt. I couldn’t agree more. I haven’t heard from XML-in-Practice 2009 yet, but I’m hoping to present on this very topic. Here are the last two paragraphs of my conference submission (since I’m lazy):

    “While XBRL uses XML technologies through and through (XML for instance documents, XML Schemas for defining taxonomies, and XLink for linkbases), it uses such a high degree of indirection and abstraction that much of XML’s lauded human-readability is missing. This is a problem: XBRL’s complexity runs the risk of limiting its promise for promoting financial transparency and innovative uses of public financial data.

    I will introduce a set of XSLT stylesheets that convert XBRL taxonomies and linkbases into an intermediate XML format that is more amenable to various kinds of processing, including display rendering and queries over an XML database. XML ‘views’ such as these may pave the way for XML developers to more easily leverage XBRL in their applications.”

    In 2001, I explored this briefly using an old version of XBRL, but I haven’t yet published anything for XBRL 2.1 (the idea being that I’d do that for the conference in September).

  4. bex Says:

    Kurt, I agree that XBRL could be rendered in a more simpler XML form for many organizations. However that issue and the issue of what forms should be blessed by XBRL.org are not necessarily related.

    The heart of the issues is in these statements:

    “[A] decision was made about ten years ago to express this information in XML, but it was XML that at the time lacked most supporting tools such as XSLT, XQuery or XForms …

    “Many of the big headache issues that have faced the XBRL.org committee could be readily solved if they simply admitted the obvious – that XBRL IS XML, not some weird variant, and that the tools exist, are freely available, are powerful, and are preferable to the cumbersome solutions put in place to try to turn XBRL into its own meta-language.”

    These statements call upon XBRL.org to perform a major overhaul. While this would clean up the coding, I believe it would push XBRL to think about the information from a different paradigm. More fact oriented and less transformational (from the point of view of paper filings).

    However this is not the same as simply adding additional formats. I think having multiple formats is a mistake if they are all functionally the same. Nothing stops the company in your example from working in whatever intermediate form the wish. [Taken to an extreme, one could argue that the accounting system of a company is their pre-XBRL intermediate form.] However, when they communicate with the outside world they should use the standard format. Similar to the various lingua franca that have been used over time, a single XBRL standard allows companies to learn two things (XBRL and their intermediate internal form) not many things (XBRL.org’s plethora of proposed/adopted forms).

  5. Kurt Cagle Says:

    beX,

    I’ve been thinking about this for some time after I wrote the initial post, especially with the comments given here and additional conversations that I had at the Semantic Technology Conference this week. Dave Raggett, one of the luminaries of the XML world and someone quite skilled in Semantic Web technologies, had attempted to express the XBRL in RDF/OWL, which satisfies at least some of the linkage issues involved, but has found that the language had some fundamental problems even within that expression.

    From an ontological standpoint, this raises red flags to me, as it says that there are aspects of XBRL that have emerged that are conceptually ill-founded. Moreover, one of the more fascinating things I find is that there are few people that I have talked to within the XBRL.org community itself who are ultimately happy with the specification as it stands. Now to a certain extent unhappiness with specifications (even data definition specifications, which are by their very nature contentious) is par for the course … standards are compromise creations in which no aspect of the language will satisfy everyone. Yet what I hear of late is that most people dislike the spec, but are unwilling to change it because it has been too widely adopted.

    The XBRL and Semantic Web communities will be meeting in July 2009 (a couple of weeks away as I write this) in order to establish a dialog for trying to come up with a form of XBRL that can readily be expressed in OWL. I’ll be raising in a post soon a fairly dramatic notion that this provides a perfect opportunity – if it is handled well by the XML, Semantic Web and XBRL communities – to start down the path of a 3.0 version of the XBRL language, one that is RDF/OWL compliant, that has a clear reducible path to XML, and that still satisfies the needs of the accountants and regulators who are having to deal with the language. It may be backwards compatible with the 2.1 version, though I’m not really sure that this is a necessity.

    It would also be a good chance to open up the language to ontologists who have had to deal with similar issues to attempt to come up with solutions that better reflect the subtle differences between national accounting standards languages, would align the presentations layer better with existing XML and W3C standards, and would formalize what has already been occurring informally – the use of XPath/XQuery as a mechanism for functional definition.

  6. Eugene Weinstein Says:

    “… XBRL has the notion of a context, in which all data items from a given report each have a context attribute which points to a corresponding data structure that gives details about the context: the reporting period, reporting agency, any specifics about the preparer, and so forth. Any XML designer would likely fold up all of the key information about that context into a contained element, with an appropriate identifier, and any elements that span contexts could then reference the primary context as an IDREF.”

    Perhaps I am misinterpreting what you mean by “fold up”, but context element’s @id is ID, while @contextRef on facts is IDREF.

    “Similarly, linkbases, one of the central characteristics of XBRL, intermix a huge amount of information that needs to be processed and stored in memory in a far more efficient form, typically one that involves resolving three to four orders of links. Yet for some areas such as labels, the most likely putative cases for having more than one label per schema, localization, is generally better solved by having multiple localization files, each of a different language, with the mappings contained in a direct one-to-one manner between schema name and label all the terms in that language. When multiple instances do arise, the XML can be modified accordingly with secondary attributes to discern this information.”

    Language is not the only variation of labels for the same element. Another factor is label role. For example, for a “Profit or Loss” element you can have a “positive” label (”Profit”) and a negative label (”Loss”). You can use terse labels for display on mobile applications. You can even create your own label roles. And even switching display between languages dynamically (without having to re-request or re-process anything) is easier achieved with a single label linkbase. Nothing prevents you from creating a separate label linkbase file per language, role, or their combination, and only including those you are interested in into the discoverable taxonomy set.

    As far as other types of linkbases go, they are not necessarily restricted to linked lists (trees), as some allow cycles. Generic linkbase allows you to reference a lot more than just concepts and resources. It is used, for example, for Formula linkbase, where you can find references for filters, variables, variable sets, etc.

    Of course, XBRL files that make up taxonomies and instances can be processed in XSLT and at times it is very useful. However, there are many other kinds of transformations or queries that cannot be done in XSLT. Standards for expressing business and workflow rules in XML are only beginning to emerge. Of course, there can be a simplified XBRL model, more suitable for XSLT or for running XQuery on, but I doubt it would be the canonical model (that is, roundtrip-transformable).

Leave a Reply