How to Approach Learning XBRL

Written by Michal Piechocki     Posted on May 22, 2009

Michal Piechocki is CEO of the Business Reporting Advisory Group (BR-AG), an international XBRL advisory company. He is an XBRL International Steering Committee At Large Member and also serves as a Member to the XBRL Quality Review Team of the IASC Foundation. He will be conducting Taxonomy Development Training at the 19th XBRL International Conference in Paris. He can be reached by email.

Suppose learning XBRL was like baking a cake. What would be the recipe?

Start with taxonomies, extensions, instance documents, FRTA, FRIS, business reporting supply chain, dimensions, formulas, versioning, rendering, arcs, extended links, layers, references, contexts, units, and footnotes. Mix in modularization strategies, design approaches, and architecture variations. Spice it up with the XBRL specifications technical jargon, and top it off with the inherent complexity of the accompanying accounting standards.

The result: a ten-foot-high gastronomic wonder that XBRL experts have learned to consume.

But… for most users, must XBRL really be so complex?

All students of XBRL, either enthusiastically interested to learn about the standard or obligated to prepare electronic financial reports, must contend with the labyrinth of baffling concepts used by XBRL experts. The complexity multiplies if you are not proficient in English and need both to acknowledge the technical terms and digest and apply the underlying knowledge at the same time.

Fortunately, there exist several levels of understanding of XBRL, and, depending on your needs, you may not be required to tackle many of the its thornier problems at all.

Let’s try to organize the XBRL Knowledge Domain.

First, we have several sources of information:

1.    Official XBRL specifications: recommendations, candidate and proposed recommendations, public working drafts, and requirements documents;
2.    Official XBRL best practices announcements;
3.    Publicly available white papers and educational materials;
4.    Official XBRL taxonomy websites (e.g. IFRS, US-GAAP, Basel II);
5.    Official XBRL projects websites;
6.    Presentations and course materials from XBRL conferences;
7.    Commercial resources and training material.

Key Point: There is an abundance of XBRL explanatory and guidance materials. The majority of them are publicly available. Search and familiarize yourself with available resources.

Second, analyze knowledge requirements from the perspective of the user. The software vendor willing to build XBRL into its products will have demands distinct from those of an SEC filer obligated to prepare a financial report.

Here are the most common user groups from the business reporting supply chain (BRSC):

•    Preparers of reports (e.g., corporate entities);
•    Regulatory bodies (e.g., Securities and Exchange Commission);
•    Software providers (e.g., ERP system providers);
•    Financial services providers (e.g., auditors);
•    Data vendors (e.g., data agencies);
•    Investors and financial analysts (e.g., CFAs).

Each requires different levels of XBRL awareness, and the existing knowledge base provides public and commercial resources for all levels.

For example, preparers do not necessarily need to dive into the deep waters of XBRL specifications. On the contrary, they should be more focused on the scope of the data to be provided within the XBRL report. The complexity of the XBRL standard might therefore be hidden in appropriately constructed software.

The opposite approach should be taken by software vendors, who are encouraged to explore not only the specifications but also the reasoning behind them (to decipher this, historical mailing list discussions can be helpful). This approach should help developers prepare more accurate, valid, and more efficient solutions.

Key Point: Place yourself on the BRSC and approach learning about XBRL from that perspective.

Third, get involved in international discussions and directly contact XBRL experts. The XBRL standard has been here for the past 10 years, and public mailing lists have thousands of individuals who are able to provide answers to your questions. XBRL International, together with regional jurisdictions and over 600 member entities, provide multiple opportunities to understand comprehensively the standard and apply practical solutions to streamline your daily financial routine.

Key Point: Don’t be afraid to reach out to the community or the organization itself. You may encounter individuals who will politely guide you simply to read the manual in case of extremely basic questions, but there is a much higher probability that you will find supportive members pointing you in helpful directions.

The XBRL knowledge domain is admittedly broad. However, step by step, you will find that the initially indigestible cake consists of hot raspberries at the bottom, vanilla ice cream in the middle, and chocolate at the top. It may actually be very tasty.

Welcome to the XBRL emporium!

Semantic XBRL Data Search Using SPARQL

Written by Ashu Bhatnagar     Posted on May 19, 2009

Ashu Bhatnagar is CEO of Good Morning Research, a Softpark company that specializes in building Semantic XBRL technology. The GoodMorningResearch.com machine automates XBRL tagging of Excel data in RDF format with one-click Save As XBRL functionality. Mr. Bhatnagar also moderates the Semantic XBRL group on LinkedIn. His earlier posts on the Semantic Web and XBRL are Introducing Semantic XBRL and Semantic XBRL Transparency, Verification, and ‘Raw Data Now’.

We all use Google for Web searches on a daily basis and admire the simplicity of its front-end user interface. It’s nice, clean, fast, and simple.

Behind this simplicity lie sophisticated index databases and advanced search technologies, but we as users don’t need to know or understand these. All we need to know are smart keywords that help direct our searches from hundreds of billions of marked-up HTML pages scattered across the global Internet.

When we try to search using regular SQL database search technologies, though, we run into difficulties. Why? Because most of this web content is in distributed HTML flat files and isn’t organized in any centralized database with well defined data structures and schema. It’s like a world full of roads with no roadmaps. Go discover!

Search engines like Google, Ask, and others find the content that matches with our queries by building and employing centralized databases that contain metadata, where every keyword acts as a tag and has fast and efficient links to corresponding websites. In other words, a search engine acts like a very knowledgeable guide for us, responding to our queries with found/not found answers based on the Internet roads it has access to and has crawled before.

Why not use such a powerful search front-end to query financial research data? During my experience working with both sell-side and buy-side research analysts, there has been a long standing request to build such a tool, but until recently, the short answer to this request has been “No!”

No, because it’s technically too difficult or it’s too expensive.

No, because Google deals with text and not data, which has both context and meaning. Data is far more challenging to search, because even when it’s on the Web, it is marked up with HTML as text, not as data, thereby losing its context for meaningful search.

No, because there are no generally accepted standard financial dictionaries, or taxonomies, that define terms such as revenue, sales, or net income as synonyms.

Until recently this list of No’s has been long. The good news is that the list is now shrinking quickly with the increasing adoption of XBRL and EDGAR standard taxonomies and the release of several XBRL tools.

All that is needed to accomplish powerful search of financial research data is to subscribe to the SEC’s XBRL filings as free RSS feeds, extract XBRL data into our own relational or Google-like index databases, and use SQL to find answers to our queries. As an alternative, we could subscribe to third-party data services firms like Bloomberg, Thomson Reuters, Factset and others that would add XBRL data to their current aggregate data and continue to offer this as a service.

The news gets even better when we add SPARQL, a W3C specified query language for RDF, to XBRL and Linked Data.

Jim Rapoza, Chief Technology Analyst of eWeek, explains:

Called SPARQL (pronounced "sparkle"), this standard brings about a standardized SQL-like query language for the Semantic Web. And, like most Semantic Web standards, it is heavily based on RDF (Resource Description Framework), although it also makes use of many Web services standards, such as WSDL (Web Services Description Language).

SPARQL essentially consists of a standard query language, a data access protocol and a data model (which is basically RDF).

Some people out there are probably thinking, So what? Sounds like just another search tool—big whoop. But there’s a big difference between blindly searching the entire Web and querying actual data models.

The ability of database queries to pull data from giant databases is pretty much the basis of a large number of enterprise applications. No one argues about the value of being able to write a query in an application that can pull relevant customer and product data.

Now, imagine writing a similarly small application that does the same thing—only with data stored across the entire World Wide Web.

That would include all the companies who not only file in XBRL but also, in conformance to SEC requirements, will be posting XBRL data on their own company websites.

In essence, with SPARQL, we can choose to build centralized databases to query XBRL data, but we don’t have to. We simply can point our queries to so-called SPARQL endpoints that — unlike traditional database requests that must be under one administrative control — can span the Web over thousands of company websites with XBRL data and obtain results as if they came from one centralized database. Imagine the cost savings in not having to build and maintain a huge and growing centralized database.

Applications for publishing XBRL as Linked Open Data are limited at this time, but they are emerging. As one example, Roberto García and Rosa Gil describe their work undertaken at a Research Group at Universitat de Lleida, Spain, which extracted 1.34 million triples from 612 XBRL filings. (Triples are semantic data elements in RDF format.) The process of extraction is machine automated and results in transforming XBRL data into Semantic Web formatted RDF data.

In addition, sufficient examples in the current Web exist to give us insight into how the user experience might look when Semantic XBRL applications go into production use. Next time you search for the best flight for your air travel on sites such as Orbitz, Kayak, or FareCompare, take a pause and observe that the flight schedules, prices and airline details are being pulled not from any one centralized database but from a variety of airline databases, in real time, to match your exact itinerary requirements, thanks to some very specialized and complex technologies.

In summary, SPARQL makes Semantic XBRL searches possible on-demand across a distributed web space while simplifying front-end design, and keeping the complexity of technology hidden and out of sight from end users.

A Google-like experience of searchable financial research data is coming. The future looks bright.

Background Information and Future Plans of the Bank of Japan’s XBRL Project

Written by Yoshiaki Wada     Posted on May 11, 2009

Yoshiaki Wada is currently Director and the head of the financial data center section of the Bank of Japan’s Financial Systems and Bank Examination Department. In this capacity, he is responsible for the operation of the bank’s financial database system and leading the development of the XBRL-based data collection system.

Almost six years have passed since I found XBRL: I joined the XBRL world at the 8th International XBRL Conference in Seattle in November 2003. Since then, I’ve discovered — as I am sure many of the readers of this blog have — that it hasn’t been easy to get a broad appreciation of XBRL in the general business society. I hope our experience in Japan may give you some ideas on ways to overcome the difficulties you may encounter in your individual projects.

Outline of the Project

The Bank of Japan (BOJ) is the central bank of Japan. Our prime mission is maintaining price stability and the sound development of the Japanese economy. To achieve these missions, we closely monitor the soundness of financial service institutions (FSIs) in Japan. For timely monitoring, efficient data gathering from FSIs is essential, and XBRL has good potential for that purpose.

Although we decided to adopt XBRL, it was not clear to us what kind of business flow XBRL is most suitable for, nor the workload of implementing XBRL. In addition, there were the following three major issues to be solved before actual implementation:

  • How to check whether XBRL is suitable for the BOJ’s data-gathering framework
  • How to make people aware of the merits of implementing XBRL
  • How to encourage people to use XBRL

The XBRL Pilot: A Step-by-Step Approach
In the first stage, only a few persons on our team were available for the XBRL project. Because of limited budget and human resources, there were few options open to us, so we chose to execute in a step-by-step approach.

First, we started the pilot project with FSIs on a voluntary basis. In July 2004, an announcement soliciting voluntary pilot test participants was made on the BOJ’s website. We were lucky that 31 cooperative and courageous FSIs applied for this test of unfamiliar technology, because XBRL was almost unknown in the finance industry in Japan at that time.

To help them understand XBRL and the purpose of the test, members of our XBRL team visited candidate FSIs one by one, which were located all over Japan, and gave hours of instruction on XBRL. The summer of 2004 was unforgettably hot for us, not only because of the elevated temperatures but also the psychological pressure to complete the first step of the project.

The contents of the test were rather simple. FSIs were requested to read the manual, install the necessary tools on their PC, and then convert their monthly balance sheet data from Excel format into XBRL. Although assistance was given by telephone because of limited manpower, all the FSIs could achieve the test program as planned.

As a consequence, we felt that any FSI could produce XBRL data provided they were given a user-friendly manual and easy-to-use tools. In addition, the IP-VPN system which was under construction for different purposes seemed to be suitable for sending metadata to FSIs and gathering XBRL data from them.

The most important result of the pilot test was that we found we could design the basic reporting scheme based on XBRL with complete confidence that XBRL was suitable for us.

How the BOJ’s XBRL Scheme Works
BOJ’s XBRL reporting scheme is based upon an interactive network system, which is called BOJ-Info system. BOJ-Info system connects BOJ and FSIs via IP-VPN, enabling both sides to send and receive any electronic data set smoothly and securely.

The resulting business flow works as follows.

(1) The BOJ prepares metadata such as the taxonomies required for reporting work. This metadata is uploaded to the BOJ-Info system.

(2) The FSI downloads the metadata via BOJ-Info and stores it on a PC which contains the BOJ’s tools for producing XBRL data.

(3) The FSI then imports report data previously created with Excel into the PC and creates data in XBRL format.

(4) Having checked and corrected any errors using formula linkbase, the FSI sends the XBRL data to the BOJ via the BOJ-Info system.

(5) The BOJ stores the XBRL data sent by the FSIs in a database. After again checking for errors in the database, the data is used for monitoring and producing statistics.

This reporting scheme has the following unique features.

(1) All necessary tools, taxonomy, and BOJ-Info system are supplied by the BOJ for free. Tools are designed to be easily added on the FSIs current system and do not cause any unexpected implementation cost for FSIs.

(2) The XBRL data generating tool is very easy to use, so FSIs can generate the XBRL instance file without any outside help. Consequently, data security can be maintained.

(3) By using formula linkbase, data accuracy is improved. The cost of data handling is minimized for both FSIs and the BOJ.

One of the most important features of our scheme is formula linkbase. When we decided to adopt formula linkbase, its specifications had not been finalized and only FDIC was implementing it. Why did we decide to use formula linkbase?

XBRL is a powerful tool for efficient data exchange. However, it is not useful enough for FSIs, because it is not necessarily worth the workload of generating XBRL data. To enhance the acceptance of XBRL in society, FSIs needed some benefit to offset the cost of generating the data. We searched for a solution and found the answer in formula linkbase.

Formula linkbase enables FSIs to validate data before submitting it to the BOJ, thus reducing the workload of correcting errors both for FSIs and the BOJ. Formula linkbase thus became core to the BOJ’s scheme in order to enhance and ensure the efficiency of XBRL.

What We Have Learned

Since February 2006, our XBRL scheme has been working without any major troubles. In addition, revision and re-distribution of the current taxonomy and the release of the new range of the taxonomy have been successfully done.

Simple revision and distribution of taxonomies is a key factor for using XBRL. Please remember that once you start using XBRL, you must maintain the taxonomy until you decide to shift to a better technology than XBRL. It is therefore crucial to build a workable and efficient overall reporting scheme which includes developing taxonomies in a timely fashion, notifying FSIs of  revisions and new releases of taxonomies, delivering taxonomies, and receiving reported XBRL data.

One of the most important successes in the past three years has been that, although XBRL was not mandatory in our reporting scheme, all FSIs submitted in XBRL format voluntarily. In other words, once XBRL reporting was ready, 100% of FSIs used it. A colleague said he could not believe it, because some banks in his country would never submit XBRL data even if required to do so by law.

What made it possible to achieve 100% XBRL based submission? One of the key factors is undoubtedly formula linkbase. Looking at the results of a questionnaire survey on the usability of BOJ’s XBRL tools and reporting scheme, which was conducted in August 2008, more than 70% of the FSIs recognized the merit of formula linkbase. In addition, many banks wanted formula linkbase to be extended to other reports.

This reveals an important aspect of technology implementation. We must always pay attention to benefits for users and try to build a popular scheme.

Looking back on our project, the following factors seem to be critical for the smooth implementation of XBRL:
-    Taxonomy with high maintainability
-    User-friendly tool
-    Well-designed reporting scheme
-    Well-organized project team

If any of these factors had been missing, our project could not have succeeded. However, to me, the last factor seems to be the most important.

BOJ’s XBRL team consists of five persons, three of whom have built all the taxonomies. They are always studying the latest technologies and trying to update their taxonomy library. In addition to the XBRL team, more than 20 people work together and cooperate to operate the database system, maintain the IT infrastructure such as BOJ-Info, and produce statistics. Without good teamwork, our breakthroughs could never have been achieved.

Future Plans
Once achieving a goal, one becomes greedy for new challenges. In our case, the XBRL-based data gathering scheme has been built. Although there is some room for improvement, basic functionality has been stable and the XBRL-based “bridge for data exchange” between BOJ and FSIs has been realized. What, then, is the next target? We have two major objectives in mind:

First is an XBRL-based database. We have stored XBRL instance data in flat files, but have not yet stored it into a native XBRL database. Currently, received XBRL instances are shredded into simple CSV that a conventional relational database can read; lots of metadata is not used. Our new database is a hybrid that is capable of supporting both XBRL and conventional data. The project started in this spring and live use is expected in September 2011.

Second is a handy and capable XBRL data analyzing tool. Using raw XBRL instances, various types of data analysis such as cross section analysis and panel approach can be easier realized. We anticipate experts both in Japan and around the world to be working on such an innovative tool.

In my humble opinion, it is just this kind of global cooperation that will continue to move XBRL forward, making it a vital and necessary tool not only in Japan, but throughout the world. I look forward to our working together and sharing the knowledge we gain. 

Semantic XBRL Transparency, Verification, and ‘Raw Data Now’

Written by Ashu Bhatnagar     Posted on May 7, 2009

Ashu Bhatnagar is CEO of Good Morning Research, a Softpark company that specializes in building Semantic XBRL technology. The GoodMorningResearch.com machine automates XBRL tagging of Excel data in RDF format with one-click Save As XBRL functionality. Mr. Bhatnagar moderates the Semantic XBRL group on LinkedIn.

Is anyone interested in the transparency of financial data?

That’s obviously a silly question: institutional investors, market regulators, analysts, financial advisors, and even some sophisticated individual investors are very interested in greater transparency of financial data than what is currently available. This interest in transparency is driven not only by the need for keener insight but also by an interest in managing risks — particularly when such data is used to drive significant investment and trading decisions that affect trillions of dollars in wealth on global capital markets.

However, trust often ends up used as a proxy for transparency, replacing it either by choice (e.g., to protect company secrets) or because of practical considerations (like limited resources of time, money, or reporting skills). In addition, compared to transparency, trust is simple, convenient, and less work!

Trust can be broken, however, in cases of fraud like Bernie Madoff or inadequate financial models like those used to value and rate complex instruments, such as credit default swaps and mortgage-backed securities before the real estate bubble burst. That’s when we are reminded of Ronald Reagan’s old maxim: trust but verify.

While Semantic Web and XBRL technology cannot provide a cure-all solution, they go a long way to integrating verifiable data into every step of financial reporting and organizing it for analysis and insight. Verifying financial data requires exposing transparency at every stage of the information supply chain as the data moves through it. The requirements are twofold:

(a) Make source data transparent in its raw format; and

(b) Make every step of any value-add process that normalizes and modifies this data transparent as well.

Examples of such value-add processes include normalizing and tagging raw data with metadata and taxonomy labels for comparability, applying relevant accounting adjustments, normalizing currencies and reporting periods, and adding other business rules and assumptions.

The first part, achieving transparency in publicly available raw financial data, is comparatively easy to attain. The second part, however — achieving transparency across value-add processes — may not be so. This is because value-add processes may be highly prized, and business rules and calculation models may be proprietary and closely guarded by data providers. In such cases, trust generally substitutes for transparency, but this requires an architecture that affords appropriate verifiability. The Semantic Web is designed to meet this challenge.

In my post last week, I discussed how Tim Berners-Lee coined the term Semantic Web in a roadmap for future Web design. A quick look at his Semantic Web architecture diagram, popularly known as “layer cake,” clearly demonstrates that the issue of trust — and, by extension, transparency — is addressed at a fairly high level in the Semantic Web’s stack. This architecture diagram has undergone several refinements; the most recent version (shown below) is on the W3C website.

To crystallize for our purposes: while building trusted systems, it’s necessary to go back not only to the source, i.e., raw data, but also to metadata about the source. The single most important point to derive is that the Semantic Web architecture explicitly addresses the issue of trust (and therefore transparency) at a much higher level than XBRL alone.

In the Semantic XBRL worldview, transparency extends not only to data and its associated taxonomy, but also to its logic, business rules, portability, and security, as well. This affords greater opportunity for verifying data when stakes are high and trust matters.

Raw data is in big demand though for reasons beyond transparency. Let’s examine two users of financial data who demand a data structure that includes raw data: buy-side institutional investors, and market regulators who represent taxpayers.

Buy-Side Institutional Investors
At O’Reilly’s Money:Tech 2008 conference in New York, a session asked, What Do Hedge Fund Managers Really Want? Here, some hedge fund managers explained that they wanted “Raw Data Now” with fully transparent source data before it was modified by sell-side analysts’ proprietary secret sauces or data aggregators’ value-add processes.

Why? Some simply disagreed with the value-added adjustments and forecast assumptions. Others wanted to be able to apply their own adjustments to raw data before building their own valuation, forecast, and risk models. I found this to be the case in my own personal experience at a major sell-side firm, where buy-side research analyst clients wanted to co-mingle data from multiple sources and needed to apply their own taxonomy tags and adjustments to the raw data.

Market Regulators
In testimony to the Domestic Policy Subcommittee of the Oversight and Government Reform Committee, Mark Bolgiano, President and CEO of XBRL US, observed:

Taxpayers want to know how their money is being used to fund the financial bailout. XBRL is a standard that promotes transparency and accountability and can be used by regulators to perform oversight functions more effectively and efficiently.

On the subject of XBRL’s impact on the transparency of financial transactions, specifically Mortgage Backed Securities (MBS), Bolgiano noted that:

The lack of reporting standards has made it difficult to understand the simple fundamental value of the mortgages in these loan pools. Information collected about borrowers, loans, ongoing surveillance, settlement and clearance information is reported in differing data and reporting formats. The identity of individual loans is lost when the pool is securitized and value becomes based on a rating and essentially what the market will bear.

With an agreed-upon data standard and XBRL, issuers, investors, rating agencies and regulators could forecast actual discounted cash flows of the individual loans, making it significantly easier to value each security – effectively “normalizing” the data so that the security can be valued using a recognized valuation method.

In the parallel universe of Semantic Web, Tim Berners-Lee is championing a grassroots movement to call for “Raw Data Now!” where raw data is stored in the RDF-based Linked Data Format to greatly increase data transparency. Speaking at this year’s TED Conference, he said:

Often you find that the people are used to database hugging. You don’t let it go before you have made a beautiful website…. Before making a beautiful website, first give us the unadulterated data. We want the data. We want the unadulterated data.

Berners-Lee went so far as to encourage the audience to join him in a chant for “Raw Data Now” and noted:

Practice that. It’s important because you have no idea the number of excuses people come up with to hang on to their data even though you paid for it as a tax payer. And its not just America but it’s all over the world, and that’s not only the governments but the enterprises as well.

Over the next several weeks I intend to explore more in the areas of Semantic wiki-tagging and Semantic XBRL data quality.