• Welcome to HPSquared’s Blog!!

    Our blog offers lively and at times controversial perspectives on key issues within information management and related domains.

    We encourage posts and comments from our visitors!!!

    HPSquared provides advisory services in areas such as: data governance, master data management, information management strategy, enterprise data management and architecture.

    Contact marc.hurst@hpsquaredllc.com

    Website: www.hpsquaredllc.com

  • Join 10 other followers

  • Blog Views

    • 1,315 hits

Data Governance is Iterative and Selective: Play Smart, Not Huge!!

Recognition of the need to manage the data life cycle and improve information quality is non disputable. Yet initiating and executing an effective program is a low probability. Factors such as investment, technology, organization readiness, complexity and ambiguity of objectives are obstacles. Scope of the data governance initiative frequently tends to be broad; at least broad enough to delay positive results.

Industry talking heads go on and on about change management being the major critical success factor. What we do know is that change of process and organization is challenging, especially when the scope spans organization silos and structures. My view is that it takes more effort to syndicate and sell data governance and the practices that drive it then it does to develop the tools such as workflow, policies, data profiling, etc..

Recognizing this dilemma an iterative approach is worth considering. Revolution never really works, evolution is a more organic process for an entity. Not only from a process view but a political view as the realization of a success drives investment for larger more complex follow on efforts.

Our suggestion is to be strategic in defining a data governance program. Apply it to high value information sources which align with a specific business process. There must be return to the business and measurement must be positive. Doing so entails an understanding of the data to ascertain what is valuable and what can be overlooked. Understanding information flow and lineage is the vehicle which drives this assessment. Dogmatism has no place in the 21st century business model, being lean and adaptive is the driver. A multi year effort to improve accuracy and consistency of data will not fly.

This may resonate with you, as this position is logical. Yet defining the scope and context of the data governance initiative is only as good as the high level problem definition and understanding of the enabling business processes. Going forward, keep it small, make sure it is targeted and most importantly achievable.

Marc Hurst, a Managing Director from HPSquared LLC, offers insight based on his systems and data architecture experience.  He has consulted across many industries as a management consultant and a systems integrator.  Marc can be reached at marc.hurst@hpsquaredllc.com. 

Waiting for Standards the Dilemma of Data Architecture

Marc Hurst/Managing Director  HPSquared LLC

September 22, 2011

Patience is a virtue.  That expression works in certain disciplines, generally not in the domains I travel within to make a living, or something close to a living.  Over the past several years the confusion and complexity of the financial industry has resulted in enormous initiatives for not so enlightened consultancies and contractors.   Literally applications are being constructed in silos, within a single organization even within a department or business unit.   Little or no efforts  addressing or planning for adoption of external standards to tie different entities and partners together. In fact,  many initiatives  are not addressing  data standards which can decrease the risk associated with integrating applications internal to an enterprise.

Where are the standards?  At the technical level there are standard platforms such as web services, J2E, relational data, frameworks for example which enable the use of common and well known toolsets for construction.  The challenge of introducing and promulgating standards for data management and integration is massive and the data reengineering investment lacks  a solid business case in view of the time and resources it would entail.   Arguably this is where patience comes in.

There are groups funded to develop semantic models and logical data models.  There are industry organizations such as CUSIP and SWIFT which have obtained a level of success across the diverse financial universe.  Industry wide models are being addressed at the semantic and logical levels.  There are solution providers who offer best of breed data standards, which are influenced by these ongoing  industry initiatives.  Yet concurrently major financial organizations are developing a proprietary view of financial data objects. Not being derived totally independent of other sources, yet still not addressing cross entity communications.

The benefits of standard data representation are enormous and indisputable.  Efficiencies, communications between parties, algorithmic complexity are all byproducts of standards.  It enables competition is the solution market, thus keeping vendors honest knowing that they can be unplugged if they do not deliver.  Yet the challenges are right in front of us every day, and are not going away.

There is no need for this post to enumerate the challenges.  Businesses are well acquainted with the inconsistency of data representation within their walls.  Further compounding this is the vast number of acquisitions which have been assimilated into most firms and the wide range of third party application solutions as well as custom developed applications which meet operational requirements.

I am not advocating enterprise wide single vendor solutions.  This establishes a dependency which does not serve the business well as requirements and solution frameworks change continuously.   In addition these broad solutions are often  a loosely coupled group of applications not based on data standards or common patterns of data integration.  Nor does this tact offer a guarantee of compliance with industry data standards as they evolve.

To stand still and perpetuate this state is an option.  Not a good option but an option.  To reengineer hundreds of applications and interfaces is another option,  with an incredibly low probability of completion and success.

I do believe there are alternatives ,which will not remediate the entire standards problem, however are worthy of consideration as a means to meet significant business objectives.

Our point of view is to establish standards within a business domain.  Ideally targeting a domain where diversification of data providers is significant; for example multiple settlement houses, or multiple advisory agents, etc.   Also there may be multiple front office trading desk solutions and no consolidation of orders across applications.  The solution approach is to establish data services which map business activity to unique data formats supporting a common semantic.  This data is published and subscription services are made available for access and reuse of the conformed data.

By adopting a hub or almost master data type approach, opportunities to address business wide data integration and consistency in information start to be addressed.   This is independent of an asset or investment accounting type solution for example.  It simplifies data distribution as a single publication data set may now meet downstream requirements of multiple business processes and users.  It also reduces workload on feeding applications as data interfaces are reduced to a single mapping via predefined services rather than a unique solution for every consumer.

In summary, standards are relative.  Industry wide standards will not be adapted for point solution application environments, yet can be transitioned to by a business via a data architecture solution.  Essentially data distribution becomes a business wide capability not an application responsibility, enterprise level application solutions no longer hold the business hostage as they become merely a data provider transmitting processing results for publishing, and lose their role as the core element of the data distribution scheme.

Marc Hurst is a management consultant in the technology space.  Marc has developed large scale database solutions for operational and analytic solutions.  He has spanned technologies over multiple decades, yet understands that the principles of database design and quality remain intact. With HPSquared LLC he offers advisory services for data and information management including strategy and architecture efforts.  He is a frequent contributor of point of views for this space, when time permits.  Enjoy!!   

marc.hurst@hpsquaredllc.com 

Challenge: Provisioning Data for Analytics

Overview

Organizations across industry domains struggle with the data provisioning process to support detailed statistical analysis. This analysis performed via specialized data analysis professionals and software is subject to constraints due to the structure of the sourcing applications and databases. Problems with data quality, integrity and consistency are costly and cannot be remediated solely byimplementation of “best of breed” software solutions. Provisioning Data for Analytics, takes a closer look at common root cause issues and offers a qualified perspective on addressing this subject area.

Business Context

Many businesses are grappling with the problem of getting correct complete and accurate data forAnalytic Analysis. Due to the events of the past decade and the evolution of the financial servicesindustry data analytics has become the “lynch pin” for managing risk, meeting compliance standards andenabling regulatory reporting.  Not only does back-end data analysis address oversight requirements it contributes significantly to thefuture of the enterprise.  The analytical tools have unique statistical reporting capabilities which have also become an integral part of Customer Relationship Management (CRM) and Product/ServiceMarketing functions. In fact due to the lack of reliable databases, to support diverse business needs, data captured for a specific purpose such as risk management is frequently repurposed for marketing or financial reporting needs. To address information needs we have observed that businesses have developed stand alone, independent extract, transform and load (ETL) processes specific to each analytic function. This approach compensates for factors such as:

  • lack of an Enterprise Data Warehouse (EDW).
  • inability to rely upon the EDW they have created.
  • need to supplement or enrich the data from the EDW per each analytic process.

The impact to the business is huge. Operational and strategic functions become limited in their abilityto drive business decision making and meet third party and compliance reporting requirements.Frustrated data and business analysts report:

  • An inability to get the information needed to make decisions
  • Different decision support processes generate inconsistent data results
  • Failure or inability to readily locate the data they need to meet business requirements
  • Data Transformation processing takes too long and costs too much
  • Difficulty loading the data you need into the analytic engine

Due to an ever competitive business climate and significant pressure on operational costs and investment it has become critical to address the overall data processes which support the business analysis functions. Major concerns are not only the direct costs but the timings and quality of the information delivery processes.

Challenges to Reengineering the Analytics Process

Many analytical functions are supported by a myriad of feeds, conversions, manual and automated data remediation processes, applications and software products. As stated earlier, these have been deployed not taking an architectural based design approach. The reengineering of these data processes is non-trivial and is not purely an architecture based project but the complexity may be significantly reduced based on architecture directions.  In summary the business objectives of a re-engineering effort include:

  • Ability to get the answers you need to make decisions in a timely way
  • Reduction in costs to maintain multiple ETL processes and tools as well as versions of the data
  • Eliminate the probability of analysis producing erroneous results leading to bad decisionsData Quality
  • Achieve acceptable data integrity, accuracy and precision
  • Reconcile or eliminate timing issues between components• Implement a consistent view of information
  • Eliminate Inconsistent information from the sources due to loss of accuracy during transformation

HPSquared recognizes the business needs and the challenges facing these problems. We are able to assist in defining an approach and implementing solutions which will over a series of releases deliver improvements to your firm’s Analytics’ function performance. Our objective is to fast track the redesign of the enabling data architecture  and to simplify ETL processing with the end result of improving business decision making capabilities.  To serve you we offer expertise in:

  • Data and Information Architecture• Development of Data Warehouse Solutions
  • Implementation of Data Governance and Stewardship Processes and Organization
  • Development of data quality improvement programs
  • Subject matter expertise with leading analytic vendor product offerings, ETL solutions, and master data management applications and strategies

 

Data Quality, When is Good Enough, Enough?

Quality is relative, not absolute.  During a presentation offered at the MIT Data Quality Symposium, Phil Teplitzky facilitated a discussion offering a unique perspective on the objectives of data quality.  This presentation is attached to this post.

MIT Data Quality Symposium: When is Quality Good Enough?

Phil Teplitzky is a Managing Director with HPSquared LLC.  He has been a thought leader in the data management field for over three decades as a consultant, educator and corporate executive.  An active member in many industry associations Phil is a trusted adviser to many executives with overall responsiblity for enabling quality data management

There is a reason it is called Data Management

We live in a world looking for instant gratification.  Demanding immediate service from web applications, sites, call centers, set-top boxes, restaurants, grocery stores, etc..  Now that quick fix mentality has infiltrated many businesses.  As a consultant it is rare that I have a proposal which requires more than three months of dedicated staffing which is considered by a client.  Yet many clients have not enabled an infrastructure which is flexible and robust to the degree that a short-term effort will result in positive results.  At best it will produce shelfware, and those shelves are ready to buckle from the weight of decades of hardcopy reports, and soon hard drives will feel the strain.

Yet back to the concept of infrastructure.  Infrastructure is viewed in broad terms.  It includes methodology, services, standards, metrics, quality processes,  skilled resources, center of excellence and of course the more traditional components of hardware and software.  Not only do these components need to be in place, but they need to be managed.   Yes, management is the key.  Thus I link back to the focus of this blog and my business which is about Data Management.

Technology solutions will not guarantee success in achieving business objectives for information quality.  Although there is so much activity developing tools for data governance, metadata management, master data management, data quality and error correction and workflow; success lies elsewhere.

Management of information regardless of  the business size or specific industry can only be achieved by a well-defined set of business processes.  That is what master data management is all about.  It is a major challenge yet to enable a business to be truly agile and be able to deliver relatively rapid returns on investment (IT) the business must get onboard.  As you all are quite aware of data is owned by the business.  Arguably, it is the business’ most valuable asset and thus must be invested in and have standard practices in place.  Management of data has its own life cycle analogous to any Systems Development Life Cycle.  It is continuous as data consistency and error identification and remediation are business processes,  comparable to product development, trading, asset management, shipping  and human resources.

Thus the gist of this brief post is about management of data and the importance of getting it right in parallel or in tandem with the introduction of advanced and expensive tool sets, which are enablers, yet will not yield expected returns unless the preconditions for data management are successfully addressed.

***********************************************************************

The author, Marc Hurst, practices information management consulting with HPSquared LLC.  Marc offers insight from several decades of providing technical and management consulting across a diverse set of clients and across industries.  He has developed data planning methodologies, designed databases, orchestrated innovative data warehouse and business intelligence solutions.  Currently he focuses on strategic information and data management services with HPSquared.

Data Management Transformation: Evolution or Revolution?

“Life is So Complicated”

The challenge of managing information continues to escalate.  There are a myriad of forces complicating an enterprise’s effort to establish a consistent, authorized, available data store.  These include: diverse set of third party functional operational applications, merger and acquisition, in depth regulatory and compliance needs, inconsistent data representation across business units and departments and the management style and philosophy of the business.

The principles underlying data management are almost universally accepted.  However the path to establishing a well designed and high quality environment is not clear, has a high level of risk, is politically controversial as consensus across a business domain must be achieved and it may not provide a rapid return on investment.

“I whole-heartedly support your findings, so give me a plan to accomplish your goals within a three month window”.   Three months is a long time for companies in highly competitive industries.  Not only are they competing for customers, developing new products (frequently quite abstract such as in the financial services space), satisfying compliance and regulatory demands but they are subject to scrutiny and measurement in highly politicized organizations.  Yet for systemic change this window closes rapidly, particularly in large bureaucratic organizations where just the overhead of project initiation may not be completed within a three month period.

Revolution works when an organization’s systems have failed or the business is so robust that the risk of the investment is acceptable.  Transformation requiring business change, philosophical change and significant investment is definitely not acceptable in a functional entity.  A functional entity may not be optimal and efficient yet is viable.  Arguments such as improving data quality and consistency, reducing manual error correction and establishing a golden source of data resonate in presentations but flounder in management committee sessions due to “sticker shock”.  There must be a better roadmap for progress.

Yet patchwork does not always succeed.  Look at the New York Mets as they established a core team and continually procured components (players) to fill needs to address perceived weaknesses.  This has been going on for several years with negative results.  There was no integrated enterprise wide planning, no underlying vision and questionable execution.  Sounds familiar, enterprises across industry are comfortable and somewhat successful in addressing point solutions yet avoid big picture problems.

Service Based Architecture and Master Data Management

A complex web of point-to-point interfaces generally accompanies a highly decentralized systems environment.  To a degree this is an artifact of point application solutions and the organizational business model.   It is highly unlikely that a data centralization or consolidation initiative will impact the way the business works or its management structures.  With an underlying architecture of this nature moving towards SOA is a path full of business and technical obstacles.   Indeed without a solid high quality enterprise wide data store establishing SOA makes no sense at all.  For example a product definition, customer representation and credit levels must be consistent across the broad array of applications to justify development of workflow and BPM which invokes well defined services.

Does the productivity and quality benefits achievable, by treating data as a service accompanied by a master data management program which establishes governance and enables business rules, justify a high risk and complex transformation program?  This question may be irrelevant due to business culture and politics.   Yet that does not preclude proposing a tactical approach for incrementally enabling functional improvements.   In fact an iterative approach may be preferable as it dictates a high degree of planning in defining its phases and imposes project management standards as well as infrastructure and integration decision making very early in the process.

Transformation is High Risk

Indeed an organization will have limitations in adopting new technology and accompanying business processes.  Indeed not all levels of a SOA based architecture may be a fit for a business.  Architecture establishes a framework, the business case establishes a justification and is a precondition for a well defined approach, detailed program planning confirms the scope; all of these activities address and mitigate risk.

The Data Maturity Model over the past several years has become a valuable tool for management.  By providing a framework for determining the current state data management capabilities it positions a company along the grid of data management optimization.  By applying the model and understanding the business it can be determined what the end goal state of maturity is appropriate.  Rarely will it be the highest level; our experience is that a Level 3 or mid-level positioning is an applicable goal for customers.  This model is important as a metaphor for planning a data management evolution.  It is not prescriptive yet it is a framework which is well thought out and applicable to data management.

Building the Transition Plan

What this comes down to be business sensitive planning.  Each iteration which is defined should endeavor to provide tactical business value and contribute to the target end state.  For example data governance is key to effective data management.  A project enabling data governance should not be limited to the derivation of logical data models.  Rather it should establish data governance processes and capabilities on a specific business area; e.g. – Foreign Exchange, Vendor Management.  By being treated as a pilot it establishes a framework for evolution and it also achieves incremental benefits to specific business areas.

This is non-trivial.  It is theoretical.  It is realistic.  The business case for each phase establishes justification and benefits for each piece of the puzzle.  If execution is successful the business views the transformation as less risky and the opportunities to accelerate the program become more probable.  Some may argue that it will take longer by choosing the evolutionary approach and that may or may not be valid.  More importantly this approach is a learning experience making subsequent iterations better defined and increasing the probability of success.  Another consideration is that it may offer the only chance to embark on this journey as it embraces the culture of the business for better or worse.

marc hurst finding inspiration

Marc Hurst is a Managing Director for HPSquared LLC.  HPSquared offers advisory services for optimizing the management and use of information at a strategic level as well as being expert in architecture and related disciplines such as: master data management, data governance and business intelligence.

MDM and Its Relationship to the Information Management Maturity Model

A Point of View Provided By:

Philip Teplitzky (phil.teplitzky@hpsquaredllc.com)

Marc Hurst  (marc.hurst@hpsquaredllc.com)

March, 2010

Key Words: methodology, data quality, data governance, analytics, enterprise data, information management


Introduction

Master Data Management (MDM) initiatives have become commonplace as enterprises attempt to establish a higher quality basis for business performance analytics.  A myriad of approaches have been taken to establish higher quality reference data.  Dominant database players such as Oracle, IBM and SAP have jumped in and have offered solutions or have acquired MDM software companies.  MDM is often viewed as a relatively rapidly applied band-aid to cover up severe and persistent wounds.  When management buys into the quick-fix the problem does not go away.  An overall information management environment is not a single dimension it is a multitude of systems, including automated and manual processes.  These components are critical to an effective MDM initiative.

Our analysis is to facilitate discussion about how to make MDM successful.  By discussing a high level view into what it is and linking it to criteria incorporated into the evolving Information Maturity Models.  With our objective is to discuss the need for a holistic information management plan which will better position the organization for the benefits which MDM offers.  Thus this is in part an attempt to bring forth a reality check for those who are embarking on this journey.

Objectives:  Why This Matters

Having led teams in the data space for several decades we have seen a myriad of circumstances around data management.  Initially the development of enterprise operational data bases, custom designed and supporting custom applications was the trend.  This transitioned into point solutions with little if any data sharing across applications.  ERPs came into dominance enforcing their view of information.

Resulting was a high degree of variation in defining the business rules, format and context of the primary business objects.  Each application became its own universe and to enable a broader view large-scale data warehouses or ERPs were implemented.  These did not resolve data problems just met different business needs.  As data quality and MDM solutions became prevalent IT groups pitched that a solution to data problems was available.  These solutions provide high quality technology to provide insight into business performance and to a degree address data quality issuers.  Yet the MDM/DQ initiatives are frequently treated as a single self-contained event.  Augment the staff by hiring outside consultants, purchase software, develop data transformation modules and build new data stores.

What has been learned is that these processes are not discreet one time initiatives.  They are continuous in nature not only adding domains of subject areas in subsequent phases but establishing a management system which monitors activity.  In addition root causes for the problems are traceable to source applications, variation in business processes and no oversight or governance.   What is clear is that MDM is more than a solution it is a catalyst for new business process; workflows and data governance for information management.  Information is a critical asset to the business subject to internal and regulatory standards and compliance and thus the appropriate environment or ecosystem should be provided.

MDM is not an IT solution; it is a business wide program. It is complex supporting multiple consumers of information and charged with establishing a degree of order amidst chaos.  In addition it offers an opportunity to simplify dataflow and architecture supporting the linkage between front office and back office information needs.

DQ and IMM are related with dependencies; IMM best practice evolution creates ecosystem for DQ and MDM benefits to be achieved.

Getting on the Same Page:  Address Terminology

Everyone has to be on the same page, or at least in the same area code.  There is no standard formula for programs of this magnitude.  Every business has its unique business priorities and needs.  When addressing MDM the effort to establish a corporate vocabulary and also a common view of the objectives and the end state goals, is mandatory.

It is one of the problems of discussing this subject, everyone has their own definition and understanding – we need “Definitional Governance. This is non-trivial as the ramifications of an MDM program are immense.  From an IT view the variations between the pundits, vendors, analysts and thought-leaders on the meaning of terms is significant.   Thus the need for proprietary definition within the business itself,  moderated by the realities of what the technologies do,  what infrastructure is in place, standards and the propensity of the overall IT and business to readily adapt to change.

Here is just a subset of the terminology applicable to an MDM program:

  • Master Data Management
  • Meta Data Management
  • Data Quality
  • Information Quality
  • Operational Data Store
  • Enterprise Data Warehouse
  • Information (Data)  Management Maturity

Additional definition and clarification address “conceptual” concepts and terms such as:

  • Data vs. Meta Data
  • System vs. Meta System
  • Continuous / Business Process Improvement
  • Control Systems and Feedback
  • Self Correcting systems
  • Information/Data Quality: Definition and Measurement

Once the objectives, scope, objectives and goals are established around key terminology you will have standard definitions.  As the initiative commences we recommend that the project:

  • Establish expectations,
  • Develop business cases
  • Syndicate  goals and progress continuously

An Interaction Model for Information Management

The performance and quality depends on a well-defined set of relationships establishing what we are calling the Interaction Model for this analysis.  We propose the following components as significant to inter dependencies and to meeting MDM expectations:

  • Operational System: that process  the inputs/transactions
  • Data Storage: the data bases that maintain the semantic model
  • Data Cleansing Systems : applying edit and validation and business rules
  • Data Governance and Stewardship Functions: oversee the application of rules and processes as well as applying acceptance and remediation of information
  • Meta Management System:  that monitors the operation of the Operational Systems

Implications of the Interaction Model lead into a discussion of the Information Management Maturity Model.  This model, also known as the Data Maturity Model, has become a major focus across industry.  There is no single standard definition of the phases and capabilities but significant progress has been made by industry organizations and consultancies.  We have stayed close to the ongoing work of the Enterprise Data Management Council working closely with Carnegie Mellon University on incorporating and integrating extensive field research for its Data Maturity Model.

The Information Management Maturity Model (IMM) is a work in process.  By no means prescriptive yet it is evolving rapidly.  The Enterprise Data Management Council has taken a broad scale approach to establishing the paradigm as a framework for making data and information more reliable, consistent and useful.  Other consortiums and groups addressing the IMM include: DAMA, MIKE 2.0, and MAG.

Lakefront Data a leading consultancy and thought leader offers their view of IMM providing an evolution of capabilities until maximum maturity factoring in the following dimensions or sectors:

Master Data Management (MDM) and Information Management Maturity

MDM cannot be created in a vacuum; it is part of a larger Information Process – an Information Ecosystem. The IMM outlines the capabilities across Information Management domains thus establishing the ability of the company to successfully address major initiatives.  MDM is a major initiative.

MDM is also a continuous process.  It happens each day in response to business events and transactions.  Its success is predicated on the effectiveness of many systems.  These systems are all embedded into the IMM which will rate an entity’s ability to execute effectively in information management.  This relationship leads to the following assumptions:

  • The higher the level of Data Maturity the higher the probability of having a more effective and efficient MDM environment
  • By inference if  you ignore the Data Maturity domains you will not have an effective MDM process
  • The higher your level of Data Maturity implies the a greater impact of the MDM functions

I came across this today, according to industry pundit Phil Simon (in his post “Data Quality Lip Service” from March 4, 2010 on Dataflux’s blog site) management gives lip service to data quality.

“They (management) realize that they are often powerless to effect standards and consistent DQ practices throughout the organization. The bigger the organization, the more difficult it is for change agents and DQ proponents to put their ideas into action. Building consensus within a department is often difficult enough. Throw multiple departments and multiple countries into the equation and your job just got exponentially harder”.

MDM requires fundamental change to so many complex processes, multiple business units, geographies and functional departments.  This speaks for establishing scope for MDM programs which is achievable and offering reasonable return of investment while also being a building block for further MDM projects.

Conclusions: Making an MDM Program Successful

We offer the following set of objective based activities to aid an organization in its journey into an MDM program.

  • Create a formal Taxonomy and Ontology of terms to limit ambiguities – avoid the I thought you meant syndrome
  • Benchmark against the EDM Council and DAMA Maturity Models – Understand target state and the roadmap to get there
  • Define MDM goals and objectives based on business priorities
  • Determine what your target level of Information Maturity is to provide an acceptable level of MDM quality
  • Establish an achievable remediation program
  • Establish a Business Process Redesign function to implement the remediation plan
  • Establish an appropriate MDM performance management process (KPIs/KPMs and tracking methods)
  • And most importantly: Going Forward:  Measure, Fix and Re-Measure: the processes, systems, organization and information management ecosystem that enable MDM

Thank You Visit Us at hpsquaredllc.com

Information Quality Assessment: An Approach


Philip Teplitzky (phil.teplitzky@hpsquaredllc.com)

Marc Hurst  (marc.hurst@hpsquaredllc.com)

March, 2010

Abstract: Data Quality is a critical factor in an Enterprise’s ability to direct its operations based on business patterns and transaction history.  Although perceived as a responsibility of the information technology practices it is in reality a responsibility that spans operations and back end data analysis functions as well. This paper discusses the breadth and scope of data quality metrics and characteristics.  It suggests pragmatic treatment of metrics and presents guidelines for establishing effective measurements for data quality. It offers three dimensions for discussing and analyzing data quality. Discussion and insight into the difficulties in implementing a successful data quality program are augmented by suggestions accordingly.

Key Words: methodology, data quality, data governance, analytics, enterprise data, information management


Introduction

Data Quality is one of the foundations of Information.  Information is composed of Data plus structure and context.  However, if the Data is wrong then you no longer have Information but noise and without Information it is not possible to run a modern competitive company.  Often is heard in the hallowed halls of American business the cry:

The data is wrong

Why don’t the reports agree?

Why does Marketing have different number then sales?

How many customers do we really have?

What are the inventory numbers?

And on and on the differences and disputes go!  It is not possible to make well considered and analytic judgments in the absence of reliable, consistent and dependable Information.  The result is that many important business decisions are made on GUT feel and best available information, the results can be, and often are less then optimal and in some cases fatal.  The companies with the best Analytics and the best Information win; the ones without loose and perish.  Knowing what you have done, with whom, knowing what is hot and what is not, and who your best customers are can mean success or failure in today’s highly competitive world market.  Perhaps the best example of this rule was the Official Airline Guide.  At the peak of its popularity it was worth more than the airlines it reported on, the information was more valuable than the Planes.  But it became obsolete and irrelevant when the internet made airline schedules and flight information arable instantaneously and for free.  The Information and speed of delivery and context became the winner.  The key lesson, today’s information is yesterday’s success without Quality, Context and delivery.

The lack of quality and by inference the decline in the usefulness of the Information is an unacceptable fact of life.  Why are the numbers wrong, why do the operational systems and the analytic applications have different values?  Why is production and marketing out of synch with their projections?  The answer is simple and yet profound in its implications and solution – the problem is THE DATA and specifically the Quality of the Data!

If the data is wrong then everything is wrong.  It does not matter how elegant your analytic models are, how insightful your predictive models are, if the data is wrong the Information is wrong.  Without an acceptable level of data quality there is no information.  The problem dear Brutus is not in our stars but in our DATA! If the problem is Data Quality how do we make it better? That is the raison d’être of this White Paper.

Lord Kelvin (Sir William Thompson, Baron Kelvin of Largs) the great 19th century scientist said it best:

To Measure is to Know, if you cannot measure it, you cannot improve it

How do we in fact measure the Quality of Data, what are the metrics?  And what are the issues with measuring it?

Let us examine three key dimensions of the problem:

  1. What the criteria that make up the measures of data quality
  2. What problems or challenges exist in using the metrics?
  3. How do you ensure that the metrics are being consistently applied, with a level of accuracy and precisions that is appropriate for the environment they are operating in?

The Three Questions

What are the criteria that make up the measures of Information Quality?

Leo L. Pipino, Yang W. Lee, and Richard Y. Wang, in their article:  Data Quality Assessment (in the COMMUNICATIONS OF THE ACM April 2002/Vol. 45, No. 4) has identified a set of criteria that is as comprehensive and appropriate as any I have seen.  They are:

Dimension Definition Discussion
Accessibility The extent to which data is available or easily and quickly retrievable How does one define quickly, a day, an hour real time?

Where is it retrievable from?

How did it get there?

Appropriate Amount of Data The extent to which the volume of data is appropriate to the task at hand Who sets the volume?

What is the cost benefit ratio of the storage vs. the volume?

Is there a difference by domains in the volume?  If so which do you select?  Based on ROI or need?

Believability The extent to which the data is regarded as true and credible Do back office business analysts view the data with a degree of confidence comparable to business operations?
Completeness The extent to which data is to missing and is of sufficient breadth and depth for the task at hand Information is frequently provided for the minimal number of fields to complete a transaction.  Yet this may not be sufficient for repurposing the data for other business functions.
Concise The extent to which  data is compactly represented The use of standard reference data and enterprise wide accepted values, (such as code values), will enhance data consistency and understanding
Consistent Representation The extent to which data is presented in the same format Do parallel or silo applications represent key data identically?  This is the driver behind MDM to reconcile those variations and remediate.
Ease of Manipulation The extent to which data is easy to manipulate and apply to different tasks Data structures, timings, business rules may act to restrict or inhibit broad distribution across the enterprise.
Free of errors The extent to which data is correct and reliable Often data quality does not impact all its usages in an enterprise.  Errors are specific to a business need thus may be acceptable and not acceptable based on its context.
Interpretability The extent to which data is in appropriate languages symbols and units and the definitions are clear Metadata definitions as well as format need to be published and if possible consistent across databases.
Objectivity The extent to which data is unbiased and unprejudiced and impartial Strict controls over data manipulation frequently are not in place.  We have seen desktop analytics where analysts revise data extracted from operational systems, thus impacting the integrity and objectivity of the specific data.
Relevancy The extent to which the data is applicable and helpful for the task at hand Data may not use the same business rules, taxonomies, etc.  When repurposed.  This may be unavoidable due to implementation factors such as package solutions.
Reputation The extent to which data is highly regarded and in terms of its source or content Data quality across an enterprise is rarely viewed comparably by multiple users. The impact of efforts to remediate data problems should be reflected in its reputation over time.
Security The extent to which access to data is restricted appropriately to maintain its security Are specific business rules around data access established?  Are they maintained?  Are they realistic due to compliance and regulatory needs or artificial?
Timeliness The extent to which the data is sufficiently up to date for the task at hand Often data due to architecture restraints or business timings does not meet timing requirements.  Particularly when analysis functions and methodologies become more sophisticated and closer to current day or even real-time.
Understandability The extent to which the data is easily comprehended Effort to analyze and comprehend source system data is effected by factors such as The degree to which data from a source system needs to be interpreted due to factors such as lack of availability of metadata and business rules.
Value Added The extent to which data is beneficial and provides advantages from its use Again the data may or may not add value to other applications outside of its originating source system

To the list above I will include the following:

  • Context – as we discussed earlier data becomes information when structure and context are added.  Information is unfortunate subject to the subjective and frequently personalized interpretations of the person evaluating it.  Their individual experiences and knowledge are prone to introduce a filter that can change meaning.  Therefore it is important to know who is using the data.  For example, sales volume can be measured in different units and therefore to say that sales for August were 100 we must add is this unit, Gross, Cases etc.  The context adds meaning.
  • Accuracy and Precision – this will be apparent to the engineers reading this.  Allow me the classical definition.  If I have a foot ruler that is 12 inches long, but because of an error in manufacturing each inch is a 12 of an inch longer then I will have a foot ruler that is 13 inches long, my measurements will be precise but inaccurate.  By the same token if my ruler is divided into tenths of an inch I cannot make a measurement that is more accurate or precise then a tenth of an inch.  Therefore my precision can only be as precise as the ruler is.  The same is true for corporate data.  It can only be as accurate and precise as the most inaccurate and imprecise source.  Introducing additional precision in representation is of no use and introduces false confidence in the numbers.

What problems or challenges exist in using the metrics?

Regardless of the elegance of the software engineering and the precision of the processes which calculate data quality metrics, any measurement system within a complex system is subject to interpretation.  As alluded to in this paper, inadequate data quality for a specific business process does not necessarily apply to all consumption processes for that information.  Thus generalizations about data quality and correctness are suspect.  For example an engineering group requires resolution at the thousands of an inch, while procurement, inventory, distribution uses the data where this level of precision is not material.  Yet the part has to have consistent attribution value for other values.

In defining metrics we offer the following guidelines:

  • Relevance to the Business Process: The metric must be defined within a business context that clearly explains how the metric score correlates to enhanced execution of the associated business process
  • Resolution of Measurement:  An enabling measurement process must quantify a quality metric within a discrete range and be subject to applying minimum quality value standards (may be specific to a business process or at an enterprise scope)
  • Controllability: The metric must reflect a controllable or manageable aspect of a business process thus supporting a well defined method for remediation or communication and tracking.
  • Communications: Workflow mechanisms need to gather and transmit the appropriate information to a data steward (assuming that concept is enabled) when the measured value falls below acceptable quality levels
  • Historical: Tracking the historical patterns and values  of the metric must be compatible with determining whether the remediation process is effective as well as supporting statistical process controls

Even with well defined and designed measurement processes there are further considerations.  How does the data quality process establish what is the acceptable tolerance in accuracy and precision?  What exactly is accuracy?  Are quality tolerance levels global or specific to a business need?  The message is that one solution does not fit multiple enterprises; in fact it may not even be able to span multiple business units or departments within a business unit.   Data Quality is inherently subjective and must be specific to business requirements and prioritization.

Complexity is further compounded when source application data is determined not be meet quality standards by an upstream process.   For example an order entry application data is rejected by a transformation process for an enterprise data warehouse.  What is the remediation approach?  Is it viable to have an automatic or manual workflow feedback loop, suspense process filtering data problems back to specific business departments or applications?  Or can remediation be handled by a mechanism such as a master data management process or data quality process within the EDW environment, thus not communicated back to the originating source system.

David Loshin writing about “Data Quality and Cost Reduction” (a Dataflux white paper) states: “There are many potential areas that may be impacted by poor data quality, but computing a precise cost is often a challenge. Classifying those impacts helps to identify discrete issues and relate business value to high quality data. Evaluating the impact of any type of recurring data issue is easier when there is a process for classification that depends on a taxonomy of impact categories and subcategories. Impacts can be assessed within each subcategory, and the quantification and reporting of the value of high quality data can be rolled up as a combination of separate measures associated with how specific data flaws prevent the achievement of business goals.”

How to ensure that the metrics are being consistently applied, with a level of accuracy and precision appropriate for the environment they are operating in?

Without formal data governance in place ensuring consistent and appropriate metrics usage will not be achievable.  Even with this in place there must be an explicit understanding that not all data quality needs span the enterprise.  Yet I would contend there does need to be a minimum acceptable quality threshold at the aggregate level. It is becoming clear that data standards are inherently decentralized in a modern enterprise.  Implied by this condition is the accountability of data stewards, architects and other data quality stakeholders to embrace heterogeneous business requirements based on the specific needs of the data consumption business processes.

Policies and workflow to handle the data quality remediation case (governance process) need to in place and continuous.  These processes themselves would benefit from applicable metrics to validate that they are being performed and then that the need is diminishing as the enterprise reaps the results of its data remediation programs.

Data consumers have different needs e.g. – manufacturing/engineering needs versus marketing.  Or precision requirements maybe based on a variety of factors such as format or timing.  e.g. – inventory requires real-time for product reservations while purchasing looks at aggregations across transactions.  A business consists of varying needs and consistency is relevant to the consuming application.  An upfront data collection or order entry function may not care whether a customer is a new or existing, therefore does not embed a function to reconcile master data.  Meanwhile that data moves to sales and marketing and customer service and the uniqueness of that customer becomes critical.

Summary

The quality of an enterprise’s data impacts its ability to execute, as deficiencies compromise its understanding of its performance.   Measuring data quality is a non-trivial undertaking.  This paper identifies a comprehensive set of criteria on which metrics can be established, measured and managed.   These establish a breadth of impact far beyond traditional data quality viewpoints, thus reflecting the dependency on accurate and precise data across the enterprise.

There are differences in the context of how data is used across departments or business units thus implying a variety of quality requirements.  There needs to be processes established which serve the business in facilitating the definition, usage and management of data which address these conflicts and others.  Processes and enabling workflows to remedy data errors due to quality issues must take into account what is appropriate for the business and be requirements driven. For example should source operational system data be overwritten, deleted or should the corrected data be resident in a different architecture layer such as an enterprise data warehouse.

These decisions and other significant conflicts and issues in executing a data quality program are not necessarily technical.  They are closely aligned with the establishment of formal and proactive data ecosystem infrastructure, specifically data governance, stewardship and continuous data quality processing.

The assessment and institution of an effective data quality program goes well beyond the accuracy and precision of data.  Characteristics such as context, extensibility and completeness come into play. Enabling solutions go well beyond the implementation of automated processes which cleanse data and isolate potential data errors. Metrics which need to be considered should address a specific set of requirements, be historical in nature and be monitored continuously.

It is the Singer Not the Song..Saga of Information Management

Prepared by Marc Hurst,  Managing Director HPSquared LLC

When I was in high school I heard Mick Jagger and the Stones going on about it is the singer not the song.  This became a haunting melody.  This message has stayed with me in fact; it has resonated with me for over four decades.

“It’s not the way you give in willingly
Others do it without thrilling me
Given me, that same old
Feeling inside that
 I know must be right
It’s the singer not the song”

Over and over again this message has emerged in my life.   Now you ask what does this have to do with information management?   A former graduate school colleague, upon having served many years with the legendary Digital Equipment Company (DEC) DecNet development team, in a discussion about methodologies commented it was the not the guideline of the SDLC or the framework of the methodology which made his group successful, it was the capabilities of the craftsmen (or crafts people).  It is the singer not the song.

Putting me in a dilemma of major magnitude I held onto this message continuously.  As a management consultant, who frequently had the opportunity to customize and implement data planning methodologies for customers as far back as the 1980’s, was I fooling myself, and even worst propelling my clients down a path with a high probability of failure?

Yet empirical experience returned a different message.  First of all there was a lack of well disciplined, pragmatic and qualified designers and developers.   This still is the case.  Mechanics yes, craftsmen not at all.

Further compounding the issue was the failure of methodologies.  They were either too bureaucratic, always slow moving, lacked iteration, had insufficient quality attributes  and were also being worked around rather than worked with.   I am guilty of this also.  I found methodologies to be loaded with too much bureaucratic  increasing structure, inadequate or unknown techniques,  lack of enabling tools, and just an impediment to my success.

Information management over the past decade has been deluged by methodologies, techniques, tools and all sorts of templates and standards.  Data was never the key of a build, it was the process that ruled.  Object based development could care less about data;  just build it as you go.   Data became a highly customized fuel for a specific engine;  only compatible with a Ford and Lincoln/Mercury,  sorry Subaru and Audi!!!

The positive result of this short sightedness was creating a market for those of an ilk similar to mine.  As we peddle data governance, deploy the miracle of data quality technology, introduce the wonders of Master Data Management and espouse the miracle of metadata to our clients,  who are just attempting to get their financial reporting and operational reporting to agree on how to define a customer, or even better a sale.   Let’s not even deal with multiple product lines and more complex business objects.

Once again Mick Jagger’s words spur us on.   There are only so many singers who can carry a tune from the studio to the club, to the stage, to the arena, to the internet.  Congruently there are few who have developed data models and the appropriate business rules which can enable more than its original purpose, the application. (I will dare not venture into the ERP space in this post).

What is my point?  The need for methods, tools and techniques is more critical than ever.   People are the most valuable asset of a business yet the data and information needs to be of a quality and in a state that is good enough to leverage the skills of a qualified business analyst productively.   In a world of complexity where business operational applications and databases are not integrated, where application designers do not have the objective of holistic enterprise development interests and where my clients live the need for add-on structure is no longer optional, it is mandatory.

Marc Hurst has over a quarter century as a management and technical consultant specializing in information management.  For major consultancies he has been a practice leader, chief architect, engagement manager and methodologist.  Marc offers significant capabilities in data warehousing, application and data integration, data governance, IT Strategy, SDLC and Development methodologies.  He has consulted across many industries most notably Financial Services, Communications and Media.  He has performed as a senior leader for BearingPoint, KPMG, CSC, TCS and PWC.  Currently he has embarked on a mission to establish HPSquaredLLC as a reputable consultancy in the information management domain.

Marc can be contacted at  marc.hurst@hpsquaredllc.com

A Method for Improving Information Quality and Maturity

 

-BETTER INFORMATION RESULTS IN BETTER DECISIONS

By Philip Teplitzky, HPSquared, LLC

phil.teplitzky@hpsquaredllc.com

This analysis offers a description of an Information Quality & Maturity assessment model and methodology.  The objective is to identify the current state of Information Quality within an organization, or part of an organization, and to use it as a way of developing a quality improvement plan.  It is a component of our overall Information Maturity and Information Governance Business Process Review and Improvement offering and one may consider it an Information Quality Process Redesign (IQBPR).  We have found, based on our implementation experience, that the methodology provides great insight to the root causes of poor Information Quality supports the development of qualified remedial action programs.  It is a good example of the transition from theory into pragmatic and operational methodology.  The transformation from where rubber meets the sky to where the rubber meets the road..

In response to the requests and demands of many organizations we have developed an Information Quality Business Process Redesign (IQBPR) methodology.  The objective is to provide operational Information and Data Management groups with a mechanism to:

  1. Determine, based on a set of objective metrics and structured categories, the state of an organization’s Data Quality, Information Maturity, and issues that have a detrimental impact on either and by inference Business Decision making capabilities.
  2. Identify the Root Causes of Information and Data Quality issues.
  3. Create a Plan for upgrading and improving Information Quality, Information Maturity and Business Decision making capabilities.
  4. Provide an ongoing method for monitoring and measuring Information Quality and determining Information Maturity.

Our experience has not come across a widely available and well formulated methodology that provides a means of diagnosing and an approach for addressing root cause deficiencies in data and information quality.  What does exist are:

  • Several excellent research efforts underway, which focus on creating taxonomies of data but not Information Quality deficiencies and issues.
  • Firms who will administer diagnostic reviews and document their observations and identify Data but not Information Quality issues.
  • Consultants who will provide advice on how to improve data quality and integrity but not overall Information Quality.

Our evolving approach is to establish a well-formulated, customizable and operationally available Information Maturity assessment methodology which defines and assesses business behavior patterns across key Information Management functional categories.  There are Data Maturity Models (DAMA, EDM Council), Data Quality Models but no Information Maturity or Quality models.  However and perhaps most importantly, we are not aware of a comprehensive model,  and methodology that links Data Quality and Data Maturity to Information Maturity and in turn relates it all to increased quality of business decisions!  It is our intent to create and deploy this methodology and to make it as easy to use and as widely accepted as a Systems Development Life Cycle (SDLC).

Information Quality, like good software engineering practices and a taste for fine Scotch whiskey, is an acquired taste.  To that end, we have created the Information Quality Business Process Redesign Methodology.  It is our basic lemma that what are required are better business decisions!  To make better decisions you need better Information and to have better Information you need a more mature information environment.  Data Quality and Data Maturity are components of better information but only one of many components, albeit important and necessary.

It is our intention to extend our approach and methodology to include other areas of Information to include, but not be limited to;

  • Information Maturity Business Process Redesign (IMBPR).
  • Information Systems Life Cycle.
  • Information related Audit, Risk and Control.

We are basing our approach and related methodologies on:

  • Industry standards such as the Information Maturity Model being developed by the EDM Council
  • The great work done by DAMA (Data Administration Management Association) with their DMABOK
  • Our credentials  of developing information-centric applications and methodologies as well as the work we have done on Information related Audit and Control.

It is our belief that Information Quality cannot be improved, and by inference better business decisions made, at least on a long-term basis, in isolation.  They must be part of, albeit an important part, of a larger information improvement process.  Processes, must be monitored, measured and constantly improved as one cannot manage what cannot be measured.  Like any machine, the Information Quality machine will develop errors and will deviate from spec.  This is attributable to:

  • Changes occur in the operational environment, (e.g. in the MDM (Master Data Management) space the Meta Data definitions from the data aggregators change and if the machine is not re-calibrated there will be errors).
  • Deficiencies in the original design, only one designer was omnipotent and omniscient and we are not sure he/she got it right!
  • Changes to the downstream applications, new requirements result in functional changes, applications are retired and new ones implemented.  Applications are in a constant state of change.
  • Entropy, overtime things just break, just ask Murphy (of Law fame).

This is a work in progress, we have to-date implemented some of the parts of the methodology.  These components have been deployed as part of our Information Resource Management and Information Quality practices and we are confident that over time we will be able to create appropriate sets of components and integrate them into a comprehensive Information Quality and Maturity Methodology and Life Cycle.  Our approach’s goal is to help organizations make better decisions.

Phil Teplitzky is a Managing Director with HPSquared LLC.  He has been a thought leader in the data management field for over three decades as a consultant, educator and corporate executive.  An active member in many industry associations Phil is a trusted adviser to many executives with overall responsiblity for enabling quality data management


Follow

Get every new post delivered to your Inbox.