Introduction When viewed from a high level, the cost of poor quality data can affect a company's bottom-line in two ways. First, there's the cost of scrap and rework, and second, missed opportunities. An example of scrap and rework costs might be when an agent errs in recording a customer's address details, and consequently a marketing premium is sent to the wrong address. Later, the customer calls to complain. The complaint needs to be handled (extra call center time), the address details then need to be entered a second time (rework), and a second premium needs to be sent. The initial premium is scrapped. An example of missed opportunity costs might be a credit card that is not granted because the calculated credit score (erroneously) falls below the cutoff score, and the customer is rejected. The opportunity to make a sale is lost, when marketing costs were already incurred. In this whitepaper, I attempt to supply a comprehensive list of potential data quality costs. Cost Categories of Information Quality The costs of data quality can be broken down in 3 categories: 1. Immediate costs of non-quality data. This happens when the primary process breaks down as a result of erroneous data. Or, information scrap and rework, when immediately apparent errors or omissions in the data need to be circumvented in support of the primary business process. For example, data entry of a non-valid ZIP code requires back-office staff to look this up again and correct it before sending out a product. 2. Information quality assessment or inspection costs. These are costs/efforts expended for (re)assuring processes work properly. Every time a 'suspect' data source is handled, the time spent to seek reassurance of data quality is an irrecoverable expense. 3. Information quality process improvement and defect prevention costs. Broken business processes need to be improved to eliminate unnecessary information costs. When a data capture or processing operation malfunctions, it requires fixing. This is the long-term investment needed to avoid further losses. 1. Immediate costs of non-quality data Process failure For example, capturing erroneous customer data like address, contact information, account details. - Irrecoverable costs; e.g. premiums submitted vain to non-existing customer addresses. - Liability and exposure costs; for instance credit issues losses whenever data quality difficulties result erroneously providing credit to a customer who's not considered creditworthy on the basis of self-supplied info. - Recovery costs of sad customers; time spent handling complaints. Information Scrap and Rework - Redundant data handling; because many processes are 'known' to count on inaccurate data, it is customary for front-line and back-office staff to maintain small private "lists" of all types. These serve only as a backup or improved version of what exactly is accessible in the main database. Apart from further difficulties like 'maintenance' and 'recovery' not being possible of these private lists, these activities are redundant, and non-value adding. - Costs of chasing missing information; a field that has not been filled out correctly, or perhaps not at all, has to be looked up later on in the process. Excess time and costs, inefficiency, and not in the least place an aggravation factor. Time spent lookin up lost info is not being spent servicing the customer better. - Business rework costs; e.g. reissuing a credit card which was sent away with a misspelled customer name. - Workaround costs; whenever a main key is lost or faulty, laborious fluffy matches should be performed to match records. This work is challenging, and eats up precious time of the most highlyexperienced database employees. - Data verification costs; e.g. costs of reworking data entry. However also, analyses by knowledge employees must start by checking the correctness of data accessible before you start analysis. - Program write costs; rewriting programs that fail to run because of invalid records found in the data. E.g.: occasionally pre- or post-conversion scripts needed to be written to deal with the content of source techniques before load in a Data Warehouse environment. - Data cleaning and correction costs; whenever feeds are processed to load into the Data Warehouse, these data should be transformed for factors that stem from quality problems. Any data cleaning and scrubbing that should be performed in the ETL process is essentially redundant and useless insofar this really is caused by faulty initial data entry. For example, when a mailing is done on the basis of the challenging customer file, dedicated scripts should be cost deal with the (recognized!) errors in the address areas. This process has to be repeated for almost any mailing. Since such customer files are often shared across departments and techniques,source changes should be negotiated with all end users of these data. - Data cleaning software costs; data cleaning software (like Vality, Ascential, etc.) is typically horrendously expensive. However, there's a trade-off between scarce labor carrying this out 'by hand', and the fact that ETL data quality software to help with these tasks usually has high license costs. Purchase may occasionally prove extremely affordable whenever relevant to (often unseen) labor costs for manually improving data quality. Lost and missed opportunity costs - Lost opportunity costs; whenever e.g. misspelling customer name on the card causes the customer to never utilize their card (instead of phoning as much as complain about this) the company looses their future revenue. - Missed opportunity costs; whenever sad customers straight influence their social environment, they generate unfavorable publicity. This will make it harder to market to people in the social network of displeased customers. - Lost shareholder value; info quality puts a drain on precious resources (scarce database experts), preventing knowledge employees from performing value added work towards market share growth. Scarce recruiting are often a bottleneck towards progress, like running one more marketing campaign, delivering insight in a product portfolio's performance, etcetera. 2. Information quality assessment or inspection costs - People spend time in assessment processes whenever they are mindful of questionable data quality; in any database project, each and every file of questionable quality has to be inspected for data quality difficulties first. This time is irreplaceable, forever lost and never recouped in any way. Merely assessing if data is of sufficient quality is specialist work. This requires access to scarce resources that are often a bottleneck towards progress. 3. Information quality process improvement and defect prevention costs - Development costs to rework existing front-end applications; data entry applications should enforce data quality by performing validity checks, and minimizing keystrokes and eye-hand movements. On the basis of functionality conclusions, program improvements always cause both higher efficiency and greater data quality. - Management attention to define accountabilities and monitor improved info quality; steering the organization towards higher data quality requires changing accountabilities and constantly monitoring improvement. This topic should stay high on management's curriculum to create permanent improvement. Conclusion Problems in data quality often go unnoticed. It can be both a source of process inefficiencies (timeliness), as well as operational costs (direct and indirect losses). In neither of these cases is it apparent that improvement is possible from enhancing data quality. One of the pernicious consequences of suboptimal data quality is that the cost of poor quality data is usually hidden. Lack of data quality is not obvious to those not deliberately looking for it. Quantifying costs isn't always easy. What makes the indirect costs of poor data quality so pernicious is that the relation between data quality problems and its consequences is non-obvious, and often only occurs with a substantial time delay. Therefore, the connection between downstream consequences and poor quality data is often not made, and the problems are not attributed to their true cause. The cause of many downstream data quality costs can easily remain largely hidden (e.g. dataquality), and therefore insufficiently subject to management attention and intervention. Also, progress after improvement efforts is gradual, relatively slow, in large part 'cultural', and therefore difficult to monitor and track. Another, and probably the most significant problem caused by poor-quality information, is that it frustrates the most valuable resource of the company: its employees. Non-quality information prevents knowledge workers from performing their job effectively. On top of that, it alienates customers because of wrong information about them, and to them. Customer data is the raw material that needs to be managed for what it is: a strategic resource. Data quality is far more than accurate data entry. It stems from monitoring downstream data usage, maintaining comprehensive and up-to-date meta data, and nurturing a corporate culture of naturally doing things right at the first attempt. Only then will knowledge workers learn to expect data quality, and enforce it because it's the natural thing to do. Letting data quality slide will promote a culture of negligence, and disdain for the use of one's most precious assets: customer information. The case for accurate source data is further underlined when one realizes that the source in and of itself does little more than support primary processes, which is fine. However, the greater value to the organization comes from enhancing these data, from deriving new information from source data. The investment in improving information quality is recouped several times in decreased costs, and improved value of information to accomplish strategic business goals. Rapid access to high quality data is the decisive factor in an organization's ability to assess and adapt it's business model to changing market conditions. As corporations become ever more 'digitized', those that get a grip on their data quality assurance processes can reap great rewards. In a highly turbulent market this maywell be the critical factor in determining the survivors in a competitive business, and therefore prove to be ultimately priceless. Resources Larry P. English (1999) Improving Data Warehouse and Business Information Quality: Methods for Reducing Costs and Increasing Profits. Wiley, ISBN 0- 471-25383-9 Jack E. Olson (2003) Data Quality: the Accuracy Dimension. Morgan Kaufman, ISBN 1-55860-891-5 Sid Adelman, Larissa Moss & Majid Abai (2005) Data Strategy. Addison- Wesley, ISBN 0-321-24099-5 Article download "How Non-Quality Data Can Cost Money" XLNT Consulting - Turning Data Into Dollars. how to mine data
Related Articles -
how, to, mine, data,
|