“How much should my organization spend to clean my customer data?” This question arises frequently whenever the subject of data cleansing arises. Cleaning up your data is one of the recommended steps for preparing for a Salesforce integration and virtually all other software implementations. Unfortunately, although this seems like a simple question, there is no simple answer. The cost of cleaning your customer data is going to depend on a number of factors, and the amount you should invest in cleaning it also depends on a variety of factors. Only by analyzing your specific situation can you decide on the proper amount to allocate for the task.
Certain factors that must be analyzed are specific to the data that exists in your files. These include:
- The number of records: Obviously, it is a much larger job to clean 10 million records than to clean 100,000. Therefore, the first consideration is the number of existing records.
- Age of the records: Customer information seldom remains accurate forever. Business customers and consumers can change phone numbers or relocate, and contacts, such as purchasing agents, can change as well. If records have not been maintained and updated frequently, there is likely to be many more records that are inaccurate.
- Location of records: In many organizations, customer records are “siloed” in a number of databases. Cleansing is easier if all records are found in a single location.
- Number of fields in each record: Although the number of records is a critical factor, the number of fields in each record is also important. For example, a record that contains customer name, address and phone number will require less effort than one that contains these fields plus email address, spouse’s name, employer, occupation, education, income and age.
- Degree of cleaning needed: You must also determine how clean is “clean enough.” The more accurate the data must be, the more difficult — and costly – the cleansing will be.
Factors Related to Business Impact
Perhaps the best way to determine the amount that should be spent to cleanse data is to analyze the impact that dirty data has on the business. Based on this analysis, the answer to the question, “How much should I spend?” might be “very little,” “as much as it takes,” or something in between.
- What happens if your data is dirty? Will shipments be delivered to the wrong address, or will your outbound sales representatives simply get a “no longer in service” recording? Will the customer receive a duplicate flyer, or will he receive duplicates of a costly full-color catalog? Will a customer who currently resides in Hawaii receive an offer to rebate transportation costs to a seminar in New York? Is the dirty data likely to cost you sales or negatively impact your relationship with the customer? The more accurate your data must be to conduct your business efficiently while maintaining good customer relations, the more you should spend to cleanse data.
- What is the return on your investment? Although some benefits are not easily translated into monetary terms, other benefits can be analyzed for financial impact. Unless your business will be severely harmed by dirty data, the amount you should spend should typically not exceed the financial benefits you will receive. In other words, if you estimate that dirty data is costing you $100 in excess labor costs, it does not make sense to spend $10,000 to clean up your data.
- How long will you be able to keep the data clean? The quality of your data depends on its source. If you regularly amass “big data” from sources that are less than “top-tier,” you will soon have much more dirty data. If you need both more data and cleaner data, you can address the issue with the source or perhaps find another source. Alternatively, you can dedicate in-house personnel to cleanse data before passing it forward. On the other hand, if you need more data rather than cleaner data, you might prefer to limit your cleansing costs to perform just a rudimentary cleaning.
It is impossible to provide a fixed cost for cleaning data, such as an average cost-per-record, because too many variables must be considered. The better approach is to determine the impact that dirty data has on your business. The greater the impact, the more you should spend to guarantee the accuracy of your customer records.