11/20/2014

6 Tips for Scrubbing Your Data Clean

In ever-increasing numbers, enterprises are discovering that Salesforce is far more than just a contact database. It is a powerful tool that can assist virtually every department in your organization. However, like any software program, Salesforce can be subject to the “garbage in, garbage out” principle.

There are times when the cleanliness of your data is not of primary importance — but if you plan to use your data for critical applications, data mining or important analysis projects, you need to have data that it scrubbed as clean as is possible. Cleaning your data can be accomplished in a variety of ways. Following are suggested practices for eliminating dirty data for critical applications.

Microsoft Data Cleansing - AhaApps

1. Start with the Cleanest Data Available

This may seem obvious to some, but many people overlook the importance of choosing sources with the highest quality of data. Examine how the source obtains the data, how current the data is and how reliable the source has been in the past. Check for issues such as minor spelling variations that are causing duplicate records — one simple method for doing this is to run a frequency count by attribute.

2. Address Issues with the Source

Cleaning data downstream may be your only option, but if at all possible, attempt to have problems fixed at the source. Discuss the issues calmly and rationally. You might discover that the source has no idea that the data passed was dirty, or that the source system merely lacks the resources to provide data that is of better quality. Perhaps you can arrange to work closely with the source system, such as agreeing to clean up the data downstream for a set period while the source builds the support to improve the quality of data.

3. Establish Business Rules for Cleaning Data

Preparing data to be integrated with Salesforce or any other platform requires input from those who are most invested in the data. Ask for help in determining the appropriate business rules to apply. For example, certain ranges of product codes may be obsolete, or the decision might be to purge delinquent receivables below a certain amount.

4. Decide on the Extent of Your Fixes

Choosing to let some problems pass to the business users can be risky. It may not be feasible, but sometimes, letting dirty data through can be the catalyst that management needs to address issues arising at the source. Of course, it can also backfire and cause you to lose credibility. If you choose to ignore certain problems with data, exercise caution — and make sure that you fully understand the potential repercussions.

5. Employ the Proper Tools

There are a number of tools available for cleansing data. Choose the solution that is best for your particular situation. Be sure to test data against reliable sources, such as when you need to verify addresses, emails or phone numbers. However, do not be afraid to “go low-tech” if the situation warrants it. Perhaps you have only a few records that show discrepancies, such as duplicate names with different phone numbers. Your best solution might be to simply have an employee call each number, ask for the person and determine the correct number for each name.

6. Develop a Plan for Future Data

Unless you plan to dedicate staff to cleaning data repeatedly, your source needs to ensure that future extracts are clean. How you approach the issue depends on your company’s political atmosphere, whether the source is irreplaceable and the extent of the problems. Perhaps you can require the source to assume all responsibility for the data, meaning that the source must examine and clean the data before passing it to you. Perhaps the solution requires a collaborative effort, or the algorithms for extracting data need to be revised. The key is to take whatever actions you can to ensure that the data you receive is as clean as it can possibly be.

In Summation

Even when your data comes from the most reliable source available, you will inevitably discover that some of it is dirty. Before it can be used in any sort of meaningful manner by your data warehouse, the data will need to be cleansed. How you accomplish this depends on a number of factors. Hopefully, these tips can help you deal with the issue, now and in the future.