Monday, April 3, 2017
05:15 PM - 06:00 PM
What is data profiling?:
- Definition of data profiling
- Process of doing it
- How does it relate to Data Quality Management?
- Why is data quality important?
- Vern the business guy - He cares about getting his information and getting it quickly and accurately.
- Vernon the IT guy - He cares about maintaining the security of the data (reliable) and keeping his infrastructure scalable (when there’s more data) and available.
Data Profiling and Data Quality Management:
- Doesn’t directly add data quality
- Data Profiling is just a tool that can help meet your data quality objectives!
- Has to be combined with your business users (usually who owns the data)
- Not using a technical fix to solve what might be a business problem
- Example: ERP System with a Customer table and Address Table
- Customer with Address line deleted
- First IT thought: Let’s go back and add an address to the Address table or remove the Customer Table.
- First Business User thought: Is this no longer a valid Address or Customer?
- Generally only thought about during a data migration
- Legacy to New (CRM, ERP, QMS, etc.) system
- Creating a data mart/warehouse
- But, the reality is…..
- Data Quality Management is an ongoing process!
- Something that is easily forgotten
- How often should I be stewarding my data?
Data Profiling Job:
- Aggregate functions
- Min, Max values (both on Strings and numeric values)
- Min, Max value count
- Average, Median values (both on Strings and numeric values)
- Min, Max string length
- Distinct Values
- Null values
- Null value count
- Null percentage
- Zero percentage
- Blanks percentage
- Pattern analysis
- Do values maintain a certain format?
- yyyy-mm-dd, mm-dd-yyyy, etc. for dates
- Should using a consistent pattern be a business rule?
- 5 Digit Zip Code vs 5 Digit + 4 (30096-4682)
- Casing format for names (Vernon Gomes vs VERNON GOMES)
- European vs. American format for Dates (09/24/2016 vs 24/09/2016)
- Can test for integrity of data
- Referential Integrity Test
- Check if a foreign key exists in a parent table.
- Remember the ERP Customer-Address example used before.
- How does this information help me?
- You can make key business decisions that can help you maintain your data quality initiatives.
- “To prevent having so many empty Address 1 fields, we’ll make it a required field in our ERP and CRM.”
- “I have users adding information like email addresses into an Address 2 field. This might be a training opportunity or an opportunity to add a new field.”
Various Data Profiling Tools:
- SAP Business Objects Data Services
Points to Take Home:
- Data Profiling is a powerful tool that can assist you in improving your data quality.
- But it requires the collaboration with the business to improve your data quality.
- It’s not a “one-and-done.” This is an ongoing process!
- Can be used for more than just your Data Warehousing or migration projects
- Data Profiling involves using tools to quickly analyze table(s) and determine the type of data and quality of said data.
- You can use this information to make business decisions to maintain your data quality.
Vernon Gomes is a former IT industry Systems Administrator turned Consultant. He is currently a Data and Analytics Consultant at Protiviti and specializes in ETL, data modeling, and systems architecture.