| 
 |  | Monday, April 3, 2017  05:15 PM - 06:00 PM
 
 
 	What is data profiling?:   Definition of data profiling Process of doing itHow does it relate to Data Quality Management?Why is data quality important?Vern the business guy - He cares about getting his information and getting it quickly and accurately.Vernon the IT guy - He cares about maintaining the security of the data (reliable) and keeping his infrastructure scalable (when there’s more data) and available. 
Data Profiling and Data Quality Management:   Doesn’t directly add data qualityData Profiling is just a tool that can help meet your data quality objectives!Has to be combined with your business users (usually who owns the data)Not using a technical fix to solve what might be a business problemExample: ERP System with a Customer table and Address TableCustomer with Address line deletedFirst IT thought: Let’s go back and add an address to the Address table or remove the Customer Table.First Business User thought: Is this no longer a valid Address or Customer?Generally only thought about during a data migrationLegacy to New (CRM, ERP, QMS, etc.) systemCreating a data mart/warehouseBut, the reality is…..Data Quality Management is an ongoing process!Something that is easily forgottenHow often should I be stewarding my data? 
Data Profiling Job:   Aggregate functionsMin, Max values (both on Strings and numeric values)Min, Max value countAverage, Median values (both on Strings and numeric values)Min, Max string lengthDistinct ValuesNull valuesNull value countNull percentageZero percentageBlanks percentagePattern analysisDo values maintain a certain format? yyyy-mm-dd, mm-dd-yyyy, etc. for datesShould using a consistent pattern be a business rule?5 Digit Zip Code vs 5 Digit + 4 (30096-4682)Casing format for names (Vernon Gomes vs VERNON GOMES)European vs. American format for Dates (09/24/2016 vs 24/09/2016)Can test for integrity of dataReferential Integrity TestCheck if a foreign key exists in a parent table.Remember the ERP Customer-Address example used before.How does this information help me?You can make key business decisions that can help you maintain your data quality initiatives.“To prevent having so many empty Address 1 fields, we’ll make it a required field in our ERP and CRM.”“I have users adding information like email addresses into an Address 2 field. This might be a training opportunity or an opportunity to add a new field.” 
Various Data Profiling Tools:   SAP Business Objects Data ServicesInformationStewardTalend 
Points to Take Home:   Data Profiling is a powerful tool that can assist you in improving your data quality.But it requires the collaboration with the business to improve your data quality.It’s not a “one-and-done.” This is an ongoing process!Can be used for more than just your Data Warehousing or migration projectsData Profiling involves using tools to quickly analyze table(s) and determine the type of data and quality of said data.You can use this information to make business decisions to maintain your data quality.     
 
              Vernon Gomes is a former IT industry Systems Administrator turned Consultant. He is currently a Data and Analytics Consultant at Protiviti and specializes in ETL, data modeling, and systems architecture.
 
 |  |  |