Tuesday, April 4, 2017
11:15 AM - 12:00 PM
Compliance Data Warehouse (CDW) users are skilled in data analysis and analytics. Their research depends on the quality of the data. Data Quality issues identified by users via the CDW Help Desk include: ID masking gone awry, dollar amounts too high, many missing records, recurring duplicate records, columns inconsistent with definitions, key fields with different data types, and fields with different names.
When a researcher reports a possible data quality issue, CDW staff attempt to replicate the problem. Some problems are relatively easy to fix, others are more complex, and some are beyond fixing. The best time to identify quality issues before the data is in production.
This session will cover:
- Importance of testing and validation during Extract-Transform-Load (ETL)
- Importance of checking record counts against the source system
- Importance of record layout from source system that matches the data
- Importance of standards across tables for both data names and data types
Robin Rappaport is the Data Quality Team Leader responsible for delivery of the Data Quality Initiative for Research Databases at the Internal Revenue Service (IRS). Her work and that of her team contributed to the IRS being awarded The Data Warehousing Institute (TWDI) 2011 Best Practices Award, a Computerworld Honor, and a Government Computer News (GCN) Gala Award.
Spanning both private (six years) and public sectors (since 1990), she has over 25 years of experience as a Data Quality practitioner. An undergraduate degree in Economics with Computer Science led to graduate work in Operations Research with a concentration in Mathematical Modeling in Information Systems. Her positions include Computer Programmer, Systems Analyst, and Operations Research Analyst.
She facilitates webinars for IQ International (IAIDQ) and is a member of the Certified Analytics Professional (CAP) Exam Committee for the Institute for Operations Research and Management Science (INFORMS).