Tuesday, April 4, 2017
11:15 AM - 12:00 PM
Compliance Data Warehouse (CDW) users are skilled in data analysis and analytics. Their research depends on the quality of the data. Data Quality issues identified by users via the CDW Help Desk include: ID masking gone awry, dollar amounts too high, many missing records, recurring duplicate records, columns inconsistent with definitions, key fields with different data types, and fields with different names.
When a researcher reports a possible data quality issue, CDW staff attempt to replicate the problem. Some problems are relatively easy to fix, others are more complex, and some are beyond fixing. The best time to identify quality issues before the data is in production.
This session will cover:
- Importance of testing and validation during Extract-Transform-Load (ETL)
- Importance of checking record counts against the source system
- Importance of record layout from source system that matches the data
- Importance of standards across tables for both data names and data types
Robin Rappaport is the Data Quality Team Leader responsible for delivery of the Data Quality Initiative for Research Databases at the Internal Revenue Service (IRS). Her work and that of her team contributed to the IRS being awarded The Data Warehousing Institute (TWDI) 2011 Best Practices Award, a Computerworld Honor, and a Government Computer News (GCN) Gala Award.
Spanning both private (six years) and public sectors (since 1990), she has over 25 years of experience as a Data Quality practitioner. Her positions include Computer Programmer, Systems Analyst, and Operations Research Analyst.
She was awarded the 2017 Distinguished Service Award for her service as IQ International, the International Association for Information & Data Quality, webinar coordinator (facilitator) hosting over 60 webinars between February 2011 and August 2017.