Faster Big Data Testing and Migration with Metadata-driven Framework
Our client provides cloud-based survey development and marketing analytics solutions for a wide range of industries, including 98% of the Fortune 500. As part of a data infrastructure modernization initiative, the company wanted to migrate the legacy SQL server data warehouse to SaaS-based Snowflake. Xoriant‚Äôs expertise in database migration and big data testing was leveraged to migrate the data and ETL scripts with optimal speed and accuracy.
The key objectives included:
- Migrate 500+ ETL scripts, 1300+ DB objects, ~9TB data to Snowflake.
- Ensure testing accuracy up to two decimal points for all migrated data and ETL scripts.
- Expedite data migration and testing process to meet stringent timelines.
- Create a parallel Snowflake environment to compare/validate SQL and Snowflake data.
Xoriant Solution | Key Contributions
The Xoriant team assessed the migration needs and scope with respect to timelines. The analysis revealed that although the SQL to Snowflake code migration utilized multiple accelerators, data migration and continuous data testing between the SQL server and Snowflake would require test automation to achieve the desired speed and accuracy. To accelerate migration and testing, the Xoriant team came up with a flexible, metadata-driven solution framework which acted as a data validation accelerator.
Solution features included:
- Test Automation Framework
- The Test Automation framework was designed to ensure the table metadata and test criteria were captured in the framework tables.
- Metadata enabled manual and scheduled execution of test cases.
- For ongoing regressions, the readily available metadata enabled faster execution of scheduled automated testing.
- Objects and Data Migration
- Migrating table structures from SQL to Snowflake (with/without data).
- Scheduling to enable migration of data during off-hours.
- Quality Assurance ‚Äď Table Structure and Data Validation
- Metadata-driven parameters facilitated testing based on data types (counts, distinct, aggregate).
- Compared the table structure and data using the automation solution with two decimal-point accuracy.
- Scheduled and automated data validation during off-hours.
- Executed continuous regression testing on schedule.
- Published a data validation report with sample records of unmatched datasets for analysis.
- Parallel Environment for Data Comparison
- Ran data flows in parallel for a specific period to compare the quality and accuracy of migrated history data and daily delta validation.
- After migration, automation tools are still being used for faster change requests and enhancement data and structure validation.
High Level Architecture
- Delivered 50% faster migration and test coverage with metadata-based testing.
- Accelerated validations by testing 1300+ objects in a parallel environment.
- Ensured seamless conversion and testing of 500+ SQL scripts to Snowflake queries.
- Migrated 1000+ table structures and history data to Snowflake.
- Improved efficiency by 60% using test automation, resulting in meeting strict migration timelines.
- The custom test automation framework enabled the client to achieve its strategic migration goals.
SQL Server | Snowflake | Python 3.8 | Airflow