Big Data, Data Governance, Data Quality

Why are Enterprises using Policy-based Data Remediation during Data Migration?

As per O’Reilly’s survey on the state of data quality in 2020, more than 60% of respondents chose data integration issues as a major cause of data quality issue. Today, the key challenge companies face is maintaining the overall health of their existing data. Data quality is deteriorating due to several factors. Be it the age of application, multiple iterations in the data model, or frequent changes in data, these factors are causing the data to become ungoverned and inaccurate.

As data is ungoverned, the Data Quality Index is reduced to such an extent that any insight on data is implausible and stakeholders cannot be sure about taking this data forward to their customers. To get faster and streamlined data access, companies are adopting big data technologies and cloud platforms. These migrations are now becoming a major requirement for companies and the key challenge they are facing is the authenticity and accuracy of data.

When it comes to data, enterprises face two major challenges – managing data migration with the availability of incremental data and ensuring data hygiene and governance. In this blog, let us find out how to manage these challenges with policy-based remediation. We will also discuss a unique framework proposed by Xoriant experts for policy-based remediation along with its benefits.

Importance of data governance during data migration

Enterprises struggle to manage client data that is residing across multiple systems and applications. There are trillions of rows of related data which are available in an isolated state. This data needs to be collated, further analyzed and reported. To understand the data and get a historical view and forecast on data trends and business aspects, it is very important that the data reflects the right information.

Data, which is not cleaned, governed or of value, brings the data potential down. Thus, it is a big ask for companies to gather complete data and cleanse it to enrich it. Data enrichment helps in gaining better insights too. Assessing the potential risks of data management can help enterprises and ISVs to determine the value of the available data.

With data remediation, enterprises can –

  • Ensure compliance with regulatory and legal obligations
  • Minimize costs linked with storage footprints
  • Identify sensitive data and implement suitable security measures
  • Allow the end-user to access meaningful real-time data that is appropriately cleansed, sorted and integrated

Understanding Policy-based Remediation

What is Policy-Based Remediation? Policy-based implies that companies would set up certain rules/policies such as data validations check rule, data integrity check rules or even business-driven rules to correct and enrich data quality.  Once stakeholders have defined these policies, they can be implemented as rules engine on existing client data to ensure data governance.

So, how does it work? Due to growing data size in Petabytes and Terabytes, it is a requirement to move data to Big Data or cloud platforms. When the data is migrated to Big Data or cloud platforms, the companies can cleanse and govern their data. This can be done based on the governing policies defined by the user.  These policies can be generic and can be applicable on any data or business user pertinent to their domain.

Exploring a remediation framework for data migration

Now, let us look into the details of an effective remediation approach proposed by Xoriant experts. As per experts, a remediation framework should be configurable, injectable, and extensible for migration of data from traditional disparate legacy systems to Big Data platforms. The framework should have a mature and extensive rule engine which ensures important quality checks on data and governs the data while it is extracted, transformed and loaded into the Big Data Lake.

Our experts propose a Big Data solution which works on data collated from different source systems and applies various validation, integrity, and business rules to this data to make it Quality Data. Also, this solution has different source connectors which can connect to different sources be it File, Database or APIs.

As per any traditional ETL, the data must flow through different stages to reach the final target. Different policies / checks should be enforced at each stage for data governance. The data should be cleansed and regularized in phases. The different stages of data remediation can be as follows:

  1. First stage: The schema validation check is applied at the first stage of data wherein any changes in schema from the expected one are captured and recorded for further correction.
  2. Second stage: Multiple rule engines for integrity and business validation is applied at the Second stage to reject the erroneous and redundant data.
  3. Third stage: The data which reaches the final Big Data Lake is of utmost quality and governed as per the business user governance policies.

Designing the rules for data remediation

A comprehensive and elaborate integrity rule engine is defined which can be configured for each table. Each configuration file is separate from the base code and can be configured for any schema, table, column combination.

Various rules are clubbed based on the rule type, so we have data validation rules, metadata validation rules and business rules. Each rule provides flexibility and can be configured dynamically. There is a feature provided to disable any rule to proceed with processing as per the requirement.

Now, let’s talk about Audit and Monitoring. At every stage of rules failure, an error as well as audit record is generated and maintained to provide the users with important feedbacks and an opportunity to nip the issue at source itself. Also, for key stakeholders a historical as well as present Data Quality Index can be gauged with any downstream reporting system.

Key benefits of using a data remediation framework

The key benefits of policy-driven data remediation framework are given below:

  1. The framework can be platform-independent and can be configured at every stage of data.
  2. As the rules can be set up to the most granular level, a single rule can be set up for schema table and column combination
  3. Rules Engine can be extended for more support on generic as well as business-specific data to make it more robust.

Data Governance Success Story

A leading multinational investment bank’s critical inventory and infrastructure data was spread across multiple applications. This diffused nature of business resulted in data decentralization, duplication and integrity issues. Xoriant developed a big data lake data governance solution applying policy-based data remediation process using Apache Spark, Hive and Java technology stack. The solution benefited the client with 30% reduced technology costs and enhanced governance with a single source of trusted data and business value

Read the Data Governance Success Story

Connect with Xoriant experts to discuss suitable data governance techniques for your enterprise scenario. If you would like to explore Xoriant’s solutions, services and offerings across Data Governance, click here

References

https://www.oreilly.com/radar/the-state-of-data-quality-in-2020/