Data Integration Is Not a Stand-Alone Project

February 17, 2022 | Data Management | Slavko Kastelic

For a variety of reasons, data integration is not a stand-alone project. It is dependent on and includes other aspects of modern data management such as data quality, data modeling and architecture, and so on.

1. Data Integration Introduction

For successful business, modern companies are increasingly relying on commercial information solutions such as ERP, CRM, online store, and warehousing applications. The complexity of data integration grows in direct proportion to the number of data sources in the organization.

Companies are increasingly purchasing commercial information systems rather than developing their own information solutions. Furthermore, each purchased application comes with its own set of master data, transaction data, and reporting data that must be integrated with the company’s other data. Even ERP systems that run the organization’s common functions rarely include all of the necessary data and must integrate with other data sources within the company.

Companies must also integrate data from external sources. This includes data as: suppliers and customers, government agencies, commercial data providers (weather, traffic), and government institutions (demographic data, purchasing power, GDP, etc.).

As a result, companies now have hundreds or even thousands of databases, and integration processes have become a critical, central responsibility of every IT department.

According to the Talend and Cohesity Data Health Survey, midsize and large businesses use over 900 different applications and over 1,400 integrations that link them .

Data integration is a process that encompasses all types of data exchange within and outside of the company, which occurs in various forms and locations.

If not properly managed, the process can overburden IT resources and capabilities.

2. Data Integration Definition

Data integration encompasses the practices, architectures, and infrastructure required to achieve consistent data access and delivery across all business processes and associated applications.

To put it simply, modern data integration is the process of combining data from various sources into a single, unified view of the data.

3. Data integration common use-cases

Common use-cases include:

  • Data warehousing and data lakes
    This is probably the most well-known data integration implementation, which provides operational intelligence and management decision support and is also known as ETL (ELT). Extraction, transformation, and loading of structured and unstructured data into data warehouses are all part of the process.
  • Data migration and conversion
    A Data migration project involves moving or copying data from System A to System B and subsequently removing or decommissioning System A. Examples are application migration, storage replacement, system or application upgrades, and disaster recovery.
  • Enterprise application integration
    Interoperability is the second name for Application integration. It synchronizes operational data between two or more applications in real time. For example, when a customer places an order in an online store, an integration flow is initiated that updates and enriches data in other applications (warehouse, ERP, etc.) in real-time.
  • Master data management (MDM)
    MDM is a process for managing non-transactional corporate data consistently. Customers, clients, products, accounts, and internal reference data are common MDM candidates. MDM hubs are commonly used in modern MDM.

4. Four Phases Of Data Integration Implementation

Data Integration (DI) ensures data delivery on time, on the right place in the required format.

DI follows a development life cycle that begins with planning and continues with development, testing and implementation. Once implemented, integrated systems must be managed, monitored and enhanced. 

Technical implementation of data integration seems to be the biggest challenge. However, there are other equally as big or even bigger challenges that arise from the individual phases of data integration.

Data integration steps are:

1. Plan and Analyze Data Integration Requirements
2. Design
3. Implementation
4. Testing

Business users – not IT – should initiate a data integration initiative in a company. A person, internal champion, who understands the company’s data assets will be able to successfully lead the project to make it consistent, successful and beneficial.

Teams of business analysts, data stewards, architects and IT must cooperate to get data in a certain place, in a certain format and integrated with other data. 

Data Migration activities, tools, technologies, techniques and deliverables according DAMA DMBOK

1. Plan and Analyze Data Integration Requirements

1.1 Define Data Integration

Data integration requirements must consider business objectives, internal data retention policy, all relevant laws and regulations and other parts of the data lifecycle.
The requirements will define the type of data integration model,  technology and services necessary to fulfill these requirements.
Defining requirements will also reveal and create corresponding metadata, which should then be managed along the complete data lifecycle, from discovery through operations and final retirement. Complete and accurate metadata significantly reducesthe risks and costs of data integration.

1.2. Perform Data Discovery 

Data discovery identifies data sources as well as the method through which data should be merged. It combines a sophisticated metadata search with real content. It also includes assessing high-level data quality through data profiling or other analyses. Typically, there will be a discrepancy between what is supposed to be true and what is actually discovered to be true.

1.3. Document Data Lineage

High-level data lineage describes how data is obtained or created, where it moves and is changed, and how the data is used for  analytics, decision-making, or triggering events. 
Detailed data lineage includes the rules for data transformation and frequency of changes.
Lineage also ensures that the effects of any changes on data flow can be analyzed.

1.4. Data Profiling 

Data profiling allows us to understand the content and structure of the data, so it is essential for successful data migration. In the realty, data profiling shows, that the actual data content and structure often differ from what is expected. Sometimes these differences are small, but if they are large enough, they can fail the whole data integration project. 
Data profiling reveals the differences taken into account in the design and if it is skipped, then differences will only be discovered at the end in testing or even operations phase .

Profiling analyses:

  • Data format from: data structures (eg table structures) and from actual data
  • Anomalies: null, blank or default data, deviations from the set of valid values,
  • Patterns and relationships between data sets

Assessing data quality is also one of the goals of profiling. It requires documenting business rules and measuring how well the data fits them. Data accuracy assessment is performed compared to verified and confirmed data sets.

1.5. Collect Business Rules 

Business rules are an important part of data integration requirements. They include definitions of business terms, definitions between related facts and definitions about constraints, actions and derivations.

Business rules support data integration and interoperability to:

  • evaluate data in data sources and data targets
  • manage data flows
  • monitor operational data
  • manage automatic event triggers and alerts

2. Design

Design is the most important part of the data integration initiative and includes the following steps:

  • Analysis of the business requirements and Business Requirements Specification (BRS) that  includes answers on:
    • reasons for data integration, its objectives and deliverables
    • data source/target systems included in data integration 
    • availability of required data 
    • which business rules must be followed
    • what is required SLA
  • Analysis of the source systems includes answer on:
    • who is the system owner
    • is documentation available
    • what are the options for data extraction, important for incremental or full extracts
    • what is the required/available frequency and volumes of the extracts 
    • whether the quality of the data is as expected
    • consistency of the data field across the data pipelines
  • Analysis of any other non-functional requirements like: available data processing window, system response time, number of users, data security policy, backup policy, SLA requirements.
  • And the final and (most) important one – who will pay the bill for implementation, maintenance and future upgrades?

Software Requirements Specification (SRC) document includes results from steps above. All stakeholders involved in the data integration project must sign the document before implementation.

3. Implementation

Based on the BRS and SRS, a feasibility study should be performed to select the tools to implement the data integration system. Companies which are starting with data warehousing meed also decide which tools they will need to implement the solution. Other companies which have already started DI projects are in an easier position. They already have experience and can leverage the existing system and knowledge to implement the DI more effectively. There are cases, however, when using a new, better suited platform or technology, makes a system more effective compared to staying with existing company standards. For example, finding a more suitable tool which provides better scaling for future growth/expansion, a solution that lowers the implementation/support cost, lowering the license costs, migrating the system to a new/modern platform, etc.

4. Testing

Along with the implementation, proper testing is a must to ensure that the unified data is correct, complete and up-to-date.

Both technical IT and business need to participate in the testing to ensure that the results are as expected/required. Therefore, the testing should incorporate at least a Performance Stress test (PST), Technical Acceptance Testing (TAT) and User Acceptance Testing (UAT).

5. Data Integration Techniques

There are several organizational levels on which the integration can be performed:

  • Manual Integration or Common User Interface – users operate with all the relevant information accessing all the source systems or web page interfaces. No unified view of the data exists.
  • Application Based Integration – requires the particular applications to implement all the integration efforts. This approach is manageable only in the case of a very limited number of applications.
  • Middleware Data Integration – transfers the integration logic from particular applications to a new middleware layer. Although the integration logic is not implemented in the applications anymore, there is still a need for the applications to partially participate in the data integration.
  • Uniform Data Access or Virtual Integration – leaves data in the source systems and defines a set of views to provide and access the unified view to the customer across the whole enterprise. For example, some details for customer information, can be in different systems and virtualisation rings them in one unified virtual view. The main benefits of the virtualisation are nearly zero latency of the data updates propagation from the source system to the consolidated view, no need for a separate store for the consolidated data. However, the drawbacks include: limited data’s history and version management, access to the user data generates extra load on the source systems.
  • Common Data Storage or Physical Data Integration – usually means creating a new system which keeps a copy of the data from the source systems to store and manage it independently of the original system. The most well known example of this approach is called Data Warehouse (DW). The benefits comprise data version management, combining data from very different sources (mainframes, databases, flat files, etc.). The physical integration, however, requires a separate system to handle the vast volumes of data.

6. Data Integration Dependencies

Data integration is not a stand-alone project but is rather heavily dependent on other areas of data management like:

  • Data Governance: For governing the transformation rules and message structures
  • Architecture: For designing solutions
  • Data Security: For ensuring solutions appropriately protect the security of data, whether it is persistent, virtual or in motion between applications and organizations
  • Metadata: For tracking the technical inventory of data (persistent, virtual, and in motion), the business meaning of the data, the business rules for transforming the data, and the operational history and lineage of the data
  • Data Storage and Operations: For managing the physical instantiation of the solutions
  • Modelling and Design: For designing the data structures including physical persistence in databases, virtual data structures, and messages passing information between applications and organisations

Data integration is not a stand-alone project. It is heavily dependent on other areas of data management.

Conclusion

Data integration and interoperability is a complex process that is crucial to successfully run a modern business that is supported with hundreds or even thousands of applications. It requires careful analysis, design, implementation, monitoring of data integration processes. We encourage you to get in touch with our experts that can help you with all the steps along this journey. 

Need Help?

We can help you prepare the data management strategy and connect the dots between different data management areas as DQ, MDM and DI – since data integration is not a stand-alone project. Many organisations find CRMT as a reliable, highly experienced and technologically independent partner for the: 

  • preparation of strategic plans for data integration, 
  • finding appropriate data integration solutions for architecture definitions, 
  • analysis of data integration infrastructure gaps and finding best-fit missing products 
  • preparing an implementation data integration blueprint and 
  • implementing data integration solutions

Data Migration in 6 Steps

Data migration is a project where we move or copy data from System A to System B, and remove or decommission it in System A, like in application migration, stor...

read more

Trusted Customer Data With Less Effort

Quality data is the epitome of every decision in a data-driven company. During the pandemics, companies across the globe realized they face challenges in this k...

read more

Ready for the next step?

Our team of experts is here to answer your questions and discuss how we can boost your operational efficiency by merging rich tradition with a progressive mindset.