This four-part series discusses various methods and considerations for keeping multiple applications in sync, including one-way, bidirectional, real-time/event-based, and hub-and-spoke.
- Sync Strategies Part 1: One-Way Syncs
- Sync Strategies Part 2: Two-Way Syncs
- Sync Strategies Part 3: Real-Time Syncs
- Sync Strategies Part 4: Syncing Multiple Applications
In this fourth and final Sync Strategies article, we look at some of the challenges that arise when trying to share data across more than two applications and ideas for how to keep records in sync using MDM.
Sticky Spider Webs
In the previous installments, we primarily focused on different approaches and considerations for syncing data between two endpoints. Point-to-point integrations serve a purpose and keep things simple for interfaces between a couple of applications. However things get complicated quickly when you need to keep records in sync across three or more applications. If each application must exchange its changes with every other application, introducing a new application requires building two integrations (both to and from) with every other application. For example, adding a fourth application could require up to six new integrations; adding a fifth could require up to eight, and so on.
It doesn’t take long--really beyond two or three applications--for the “spider web” effect of all those point-to-point integrations to become unmanageable. Although the sheer number of interfaces would keep a team of integration developers gainfully employed, it’s time to consider a different approach.
Hub and Spoke
Instead of each application having to know about every other application (and its data format, mapping rules, API communication, etc.), a better approach is to implement the hub-and-spoke pattern. This pattern uses a central, common entity to which each application connects to send and receive messages. Each application only needs to be concerned with integrating with the common hub. When a new application is introduced it only needs to integrate with the hub and once plugged in, it can readily exchange messages with any other application.
You probably recognize this as the basic pattern behind the traditional enterprise service bus (ESB) architecture. Hub-and-spoke and ESB architectures provide a number of important application integration features that are critical to a sophisticated and scalable middleware implementation including:
- Sender/receiver decoupling
- Dynamic routing
- Message translation
- Message publishing and delivery
However when you start thinking about the needs of a data synchronization use case specifically, there are additional responsibilities beyond the fundamental ESB features the hub must do, such as:
- Defining a common message format (or “canonical model”)
- Persisting a central copy of the records themselves to match and compare against incoming messages
- Determining the specific fields that changed
- Maintaining the correlation of internal record IDs across applications
- Providing a means to detect and resolve data issues such as duplicates
To accommodate these needs for application logic, record persistence, and even end user interaction, you need more than a messaging hub. You need a data synchronization engine.
Master Data Management and Dell Boomi MDM
The combination of software and business processes that govern the synchronization of data across a number of applications is collectively referred to as Master Data Management (MDM).The new Dell Boomi MDM solution serves as the central hub repository and performs the functions listed above and more:
- Managing subscribers
- Publishing messages
- Ensuring delivery of messages to integration client
- Performing rules-based matching
- Performing data quality and enrichment to stop the proliferation of “bad” data
- Defining the common model for each record
- Providing field-level updates
- Coalescing record updates
- Linking source record internal IDs
- Determining create vs. update action
- Providing a user interface to review conflicts/quarantined records
Unlike other MDM solutions, Dell Boomi MDM enables bidirectional syncs (data flows both into the MDM and back out to source applications--more on that below) and is not tied to a specific domain such as customers or products.
Integrating with Dell Boomi MDM
It is important to keep in mind that Dell Boomi MDM serves as the central hub but don’t forget about the spokes: integrations between MDM and the various source applications are still needed. Dell Boomi MDM provides a web services API to exchange data with the repository and can be used independent from AtomSphere, but of course works seamlessly when integrating with AtomSphere.
Instead of syncing records directly to the destination application, they are synced to the central MDM repository that keeps a cached version of each record called the “golden record”. When a new message is received, MDM compares the record to its repository to determine whether the record is new or already exists and which field values have changed. If a new record or changes are found, MDM publishes a notification for the other subscribed applications to retrieve.
But let’s talk a little bit about process development. As far as the integration processes go, you will simply build a bidirectional sync with the common MDM repository instead of the destination application.
Source Application to MDM
When syncing from the source application to MDM, you will build an AtomSphere process for each source application to extract the new and modified records--hopefully using one of the incremental or event-based approaches discussed in the previous articles! Then map to the MDM upsert API to write the records to the MDM repository using the MDM Connector.
MDM to Source Application
To sync updates from MDM back to the source applications, you will build a process for each application that extract the pending record updates from its respective source “queue” using the MDM Connector and then perform the appropriate action (e.g. create or update) against in the source application.
The MDM response message contains several attributes that makes syncing back to the source application easy, most notably @op (operation: CREATE, UPDATE, DELETE) and id (the source application’s internal Id for the record). With this information you can route and perform the appropriate mapping and connector calls to the source application.
Solving the Dreaded “Data Bounce” with MDM
The data bounce issue (introduced in Part 2) is a concern for bidirectional integrations. As a refresher, the data bounce issue is the scenario where the same record is synced back and forth endlessly because the record is extracted incrementally from each application based on last modified date.
MDM prevents this problem because it only publishes a change event to subscribers when there is a “real” change, that is, when some value actually changed. At worst there is a “half bounce” from MDM to the end application and then back to MDM. For example:
- User makes change to Record 1 in App A.
- Recent changes are extracted from App A (which includes Record 1) and sent to MDM.
- That update is published to and updated in App B.
- User make no other changes to Record 1 in App B.
- Recent changes are extracted from App B (which includes the update just made to Record 1) and sent back to MDM.
- However MDM detects Record 1 has all the same values and therefore does NOT publish an update to App A.
- Cycle is broken!
From the integrator’s perspective, one of the best things about using the MDM solution is that it handles all the heavy matching and record persistence logic which complicates the integration processes with the source applications. With this work done behind the scenes, it greatly simplifies your integrations, reducing development and maintenance costs.