How to sync records incrementally by Last Modified Date

Document created by Adam Arrowsmith Employee on Aug 31, 2015Last modified by Adam Arrowsmith Employee on Sep 20, 2016
Version 5Show Document
  • View in full screen mode

This article describes a best practice approach for syncing records incrementally between applications based on a last modified date value captured from the source application itself and persisted.

 

 

Use Case

You want to incrementally sync/extract records from a source system that have been created or modified since the last time the process ran.

 

Approach

The general approach for incremental syncs by last mod date is:

  1. Query records greater than a persisted process property containing the most recent last mod date encountered during the last execution.
  2. Capture in a temporary process property the most recent last mod date value from the records (the cleanest way to do this is a simple scripting map function in a Business Rules step before you routing/mapping records. The function gets the current variable value, compares against the current property value, and if greater updates the property with the new value. See How to manipulate properties without a Map shape.) In this approach sorting doesn’t matter.
  3. Process the records.
  4. Upon successful completion, persist the temp process property to save for next time.
  5. If failure, don’t update the persisted value and re-pull all the records next time.

 

Always use the last modified date from the source records themselves. Do not use Boomi’s built-in Last [Successful] Run Date values for this purpose. Minor variations in server time stamps can cause records to be missed.

 

Considerations

  • If there are errors halfway through, records are reprocessed and can get backed up if there are a lot of records.
  • The integration design and destination application must be idempotent—able to reasonably process the same record twice. This typically means leveraging “upsert” APIs if available in the destination app or duplicate-checking within the workflow (can be slow).


Alternative Approaches

  • If reprocessing all the records is impractical, the records would need to be sorted and processed in sequence. This would allow the process to persist the last mod date in between records. Search for the KB article "How to Sort a group of Documents using Groovy”). Note records must be split into the documents at the granularity by which you want to sort.
  • However be mindful that while processing records one at a time (using the Flow Control Run Individually step) would allow for maximum precision of persisting the most recent last mod date, it will be VERY inefficient for all but the smallest data sets. Instead find a reasonable compromise of batch size (e.g. several hundred documents).
  • Note that even with the individual or batching alternative, the integration should be idempotent.
  • Also consider some document failures may be due to “bad data” and therefore will never process successfully without correction in the source app. These failures should be logged appropriately but the last mod date moved forward to avoid endlessly reprocessing the bad record.

 

See also Sync Strategies - One-Way, Two-Way, Real-Time, Multiple Applications.

6 people found this helpful

Attachments

    Outcomes