AnsweredAssumed Answered

Salesforce to DWH Integration - Concerns for initial full data load and future delta loads

Question asked by ErikIvarsson3431 on Aug 29, 2016
Latest reply on Sep 2, 2016 by vreddy

Hi all!

 

Currently I am working a bit with a Salesforce to Datawarehouse for a client and I have been taking over the project close to testing period start, so I have limited access to early design decisions etc.

 

The integration concerns the extract on a daily basis of ~20 Salesforce Objects (both Standard and Custom) where the selection of Objects is based on the Last successfull Run date parameter in Boomi to compare this to the Last Modified Date in Salesforce, to retrive the ones that are changed since we last run the process. My first question is regarding this paramter (How is the last Successful Run Date-Time Set? ), I assume this is based on an Environment Level? Meaning that we should not expect any issues running the test in Test Envionment and later on deploying to Production Environment -> "resetted" Last Successful Run Date?

 

Currently we have 2 Objects in Salesforce that has close to a million records, 1 Object that has ~600 000 records, 5 that have 100 000 - 500 000 and then the rest ~100 - 50 000 records. The regular Salesforce get Operation Batches the results and there is a Query limit of -1, so unlimited. The above numbers are for Production system, in Test we have ~50% of the data load for the objects. My second question is in regard to the Salesforce limits for querying data etc. what can we expect of the performance, exceptions etc. for the batch result option, is there anyone who has done similar integrations and can share some insights here?

 

Once the initial full load is completed there will not be any high data volume on a day to day basis so I am only concerned for the initial data load.

 

If I were to split this up in some way, so that we for example can run it in smaller batches it would require me to hcange the architecture of the processes and also make some Salesforce implementation to selecting records on a filter basis instead of using the last successful run date so I do not want to do this if not necessary.

 

So any experiences or suggestions here is highly appreciated.

Outcomes