While executing a process to query a database , map the record set to flat file and then create a .csv file to a FTP location , the execution time was found to be high when record count is larger i.e. more than 100 K.
The output files generated ( as shown here ) in both the cases are same - implying that there is no data loss.
The process flow as shown below where a data process shape is connected after the data operation is set with batching
The question is - is it a recommended practice to apply data process when batch count is used ? It appears that batch count when set , is splitting the query into multiple queries and the data process shape is combining them back together into a single file. This feature is found to be very useful , can others share similar experience specially on retrieving records from complex DB query. Additionally , can anyone share the underlying flow specially the role of data process shape ?