AnsweredAssumed Answered

Performance improvement

Question asked by anu.bhardwaj044204 on Jun 21, 2018
Latest reply on Jun 21, 2018 by anu.bhardwaj044204

Hi,

I have a scenario which I need to deal with to improve my process performance. The process is currently getting around 90k records from a flat file which are needed to be upserted into database(on-prem).

 

Scenario: The flat file contains duplicate records like

3RD PARTY UNSPEC        0051210001         3RD PARTY UNSPEC PLAN 0001   US          N/A - No match in the FF database                                                               

3RD PARTY UNSPEC        0031210001         3RD PARTY UNSPEC PLAN 0001   US          N/A - No match in the FF database                                                               

3RD PARTY UNSPEC        0031210001         3RD PARTY UNSPEC PLAN 0003   US          N/A - No match in the FF database

 

What is needed of the process is to update the last duplicate record(which is assumed to be most updated one) and also take care that if there is a null value for any field then the last known non-null value should not be altered. For example:

3RD PARTY UNSPEC        0031210001         3RD PARTY UNSPEC PLAN 0001   US          N/A - No match in the FF database                                                               

3RD PARTY UNSPEC        0031210001         3RD PARTY UNSPEC PLAN 0003             N/A - No match in the FF database

 

Here I have removed the country name in the last record. Now the data that should be updated in DB should be as:

3RD PARTY UNSPEC        0031210001         3RD PARTY UNSPEC PLAN 0003     US        N/A - No match in the FF database

 

Approach being followed:

I am checking for duplicates in the flat file and then saving the duplicate and unique records in separate cache(see attachment). Thereafter, I am carrying out a dynamic insert and dynamic update separately.

This approach is taking a lot of time as for separating out duplicate records, the documents are needed to be processed individually.

 

Can anyone suggest any other approach? Also, just to add, the records in the flat file are being pulled from a history table and this is why we see so much anonymity in data in the file and this is what i presume is making the processing difficult and effecting performance.

Attachments

Outcomes