This article describes how to design a process to retrieve multiple pages of results from an API call.
Get the example process used below from the Process Library here.
This pattern is used when calling a web service query or search API that paginates results. Pagination is when a query returns more results than the API allows to be transferred in a single response ("page") and you must continue to make subsequent calls to request additional pages until there are no more left. This pattern is typically used with the generic HTTP Client or Web Services SOAP Client Connectors because application-specific connectors will generally perform this paging logic "behind the scenes" as a value-added functionality.
This is an advanced pattern and can result in run-away process executions if not designed and configured carefully. You should be familiar with process execution, design, and troubleshooting before implementing.
The following is a generic pagination flow found in many modern APIs:
- Initiate the first call with the desired query criteria
- Return the first page of results
- If there are more results available, provide a token or page number to be used in the next call
- Make a subsequent call referencing the given token
- Repeat steps 2-4 until there are no more results left
However APIs will vary in their specific implementations. You will need to refer to API's technical documentation for specific details on how to:
- Invoke the first call
- Detect if there are results
- Detect if there are more results
- Invoke subsequent calls
The overall objective is to emulate the behavior of application-specific connectors. This is best accomplished by encapsulating the paging logic within a Data Passthrough subprocess that can be called by a "main" process. The primary logic includes:
- Paging through all the results before returning the entire result set to the main process
- Splitting the result set into one logical record per document
- If no results are found, do not return any documents so the main process will not continue
- You can also consider using dynamic document properties to return custom "metadata" values
Using a subprocess is desirable because it:
- Enables the multiple pages of results to be returned and executed as a single group of documents in the main process
- Simplifies the main process by encapsulating the connector and paging logic
- Allows for reusability if you need to perform similar query calls in other processes
The general approach is:
- The main process calls the subprocess that performs the following:
- Initialize the query parameters, generally via document properties.
- Call the API via generic connector.
- If no results are returned, stop. Else continue.
- Branch 1:
- Send to a Return Documents shape.
- Branch 2:
- If there are no more results, stop. Else continue.
- Set the the parameters for the next call.
- Connect the process path to the same Connector shape in Step 3 above.
- The subprocess will continue to execute the shapes recursively, accumulating results on the Return Documents shape until no more results are returned.
- Once complete, THEN the subprocess will return the entire group of result documents to the main process.
The following process screen shots illustrate a representative flow and are not fully configured. You will need to determine the specific mechanics of the API and configure the process shapes accordingly.
The main process:
- Initiates the workflow.
- Calls the "Get all Records" subprocess to retrieve all the records from the source application.
- Receives the results of the subprocess and continues executing as needed, for example mapping and sending to a destination application.
"Get all Records" Subprocess
The subprocess performs all the paging logic, accumulates results, and finally returns the documents to the main process. This example uses an HTTP Client connector and assumes the query criteria and paging logic are specified as querystring parameters on the URL, however this can be adapted to meet your specific application.
Let's take a closer look at some of the individual shapes.
- Initialize - Establish the request values for the first query. This typically involves setting the initial URL parameters or even a JSON or XML payload but without any reference to a "next token" value.
- Connector Call - When using the HTTP Client connector to call a GET endpoint, the connector operation will typically be configured with replacement variables for the URL resource path and/or HTTP headers to accept dynamic values. This will be defined even if not actually used in the initial call. When using the HTTP Client or Web Services Client connector to call a POST endpoint with a payload, there may not be any replacement variables configured in the connector operation.
- Found result? - Inspect the response document to determine if any results were returned and if not, stop. If this is the first iteration, no documents will be returned and the subprocess and main process will stop. If this is a subsequent iteration, the subprocess will complete and return the previously accumulated documents on the Return Documents shape to the main process. Again the exact logic will be specific to your API but several common techniques include:
- The response includes an explicit field for the number of results (for example, numResults = 0)
- Check for the existence of a known field in the first record in the result list, such as an internal ID or other primary key (for example, internalId != "" without the quotes)
- If no results are found the connector call will fail or return a non-HTTP 200 response code
- Split by logical record (optional) - If desired, emulate the behavior of most application-connectors by splitting the result into individual logical records per document. Note the feasibility of this may vary when attempting to reuse the same subprocess for different record types if the API returns structurally different responses.
- Return Documents - Documents will accumulate here until this subprocess completes, that is, when there are no more pages left.. Then all documents will be returned to the main process as a single group of documents.
- Has more results? - Inspect the response document to determine if there are more results to be retrieved. If not, the subprocess will stop and return the accumulated documents to the main process. Again the exact logic will be specific to your API but several common techniques include:
- The response includes an explicit field whose value indicates more result are available, such as a "token" for the next page to use in the subsequent query or a Boolean value such as "hasMore=true"
- The response includes an explicit field whose value is a counter which must be incremented before making the next call. For example, "page X of Y" or "offset number of records." To increment the value for the next, you can capture the value in a process property and increment using a simple script as illustrated in How to Create a Do-While Loop in the Process using a Custom Counter.
- There is no explicit indication and you simply need to query again and if no results are returned then you know you're done
- Prepare next request - Set document properties or construct a payload accordingly for the subsequent request. This often includes referencing the "next page token" from the current response. Depending on the specifics of your API you may need parse, increment, or otherwise manipulate the response values using a Data Process Custom Scripting shape or Map shape.
- Reconnect back to the same connector call - Simply connect the path back to the same Connector shape.
- This is a recursive pattern. As with recursion in any type of programming you must carefully design and configure the boundary conditions to avoid getting stuck in an infinite loop. Here are a few tips to help mitigate during development:
When first developing, test on local atom so you can manually terminate the atom service if you accidentally fall into an infinite loop.
Be sure to conduct positive and negative tests to account for all possible outcomes. For example, be sure to test when there are 0 results, exactly 1 result, >1 results, connection exceptions, etc.
Consider maintaining a simple "counter" in a document or process property that is incremented (using a simple Data Process custom script) for each iteration. Output this to the process log for debugging. You can also use the property to enforce a maximum number of iterations if desired.
IMPORTANT Keep the number of shapes executed recursively to an absolute minimum (this is another reason for using a subprocess). After several hundred shape executions (the exact number will vary based on atom memory configuration) the process will fail with a StackOverflowException.
- Remember that document properties will continue to propagate through the recursive connector calls as long as you do not set parameters directly on the connector shape.
- If looking to create a generic subprocess for any time you need to query a list of results from the application (and if technically feasible given the API mechanics), you can "parameterize" it by setting document or process properties in the main process for values like record type, search criteria, page size, and more.
- This pattern should NOT be used to simply loop through documents individually. Remember process shapes inherently loop through and execute for each document within a group. For situations in which documents should be executed individually (e.g. one at a time), use the Flow Control shape.