Does Boomi provides a build-in connector for Azure Data Lake? I can see connectors for Azure SQL Database, SQL Data Warehouse and Blob Storage but not for Data Lake
We don't currently have an Azure Data Lake Store connector but it's something we would like to do. If it's something that you would like to see, you should create an Idea. In the meantime, you should be able to connect to the REST API using the HTTP Client connector.
Thanks Jason. I've created an Idea (Azure Data Lake connector ). Is this something already on the product road map that I can expect in coming releases?
This is something which purely depends on a number of votes and implementation part.
So you can expect that to be done, but there is no any timeline for this as of now.
I have tried myself to implement using http connector and followed the Azure data Lake documentation
STEP 1 - Getting the authorization token
Service-to-service authentication: REST API with Data Lake Store using Azure Active Directory | Microsoft Docs
curl -X POST https://login.microsoftonline.com/<TENANT-ID>/oauth2/token \ -F grant_type=client_credentials \ -F resource=https://management.core.windows.net/ \ -F client_id=<CLIENT-ID> \ -F client_secret=<AUTH-KEY>
Getting the Authorization Bearer token with http POST method
Success with the response and proceed to STEP 2
Step - 2 - Creating A file in Azure Data Lake
Using the access_Token am trying to create a file in Azure Datalake (below Curl Command)
REST API: Filesystem operations on Azure Data Lake Store | Microsoft Docs
curl -i -X PUT -L -T 'C:\temp\list.txt' -H "Authorization: Bearer <REDACTED>" 'https://<yourstorename>.azuredatalakestore.net/webhdfs/v1/mytempdir/list.txt?op=CREATE'
Struck with getting the file to http client connector
Tried to place the content to http client connector Body (using message shape created data and sending the content to http connector)
I could see the file is creating in Azure DataLake but the file content is zero(0 bytes )
Require your help in acheiving the CURL command work
Appreciate your ideas and thoughts/help if any one as implemented With Azure Data Lake REST API using the HTTP Client connector.
We were able to upload files to Azure data lake using Http connector. There will be 2 REST calls to Azure data lake. In the first call, the resource path would be data lake including folder and file name. In the second call, resource path would be the response from 1st call. These are helpful links which assisted in getting one for data lake:
How to Upload a File to Box.com
Let me know for any question and I can elaborate it further if needed.
First call with
Following the documentation part
Apache Hadoop 2.9.0 – WebHDFS REST API
Step 1: Submit a HTTP PUT request without automatically following redirects and without sending the file data.
curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=CREATE [&overwrite=<true |false>][&blocksize=<LONG>][&replication=<SHORT>] [&permission=<OCTAL>][&buffersize=<INT>][&noredirect=<true|false>]"
Usually the request is redirected to a datanode where the file data is to be written.
HTTP/1.1 307 TEMPORARY_REDIRECT Location: http://<DATANODE>:<PORT>/webhdfs/v1/<PATH>?op=CREATE... Content-Length: 0
I tried sending in the first call with the resource path (including folder and file name)Example - https://host:port/webhdfs/v1/Lab/Boomi/list.txt?op=CREATEExecuting the Boomi processHttp Client connector operation - By unchecking the Return HTTP Errors got the error asTest execution of MarketoAzure completed with errors. Embedded message: (307) - Error message received from Http Server, Code 307: Temporary Redirect; Caused by: Error message received from Http Server, Code 307: Temporary Redirectwith checking Return HTTP Errors No response for the http call Could you please elaborate/share details on the first and second calls you mentioned in detail
Here are the details of both the REST call and other information:
1. Content Type: multipart/related; boundary=BOOMI
HTTP Method: PUT
This will result in error "Http Server, Code 307: Temporary Redirect"
Capture the output and store in a document property. It will be something like https://host/webhdfs/v1/folder/file?op=CREATE&write=true
Use Data Process shape to remove https://host/ from the above value and store in another property. It's done since https://host/ is already specified as URL in the connector object
2. In the second call, Content Type: multipart/related; boundary=BOOMI
In the resource path, provide the property created from above call. If successful, you should receive Http Server, Code 201
There were few posts asking to include carriage return in the document that is being sent to Http connector but in my case it worked without that.
There could be better ways to achieve it. I just did for evaluation purpose but didn't use Boomi for ingesting data. Let me know if it doesn't work.
Thank you Srivastava
The issue am facing is with First call - as you described Am getting "Error message received from Http Server, Code 307: Temporary Redirect"
and i don't see any output for that connector call to capture
And you have mentioned it as Capture the output and store in a document property.It will be something like https://host/webhdfs/v1/folder/file?op=CREATE&write=true
could you please help with the output for the request?
Is there a reason you're using the WebHDFS API instead of the Data Lake API?
The requirement for me is to place file in to Azure data lake and when i browse through the documentation i found the information and i am trying to implement
Also when i check for the Data Lake API for the file system which uses the WebHDFS FileSystem APIs
Reference link - Azure Data Lake Store REST API | Microsoft Docs
Please let me know if any other way or reference to achieve my requirement (Placing the file into Data Lake Store)
Sorry, yes, that's correct.
Thank you all for your inputs
The issue is with the parameters i used as DynamicProcessProperties for the operation which overrides the current inbound data. changed the http connector to have no parameters, and use DynamicDocumentProperties to expose the "FileName" paramater and it works fine
Glad it worked!
You don't need to manually handle the oauth flow. The connector can/will do this for you.
Retrieving data ...