In this post we will use Azure Data Factory (ADF) platform to move data from Azure SQL DB to Azure File Storage. We will work on the same cloud integration example we did with Logic Apps, but this time with the specialized data integration service.
Overview of Azure Data Factory
Azure Data Factory is a cloud-based ETL and data integration service that allows you to create data-driven workflows (pipelines) for orchestrating data movement and transforming data at scale.
ADF provides a visual environment to easily construct ETL (extract-transform-load) and ELT (extract-load-transform) workflows or pipelines to move data using OOB data connectors.
Ingest, move, prepare, transform, and process your data in a few clicks, and complete your data modeling within the accessible visual environment. This serverless service does all the heavy lifting for you. You as data integrator, just need to focus on the data.
Don’t worry, if needed you can also write custom logic or call Azure Functions / Logic Apps / APIs etc.
Concept & Components
ADF is composed of following key components. You can use these components to build data-driven workflows with steps to transform and move data.
Data factory location
You need to specify the location for the data factory at the time of its creation. The Data Factory location is where the metadata of the data factory is stored and where the triggering of the pipeline is initiated from.
You create data factory workflow using pipelines that define the movement or transformation of data from one location to another. A pipeline is a logical grouping of activities that performs a unit of work. Together, the activities in a pipeline perform a task.
An activity defines the processing step in the pipeline or the action to be performed. Activities can be of three types: data movement activities, data transformation activities, and control activities.
When you trigger the execution of a pipeline, it creates a pipeline instance represented by a pipeline run.
Data flows are used to create data transformation logic to transform any size of data.
A connection to the data source or a compute resource. Linked services are your connection strings, which define the connection information that’s needed for Data Factory to connect to external resources.
Datasets represents the data structures within a data source. You use datasets in your activities as inputs or outputs for the data you want to work on.
Azure integration runtime provides a fully managed, serverless compute to move data between cloud data stores in a secure, reliable, and high-performance manner. It acts like a bridge between the activity and linked Services. It’s referenced by the linked service or activity, and provides the compute environment where the activity either runs on or gets dispatched from.
SSIS Integration Runtime can be easily used to move and run on-premise SSIS packages in the cloud using familiar SSIS tools.
The IR Location defines the location of its back-end compute, and essentially the location where the data movement, activity dispatching, and SSIS package execution are performed. The IR location can be different from the location of the data factory it belongs to.
Create Azure Data Factory pipeline to move data from Azure SQL DB to Azure File Share
Now as we have some basic knowledge about Data factory, we can get started with our data integration project.
In this solution, we will create a Data Factory with a pipeline that will orchestrate the movement of data from an Azure SQL DB to an Azure File Share.
You can also refer the Logic Apps project for the same data integration scenario, here: Logic App for Azure SQL DB to Azure File Storage Workflow
Create Data Factory
Go to Azure Portal and search for Data factory to create a new data factory instance. Select resource group, unique name, and factory location where the metadata will be stored.
Once it is deployed, go to ADF portal from the “Author & Monitor” link to create pipeline and related components.
Create Linked Services
We will create two linked services to hold connection info for our source Azure SQL DB and target Azure File Share.
Click on Manage > New to add linked services.
Select the data stores (Azure SQL / Azure File Storage) and provide appropriate connection details.
Now create two datasets, one each for our linked services. These datasets will represent the data structure of our source and targets.
Click on Author pencil icon and then Datasets.
For file dataset, also specify the file format and location.
We can now create the workflow with required activities to complete the data movement or transformation task.
Add Copy data activity to move the data from source to sink (target). On the source tab, select the SQL dataset we created and call the table/view/stored procedure to fetch the data.
Set the target or sink.
Trigger the Pipeline
You can run your pipeline immediately or set a recurring trigger.
Monitor the Pipeline
You can monitor the pipeline from the monitor tab.
Publish and Deploy Pipeline
At this point, we are done with our ADF solution and it is ready to be saved and deployed.
In order to save the changes to ADF, select Publish all.
For instructions on how to deployment to other environments (uat, test, prod), check the article Azure Data Factory – Manually Deploy to other Environments using ARM Templates.
That should be all for today!