Skip to main content

Azure Blob Storage

Overview

This destination writes data to Azure Blob Storage.

The Airbyte Azure Blob Storage destination allows you to sync data to Azure Blob Storage. Each stream is written to its own blob under the container, as <stream_namespace>/<stream_name>/yyyy_mm_dd_<unix_epoch>_<part_number>.<file_extension>.

Sync Mode

FeatureSupport
Full Refresh Sync
Incremental - Append Sync
Incremental - Append + Deduped

Configuration

ParameterTypeNotes
Endpoint Domain NamestringThis is Azure Blob Storage endpoint domain name. Leave default value (or leave it empty if run container from command line) to use Microsoft native one.
Azure blob storage container (Bucket) NamestringA name of the Azure blob storage container. If not exists - will be created automatically. If leave empty, then will be created automatically airbytecontainer+timestamp.
Azure Blob Storage account namestringThe account's name of the Azure Blob Storage.
The Azure blob storage account keystringAzure blob storage account key. If this is set, the shared access signature option must not be set. Example: abcdefghijklmnopqrstuvwxyz/0123456789+ABCDEFGHIJKLMNOPQRSTUVWXYZ/0123456789%++sampleKey==.
The Azure blob shared access signaturestringAzure blob storage shared account signature (SAS). If this is set, the storage account key option must not be set. Example: sv=2025-01-01&ss=b&srt=co&sp=abcdefghijk&se=2026-01-31T07:00:00Z&st=2025-01-31T20:30:29Z&spr=https&sig=YWJjZGVmZ2hpamthYmNkZWZnaGlqa2FiY2RlZmdoaWp%3D.
Azure Blob Storage target blob sizeintegerHow large each blob should be, in megabytes. Example: 500. After a blob exceeds this size, the connector will start writing to a new blob, and increment the part number.
FormatobjectFormat specific configuration. See below for details.

Output Schema

CSV

Like most other Airbyte destination connectors, the output contains your data, along with some metadata fields. If you select the "root level flattening" option, your data will be promoted to additional columns; if you select "no flattening", your data will be left as a JSON blob inside the _airbyte_data column.

For example, given the following JSON object from a source:

{
"user_id": 123,
"name": {
"first": "John",
"last": "Doe"
}
}

With no flattening, the output CSV is:

_airbyte_raw_id_airbyte_extracted_at_airbyte_generation_id_airbyte_meta_airbyte_data
26d73cde-7eb1-4e1e-b7db-a4c03b4cf206162213580500011{"changes":[], "sync_id": 10111 }{ "user_id": 123, name: { "first": "John", "last": "Doe" } }

With root level flattening, the output CSV is:

_airbyte_raw_id_airbyte_extracted_at_airbyte_generation_id_airbyte_metauser_idname.firstname.last
26d73cde-7eb1-4e1e-b7db-a4c03b4cf206162213580500011{"changes":[], "sync_id": 10111 }123JohnDoe

JSON Lines (JSONL)

JSON Lines is a text format with one JSON per line. As with the CSV format, this connector will write your data along with some metadata fields. You can enable "root level flattening" to promote your data to the root of the JSON object, or use "no flattening" to leave your data inside the _airbyte_data object.

For example, given the following two JSON object from a source:

{
"user_id": 123,
"name": {
"first": "John",
"last": "Doe"
}
}
{
"user_id": 456,
"name": {
"first": "Jane",
"last": "Roe"
}
}

With no flattening, the output JSONL is:

{ "_airbyte_raw_id": "26d73cde-7eb1-4e1e-b7db-a4c03b4cf206", "_airbyte_extracted_at": "1622135805000", "_airbyte_generation_id": "11", "_airbyte_meta": { "changes": [], "sync_id": 10111 }, "_airbyte_data": { "user_id": 123, "name": { "first": "John", "last": "Doe" } } }
{ "_airbyte_ab_id": "0a61de1b-9cdd-4455-a739-93572c9a5f20", "_airbyte_extracted_at": "1631948170000", "_airbyte_generation_id": "12", "_airbyte_meta": { "changes": [], "sync_id": 10112 }, "_airbyte_data": { "user_id": 456, "name": { "first": "Jane", "last": "Roe" } } }

With root level flattening, the output JSONL is:

{ "_airbyte_raw_id": "26d73cde-7eb1-4e1e-b7db-a4c03b4cf206", "_airbyte_extracted_at": "1622135805000", "_airbyte_generation_id": "11", "_airbyte_meta": { "changes": [], "sync_id": 10111 }, "user_id": 123, "name": { "first": "John", "last": "Doe" } }
{ "_airbyte_ab_id": "0a61de1b-9cdd-4455-a739-93572c9a5f20", "_airbyte_extracted_at": "1631948170000", "_airbyte_generation_id": "12", "_airbyte_meta": { "changes": [], "sync_id": 10112 }, "user_id": 456, "name": { "first": "Jane", "last": "Roe" } }

Getting started

Requirements

  1. Create an AzureBlobStorage account.
  2. Check if it works under https://portal.azure.com/ -> "Storage explorer (preview)".

Setup guide

  • Fill up AzureBlobStorage info
    • Endpoint Domain Name
      • Leave default value (or leave it empty if run container from command line) to use Microsoft native one or use your own.
    • Azure blob storage container
      • If not exists - will be created automatically. If leave empty, then will be created automatically airbytecontainer+timestamp..
    • Azure Blob Storage account name
      • See this on how to create an account.
    • Authentication - you must use exactly one of these:
      • The Azure blob storage shared acces signature (recommended)
        • See this for how to create an SAS.
      • The Azure blob storage account key
        • Corresponding key to the above user.
    • Format
      • Data format that will be use for a migrated data representation in blob.
  • Make sure your user has access to Azure from the machine running Airbyte.
    • This depends on your networking setup.
    • The easiest way to verify if Airbyte is able to connect to your Azure blob storage container is via the check connection tool in the UI.

Reference

Config fields reference

Field
Type
Property name
string
azure_blob_storage_account_name
string
azure_blob_storage_container_name
object
format
string
azure_blob_storage_endpoint_domain_name
string
shared_access_signature
string
azure_blob_storage_account_key
integer
azure_blob_storage_spill_size

Changelog

Expand to review
VersionDatePull RequestSubject
1.0.12025-04-0957541Fix metadata to actually certify.
1.0.02025-04-0356391Bring into compliance with modern connector standards; certify connector.
0.2.52025-03-2155906Upgrade to airbyte/java-connector-base:2.0.1 to be M4 compatible.
0.2.42025-01-1051507Use a non root base image
0.2.32024-12-1849910Use a base image: airbyte/java-connector-base:1.0.0
0.2.22024-06-12#38061File Extensions added for the output files
0.2.12023-09-13#30412Switch noisy logging to debug
0.2.02023-01-18#21467Support spilling of objects exceeding configured size threshold
0.1.62022-08-08#15318Support per-stream state
0.1.52022-06-16#13852Updated stacktrace format for any trace message errors
0.1.42022-05-1712820Improved 'check' operation performance
0.1.32022-02-1410256Add -XX:+ExitOnOutOfMemoryError JVM option
0.1.22022-01-20#9682Each data synchronization for each stream is written to a new blob to the folder with stream name.
0.1.12021-12-29#9190Added BufferedOutputStream wrapper to blob output stream to improve performance and fix issues with 50,000 block limit. Also disabled autoflush on PrintWriter.
0.1.02021-08-30#5332Initial release with JSONL and CSV output.