Azure Data Lake Storage Gen2 (ADLS Gen2)—the latest iteration of Azure Data Lake Storage—is designed for highly scalable big data analytics solutions. Not only does it combine the management and scalability features of Azure Blob Storage and Azure Data Lake Storage Gen1—including a hierarchical file system with granular security and lower-cost tiered storage—it also offers highly scalable storage, processing capabilities, high availability and disaster recovery.
In this blog, I’ll cover all the latest and greatest features that ADLS Gen2 has to offer.
Multi-Protocol Access Capability
Recently, ADLS introduced new multi-protocol access capability to support solutions for both object storage and analytics storage (Note: it’s still currently in public preview for West US 2 and West Central US regions).
The multi-protocol access allows you to connect applications to your ADLS Gen2 storage account via the object store Blob API using the WASB driver, or to the ADLS Gen2 API using the new ABFS driver. With hierarchical namespace enabled, both APIs can access data in ADLS Gen2 the same way. Using the Blob API, data access is routed through the hierarchical namespace to leverage the same directory operations and access control lists (ACLs) as the ADLS Gen 2 API. This is great for existing solutions using the Blob API, as no code changes are required to take advantage of the new access control features on files and directories introduced by the hierarchical namespace. Even better? The multi-protocol access on ADLS Gen2 is interoperable with many Azure services like Azure Stream Analytics, IoT Hub, Power BI, Azure Data Factory and others.
Now, with a true hierarchical namespace to Blob storage, ADLS Gen2 allows true atomic directory manipulation. Historically, traditional object stores like Blob storage resembled a pseudo-filesystem directory hierarchy, adopting naming conventions to Blob objects containing slashes (/). This was inefficient because applications would have to iterate through potentially millions of individual Blob objects to achieve directory-level tasks: For example, deleting a directory with several million objects in Blob storage would require an equal number of delete operations as objects in that directory. In contrast, with ADLS Gen2, deleting a directory is a single operation regardless of the number of files in the directory.
Furthermore, the hierarchical namespace in ADLS Gen2 does not limit its scalability potential as traditional object stores do. ADLS Gen2 scales linearly in both data capacity (exabytes) and performance (Gbps throughput).
The hierarchical namespace allows you to define ACL and POSIX permissions on directories, subdirectories or individual files. You can also use role-based authentication and Azure Active Directory (Azure AD) to support resource management and data operations.
Additionally, ADLS Gen2 supports both encryption-in-transit and encryption-at-rest to move data around. Encryption-at-rest is automatically enabled for all storage accounts via Storage Service Encryption (SSE), using Microsoft-managed encryption keys or using your own encryption keys. Encryption-in-transit is enabled by Transport-Level Encryption using HTTPS and can be enforced by enabling the Secure transfer required option for the storage account under Settings > Configuration. Client-side encryption is also supported with the Azure Storage Client Library for .Net.
In addition to access and encryption, ADLS Gen2 supports firewall and virtual network configurations. Network rules can be defined to restrict access to the storage account from a specific set of networks. For more information on firewalls and virtual networks for Azure Storage, check out Microsoft’s guide to Configure Azure Storage firewalls and virtual networks.
Performance and Access Tiers
ADLS Gen2 is currently supported in Azure Storage accounts with standard performance tiers (magnetic disks). However, the premium performance tier is currently not supported for ADLS Gen2 accounts.
Both hot and cool access tiers are available for ADLS Gen2 storage accounts: While the hot access tier is optimized for storing data that is accessed frequently, the cool access tier is optimized for storing data that is infrequently accessed and stored for at least 30 days.
High-Availability and Disaster Recovery
Data in ADLS Gen2 storage accounts are always replicated to ensure durability and high availability. The replication option is selected when the storage account is created and can be later upgraded for more durable and resilient availability. You can select one of the following redundancy options:
- Locally-redundant storage (LRS)
- Zone-redundant storage (ZRS)
- Geo-redundant storage (GRS)
- Read-access geo-redundant storage (RA-GRS)
For more details on redundancy options for Azure Storage accounts, please read Microsoft’s Azure Storage redundancy guide.
ADLS Gen2 continues to evolve rapidly as new access and interoperability features are introduced. For upcoming product updates and announcements, check out Microsoft’s Azure announcements. In the meantime, watch our webinar, “Loading Data into Azure Data Lake Gen 2 with Azure Data Factory v2,” to learn how to save time building big data analytics solutions using Azure Data Lake Gen2 and Azure Data Factory v2.