Introduction to Microsoft SQL Server Parallel Data Warehouse (PDW)

Introduction to Microsoft SQL Server Parallel Data Warehouse (PDW)
Sql ServerParallel Data Warehouse (PDW)

Microsoft Parallel Data Warehouse (PDW) also known as Microsoft Analytics Platform System (APS) is a special version of SQL Server designed to handle petabytes worth of data. It has very different architecture compared to traditional Symmetric Multiprocessing (SMP) System. Check below presentation to learn more about PDW system. PDW System comes as Appliance with configured software along with it. So there is no way you can test PDW unless you have test hardware. Throughout this article I will refer Microsoft SQL Server Parallel Datawarehouse as PDW.  

What is Microsoft Parallel Data Warehouse

  • In simple term its special version of sql server designed to handle massive parallel processing (MPP).
  • Parallel Data Warehouse has few types of Nodes to handle various activities within your PDW Appliances.
    • PDW Control Node  (Active/Passive Cluster) – This is entry point of all your data read/write requests. User will always send query to this node. This is SQL Server Instance contains Metadata about PDW.
    • PDW Management Node  (Active/Passive Cluster) – This node handles all patching related actions. Sending new Images to all compute nodes and applying seprvice packs.
    • PDW Landing Zone – This is where High performance Bulk dataloading can happen. User can FTP/SFTP files to this node and then call DWLoader to Bulk load data into PDW. You may also use Traditional ETL Tool to load data such as use SSIS or Informatica.
    • PDW Backup Node – This node is responsible for backup related activities
    • PDW Compute Node – This is where actual query processing happens. Compute Node is dedicated SQL Server Intance behind the scene which user dont see but each query you send to PDW is distributed across all Compute nodes. 
    • PDW Storage Node – This is where data is stored. Each storage node is directly attached to Compute Node using Fiber Channel and all Nodes connected to each other using Highspeed Network called InfiniBand 
  • Parallel Data Warehouse comes with Minimum 2 Server Racks (As of now two vendors HP and Dell sell PDW Appliances). You can add more data racks to scale as you need 
  • SQL Server PDW can store several petabytes worth of data compared to few Terabytes in traditional SQL Server (e.g Standard or Enterprise Edition)
  • PDW Stores data in 3 different ways.
    • Replicated tables
    • Distributed tables
  • SQL Server PDW System comes as base preconfigured hardware and software which can be scaled as needed according to data growth
  • PDW requires less DBA work compared to traditional Database system because there is less maintenance and setup compared to normal SQL Server.
  • PDW system uses Shared Nothing approach means each node has its own node and own processer/memory and nothing is shared with other node compared to Clustered environment where disks are shared.
  • PDW scale several times more than regular sql server. PDW requires special hardware which must follow Microsoft PDW Hardware guideline.
  • PDW  is several times more expensive compared to regular SQL Server but it gives you massive query performance in both read and write because of distributed approach.

Architecture Diagram of Microsoft Parallel DataWarehouse (PDW)

Here is high level diagram of Microsoft Parallel Data warehouse.

Microsoft Parallel Datawarehouse Architecture Diagram
Microsoft Parallel Datawarehouse Architecture Diagram

 

PDW cost comparision with other Vendors

Here is simple PDW cosnt comparision chart based on various factors (Source) As you can see PDW appliances highly competitive in terms of Price per TB compared to other vendors selling similar offering.

PDW Cost Comparison with other Vendors
PDW Cost Comparison with other Vendors

 

PowerPoint Presentation of Parallel Data Warehouse (PDW)  

 

Introduction Video Tutorial – PDW – Microsoft Parallel Data Warehouse

 [embedyt]http://www.youtube.com/watch?v=AnxJ4OtmGsk[/embedyt]

PDW Hardware – Microsoft Parallel Data Warehouse 

PDW - Parallel Data Warehouse Hardware
Microsoft SQL Server Parallel Data Warehouse (PDW) Hardware

Binary World is a Software Development company located in Atlanta, USA (since 2007). Binary World specialized in Business Intelligence, mobile, cloud computing and .Net Application Development.