AI & ML
About: Power AI is an Open Source Deep Learning Platform that helps you accelerate your journey to cognitive computing by bringing together a collection of the most popular open source frameworks for deep learning, along with supporting software and libraries in a single installable package.
The Solution has been proposed as a Deep Learning Container Cloud with an ability to spin off a container on premise with any of the Open Source Frameworks requested by the Research teams. This also ensures optimal utilization of the resources.
IBM PowerAI Overview
The IBM PowerAI Deep Learning suite of framework software will be embedded in this project across any/all relevant GPU-based compute nodes. These applications are extremely relevant to Deep Learning today as the leading institutions are transforming their education, research and industry collaboration activities for these new domains. Fields ranging from Imaging analysis, Audio analysis, Drug Discovery and multiple others are being transformed with advanced data analytics techniques. The IBM hardware and software platform for
Deep/Machine Learning brings unique advantages to the table.
PowerAI Framework internals:
What is within PowerAI frameworks? IBM has built a pre-defined, tuned and optimized Deep Learning framework stack for Power systems that significantly speeds time to train for data scientists because they don’t have to navigate setup operations in addition complex tuning and optimization challenges. Customer results have proven this process 3-5x “faster to deployment” than a normal “do it yourself” deep learning implementation.
[B] Framework innovations in PowerAI:
 Large Model Support – Enable deep learning for large models and data sets (without running out of GPU memory) by leveraging the main (CPU) memory that has much higher capacity.
GPU memory is very high performant but limited in capacity (32Gb in the latest V100 version from NVIDIA). This results in out-of-memory (OOM) errors when working with large models or datasets or both. The high bandwidth connection between GPU and CPU on the IBM Power infrastructure (called NV-Link described in the next section), helps to overcome this limitation. LMS (supported for Caffe and Tensorflow in PowerAI) allows the user to reserve a section of main memory (upto ~2Tb) as a shared memory across GPUs. The reserved memory can be used to save model layers, tuning parameters and learning deltas that are currently not required for the training phase (either forward or back propagation). This permits training of very large models, high resolution images/larger datasets and/or larger batch sizes.
 Distributed Deep Learning on PowerAI (PowerAI DDL) – Providing near linear scaling for multi-node deep learning training, reducing the training time to 7h from 10 days which Microsoft took.
Fig 3: DDL leverages the hierarchical nature of communication speeds to optimize data distribution in addition to PCIe-Gen4 and InfiniBand technology
Most popular deep learning frameworks scale to multiple GPUs in a server, but not to multiple servers with GPUs. Specifically, our team wrote software and algorithms that automate and optimize the parallelization of this very large and complex computing task across hundreds of GPU accelerators attached to dozens of servers. this problem of orchestrating and optimizing a deep learning problem across many servers is made much more difficult as GPUs get faster. This has created a functional gap in deep learning systems that drove us to create a new class of DDL software to make it possible to run popular open source codes like TensorFlow, Caffe, Torch and Chained over massive scale neural networks and data sets with very high performance and very high accuracy.
IBM Power 8-based accelerated computing servers
[A] NV-Link (NVLink 2.0) communication technology between CPU & GPU solves the PCIe system bottleneck:
IBM Research and Development has worked closely with Nvidia to develop “on processor” NVLINK GPU to CPU communications technology that revolutionizes the GPU processing and large memory data passing tasks that have been limited by current and future PCI-based GPU implementations.
The GPU to CPU bandwidth (as illustrated above) achieves a bi-directional 75GB per second thru-put across the IBM NVLINK2/CPU bus, i.e. 75GB/sec per lane x 2 lanes per interface = 150GB/sec bidirectional data transfer to/from the CPU and GPU. This is unprecedented, and IBM’s “secret sauce” in this Deep Learning technology. This large scale NVLINK Nvidia data path implemented on the Power Processor far exceeds the bandwidth capabilities of the existing PCI based Intel GPU to CPU communications (32GB per sec bi-directional) leading to a >5X increase in bandwidth.
Fig 4: NV-Link Technology overcomes the PCIe “system bottleneck”
The innovation on NV-Link allows IBM to offer a feature called Large Model Support (LMS) that helps to overcome the memory limitation on GPUs (32Gb on the latest NVIDIA GPUs V100). This was described in the previous section.
[B] PCIe Gen-4 & InfiniBand connectivity provides improved I/O connectivity for scaling training across multiple servers:
The IBM AC 922 platform is the first to provide PCIe Generation 4 technology that gives 2x the bandwidth compared to PCIe Gen 3. This along with InfiniBand adapter (shown as the Blue box on top in Fig 4 above) provides extremely high I/O bandwidth for connection between multiple nodes. We leverage this connectivity to provide linear speedup for deep learning training across nodes as mentioned in the section on Distributed Deep Learning (DDL).
The hardware (NV-Link 2, Infiniband, PCIe Gen4) & software innovations (LMS, DDL) described in the previous section lead to significant improvements in the “time-to-train” metric that is crucial for Machine/Deep Learning application development. A few of these benchmarks are listed below.
Fig 8: Benchmark on Caffe (3.7x faster) and Tensorflow (2.3x faster) leveraging LMS
Fig 9: Benchmark on Tensorflow leveraging DDL showing 95% scaling (near linear)
This section outlines a solution for creating a Docker container-based environment managed by IBM Cloud Private (based on open-source Kubernetes) to provide a dynamic, optimized and centrally managed deep learning cluster for R&D activities. The figure below outlines the high-level workflow for this environment
Fig 10: Workflow diagram
[A] Salient features for this workflow:
- PowerAI images are present in public Docker hub repository
- ICP provides Helm chart capability to create a CI/CD pipeline (leveraging technologies like Jenkins) to pull the image into the private repository provided by ICP.
- Modification to the base PowerAI image can be made if required in the ICP private repository.
- ICP master allows creating multiple container deployments (different GPU, CPU and memory configurations) as the workloads demand.
- Containers can be created and deployed from the single-plane provided by the ICP master, can be dynamically provisioned based on usage thresholds.
- The single GUI of the master provides monitoring capability to view all the workers, resources being utilized and other features
This offering is built on top of 3 main IBM components discussed in the previous sections and 1 optional component:
- IBM PowerAI® software package
- IBM Cloud Private (ICP)® Community Edition (CE)®
- IBM AC 922 accelerated servers
- IBM Spectrum Scale -based storage infrastructure (optional)
which were discussed in detail in the preceding sections.
Fig 11: High Level architecture (Hardware & Software)
The ICP Management Node consists of a single node (physical or VM) that runs standard Linux OS (on the Power or Intel platform). The 2 compute nodes are GPU-accelerated servers that act as slaves to the master. The compute nodes run standard Linux OS (only Power platform). In this figure external/shared storage is not shown but can be configured if required (may require changes to the network link shown as well).
We propose IBM ICAT Solution for training.
The IBM Certification in Advanced Technologies (iCAT) program helps Academic Institutions/Universities to keep up with the technology rapid changes so that students will be better prepared for the IT industry needs by providing up-to-date IT related curricula and training for the faculty. The program supports curriculum specialization/concentrations for undergraduate degrees in specific technologies aligned to Industry requirements and skills.
The ICE iCAT program provides a framework that helps simulate a real-life work environment of an IT rganization on-campus.
The IBM Certification in Advanced Technology (iCAT) offering at Mangalore University will be the Hub upskilling students across different Colleges affiliated to this university.
Further, The ICE online platform provides students access Courseware content and Labs, It also tracks students progress and connects to Industry aligned Projects and mentors.
Ultimately the IBM ICE ICAT Program aims to contribute to the transformation to a knowledge society supporting the implementation of new educational models that will increase the alignment of student skills with the current/future regional market needs and IT industry trends.
The training program plans to cover the following subjects:
- Artificial Intelligence , Machine Learning & Deep Learning (100 Hrs. of Teach time)
|S. No.||Course Name|
|2||Introduction to AI & ML|
|The following are deliverables from NCS.|
– Specialization Description
– Specialization Course Details
– Course Material
– Course Presentations
– Student Projects