Training‎ > ‎

Data Warehousing with Kettle (Pentaho Data Integration)


Course Introduction

Data analysis as part of business intelligence solutions is now a growingly demanded need. 

Unfortunately in most transactional systems implemented rarely data is organized for effective and efficient aggregation and analysis.  In most cases, producing an analytical report require an exhaustive and time consuming efforts.

Thus, to successfully build a comprehensive BI solution we need to transform our transactional into analytical friendly data format. This process is usually called data warehousing.

In data warehouse, we usually involved a tool designed for this need and is called ETL (Extract, Transform and Load) application. Pentaho Data Integration (Kettle) is one of the ETL tool that is open sourced and a very popular tool. 

This course presents an overview of data warehousing and ETL concept. We also introduce Pentaho Data Integration to build data warehouse solution.



Who Should Attend ?

This course is designed for those new to Data Warehouse and ETL or needing to understand the basics of Pentaho Data Integration.

Objectives

At the completion of this course, attendee should be able to :
  • Understand the concepts and topics of Data Warehouse, Dimensional Modeling, OLAP and ETL
  • Use Pentaho Data Integration to build simple jobs / transformations
  • Consume data from several data sources 
  • Building and populating fact and dimension tables
  • Troubleshooting techniques
  • Schedule ETL jobs / transformations

Course Fee

Rp 6.500.000,- / pax including :
  • 2 snack times + 1 lunch time / day
  • Course Material
  • Certificate
  • 1 DVD including all softwares used in the training

Course Duration

5 days / 30 hours

Course Schedule

29 November 2010 - 3 December 2010 (9.00 am - 16.00 pm)

Course Prerequisites

  • Basic understanding one of several popular DBMS (Oracle, SQL Server, MySQL, etc) and of Structured Query Language (SQL) 
  • No understanding of other Pentaho tools is needed

Course Requirements

  • PC or Laptop with minium of 2GHz CPU, 1 GB of RAM, DVD Drive and 2 GB of available hard disk space.
  • Softwares :
    • Microsoft Windows XP Pro
    • Java Runtime Environment (JRE)
    • Apache Tomcat
    • MySQL 5.0 Database Server
    • Pentaho Data Integration
    • Pentaho Data Analysis (Mondrian)

Course Outline

  1. Introduction to Data Warehouse
    • Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP)
    • Data Warehouse
    • Data Warehouse and OLAP
    • Delivering Solution with ETL (Extract, Transform, Load) Tool
  2. Pentaho Data  Integration (Kettle)
    • Introduction to Kettle as ETL
    • Architecture
    • Components
      • Spoon : Graphical UI Designer for job / transformation steps
      • Pan : Command line batch script for transformation execution
      • Kitchen : Command line batch script for transformation execution
      • Carte : Cluster server
  3. Installation and Configuration
    • Java Runtime
    • Pentaho Data Integration
    • Apache Web Server and MySQL  using XAMPP package
    • Apache Tomcat
    • Pentaho Analysis Service (Mondrian)
  4. Getting Started with Spoon
    • File system and RDBMS based Repository 
    • Spoon Development Environment 
    • Database Connections 
    • Job and Transformation 
      • Creating first job 
      • Creating first transformation 
      • Calling transformation from job 
    • Job / Transformation Steps and Hops 
    • Metadata 
    • Log
  5. Short Introduction to MySQL
    • MySQL Storage Engines
    • Administering MySQL via PHPMyAdmin
  6. Dimensional Modelling
    • Normalized versus Dimensional Model 
    • Fact Tables 
    • Dimension Tables 
    • Task : 
      • Create a Kettle  transformation  to map PHI-Minimart  transactional database sample to dimensional modeling database 
      • Create logs for each steps
  7. Hands On : Viewing OLAP Data using Mondrian / JPivot
    • Mondrian Installation
    • Creating scheme based on our fact and dimension tables
    • View and navigate our Cube using Web Browser
  8. Data Staging
    • What is Data Staging?
    • Background : Physical I/O  versus In-Memory Processing
    • Task :
      • Create  a  transformation  to  join  from  3  data  sources  :  text  file,  Excel
        spreadsheet, and RDBMS
      • Create a same functional transformation using staging database 
  9. Advance Controls
    • Environment Variables
    • Shared Objects
    • Error Handling
    • Email job results
    • Task :
      • Refining existing transformations to use email alert
  10. Slowly Changing Dimension (SCD)
    • SCD Types
  11. Automation
    • Using Cron / Windows Task Sheduler to schedule our ETL job
You can download our course outline in PDF attachments at the end of this page

Related Course

Ċ
Feris Thia,
May 14, 2009, 2:53 AM
Comments