Training‎ > ‎

Data Warehouse with Kettle (Pentaho Data Integration)

Update : This Course is now Deprecated and Replaced by Kettle Training & Dimensional Modelling

Course Introduction

Data analysis as part of business intelligence solutions is now a growingly demanded need. 

Unfortunately in most transactional systems implemented rarely data is organized for effective and efficient aggregation and analysis.  In most cases, producing an analytical report require an exhaustive and time consuming efforts.

Thus, to successfully build a comprehensive BI solution we need to transform our transactional into analytical friendly data format. This process is usually called data warehousing.

In data warehouse, we usually involved a tool designed for this need and is called ETL (Extract, Transform and Load) application. Pentaho Data Integration (Kettle) is one of the ETL tool that is open sourced and a very popular tool. 

This course presents an overview of data warehousing and ETL concept. We also introduce Pentaho Data Integration to build data warehouse solution.

Who Should Attend ?

This course is designed for those new to Data Warehouse and ETL or needing to understand the basics of Pentaho Data Integration.


At the completion of this course, attendee should be able to :
  • Understand the concepts and topics of Data Warehouse, Dimensional Modeling, OLAP and ETL
  • Use Pentaho Data Integration to build simple jobs / transformations
  • Consume data from several data sources 
  • Building and populating fact and dimension tables
  • Troubleshooting techniques
  • Schedule ETL jobs / transformations

Course Fee

USD 2,500,- / pax including :
  • 2 snack times + 1 lunch time / day
  • Course Material
  • Certificate
  • 1 DVD including all softwares used in the training

Course Duration

5 days / 30 hours

Course Schedule


Course Prerequisites

  • Basic understanding one of several popular DBMS (Oracle, SQL Server, MySQL, etc) and of Structured Query Language (SQL) 
  • No understanding of other Pentaho tools is needed

Course Requirements

  • PC or Laptop with minium of 2GHz CPU, 1 GB of RAM, DVD Drive and 2 GB of available hard disk space.
  • Softwares :
    • Microsoft Windows XP Pro
    • Java Runtime Environment (JRE)
    • Apache Tomcat
    • MySQL 5.0 Database Server
    • Pentaho Data Integration
    • Pentaho Data Analysis (Mondrian)

Course Outline

  1. Introduction to Data Warehouse
    • Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP)
    • Data Warehouse
    • Data Warehouse and OLAP
    • Delivering Solution with ETL (Extract, Transform, Load) Tool
  2. Pentaho Data  Integration (Kettle)
    • Introduction to Kettle as ETL
    • Architecture
    • Components
      • Spoon : Graphical UI Designer for job / transformation steps
      • Pan : Command line batch script for transformation execution
      • Kitchen : Command line batch script for transformation execution
      • Carte : Cluster server
  3. Installation and Configuration
    • Java Runtime
    • Pentaho Data Integration
    • Apache Web Server and MySQL  using XAMPP package
    • Apache Tomcat
    • Pentaho Analysis Service (Mondrian)
  4. Getting Started with Spoon
    • File system and RDBMS based Repository 
    • Spoon Development Environment 
    • Database Connections 
    • Job and Transformation 
      • Creating first job 
      • Creating first transformation 
      • Calling transformation from job 
    • Job / Transformation Steps and Hops 
    • Metadata 
    • Log
  5. Short Introduction to MySQL
    • MySQL Storage Engines
    • Administering MySQL via PHPMyAdmin
  6. Dimensional Modelling
    • Normalized versus Dimensional Model 
    • Fact Tables 
    • Dimension Tables 
    • Task : 
      • Create a Kettle  transformation  to map PHI-Minimart  transactional database sample to dimensional modeling database 
      • Create logs for each steps
  7. Hands On : Viewing OLAP Data using Mondrian / JPivot 
    • Mondrian Installation
    • Creating scheme based on our fact and dimension tables
    • View and navigate our Cube using Web Browser
  8. Data Staging
    • What is Data Staging?
    • Background : Physical I/O  versus In-Memory Processing 
    • Task : 
      • Create  a  transformation  to  join  from  3  data  sources  :  text  file,  Excel 
        spreadsheet, and RDBMS
      • Create a same functional transformation using staging database 
  9. Advance Controls
    • Environment Variables
    • Shared Objects
    • Error Handling
    • Email job results
    • Task :
      • Refining existing transformations to use email alert
  10. Slowly Changing Dimension (SCD)
    • SCD Types
  11. Automation
    • Using Cron / Windows Task Sheduler to schedule our ETL job
    You can download our course outline in PDF attachments at the end of this page

    Related Course

    Contact Us

    PT. Putera Handal Indotama (PHI-Integration)
    Ruko Paramount Gading Serpong
    Blok CC - 12
    Gading Serpong
    Tangerang - Banten
    Phone : 021-70387738