→Guidelines on Talend Data Integration v6 Certified Developer Exam preparation
→Big data - Hadoop with Talend DI
- What is Hadoop?
- Why only hadoop? can't we use existing technology stack to store and process large volumes of Data or
- What is Commodity hardware? How it is different from the server category hardware?
- Talend supported Hadoop distributions? How to handle Talend unsupported Hadoop distributions?
- Market leading Hadoop distributions, comparison between the predominantly using distros.
- What is Block and why HDFS block size is very large?
- Namenode - Datanode / Master - slave architecture.
- Rack awareness
- HDFS data write operation
- HDFS data read operation
→Hadoop daemons and architecture
- Master daemons in MR1
- MapReduce - distributed data processing framework working phenomena on Hadoop 1.x
- Limitations of Hadoop 1.x
- Master daemons in MR2
- Yarn Architecture
- Walk through with step by step YARN application execution
→Hadoop client gateway connectivity
- What is Edge node / Gateway node
- Ambari views, HUE
- Talend - Hadoop cluster connection
- HDFS commands
- Lab Practical
→18.MapReduce job conventional approach:
- Walk through with MapReduce classes and sample job
- Sample MapReduce job execution
- Lab Practical and assignments
→MapReduce job with Talend Bigdata edition
- How Talend studio executes a Hadoop job and how other competitor ETL tools are executing Hadoop job?
- MapReduce job design in Talend-Bigdata sandbox studio
- MapReduce job demonstration in Talend-Bigdata studio pointing to external Hortonworks cluster
- Lab Assignment
- What is Sqoop?
- Sqoop Import/Export architecture
- Sqoop connectors
- Sqoop sample scripts
- Direct-mode imports? Advantage of using direct mode?
- Escape characters
RDBMS to HDFS
- Full table Import
- Import all tables, only subset of data
- Encoding null values
- Incremental import
- Why do you need either primary key column or split by column is required for Sqoop import operation?
RDBMS to Hive
- Hive Import
- Hive import with partitions
RDBMS to Hbase
- hbase table, hbase create table, hbase row key
- How to improve performance of Hbase import job
HDFS to RDBMS
- Insert data
- Insert data in batches
- Update an existing dataset in a database table
- Update else Insert
Necessity of -- mapreduce-job- name
Lab practical on Sqoop import to HDFS,Hive,HBase and Export to database table
Sqoop Components in Talend studio
- tSqoopImport, tSqoopImportAlltables
- tSqoopExport, tSqoopMerge
- Sample Talend job execution with Sqoop components
- Lab practical with above list of Sqoop
- What is Pig and its role in Hadoop frame work?
- Demo on sample pig script
- Grunt shell: Local mode and cluster mode
Scalar types, Complex types
Operators: Input & Output, Relational
User defined functions
Pig script execution steps
Debug pig relation/script: Describe, Explain, Illustrate
Pig Components in Talend studio
Sample Talend job execution with Pig components
Lab practical with above list of Pig components and more on tPigMap
- What is Hive? Why do we need Hive?
- Hive services and Hive clients.
- Hive Architecture and Role of Metastore
- Schema on Write vs Schema on Read
- How hive is different from Regular RDBMS?
- Type of HQL executions?
- Data types: Primitive, Complex
- Type of tables in hive
- Multi table inserts
- Use of Partitions & types of partitions in hive
- What is Bucketing? when to use Bucketing and when to use partitioning
- UDF types in Hive
Lab practical on : create table, view, Index, Load/insert, Multi table insert, Dynamic partition,CTAS,Alter table, select &
joins, create an UDF
Hive Components in Talend studio
- tHiveConnection, tHiveCreateTable, tHiveLoad
- tHiveRow, tHiveInput, tHiveClose
Sample Talend job execution with Hive components
Lab practical with above list of hive components
→Hcatalog components in Talend studio
- tHcatalogOperation , tHcatalogLoad or tHcatalogOutput, tHcatalogInput
Sample Talend job execution with Hcatalog components
Lab practical with above list of Hcatalog components
- What is HBase?
- HBase lacking features of RDBMS.
- HBase internals and Architecture.
- Arch difference between RDBMS and HBase.
- HBase data storage Architecture (LSM-tree)
- Data representation in HBase
- Compare HBase with Hadoop file system
- Pros and cons of column-oriented databases
- HBase components and functionalities: Zookeper,HMaster,RegionServer,Client,Catalog tables
- What If a hbase master node goes down?
- When should one think of using HBase?
- When not to use HBase?
- What is Phoenix?
- HBase Table DDL create,Disable,Drop,Alter
- Data types
- Reading, writing, and modifying data in an HBase table using commands.
- Data read ,write, and modify using HBaseConfiguration,HTable classes.
HBase Components in Talend studio:
- Sample Talend job execution with HBase components
- Lab practical with above list of