Course

Analyzing Big Data with SQL

Cloudera

Analyzing Big Data with SQL is a comprehensive course offered by Cloudera that delves into the intricacies of SQL SELECT statement and its application in big data analysis. The course aims to equip learners with the essential skills to navigate, analyze, and derive insights from vast datasets using SQL. With a focus on big data SQL engines such as Apache Hive and Apache Impala, the course also addresses SQL application in traditional RDBMs like MySQL and PostgreSQL, highlighting any notable differences.

The course is structured into six modules, each covering specific aspects of SQL application in big data analysis. Starting with an orientation to SQL on Big Data, learners are introduced to the fundamentals of SQL SELECT statements, filtering data, grouping and aggregating data, sorting and limiting data, and combining data. Each module offers hands-on exercises and quizzes to reinforce learning and assess understanding, ensuring learners grasp the practical application of SQL in analyzing big data.

To enroll in this course, learners need access to a computer that meets specific hardware and software requirements. Upon completion, learners will have a comprehensive understanding of SQL SELECT statement and its ability to handle big data, enabling them to explore, filter, group, aggregate, sort, and combine data effectively for analytical purposes. This course is ideal for data professionals, analysts, and anyone seeking to enhance their data manipulation and analysis skills.

Certificate Available ✔

Get Started / More Info
Analyzing Big Data with SQL
Course Modules

The course consists of six modules covering SQL on Big Data orientation, SQL SELECT essentials, filtering data, grouping and aggregating data, sorting and limiting data, and combining data.

Orientation to SQL on Big Data

Welcome to the Course, Review and Preparation, Using the Hue Query Editors, Running SQL Utility Statements, Running SQL SELECT Statements, Understanding Different SQL Interfaces, Overview of Beeline and Impala Shell, Using Beeline, Using Impala Shell, Instructions for Downloading and Installing the Exercise Environment, Troubleshooting the VM, (Optional) What about Spark SQL?, Expectations for Learners, (Optional) Using Other SQL Engines, Week 1 Core Quiz, Week 1 Honors Quiz, Your Expectations

SQL SELECT Essentials

Introduction, SQL SELECT Building Blocks, Introduction to the SELECT List, Expressions and Operators, Data Types, Column Aliases, Built-In Functions, Data Type Conversion, The DISTINCT Keyword, Introduction to the FROM Clause, Identifiers, Formatting SELECT Statements, Using Beeline in Non-Interactive Mode, Using Impala Shell in Non-Interactive Mode, Formatting the Output of Beeline and Impala Shell, Saving Hive and Impala Query Results to a File, Order of Operations, Division and Modulo Operators, Common String Functions, Case (In)Sensitivity in SQL, Week 2 Core Quiz, Week 2 Honors Quiz, Data Analysis and You

Filtering Data

Introduction, About the Datasets, Introduction to the WHERE Clause, Using Expressions in the WHERE Clause, Comparison Operators, Data Types and Precision, Logical Operators, Other Relational Operators, Understanding Missing Values, Handling Missing Values, Conditional Functions, Using Variables with Beeline and Impala Shell, Calling Beeline and Impala Shell from Scripts, Querying Hive and Impala in Scripts and Applications, Data Reference, (Optional) Unicode Characters, Working with Literal Strings, Missing Values with Logical Operators, Missing Values in String Columns, (Optional Exercise) Change VM Desktop Color, Week 3 Core Quiz, Week 3 Honors Quiz, Filtering Examples

Grouping and Aggregating Data

Introduction, Introduction to Aggregation, Common Aggregate Functions, Using Aggregate Functions in the SELECT Statement, Introduction to the GROUP BY Clause, Choosing an Aggregate Function and Grouping Column, Grouping Expressions, Grouping and Aggregation, Together and Separately, NULL Values in Grouping and Aggregation, The COUNT Function, Tips for Applying Grouping and Aggregation, Filtering on Aggregates, The HAVING Clause, Understanding Hive and Impala Version Differences, Understanding Hue Version Differences, COUNT(*) and SUM(1), Interpreting Aggregates: Populations and Samples, The least and greatest Functions, Why Aggregate Expressions Ignore NULL Values, (Optional) Shortcuts for Grouping, How Grouping and Aggregation Can Mislead, Week 4 Core Quiz, Week 4 Honors Quiz, Grouping and Aggregation Examples, The Analytic Journey

Sorting and Limiting Data

Introduction, Introduction to the ORDER BY Clause, Controlling Sort Order, Ordering Expressions, Missing Values in Ordered Results, Using ORDER BY with Hive and Impala, Introduction to the LIMIT Clause, When to Use the LIMIT Clause, Using LIMIT with ORDER BY, Using LIMIT for Pagination, Review, How to Effectively Use the Hive and Impala Documentation, Tips for Using the Hive Documentation, Tips for Using the Impala Documentation, Ordering by String Columns, Week 5 Core Quiz, Week 5 Honors Quiz, Questions?

Combining Data

Introduction, Combining Query Results with the UNION Operator, Using ORDER BY and LIMIT with UNION, Introduction to Joins, Join Syntax, Inner Joins, Outer Joins, Conclusion, Handling NULL Values in Join Key Columns, Non-Equijoins, Cross Joins, Left Semi-Joins, Missing or Truncated Values from Type Conversion, Using UNION to Combine Three or More Results, Alternative Join Syntax, Joining Three or More Tables, Specifying Two or More Join Conditions, Week 6 Core Quiz, Week 6 Honors Quiz, Analyzing Big Data with SQL, Get Excited!

More Data Analysis Courses

Database systems

Universidad Nacional Autónoma de México

This specialized program in Database Systems equips learners with the skills to design and implement reliable, scalable, and maintainable data-intensive systems,...

Designing data-intensive applications

Universidad Nacional Autónoma de México

Designing data-intensive applications equips learners with the skills to propose, design, and develop high reliable information systems. This specialization focuses...

Measuring Total Data Quality

University of Michigan

Measuring Total Data Quality provides learners with the essential skills to evaluate data quality at each stage of the Total Data Quality framework.

Using Advanced Formulas and Functions in Excel

Coursera Project Network

Using Advanced Formulas and Functions in Excel provides hands-on training in applying advanced Excel functions to real-world examples.