Big Data

Courses

Our Big Data Crash Courses introduce students to a variety of data analysis techniques. Students are given the freedom to complete the modules at their own pace. Courses include introductory statistics, data manipulation using R and Python, data visualization using R and Python, and basic machine learning using Python.

The following is proprietary material of STEM Fellowship and its creators and is under the protection of the STEM Fellowship Licensing Agreement. It cannot be modified, reproduced or distributed without written consent from STEM Fellowship.

Please download Jupyter for Linux, Mac, and Windows. You’ll be using it to run the workshops. There are installation instructions and the first thing you should do is open the R – Jupyter Essentials in 5 Minutes.ipynb notebook for R and the Python – Jupyter Essentials in 5 Minutes.ipynb for Python.

INTRODUCTION TO STATISTICS

The purpose of this course is to introduce students to data basics which they can then apply for data analysis. The following topics are covered in the course:

  • variables and cases
  • samples vs populations
  • describing data
  • mean, median and mode
  • variance and standard deviation
  • confidence intervals
  • hypothesis testing

DATA MANIPULATION USING R

The purpose of this course is to teach you how to make your data more oragnised and easier to read. Topics covered in this course include:

  • introductory data science vocabulary
  • basic R
  • data structures
  • the data analysis process

DATA MANIPULATION USING PYTHON

The purpose of this course is to teach you how to make your data more oragnised and easier to read. Topics covered in this course include:

  • Introductory data science vocabulary
  • basic R
  • data structures
  • the data analysis process

DATA VISUALISATON USING R

This course will focus on teaching you about the various visualisation forms available to best represent various data types. Topics covered in this course include:

  • data visualisation types (correlation heat maps, scatter plots, bar charts, pie charts)
  • the grammar of graphics
  • layering
  • local vs. global features
  • geographical mapping
  • data cleaning.

DATA VISUALISATION USING PYTHON

This course will teach you about the various visualisation forms available to represent various data types. Topics covered in this course include:

  • preparing data for visualisation
  • data visualisation types
  • scatter plots
  • correlation matrices
  • geographic mapping
  • packages available for data manipulation and analysis

BASIC MACHINE LEARNING USING PYTHON

This course will introduce you to k-means clustering.