50.038 Computational Data Science

No. of Credits: 12 Subject Credits

Pre-requisites:


Course Description

This course provides students the necessary background and experience in data science technology and concepts.

Students will gain experience with tackling a complete data science project, from data gathering and preprocessing to data analysis through machine learning tools.

Students will learn to apply fundamental concepts in machine learning to data storage and distributed processing as a foundation for their project.

Learning Objectives

  1. Be aware of the main goals of data science, its main application domains and current challenges.
  2. Apply tools to build basic models for solving typical data analytics problems.
  3. Visualise the structure of big data in order to uncover hidden patterns.
  4. Design and implement distributed database systems for managing heterogeneous data.
  5. Perform basic operations on a moderately complex distributed computation system, such as Spark.
  6. Explain the fundamentals of statistical machine learning and deep learning.
  7. Appreciate the technical skills necessary to be a capable data scientist.

Measurable Outcomes

  1. Identify important concepts and current challenges in data science.
  2. Design feature representations for image, text and time series data.
  3. Analyse data and build simple models in tools such as Weka, Python and Tableau.
  4. Implement distributed computation model using Spark.
  5. Evaluate the performance of different models using empirical benchmarks.
  6. Mathematically explain common machine learning models such as SVMs, logistic regression systems and neutral networks.
  7. Implement machine learning algorithms using software such as R, C++ and PyTorch.
  8. Manage big data using Hadoop and MapReduce.

Revision Checklist

This revision checklist was graciously shared by Tey Siew Wen , updated as of 05 February 2020.

Follow her on GitHub and give her your messages of appreciation!

Download

View Source Code