Difference between revisions of "DataScience"

From SourceWiki
Jump to navigation Jump to search
Line 24: Line 24:
 
* Programming Skills:
 
* Programming Skills:
 
** "Clean code shows clarity of mind,"
 
** "Clean code shows clarity of mind,"
 +
** Languages: R? Python? Others?
 
** Version control.
 
** Version control.
 
** Build systems.
 
** Build systems.
 
** Testing.
 
** Testing.
 
** Scripting and automation.
 
** Scripting and automation.

Revision as of 13:49, 5 January 2015

What would a course on Data Science look like?

Introduction

Drew Conway's Venn diagram of data science

Topics would include

  • What is relevant for the UoB?
  • y=f(x) relationships:- classifiers & regression
    • Examples: Linear & logistic regression, K-Nearest Neighbours, Decision Trees, Neural Networks etc.
  • Data topics:
    • Training, Test & validation data.
    • Sources of data, e.g. web scraping.
    • Exploratory Data Analysis (EDA).
    • Cleaning & munging data (90% of your effort?). Useful Linux tools.
    • Feature selection.
  • Model selection & training topics:
    • Algorithms that scale.
    • Supervised vs. Unsupervised training.
    • Overfitting.
    • The curse of dimensionality.
  • Programming Skills:
    • "Clean code shows clarity of mind,"
    • Languages: R? Python? Others?
    • Version control.
    • Build systems.
    • Testing.
    • Scripting and automation.