Difference between revisions of "DataScience"

From SourceWiki
Jump to navigation Jump to search
Line 5: Line 5:
  
 
[[Image:Data_Science_VD.png|400px|thumbnail|center|Drew Conway's Venn diagram of data science]]
 
[[Image:Data_Science_VD.png|400px|thumbnail|center|Drew Conway's Venn diagram of data science]]
 +
 +
Topics would include:
 +
* What is relevant for the UoB?
 +
* y=f(x) relationships:- classifiers & regression
 +
** Examples: Linear & logistic regression, K-Nearest Neighbours, Decision Trees, Neural Networks etc.
 +
* Data topics:
 +
** Training, Test & validation data.
 +
** Sources of data, e.g. web scraping.
 +
** Exploratory Data Analysis (EDA).
 +
** Cleaning & munging data (90% of your effort?).  Useful Linux tools.
 +
** Feature selection.
 +
* Model selection & training topics:
 +
** Algorithms that scale.
 +
** Supervised vs. Unsupervised training.
 +
** Overfitting.
 +
** The curse of dimensionality.
 +
* Programming Skills:
 +
** "Clean code shows clarity of mind,"
 +
** Version control.
 +
** Build systems.
 +
** Testing.
 +
** Scripting and automation.

Revision as of 13:28, 5 January 2015

What would a course on Data Science look like?

Introduction

Drew Conway's Venn diagram of data science

Topics would include:

  • What is relevant for the UoB?
  • y=f(x) relationships:- classifiers & regression
    • Examples: Linear & logistic regression, K-Nearest Neighbours, Decision Trees, Neural Networks etc.
  • Data topics:
    • Training, Test & validation data.
    • Sources of data, e.g. web scraping.
    • Exploratory Data Analysis (EDA).
    • Cleaning & munging data (90% of your effort?). Useful Linux tools.
    • Feature selection.
  • Model selection & training topics:
    • Algorithms that scale.
    • Supervised vs. Unsupervised training.
    • Overfitting.
    • The curse of dimensionality.
  • Programming Skills:
    • "Clean code shows clarity of mind,"
    • Version control.
    • Build systems.
    • Testing.
    • Scripting and automation.