Difference between revisions of "DataScience"

From SourceWiki
Jump to navigation Jump to search
Line 6: Line 6:
 
[[Image:Data_Science_VD.png|400px|thumbnail|center|Drew Conway's Venn diagram of data science]]
 
[[Image:Data_Science_VD.png|400px|thumbnail|center|Drew Conway's Venn diagram of data science]]
  
Topics would include:
+
=Topics would include=
 +
 
 
* What is relevant for the UoB?
 
* What is relevant for the UoB?
 
* y=f(x) relationships:- classifiers & regression
 
* y=f(x) relationships:- classifiers & regression

Revision as of 13:29, 5 January 2015

What would a course on Data Science look like?

Introduction

Drew Conway's Venn diagram of data science

Topics would include

  • What is relevant for the UoB?
  • y=f(x) relationships:- classifiers & regression
    • Examples: Linear & logistic regression, K-Nearest Neighbours, Decision Trees, Neural Networks etc.
  • Data topics:
    • Training, Test & validation data.
    • Sources of data, e.g. web scraping.
    • Exploratory Data Analysis (EDA).
    • Cleaning & munging data (90% of your effort?). Useful Linux tools.
    • Feature selection.
  • Model selection & training topics:
    • Algorithms that scale.
    • Supervised vs. Unsupervised training.
    • Overfitting.
    • The curse of dimensionality.
  • Programming Skills:
    • "Clean code shows clarity of mind,"
    • Version control.
    • Build systems.
    • Testing.
    • Scripting and automation.