Difference between revisions of "DataScience"
Jump to navigation
Jump to search
Line 5: | Line 5: | ||
[[Image:Data_Science_VD.png|400px|thumbnail|center|Drew Conway's Venn diagram of data science]] | [[Image:Data_Science_VD.png|400px|thumbnail|center|Drew Conway's Venn diagram of data science]] | ||
+ | |||
+ | Topics would include: | ||
+ | * What is relevant for the UoB? | ||
+ | * y=f(x) relationships:- classifiers & regression | ||
+ | ** Examples: Linear & logistic regression, K-Nearest Neighbours, Decision Trees, Neural Networks etc. | ||
+ | * Data topics: | ||
+ | ** Training, Test & validation data. | ||
+ | ** Sources of data, e.g. web scraping. | ||
+ | ** Exploratory Data Analysis (EDA). | ||
+ | ** Cleaning & munging data (90% of your effort?). Useful Linux tools. | ||
+ | ** Feature selection. | ||
+ | * Model selection & training topics: | ||
+ | ** Algorithms that scale. | ||
+ | ** Supervised vs. Unsupervised training. | ||
+ | ** Overfitting. | ||
+ | ** The curse of dimensionality. | ||
+ | * Programming Skills: | ||
+ | ** "Clean code shows clarity of mind," | ||
+ | ** Version control. | ||
+ | ** Build systems. | ||
+ | ** Testing. | ||
+ | ** Scripting and automation. |
Revision as of 13:28, 5 January 2015
What would a course on Data Science look like?
Introduction
Topics would include:
- What is relevant for the UoB?
- y=f(x) relationships:- classifiers & regression
- Examples: Linear & logistic regression, K-Nearest Neighbours, Decision Trees, Neural Networks etc.
- Data topics:
- Training, Test & validation data.
- Sources of data, e.g. web scraping.
- Exploratory Data Analysis (EDA).
- Cleaning & munging data (90% of your effort?). Useful Linux tools.
- Feature selection.
- Model selection & training topics:
- Algorithms that scale.
- Supervised vs. Unsupervised training.
- Overfitting.
- The curse of dimensionality.
- Programming Skills:
- "Clean code shows clarity of mind,"
- Version control.
- Build systems.
- Testing.
- Scripting and automation.