<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://source.geography.bristol.ac.uk/mediawiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=GethinWilliams</id>
	<title>SourceWiki - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://source.geography.bristol.ac.uk/mediawiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=GethinWilliams"/>
	<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/wiki/Special:Contributions/GethinWilliams"/>
	<updated>2026-04-08T16:25:35Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.35.8</generator>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=DataScience&amp;diff=9551</id>
		<title>DataScience</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=DataScience&amp;diff=9551"/>
		<updated>2015-11-05T10:45:45Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[category:Pragmatic Programming]]&lt;br /&gt;
'''What would a course on Data Science look like?'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Media:intro-to-data-science-nov15.pdf|Intro to Data Science]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
[[Image:Data_Science_VD.png|400px|thumbnail|center|Drew Conway's Venn diagram of data science]]&lt;br /&gt;
&lt;br /&gt;
=Topics would include=&lt;br /&gt;
&lt;br /&gt;
* What is relevant for the UoB?&lt;br /&gt;
* y=f(x) relationships:- classifiers &amp;amp; regression&lt;br /&gt;
** Examples: Linear &amp;amp; logistic regression, K-Nearest Neighbours, Decision Trees, Neural Networks etc.&lt;br /&gt;
* Data topics:&lt;br /&gt;
** Training, Test &amp;amp; validation data.&lt;br /&gt;
** Sources of data, e.g. web scraping.&lt;br /&gt;
** Exploratory Data Analysis (EDA).&lt;br /&gt;
** Cleaning &amp;amp; munging data (90% of your effort?).  Useful Linux tools.&lt;br /&gt;
** Feature selection.&lt;br /&gt;
* Model selection &amp;amp; training topics:&lt;br /&gt;
** Algorithms that scale. &lt;br /&gt;
** Supervised vs. Unsupervised training.&lt;br /&gt;
** Overfitting.&lt;br /&gt;
** The curse of dimensionality.&lt;br /&gt;
* Programming Skills:&lt;br /&gt;
** &amp;quot;Clean code shows clarity of mind,&amp;quot;&lt;br /&gt;
** Languages: R? Python? Others?&lt;br /&gt;
** Version control.&lt;br /&gt;
** Build systems.&lt;br /&gt;
** Testing.&lt;br /&gt;
** Scripting and automation.&lt;br /&gt;
--&amp;gt;&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=DataScience&amp;diff=9550</id>
		<title>DataScience</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=DataScience&amp;diff=9550"/>
		<updated>2015-11-05T10:45:15Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[category:Pragmatic Programming]]&lt;br /&gt;
'''What would a course on Data Science look like?'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:intro-to-data-science-nov15.pdf|Intro to Data Science]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
[[Image:Data_Science_VD.png|400px|thumbnail|center|Drew Conway's Venn diagram of data science]]&lt;br /&gt;
&lt;br /&gt;
=Topics would include=&lt;br /&gt;
&lt;br /&gt;
* What is relevant for the UoB?&lt;br /&gt;
* y=f(x) relationships:- classifiers &amp;amp; regression&lt;br /&gt;
** Examples: Linear &amp;amp; logistic regression, K-Nearest Neighbours, Decision Trees, Neural Networks etc.&lt;br /&gt;
* Data topics:&lt;br /&gt;
** Training, Test &amp;amp; validation data.&lt;br /&gt;
** Sources of data, e.g. web scraping.&lt;br /&gt;
** Exploratory Data Analysis (EDA).&lt;br /&gt;
** Cleaning &amp;amp; munging data (90% of your effort?).  Useful Linux tools.&lt;br /&gt;
** Feature selection.&lt;br /&gt;
* Model selection &amp;amp; training topics:&lt;br /&gt;
** Algorithms that scale. &lt;br /&gt;
** Supervised vs. Unsupervised training.&lt;br /&gt;
** Overfitting.&lt;br /&gt;
** The curse of dimensionality.&lt;br /&gt;
* Programming Skills:&lt;br /&gt;
** &amp;quot;Clean code shows clarity of mind,&amp;quot;&lt;br /&gt;
** Languages: R? Python? Others?&lt;br /&gt;
** Version control.&lt;br /&gt;
** Build systems.&lt;br /&gt;
** Testing.&lt;br /&gt;
** Scripting and automation.&lt;br /&gt;
--&amp;gt;&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=DataScience&amp;diff=9549</id>
		<title>DataScience</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=DataScience&amp;diff=9549"/>
		<updated>2015-11-05T10:44:17Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[category:Pragmatic Programming]]&lt;br /&gt;
'''What would a course on Data Science look like?'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[:File:intro-to-data-science-nov15.pdf|Intro to Data Science]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
[[Image:Data_Science_VD.png|400px|thumbnail|center|Drew Conway's Venn diagram of data science]]&lt;br /&gt;
&lt;br /&gt;
=Topics would include=&lt;br /&gt;
&lt;br /&gt;
* What is relevant for the UoB?&lt;br /&gt;
* y=f(x) relationships:- classifiers &amp;amp; regression&lt;br /&gt;
** Examples: Linear &amp;amp; logistic regression, K-Nearest Neighbours, Decision Trees, Neural Networks etc.&lt;br /&gt;
* Data topics:&lt;br /&gt;
** Training, Test &amp;amp; validation data.&lt;br /&gt;
** Sources of data, e.g. web scraping.&lt;br /&gt;
** Exploratory Data Analysis (EDA).&lt;br /&gt;
** Cleaning &amp;amp; munging data (90% of your effort?).  Useful Linux tools.&lt;br /&gt;
** Feature selection.&lt;br /&gt;
* Model selection &amp;amp; training topics:&lt;br /&gt;
** Algorithms that scale. &lt;br /&gt;
** Supervised vs. Unsupervised training.&lt;br /&gt;
** Overfitting.&lt;br /&gt;
** The curse of dimensionality.&lt;br /&gt;
* Programming Skills:&lt;br /&gt;
** &amp;quot;Clean code shows clarity of mind,&amp;quot;&lt;br /&gt;
** Languages: R? Python? Others?&lt;br /&gt;
** Version control.&lt;br /&gt;
** Build systems.&lt;br /&gt;
** Testing.&lt;br /&gt;
** Scripting and automation.&lt;br /&gt;
--&amp;gt;&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=DataScience&amp;diff=9548</id>
		<title>DataScience</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=DataScience&amp;diff=9548"/>
		<updated>2015-11-05T10:43:34Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[category:Pragmatic Programming]]&lt;br /&gt;
'''What would a course on Data Science look like?'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[:File:intro-to-data-science-nov15.pdf]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
[[Image:Data_Science_VD.png|400px|thumbnail|center|Drew Conway's Venn diagram of data science]]&lt;br /&gt;
&lt;br /&gt;
=Topics would include=&lt;br /&gt;
&lt;br /&gt;
* What is relevant for the UoB?&lt;br /&gt;
* y=f(x) relationships:- classifiers &amp;amp; regression&lt;br /&gt;
** Examples: Linear &amp;amp; logistic regression, K-Nearest Neighbours, Decision Trees, Neural Networks etc.&lt;br /&gt;
* Data topics:&lt;br /&gt;
** Training, Test &amp;amp; validation data.&lt;br /&gt;
** Sources of data, e.g. web scraping.&lt;br /&gt;
** Exploratory Data Analysis (EDA).&lt;br /&gt;
** Cleaning &amp;amp; munging data (90% of your effort?).  Useful Linux tools.&lt;br /&gt;
** Feature selection.&lt;br /&gt;
* Model selection &amp;amp; training topics:&lt;br /&gt;
** Algorithms that scale. &lt;br /&gt;
** Supervised vs. Unsupervised training.&lt;br /&gt;
** Overfitting.&lt;br /&gt;
** The curse of dimensionality.&lt;br /&gt;
* Programming Skills:&lt;br /&gt;
** &amp;quot;Clean code shows clarity of mind,&amp;quot;&lt;br /&gt;
** Languages: R? Python? Others?&lt;br /&gt;
** Version control.&lt;br /&gt;
** Build systems.&lt;br /&gt;
** Testing.&lt;br /&gt;
** Scripting and automation.&lt;br /&gt;
--&amp;gt;&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=File:Intro-to-data-science-nov15.pdf&amp;diff=9547</id>
		<title>File:Intro-to-data-science-nov15.pdf</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=File:Intro-to-data-science-nov15.pdf&amp;diff=9547"/>
		<updated>2015-11-05T10:38:01Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=DataScience&amp;diff=9499</id>
		<title>DataScience</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=DataScience&amp;diff=9499"/>
		<updated>2015-01-05T13:49:43Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: /* Topics would include */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[category:Pragmatic Programming]]&lt;br /&gt;
'''What would a course on Data Science look like?'''&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
[[Image:Data_Science_VD.png|400px|thumbnail|center|Drew Conway's Venn diagram of data science]]&lt;br /&gt;
&lt;br /&gt;
=Topics would include=&lt;br /&gt;
&lt;br /&gt;
* What is relevant for the UoB?&lt;br /&gt;
* y=f(x) relationships:- classifiers &amp;amp; regression&lt;br /&gt;
** Examples: Linear &amp;amp; logistic regression, K-Nearest Neighbours, Decision Trees, Neural Networks etc.&lt;br /&gt;
* Data topics:&lt;br /&gt;
** Training, Test &amp;amp; validation data.&lt;br /&gt;
** Sources of data, e.g. web scraping.&lt;br /&gt;
** Exploratory Data Analysis (EDA).&lt;br /&gt;
** Cleaning &amp;amp; munging data (90% of your effort?).  Useful Linux tools.&lt;br /&gt;
** Feature selection.&lt;br /&gt;
* Model selection &amp;amp; training topics:&lt;br /&gt;
** Algorithms that scale. &lt;br /&gt;
** Supervised vs. Unsupervised training.&lt;br /&gt;
** Overfitting.&lt;br /&gt;
** The curse of dimensionality.&lt;br /&gt;
* Programming Skills:&lt;br /&gt;
** &amp;quot;Clean code shows clarity of mind,&amp;quot;&lt;br /&gt;
** Languages: R? Python? Others?&lt;br /&gt;
** Version control.&lt;br /&gt;
** Build systems.&lt;br /&gt;
** Testing.&lt;br /&gt;
** Scripting and automation.&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=DataScience&amp;diff=9498</id>
		<title>DataScience</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=DataScience&amp;diff=9498"/>
		<updated>2015-01-05T13:29:32Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: /* Introduction */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[category:Pragmatic Programming]]&lt;br /&gt;
'''What would a course on Data Science look like?'''&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
[[Image:Data_Science_VD.png|400px|thumbnail|center|Drew Conway's Venn diagram of data science]]&lt;br /&gt;
&lt;br /&gt;
=Topics would include=&lt;br /&gt;
&lt;br /&gt;
* What is relevant for the UoB?&lt;br /&gt;
* y=f(x) relationships:- classifiers &amp;amp; regression&lt;br /&gt;
** Examples: Linear &amp;amp; logistic regression, K-Nearest Neighbours, Decision Trees, Neural Networks etc.&lt;br /&gt;
* Data topics:&lt;br /&gt;
** Training, Test &amp;amp; validation data.&lt;br /&gt;
** Sources of data, e.g. web scraping.&lt;br /&gt;
** Exploratory Data Analysis (EDA).&lt;br /&gt;
** Cleaning &amp;amp; munging data (90% of your effort?).  Useful Linux tools.&lt;br /&gt;
** Feature selection.&lt;br /&gt;
* Model selection &amp;amp; training topics:&lt;br /&gt;
** Algorithms that scale. &lt;br /&gt;
** Supervised vs. Unsupervised training.&lt;br /&gt;
** Overfitting.&lt;br /&gt;
** The curse of dimensionality.&lt;br /&gt;
* Programming Skills:&lt;br /&gt;
** &amp;quot;Clean code shows clarity of mind,&amp;quot;&lt;br /&gt;
** Version control.&lt;br /&gt;
** Build systems.&lt;br /&gt;
** Testing.&lt;br /&gt;
** Scripting and automation.&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=DataScience&amp;diff=9497</id>
		<title>DataScience</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=DataScience&amp;diff=9497"/>
		<updated>2015-01-05T13:28:51Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: /* Introduction */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[category:Pragmatic Programming]]&lt;br /&gt;
'''What would a course on Data Science look like?'''&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
[[Image:Data_Science_VD.png|400px|thumbnail|center|Drew Conway's Venn diagram of data science]]&lt;br /&gt;
&lt;br /&gt;
Topics would include:&lt;br /&gt;
* What is relevant for the UoB?&lt;br /&gt;
* y=f(x) relationships:- classifiers &amp;amp; regression&lt;br /&gt;
** Examples: Linear &amp;amp; logistic regression, K-Nearest Neighbours, Decision Trees, Neural Networks etc.&lt;br /&gt;
* Data topics:&lt;br /&gt;
** Training, Test &amp;amp; validation data.&lt;br /&gt;
** Sources of data, e.g. web scraping.&lt;br /&gt;
** Exploratory Data Analysis (EDA).&lt;br /&gt;
** Cleaning &amp;amp; munging data (90% of your effort?).  Useful Linux tools.&lt;br /&gt;
** Feature selection.&lt;br /&gt;
* Model selection &amp;amp; training topics:&lt;br /&gt;
** Algorithms that scale. &lt;br /&gt;
** Supervised vs. Unsupervised training.&lt;br /&gt;
** Overfitting.&lt;br /&gt;
** The curse of dimensionality.&lt;br /&gt;
* Programming Skills:&lt;br /&gt;
** &amp;quot;Clean code shows clarity of mind,&amp;quot;&lt;br /&gt;
** Version control.&lt;br /&gt;
** Build systems.&lt;br /&gt;
** Testing.&lt;br /&gt;
** Scripting and automation.&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=DataScience&amp;diff=9496</id>
		<title>DataScience</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=DataScience&amp;diff=9496"/>
		<updated>2015-01-05T13:17:02Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: /* Introduction */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[category:Pragmatic Programming]]&lt;br /&gt;
'''What would a course on Data Science look like?'''&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
[[Image:Data_Science_VD.png|400px|thumbnail|center|Drew Conway's Venn diagram of data science]]&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=File:Data_Science_VD.png&amp;diff=9495</id>
		<title>File:Data Science VD.png</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=File:Data_Science_VD.png&amp;diff=9495"/>
		<updated>2015-01-05T13:16:06Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=DataScience&amp;diff=9494</id>
		<title>DataScience</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=DataScience&amp;diff=9494"/>
		<updated>2015-01-05T13:15:37Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: /* Introduction */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[category:Pragmatic Programming]]&lt;br /&gt;
'''What would a course on Data Science look like?'''&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
[[Image:R-lm(cars)-abline.png|400px|thumbnail|center|linear regression of stopping distance against speed from the built-in data set, cars]]&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=DataScience&amp;diff=9493</id>
		<title>DataScience</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=DataScience&amp;diff=9493"/>
		<updated>2015-01-05T13:14:39Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[category:Pragmatic Programming]]&lt;br /&gt;
'''What would a course on Data Science look like?'''&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=DataScience&amp;diff=9492</id>
		<title>DataScience</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=DataScience&amp;diff=9492"/>
		<updated>2015-01-05T13:13:51Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: Created page with 'category:Pragmatic Programming '''Open Source Statistics with R'''  =Introduction='&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[category:Pragmatic Programming]]&lt;br /&gt;
'''Open Source Statistics with R'''&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=R2&amp;diff=9490</id>
		<title>R2</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=R2&amp;diff=9490"/>
		<updated>2014-12-12T12:23:38Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[category:Pragmatic Programming]]&lt;br /&gt;
'''Open Source Statistics with R'''&lt;br /&gt;
&lt;br /&gt;
=Submitting R jobs on BlueCrystal=&lt;br /&gt;
&lt;br /&gt;
If you have an R script called, for example, '''myscript.r''', you can run it on BlueCrystal using the following submission script:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#PBS -l nodes=1:ppn=1,walltime=01:00:00&lt;br /&gt;
&lt;br /&gt;
#! change the working directory (default is home directory)&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
#! Run the R script&lt;br /&gt;
R CMD BATCH myscript.r&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that this script runs an R script on a single processor.  That processor is requested for 1 hr in the above example (walltime=01:00:00).  You can change this by modifying the 'walltime' resource request.  (See later examples is you would like to use more than one processor.)  The output of the job will be placed in a file called '''myscript.r.Rout'''.&lt;br /&gt;
&lt;br /&gt;
If the submission script is saved as '''r-submit''', then you would submit the job by typing: '''qsub r-submit'''.&lt;br /&gt;
&lt;br /&gt;
=Writing Faster R Code=&lt;br /&gt;
&lt;br /&gt;
In the above sections we've introduced a number of features of R and have begun the journey to becoming a proficient and productive user of the language.  In the remaining sections, we'll switch tack and focus on a question commonly asked by those beginning to use R in anger--'''&amp;quot;My R code is slow.  How can I speed it up?&amp;quot;'''.  In this section we'll consider the related tasks of finding which bits of your R code is responsible for the majority of the run-time and what you can do about it.&lt;br /&gt;
&lt;br /&gt;
==Profiling &amp;amp; Timing==&lt;br /&gt;
&lt;br /&gt;
In order to remain productive (and sane, and have a social life...), it is essential that we first identify which portions of your R code are responsible for the majority of the run-time.  We could spend ages optimising a portion that we ''think'' may be running slowly, but computers have the gift(!) to constantly surprise us, and if that portion of your program accounted for, say, 10% of the run-time, then you will have sweated for absolutely no useful gain.&lt;br /&gt;
&lt;br /&gt;
The simplest method of investigation is to simply time the application of a function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
system.time(some.function())&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can get a more detailed analysis of a block of code using the built-in R profiler.  The general pattern of invocation is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
Rprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
# Do some work&lt;br /&gt;
Rprof()&lt;br /&gt;
summaryRprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For example, here's an R script, '''profile.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
Rprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
# Create a 10 x 100,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(100000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)})&lt;br /&gt;
Rprof()&lt;br /&gt;
summaryRprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Which I ran by typing:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
R CMD BATCH profile.r&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the output file, '''profile.r.Rout''', I found the following break down:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
               self.time self.pct total.time total.pct&lt;br /&gt;
&amp;quot;simpleLoess&amp;quot;       4.84    88.00       5.10     92.73&lt;br /&gt;
&amp;quot;rnorm&amp;quot;             0.22     4.00       0.22      4.00&lt;br /&gt;
&amp;quot;loess.smooth&amp;quot;      0.18     3.27       5.28     96.00&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The profile tells us that the function '''simpleLoess''' take 88% of the runtime, whereas '''rnorm''' takes only 4%.&lt;br /&gt;
&lt;br /&gt;
==Preallocation of Memory==&lt;br /&gt;
&lt;br /&gt;
As with other scripting languages, such as MATLAB, the simplest method that you can use to speed up your R code is to pre-allocate the storage for variables whenever possible.  To see the benefits of this, consider the following two functions:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; f1 &amp;lt;- function() {&lt;br /&gt;
+ v &amp;lt;- c()&lt;br /&gt;
+ for (i in 1:30000)&lt;br /&gt;
+   v[i] &amp;lt;- i^2&lt;br /&gt;
+ }&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; f2 &amp;lt;- function() {&lt;br /&gt;
+ v &amp;lt;- c(NA)&lt;br /&gt;
+ length(v) &amp;lt;- 30000&lt;br /&gt;
+ for (i in 1:30000)&lt;br /&gt;
+   v[i] &amp;lt;- i^2&lt;br /&gt;
+ }&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Timing calls to each of them shows that the pre-allocation of memory gives a whopping ~'''x30 speed-up'''.  Your mileage will vary depending upon the details of your code.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; system.time(f1())&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  1.720   0.040   1.762&lt;br /&gt;
&amp;gt; system.time(f2())&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.052   0.000   0.05&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Vectorised Operations==&lt;br /&gt;
&lt;br /&gt;
The other principle method for speeding up your R code is to eliminate loops whenever you can.  Many functions and operators in R will accept arrays as input, rather than just single values and this may allow you to not use a loop.  The examples in the previous section used for loops to step through an array, squaring each element.  However, you can achieve the same result far more quickly by passing the array ''en masse'' to exponentiation operator:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; system.time(v &amp;lt;- (1:1000000)^2)&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.024   0.004   0.026&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Here we've been able to square 1,000,000 items in half the time it took to process 30,000!&lt;br /&gt;
&lt;br /&gt;
==Calling Functions Written in a Compiled Language (e.g. C or Fortran)==&lt;br /&gt;
&lt;br /&gt;
Another way to get more speed is to outsource portions of R code that are found to be slow to a compiled language, such as C or Fortran.  A good starting point on this topic is:&lt;br /&gt;
&lt;br /&gt;
* http://mazamascience.com/WorkingWithData/?p=1067&lt;br /&gt;
&lt;br /&gt;
=R and HPC=&lt;br /&gt;
&lt;br /&gt;
If you've profiled your code and tried all that you can to speed it up, as described in the previous section, you might be interested in the various initiatives that exist to run R on high performance computers, such as bluecrsytal:  &lt;br /&gt;
&lt;br /&gt;
* http://cran.r-project.org/web/views/HighPerformanceComputing.html&lt;br /&gt;
&lt;br /&gt;
We will see in the following examples, the general approach to running R in parallel is to arrange your task so that a function is applied to a list of inputs, and then to split the list over several CPU cores or cluster worker nodes.&lt;br /&gt;
&lt;br /&gt;
==Multicore==&lt;br /&gt;
&lt;br /&gt;
The '''multicore''' package allows us to make use of several CPU cores within a single machine.  Note, however, that the package  does not work on a MS Windows computers.&lt;br /&gt;
&lt;br /&gt;
As an example, let's look at the use of the package's '''mclapply''' function, a multicore equivalent of R's built-in list apply mapper, '''lapply'''.   I saved the following commands into an R script called '''mutlicore.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(multicore)&lt;br /&gt;
# how many cores are present?&lt;br /&gt;
multicore:::detectCores()&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And used the following submission script to run it on bluecrystal phase2:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=00:00:05&lt;br /&gt;
&lt;br /&gt;
#! Ensure that we have the correct version of R loaded&lt;br /&gt;
module add languages/R-2.15.1&lt;br /&gt;
&lt;br /&gt;
#! change the working directory (default is home directory)&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
#! Run the R script&lt;br /&gt;
R CMD BATCH multicore.r&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After the job had run, I got the following output in the file '''multicore.r.Rout''':&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(multicore)&lt;br /&gt;
&amp;gt; # how many cores are present?&lt;br /&gt;
&amp;gt; multicore:::detectCores()&lt;br /&gt;
[1] 8&lt;br /&gt;
&amp;gt; # Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
&amp;gt; data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
&amp;gt; # Map a function over the matrix.  First in serial..&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.674   0.007   0.749 &lt;br /&gt;
&amp;gt; # .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.301   0.074   0.113 &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Rmpi==&lt;br /&gt;
&lt;br /&gt;
The '''Rmpi''' package allows us to create and use cohorts of message passing processes from within R.  It does so by providing an interface to the MPI (Message Passing Interface) library.&lt;br /&gt;
&lt;br /&gt;
In order to use the Rmpi package on BCp2, you will need the '''ofed/openmpi/gcc/64/1.4.2-qlc''' module loaded.&lt;br /&gt;
&lt;br /&gt;
Here's a short example that I saved as '''Rmpi.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(Rmpi)&lt;br /&gt;
# spawn as many slaves as possible&lt;br /&gt;
mpi.spawn.Rslaves()&lt;br /&gt;
mpi.remote.exec(mpi.get.processor.name())&lt;br /&gt;
mpi.remote.exec(runif(1))&lt;br /&gt;
mpi.close.Rslaves()&lt;br /&gt;
mpi.quit()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I submitted the job to BCp2 using the following submission script:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#PBS -l nodes=4:ppn=1,walltime=00:00:05&lt;br /&gt;
&lt;br /&gt;
#! Ensure that we have the correct version of R loaded&lt;br /&gt;
module add languages/R-2.15.1&lt;br /&gt;
&lt;br /&gt;
#! change the working directory (default is home directory)&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
#! Create a machine file (used for multi-node jobs)&lt;br /&gt;
cat $PBS_NODEFILE &amp;gt; machine.file.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
#! Disable PSM on the QLogic HCAs&lt;br /&gt;
export OMPI_MCA_mtl=^psm&lt;br /&gt;
&lt;br /&gt;
#! Run the R script&lt;br /&gt;
mpirun -np 1 -machinefile machine.file.$PBS_JOBID R CMD BATCH Rmpi.r&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and got the following output:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(Rmpi)&lt;br /&gt;
&amp;gt; # spawn as many slaves as possible&lt;br /&gt;
&amp;gt; mpi.spawn.Rslaves()&lt;br /&gt;
        4 slaves are spawned successfully. 0 failed.&lt;br /&gt;
master (rank 0, comm 1) of size 5 is running on: u03n074 &lt;br /&gt;
slave1 (rank 1, comm 1) of size 5 is running on: u03n098 &lt;br /&gt;
slave2 (rank 2, comm 1) of size 5 is running on: u04n029 &lt;br /&gt;
slave3 (rank 3, comm 1) of size 5 is running on: u04n030 &lt;br /&gt;
slave4 (rank 4, comm 1) of size 5 is running on: u03n074 &lt;br /&gt;
&amp;gt; mpi.remote.exec(mpi.get.processor.name())&lt;br /&gt;
$slave1&lt;br /&gt;
[1] &amp;quot;u03n098&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave2&lt;br /&gt;
[1] &amp;quot;u04n029&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave3&lt;br /&gt;
[1] &amp;quot;u04n030&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave4&lt;br /&gt;
[1] &amp;quot;u03n074&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;gt; mpi.remote.exec(runif(1))&lt;br /&gt;
         X1        X2        X3        X4&lt;br /&gt;
1 0.5154871 0.5154871 0.5154871 0.5154871&lt;br /&gt;
&amp;gt; mpi.close.Rslaves()&lt;br /&gt;
[1] 1&lt;br /&gt;
&amp;gt; mpi.quit()&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Snow==&lt;br /&gt;
&lt;br /&gt;
Calling MPI routines from within R may be too low level for many people to use comfortably.  Happily, the '''snow''' package provides a higher level abstraction for distributed memory programming from within R.&lt;br /&gt;
&lt;br /&gt;
Here's my example program that a saved as '''snow.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(snow)&lt;br /&gt;
# request a cluster of 3 worker nodes&lt;br /&gt;
cl &amp;lt;- makeCluster(3)&lt;br /&gt;
clusterCall(cl, function() Sys.info()[c(&amp;quot;nodename&amp;quot;,&amp;quot;machine&amp;quot;)])&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using snow, across a cluster of workers)&lt;br /&gt;
system.time(x &amp;lt;- clusterApply(cl, data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
stopCluster(cl)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I ran it on BCp2 using the same submission script given for Rmpi, save for changing Rmpi.r to snow.r.  The output was:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(snow)&lt;br /&gt;
&amp;gt; # request a cluster of 3 worker nodes&lt;br /&gt;
&amp;gt; cl &amp;lt;- makeCluster(3)&lt;br /&gt;
Loading required package: Rmpi&lt;br /&gt;
        3 slaves are spawned successfully. 0 failed.&lt;br /&gt;
&amp;gt; clusterCall(cl, function() Sys.info()[c(&amp;quot;nodename&amp;quot;,&amp;quot;machine&amp;quot;)])&lt;br /&gt;
[[1]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u01n105&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
[[2]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u02n014&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
[[3]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u03n098&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&amp;gt; # Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
&amp;gt; data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
&amp;gt; # Map a function over the matrix.  First in serial..&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.711   0.001   0.715 &lt;br /&gt;
&amp;gt; # .. and secondly in parallel (using snow, across a cluster of workers)&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- clusterApply(cl, data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.259   0.001   0.260 &lt;br /&gt;
&amp;gt; stopCluster(cl)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Parallel==&lt;br /&gt;
&lt;br /&gt;
The '''parallel''' package is an amalgamation of functionality from the multicore and snow packages.  The shared memory parallelism in this package runs on an MS Windows machine (unlike the multicore package). &lt;br /&gt;
&lt;br /&gt;
I trivial translation of our previous multicore example is:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(parallel)&lt;br /&gt;
# how many cores are present?&lt;br /&gt;
parallel:::detectCores()&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I have not been able to get a distributed memory cluster working on BCp2 using the parallel package.&lt;br /&gt;
&lt;br /&gt;
=Further Reading=&lt;br /&gt;
&lt;br /&gt;
* [http://shop.oreilly.com/product/9780596801717.do R in a Nutshell]&lt;br /&gt;
* [http://shop.oreilly.com/product/0636920021421.do Parallel R]&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=R1&amp;diff=9489</id>
		<title>R1</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=R1&amp;diff=9489"/>
		<updated>2014-12-10T17:06:03Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: /* Packages */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[category:Pragmatic Programming]]&lt;br /&gt;
'''Open Source Statistics with R'''&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
R is a mature, open-source (i.e. free!) statistics package, with an intuitive interface, excellent graphics and a vibrant community constantly adding new methods for the statistical investigation of your data to the library of packages available.&lt;br /&gt;
&lt;br /&gt;
The goal of this tutorial is to introduce you to the R package, and not to be an introductory course in statistics.&lt;br /&gt;
&lt;br /&gt;
If you are working on a Linux system, you will typically start R from the command line.  On a Windows machine, or a Mac, you will typically start up R in some form of GUI.  However you get R started, you will have access to an R command prompt.  The good news is that the examples below will all work at the R command prompt, however you gained access to it.&lt;br /&gt;
&lt;br /&gt;
Further resources:&lt;br /&gt;
&lt;br /&gt;
* The R manual is a great resource for learning R: http://cran.r-project.org/doc/manuals/r-release/R-intro.pdf&lt;br /&gt;
* Some excellent examples of using R can also be found at: http://msenux.redwoods.edu/math/R/ and http://www.r-tutor.com/&lt;br /&gt;
&lt;br /&gt;
=Getting Started=&lt;br /&gt;
&lt;br /&gt;
The very simplest thing we can do with R is to perform some arithmetic at the command prompt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; phi &amp;lt;- (1+sqrt(5))/2&lt;br /&gt;
&amp;gt; phi&lt;br /&gt;
[1] 1.618034&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Parentheses are used to modify the usual order of precedence of the operators ('''/''' will typically be evaluated before '''+''').  Note the '''[1]''' accompanying the returned value.  All numbers entered at the console are interpreted as a vector.  The '[1]' indicates that the line in question is displaying the vector of values starting at first index.  We can use the handy sequence function to create a vector containing more than a single element:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; odds &amp;lt;- seq(from=1, to=67, by=2)&lt;br /&gt;
&amp;gt; odds&lt;br /&gt;
 [1]  1  3  5  7  9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49&lt;br /&gt;
[26] 51 53 55 57 59 61 63 65 67&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
From the above example, we can see that both the '''&amp;lt;-''' and '''=''' operators can be used for assignment.&lt;br /&gt;
&lt;br /&gt;
Vectors are commonly used data structures in R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
coords.bris &amp;lt;- c(51.5, 2.6)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As are matrices:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; magic &amp;lt;- matrix(data=c(2,7,6,9,5,1,4,3,8),nrow=3,ncol=3)&lt;br /&gt;
&amp;gt; magic&lt;br /&gt;
     [,1] [,2] [,3]&lt;br /&gt;
[1,]    2    9    4&lt;br /&gt;
[2,]    7    5    3&lt;br /&gt;
[3,]    6    1    8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the '''c''' function combines the arguments given in the parentheses.  We can access portions of the array using the syntax shown in the square brackets.  For example, we can access the first row using the '''[1,]''' notation, and similarly the second column using '''[,2]'''.  Since the square is 3x3 magic, the numbers in both slices should sum to 15:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; sum(magic[1,])&lt;br /&gt;
[1] 15&lt;br /&gt;
&amp;gt; sum(magic[,2])&lt;br /&gt;
[1] 15&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Single elements and ranges can also accessed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; magic[2,2]&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;gt; magic[2:3,2:3]&lt;br /&gt;
     [,1] [,2]&lt;br /&gt;
[1,]    5    3&lt;br /&gt;
[2,]    1    8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
R also provides '''arrays''', which have more than two dimensions, and '''lists''' to hold heterogeneous collections.&lt;br /&gt;
&lt;br /&gt;
An example list:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; list.r4 &amp;lt;- list(name=&amp;quot;Radio4&amp;quot;, frequency=&amp;quot;93.7&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The items of which, we can access in several ways:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; list.r4$frequency&lt;br /&gt;
[1] &amp;quot;93.7&amp;quot;&lt;br /&gt;
&amp;gt; list.r4[1]&lt;br /&gt;
$name&lt;br /&gt;
[1] &amp;quot;Radio4&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;gt; list.r4[[1]]&lt;br /&gt;
[1] &amp;quot;Radio4&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A very commonly used data structure is the '''data frame''', which R uses to store tabular data.  Given several vectors of equal length, we can collate them into a data frame:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; country &amp;lt;- c(&amp;quot;USA&amp;quot;, &amp;quot;China&amp;quot;, &amp;quot;GB&amp;quot;)&lt;br /&gt;
&amp;gt; gold &amp;lt;- c(46, 38, 29)&lt;br /&gt;
&amp;gt; silver &amp;lt;- c(29, 27, 17)&lt;br /&gt;
&amp;gt; bronze &amp;lt;- c(29, 23, 19)&lt;br /&gt;
&amp;gt; medals.2012 &amp;lt;- data.frame(country, gold, silver, bronze)&lt;br /&gt;
&amp;gt; medals.2012&lt;br /&gt;
    country  gold  silver  bronze&lt;br /&gt;
1       USA    46      29      29&lt;br /&gt;
2     China    38      27      23&lt;br /&gt;
3        GB    29      17      19&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can access columns of a data frame using the '''$''' operator:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; medals.2012$country&lt;br /&gt;
[1] USA   China GB   &lt;br /&gt;
Levels: China GB USA&lt;br /&gt;
&amp;gt; medals.2012$gold&lt;br /&gt;
[1] 46 38 29&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Standard Graphics: A taster=&lt;br /&gt;
&lt;br /&gt;
An aspect which makes R popular are it's graphing functions.  R also has some very handy built-in data sets--we'll use this to demonstrate just a small fraction of R's graphing abilities.&lt;br /&gt;
&lt;br /&gt;
First up is the humble '''plot()''' function.  Given a data frame of points, such as one charting the relationship between temperature and the vapour pressure of mercury, it will give us a (handily labelled) scatter plot: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(pressure)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See the gallery below for all the plots created in this section.&lt;br /&gt;
&lt;br /&gt;
The plot function will also accept a time-series (another class of object recognised by R) and will sensibly join the points with a line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(co2)&lt;br /&gt;
&amp;gt; class(co2)&lt;br /&gt;
[1] &amp;quot;ts&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Pie charts are easily constructed.  In this case, to show the relative proportions of electricity generated from different sources in the UK in 2011 (source: https://www.gov.uk/government/.../5942-uk-energy-in-brief-2012.pdf‎):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; uk.electricty.sources.2011 &amp;lt;- c(41,29,18,5,4,2,1)&lt;br /&gt;
&amp;gt; names(uk.electricty.sources.2011) &amp;lt;- c(&amp;quot;Gas&amp;quot;, &amp;quot;Coal&amp;quot;, &amp;quot;Nuclear&amp;quot;, &amp;quot;Hydro &amp;amp; other&amp;quot;, &amp;quot;Wind&amp;quot;, &amp;quot;Imports&amp;quot;, &amp;quot;Oil&amp;quot;)&lt;br /&gt;
&amp;gt; pie(uk.electricty.sources.2011, main=&amp;quot;UK Electricty Generating Mix, 2011&amp;quot;, col=rainbow(7))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Next, let's create a bar chart of monthly average precipitation falling here in the fair city of Bristol (source: http://www.worldweatheronline.com):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; bristol.precip &amp;lt;- c(82.9, 56.1, 59.2, 69, 50.8, 50.9, 50.8, 74.8, 74.7, 91.1, 94.5, 93.6)&lt;br /&gt;
&amp;gt; names(bristol.precip) &amp;lt;- c(&amp;quot;Jan&amp;quot;, &amp;quot;Feb&amp;quot;, &amp;quot;Mar&amp;quot;, &amp;quot;Apr&amp;quot;, &amp;quot;May&amp;quot;, &amp;quot;Jun&amp;quot;, &amp;quot;Jul&amp;quot;, &amp;quot;Aug&amp;quot;, &amp;quot;Sep&amp;quot;, &amp;quot;Oct&amp;quot;, &amp;quot;Nov&amp;quot;, &amp;quot;Dec&amp;quot;)&lt;br /&gt;
&amp;gt; barplot(bristol.precip,&lt;br /&gt;
+ main=&amp;quot;Average Monthly Precipitation in Bristol&amp;quot;,&lt;br /&gt;
+ ylab=&amp;quot;Mean precipitation (mm)&amp;quot;,&lt;br /&gt;
+ ylim=c(0,100),&lt;br /&gt;
+ col=c(&amp;quot;darkblue&amp;quot;))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Box_plot 'Box and whisker' plots] are useful ways to graph the quartiles of some data.  In this case, the fuel efficiencies of various US cars, circa 1974:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; boxplot(mpg~cyl,data=mtcars, main=&amp;quot;Car Milage Data&amp;quot;,&lt;br /&gt;
+    xlab=&amp;quot;Number of Cylinders&amp;quot;, ylab=&amp;quot;Miles Per Gallon&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
R includes a very useful help facility.  In the case of the '''filled.contour()''' plotting function, the help page includes an example of it's use to plot the topology of a volcano in Auckland, NZ:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; ?filled.countour&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;gallery widths=300px heights=300px perrow=3&amp;gt;&lt;br /&gt;
File:Vapour-pressure.png|Vapour pressure of mercury against temperature&lt;br /&gt;
File:Mauna-loa.png|CO2 concentrations measured at Mauna-Loa between 1959 and 1997&lt;br /&gt;
File:Pie.png|The UK's electricity generating mix, 2011&lt;br /&gt;
File:Barplot.png|Average monthly precipitation in Bristol&lt;br /&gt;
File:Boxplot.png|Range of fuel efficiencies for different engine sizes &lt;br /&gt;
File:Maunga-Whau.png|Topology of Maunga Whau volcano in Auckland&lt;br /&gt;
&amp;lt;/gallery&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are many more example plots--complete with the R code required to create the plots (at the bottom of the page, after the comments)--on the following web pages:&lt;br /&gt;
* http://www.sr.bham.ac.uk/~ajrs/R/r-gallery.html&lt;br /&gt;
* http://blog.revolutionanalytics.com/2009/01/r-graph-gallery.html&lt;br /&gt;
* https://www.facebook.com/pages/R-Graph-Gallery/169231589826661&lt;br /&gt;
* http://research.stowers-institute.org/efg/R/&lt;br /&gt;
* http://rspatial.r-forge.r-project.org/gallery/&lt;br /&gt;
&lt;br /&gt;
=Loops=&lt;br /&gt;
&lt;br /&gt;
A simple '''for''' loop:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; for (ii in seq(1,10)) print(ii)&lt;br /&gt;
[1] 1&lt;br /&gt;
[1] 2&lt;br /&gt;
[1] 3&lt;br /&gt;
[1] 4&lt;br /&gt;
[1] 5&lt;br /&gt;
[1] 6&lt;br /&gt;
[1] 7&lt;br /&gt;
[1] 8&lt;br /&gt;
[1] 9&lt;br /&gt;
[1] 10&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Some more exotic counting:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; for (ii in seq(from=10, to=0, by=-2)) print(ii)&lt;br /&gt;
[1] 10&lt;br /&gt;
[1] 8&lt;br /&gt;
[1] 6&lt;br /&gt;
[1] 4&lt;br /&gt;
[1] 2&lt;br /&gt;
[1] 0&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''while''' loops are for when we don't know the number of iterations in advance:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; ii &amp;lt;- runif(1,0,1)&lt;br /&gt;
&amp;gt; ii&lt;br /&gt;
[1] 0.3998513&lt;br /&gt;
&amp;gt; while (ii &amp;lt; 0.5) {print(ii); ii &amp;lt;- runif(1,0,1)}&lt;br /&gt;
[1] 0.3998513&lt;br /&gt;
[1] 0.05469244&lt;br /&gt;
&amp;gt; ii&lt;br /&gt;
[1] 0.8265036&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Functions=&lt;br /&gt;
&lt;br /&gt;
You can define your own functions in R, using the '''function''' keyword.  For example, Pythagoras' Theorem:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypotenuse &amp;lt;- function(x, y) {sqrt(x^2 + y^2)}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The braces ({}) are optional, but add clarity.&lt;br /&gt;
&lt;br /&gt;
To call the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypotenuse(3,4)&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can provide default values for the arguments, which can be overridden for any given invocation of the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot2 &amp;lt;- function(x=3 ,y=4) {sqrt(x^2 + y^2)}&lt;br /&gt;
&amp;gt; hypot2()&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;gt; hypot2(12,16)&lt;br /&gt;
[1] 20&lt;br /&gt;
&amp;gt; hypot2(y=16, x=12)&lt;br /&gt;
[1] 20&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can see that the order of the arguments is respected, unless the names are given, in which case the order can be changed. &lt;br /&gt;
&lt;br /&gt;
Longer functions can be spread over several lines.  We can also use the '''return''' keyword to control which value is returned by the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot3 &amp;lt;- function(x=3 ,y=4) {&lt;br /&gt;
+ x_sq &amp;lt;- x^2&lt;br /&gt;
+ y_sq &amp;lt;- y^2&lt;br /&gt;
+ return( sqrt(x_sq + y_sq) )}&lt;br /&gt;
&amp;gt; hypot3(6,8)&lt;br /&gt;
[1] 10&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can check on the contents of a function, by just typing it's name (without parentheses):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot3&lt;br /&gt;
function(x=3 ,y=4) {&lt;br /&gt;
x_sq &amp;lt;- x^2&lt;br /&gt;
y_sq &amp;lt;- y^2&lt;br /&gt;
return( sqrt(x_sq + y_sq) )}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or just check the arguments, using the '''args''' function.  (The body of the function in general is reported as NULL):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; args(hypot3)&lt;br /&gt;
function (x = 3, y = 4) &lt;br /&gt;
NULL&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Packages=&lt;br /&gt;
&lt;br /&gt;
Listed at http://cran.r-project.org/&lt;br /&gt;
&lt;br /&gt;
Let's install the '''GoogleVis''' package, that provides an interface between R and Google Charts.  Google Charts offer interactive charts which can be embedded into web pages. The best known of these charts is probably the Motion Chart, popularised by Hans Rosling in his TED talks.  (See http://cran.r-project.org/web/packages/googleVis/vignettes/googleVis.pdf for more information on this package.)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; install.packages(&amp;quot;googleVis&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Et voila!  It is done.&lt;br /&gt;
&lt;br /&gt;
We can check which packages are currently loaded into the library available from our workspace:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; library()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If we need to add one, we type e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; library(googleVis)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Another package you might like to try is '''xlsx''' package for reading/writing MS Excel files (http://cran.r-project.org/web/packages/xlsx/vignettes/xlsx.pdf).&lt;br /&gt;
&lt;br /&gt;
=Reading Data from File=&lt;br /&gt;
&lt;br /&gt;
R provides some very useful functions for reading and writing data from/to file.&lt;br /&gt;
&lt;br /&gt;
==Text Files==&lt;br /&gt;
&lt;br /&gt;
Let's start with text files.  If your data is organised into a file such that it looks like a table with column headings:  &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Perhaps the simplest one is '''read.table()'''.  If I have a text file with the following contents:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
country              gold silver bronze&lt;br /&gt;
&amp;quot;USA&amp;quot;                46   29     29&lt;br /&gt;
&amp;quot;China&amp;quot;              38   27     23&lt;br /&gt;
&amp;quot;Great Britain&amp;quot;      29   17     19&lt;br /&gt;
&amp;quot;Russian Federation&amp;quot; 24   26     32&lt;br /&gt;
&amp;quot;Republic of Korea&amp;quot;  13   8      7&lt;br /&gt;
&amp;quot;Germany&amp;quot;            11   19     14&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It will be a simple matter to use the '''read.table()''' function to load the data into R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; medals.2012 &amp;lt;- read.table(&amp;quot;medals.txt&amp;quot;, header=TRUE)&lt;br /&gt;
&amp;gt; medals.2012&lt;br /&gt;
             country gold silver bronze&lt;br /&gt;
1                USA   46     29     29&lt;br /&gt;
2              China   38     27     23&lt;br /&gt;
3      Great Britain   29     17     19&lt;br /&gt;
4 Russian Federation   24     26     32&lt;br /&gt;
5  Republic of Korea   13      8      7&lt;br /&gt;
6            Germany   11     19     14&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is a corresponding '''write.table()''' function to export the contents of a data frame into a text file.&lt;br /&gt;
&lt;br /&gt;
CSV files can be easily handled by specifying '''sep=&amp;quot;,&amp;quot;''' as an argument to read.table().  However, for convenience, there are also '''read.csv()''' and '''write.csv()''' functions defined.  For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; write.csv(medals.2012,&amp;quot;medals.csv&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Gives us the file, '''medals.csv''', with the contents:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;quot;&amp;quot;,&amp;quot;country&amp;quot;,&amp;quot;gold&amp;quot;,&amp;quot;silver&amp;quot;,&amp;quot;bronze&amp;quot;&lt;br /&gt;
&amp;quot;1&amp;quot;,&amp;quot;USA&amp;quot;,46,29,29&lt;br /&gt;
&amp;quot;2&amp;quot;,&amp;quot;China&amp;quot;,38,27,23&lt;br /&gt;
&amp;quot;3&amp;quot;,&amp;quot;Great Britain&amp;quot;,29,17,19&lt;br /&gt;
&amp;quot;4&amp;quot;,&amp;quot;Russian Federation&amp;quot;,24,26,32&lt;br /&gt;
&amp;quot;5&amp;quot;,&amp;quot;Republic of Korea&amp;quot;,13,8,7&lt;br /&gt;
&amp;quot;6&amp;quot;,&amp;quot;Germany&amp;quot;,11,19,14&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Binary Files==&lt;br /&gt;
&lt;br /&gt;
The '''save()''' function will store an R data structure in binary form:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; save(medals.2012,file=&amp;quot;medals.RData&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~$ file medals.RData &lt;br /&gt;
medals.RData: gzip compressed data, from Unix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is, of course, a corresponding function to load such data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; load(&amp;quot;medals.RData&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Databases==&lt;br /&gt;
&lt;br /&gt;
If you would like to read and write data directly from/to a database, there are several packages to help you.  See http://cran.r-project.org/doc/manuals/r-release/R-data.html#Relational-databases for more information.&lt;br /&gt;
&lt;br /&gt;
==NetCDF==&lt;br /&gt;
&lt;br /&gt;
The [http://cran.r-project.org/web/packages/ncdf/index.html '''ncdf''' package] provides an interface to NetCDF files.  Before installing the package, you will need the Unidata NetCDF libraries installed on your system.  On Linux, the standard package managers conveniently provide this.  Note that you will need the 'development' packages.  Once the prerequisites are satisfied, you can use the standard R command to install the package from CRAN:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; install.packages(&amp;quot;ncdf&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Examples of Common Tasks=&lt;br /&gt;
&lt;br /&gt;
==Preparing Data==&lt;br /&gt;
&lt;br /&gt;
===Sorting===&lt;br /&gt;
&lt;br /&gt;
Using '''sort''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; railway.engines &amp;lt;- c(&amp;quot;thomas&amp;quot;, &amp;quot;henry&amp;quot;, &amp;quot;gordon&amp;quot;, &amp;quot;edward&amp;quot;, &amp;quot;james&amp;quot;)&lt;br /&gt;
&amp;gt; sort(railway.engines)&lt;br /&gt;
[1] &amp;quot;edward&amp;quot; &amp;quot;gordon&amp;quot; &amp;quot;henry&amp;quot;  &amp;quot;james&amp;quot;  &amp;quot;thomas&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/sort.html&lt;br /&gt;
&lt;br /&gt;
===Random Sampling===&lt;br /&gt;
&lt;br /&gt;
Using '''sample''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; railway.engines &amp;lt;- c(&amp;quot;thomas&amp;quot;, &amp;quot;henry&amp;quot;, &amp;quot;gordon&amp;quot;, &amp;quot;edward&amp;quot;, &amp;quot;james&amp;quot;)&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;gordon&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;james&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;edward&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;thomas&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;gordon&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;james&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/sample.html&lt;br /&gt;
&lt;br /&gt;
===Combining===&lt;br /&gt;
&lt;br /&gt;
Using '''rbind''' to add combine the rows to two data frames:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; country &amp;lt;- c(&amp;quot;France&amp;quot;, &amp;quot;Italy&amp;quot;, &amp;quot;Hungary&amp;quot;, &amp;quot;Australia&amp;quot;)&lt;br /&gt;
&amp;gt; gold &amp;lt;- c(11, 8, 8, 7)&lt;br /&gt;
&amp;gt; silver &amp;lt;- c(11, 9, 4, 16)&lt;br /&gt;
&amp;gt; bronze &amp;lt;- c(12, 11, 5, 12)&lt;br /&gt;
&amp;gt; extras.2012 &amp;lt;- data.frame(country, gold, silver, bronze)&lt;br /&gt;
&amp;gt; rbind(medals.2012, extras.2012)&lt;br /&gt;
              country gold silver bronze&lt;br /&gt;
1                 USA   46     29     29&lt;br /&gt;
2               China   38     27     23&lt;br /&gt;
3       Great Britain   29     17     19&lt;br /&gt;
4  Russian Federation   24     26     32&lt;br /&gt;
5   Republic of Korea   13      8      7&lt;br /&gt;
6             Germany   11     19     14&lt;br /&gt;
7              France   11     11     12&lt;br /&gt;
8               Italy    8      9     11&lt;br /&gt;
9             Hungary    8      4      5&lt;br /&gt;
10          Australia    7     16     12&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/cbind.html&lt;br /&gt;
&lt;br /&gt;
===Binning Data===&lt;br /&gt;
&lt;br /&gt;
Using '''cut''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; girls_2=c(83.8, 86.2, 85.1, 88.6, 83, 88.9, 89.7, 81.3, 88.7, 88.4)&lt;br /&gt;
&amp;gt; bins=cut(girls_2, breaks=3)&lt;br /&gt;
&amp;gt; bins&lt;br /&gt;
 [1] (81.3,84.1] (84.1,86.9] (84.1,86.9] (86.9,89.7] (81.3,84.1] (86.9,89.7]&lt;br /&gt;
 [7] (86.9,89.7] (81.3,84.1] (86.9,89.7] (86.9,89.7]&lt;br /&gt;
Levels: (81.3,84.1] (84.1,86.9] (86.9,89.7]&lt;br /&gt;
&amp;gt; plot(bins)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Plotting the data couldn't be simpler with '''plot(bins)'''!&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/cut.html&lt;br /&gt;
&lt;br /&gt;
==Linear Regression==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(cars)&lt;br /&gt;
&amp;gt; res=lm(dist ~ speed, data=cars)&lt;br /&gt;
&amp;gt; abline(res)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Image:R-lm(cars)-abline.png|400px|thumbnail|center|linear regression of stopping distance against speed from the built-in data set, cars]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
'''Exercises'''&lt;br /&gt;
* You may wish to compare different methods of estimation.  From the MASS package, you can fit a line with the '''rlm''' and '''lqs'' funtions.  You can plot all the lines against the data using:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; abline(res.lm, lty=1)&lt;br /&gt;
&amp;gt; abline(res.rlm, lty=2)&lt;br /&gt;
&amp;gt; abline(res.lqs, lty=3)&lt;br /&gt;
&amp;gt; legend(x=5, y=100, legend=c(&amp;quot;lm&amp;quot;,&amp;quot;rlm&amp;quot;,&amp;quot;lqs&amp;quot;), lty=c(1,2,3))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-patched/library/MASS/html/rlm.html and http://stat.ethz.ch/R-manual/R-devel/RHOME/library/MASS/html/lqs.html.&lt;br /&gt;
&lt;br /&gt;
* Weighted least squares.  The '''lm''' function will accept a vector of weights, '''lm(... weights=...)'''.  If given, the function will optimise the line of best fit according a the equation of weighted least squares.  Experiment with different linear model fits, given different weighting vectors.  Some handy hints for creating a vector of weights:&lt;br /&gt;
** '''w1&amp;lt;-rep(0.1,50)''' will give you a vector, length 50, where each element has a value of 0.1.  W1[1]&amp;lt;-10 will give the first element of the vector a value of 10.&lt;br /&gt;
** '''w2&amp;lt;-seq(from=0.02, to=1.0, by=0.02)''' provides a vector containing a sequence of values from 0.02 to 1.0 in steps of 0.02 (handily, again 50 in total).&lt;br /&gt;
&lt;br /&gt;
==Significance Testing==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; boys_2=c(90.2, 91.4, 86.4, 87.6, 86.7, 88.1, 82.2, 83.8, 91, 87.4)&lt;br /&gt;
&amp;gt; girls_2=c(83.8, 86.2, 85.1, 88.6, 83, 88.9, 89.7, 81.3, 88.7, 88.4)&lt;br /&gt;
&amp;gt; res=var.test(boys_2,girls_2)&lt;br /&gt;
&amp;gt; res&lt;br /&gt;
&lt;br /&gt;
	F test to compare two variances&lt;br /&gt;
&lt;br /&gt;
data:  boys_2 and girls_2 &lt;br /&gt;
F = 1.0186, num df = 9, denom df = 9, p-value = 0.9786&lt;br /&gt;
alternative hypothesis: true ratio of variances is not equal to 1 &lt;br /&gt;
95 percent confidence interval:&lt;br /&gt;
 0.2529956 4.1007126 &lt;br /&gt;
sample estimates:&lt;br /&gt;
ratio of variances &lt;br /&gt;
          1.018559 &lt;br /&gt;
&amp;gt; res=t.test(boys_2, girls_2, var.equal=TRUE, paired=FALSE)&lt;br /&gt;
&amp;gt; res&lt;br /&gt;
&lt;br /&gt;
	Two Sample t-test&lt;br /&gt;
&lt;br /&gt;
data:  boys_2 and girls_2 &lt;br /&gt;
t = 0.8429, df = 18, p-value = 0.4103&lt;br /&gt;
alternative hypothesis: true difference in means is not equal to 0 &lt;br /&gt;
95 percent confidence interval:&lt;br /&gt;
 -1.656675  3.876675 &lt;br /&gt;
sample estimates:&lt;br /&gt;
mean of x mean of y &lt;br /&gt;
    87.48     86.3&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Classification==&lt;br /&gt;
&lt;br /&gt;
===k Nearest Neighbours===&lt;br /&gt;
&lt;br /&gt;
This famous (Fisher's or Anderson's) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa (s), versicolor (c), and virginica (v).&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-patched/library/datasets/html/iris.html&lt;br /&gt;
&lt;br /&gt;
k-nearest neighbour classification for test set from training set: For each row of the test set, the k nearest (in Euclidean distance) training set vectors are found, and the classification is decided by majority vote, with ties broken at random. If there are ties for the kth nearest vector, all candidates are included in the vote.&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/class/html/knn.html&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(class)&lt;br /&gt;
train &amp;lt;- rbind(iris3[1:25,,1], iris3[1:25,,2], iris3[1:25,,3])&lt;br /&gt;
test &amp;lt;- rbind(iris3[26:50,,1], iris3[26:50,,2], iris3[26:50,,3])&lt;br /&gt;
cl &amp;lt;- factor(c(rep(&amp;quot;s&amp;quot;,25), rep(&amp;quot;c&amp;quot;,25), rep(&amp;quot;v&amp;quot;,25)))&lt;br /&gt;
iris3.knn &amp;lt;- knn(train, test, cl, k = 3, prob=TRUE)&lt;br /&gt;
table(predicted=iris3.knn, actual=cl)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
How did we do?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
         actual&lt;br /&gt;
predicted  c  s  v&lt;br /&gt;
        c 23  0  3&lt;br /&gt;
        s  0 25  0&lt;br /&gt;
        v  2  0 22&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Classification Trees===&lt;br /&gt;
&lt;br /&gt;
The kyphosis data frame has 81 rows and 4 columns. representing data on children who have had corrective spinal surgery.&lt;br /&gt;
&lt;br /&gt;
This data frame contains the following columns:&lt;br /&gt;
* Kyphosis: a factor with levels absent present indicating if a kyphosis (a type of deformation) was present after the operation.&lt;br /&gt;
* Age: in months&lt;br /&gt;
* Number: the number of vertebrae involved&lt;br /&gt;
* Start: the number of the first (topmost) vertebra operated on.&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/rpart/html/kyphosis.html&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
fit &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)&lt;br /&gt;
fit2 &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis,&lt;br /&gt;
              parms = list(prior = c(.65,.35), split = &amp;quot;information&amp;quot;))&lt;br /&gt;
fit3 &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis,&lt;br /&gt;
              control = rpart.control(cp = 0.05))&lt;br /&gt;
par(mfrow = c(1,2), xpd = NA) # otherwise on some devices the text is clipped&lt;br /&gt;
plot(fit)&lt;br /&gt;
text(fit, use.n = TRUE)&lt;br /&gt;
plot(fit2)&lt;br /&gt;
text(fit2, use.n = TRUE)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Image:R-classification-tree.png|500px|thumbnail|center|Classification tree for the kyphosis data frame.]]&lt;br /&gt;
&lt;br /&gt;
==Solving Systems of Linear Equations==&lt;br /&gt;
&lt;br /&gt;
See, e.g.: https://source.ggy.bris.ac.uk/wiki/NumMethodsPDEs&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; A &amp;lt;- array(c(1,3,2,3,5,4,-2,6,3), dim=c(3,3))&lt;br /&gt;
&amp;gt; b &amp;lt;- c(5,7,8)&lt;br /&gt;
&amp;gt; solve(A,b)&lt;br /&gt;
[1] -15   8   2&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Suggested Exercises=&lt;br /&gt;
&lt;br /&gt;
If you would like to work through some exercises, with model answers included, you could take a look at:&lt;br /&gt;
* http://www2.warwick.ac.uk/fac/sci/statistics/staff/academic-research/reed/rexercises.pdf&lt;br /&gt;
&lt;br /&gt;
If you would prefer to noodle about with some real-world data, you could take a look at:&lt;br /&gt;
* http://www.theguardian.com/news/datablog/2010/oct/18/historic-government-spending-area#data&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=R1&amp;diff=9488</id>
		<title>R1</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=R1&amp;diff=9488"/>
		<updated>2014-12-10T17:03:49Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: /* Packages */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[category:Pragmatic Programming]]&lt;br /&gt;
'''Open Source Statistics with R'''&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
R is a mature, open-source (i.e. free!) statistics package, with an intuitive interface, excellent graphics and a vibrant community constantly adding new methods for the statistical investigation of your data to the library of packages available.&lt;br /&gt;
&lt;br /&gt;
The goal of this tutorial is to introduce you to the R package, and not to be an introductory course in statistics.&lt;br /&gt;
&lt;br /&gt;
If you are working on a Linux system, you will typically start R from the command line.  On a Windows machine, or a Mac, you will typically start up R in some form of GUI.  However you get R started, you will have access to an R command prompt.  The good news is that the examples below will all work at the R command prompt, however you gained access to it.&lt;br /&gt;
&lt;br /&gt;
Further resources:&lt;br /&gt;
&lt;br /&gt;
* The R manual is a great resource for learning R: http://cran.r-project.org/doc/manuals/r-release/R-intro.pdf&lt;br /&gt;
* Some excellent examples of using R can also be found at: http://msenux.redwoods.edu/math/R/ and http://www.r-tutor.com/&lt;br /&gt;
&lt;br /&gt;
=Getting Started=&lt;br /&gt;
&lt;br /&gt;
The very simplest thing we can do with R is to perform some arithmetic at the command prompt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; phi &amp;lt;- (1+sqrt(5))/2&lt;br /&gt;
&amp;gt; phi&lt;br /&gt;
[1] 1.618034&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Parentheses are used to modify the usual order of precedence of the operators ('''/''' will typically be evaluated before '''+''').  Note the '''[1]''' accompanying the returned value.  All numbers entered at the console are interpreted as a vector.  The '[1]' indicates that the line in question is displaying the vector of values starting at first index.  We can use the handy sequence function to create a vector containing more than a single element:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; odds &amp;lt;- seq(from=1, to=67, by=2)&lt;br /&gt;
&amp;gt; odds&lt;br /&gt;
 [1]  1  3  5  7  9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49&lt;br /&gt;
[26] 51 53 55 57 59 61 63 65 67&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
From the above example, we can see that both the '''&amp;lt;-''' and '''=''' operators can be used for assignment.&lt;br /&gt;
&lt;br /&gt;
Vectors are commonly used data structures in R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
coords.bris &amp;lt;- c(51.5, 2.6)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As are matrices:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; magic &amp;lt;- matrix(data=c(2,7,6,9,5,1,4,3,8),nrow=3,ncol=3)&lt;br /&gt;
&amp;gt; magic&lt;br /&gt;
     [,1] [,2] [,3]&lt;br /&gt;
[1,]    2    9    4&lt;br /&gt;
[2,]    7    5    3&lt;br /&gt;
[3,]    6    1    8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the '''c''' function combines the arguments given in the parentheses.  We can access portions of the array using the syntax shown in the square brackets.  For example, we can access the first row using the '''[1,]''' notation, and similarly the second column using '''[,2]'''.  Since the square is 3x3 magic, the numbers in both slices should sum to 15:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; sum(magic[1,])&lt;br /&gt;
[1] 15&lt;br /&gt;
&amp;gt; sum(magic[,2])&lt;br /&gt;
[1] 15&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Single elements and ranges can also accessed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; magic[2,2]&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;gt; magic[2:3,2:3]&lt;br /&gt;
     [,1] [,2]&lt;br /&gt;
[1,]    5    3&lt;br /&gt;
[2,]    1    8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
R also provides '''arrays''', which have more than two dimensions, and '''lists''' to hold heterogeneous collections.&lt;br /&gt;
&lt;br /&gt;
An example list:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; list.r4 &amp;lt;- list(name=&amp;quot;Radio4&amp;quot;, frequency=&amp;quot;93.7&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The items of which, we can access in several ways:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; list.r4$frequency&lt;br /&gt;
[1] &amp;quot;93.7&amp;quot;&lt;br /&gt;
&amp;gt; list.r4[1]&lt;br /&gt;
$name&lt;br /&gt;
[1] &amp;quot;Radio4&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;gt; list.r4[[1]]&lt;br /&gt;
[1] &amp;quot;Radio4&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A very commonly used data structure is the '''data frame''', which R uses to store tabular data.  Given several vectors of equal length, we can collate them into a data frame:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; country &amp;lt;- c(&amp;quot;USA&amp;quot;, &amp;quot;China&amp;quot;, &amp;quot;GB&amp;quot;)&lt;br /&gt;
&amp;gt; gold &amp;lt;- c(46, 38, 29)&lt;br /&gt;
&amp;gt; silver &amp;lt;- c(29, 27, 17)&lt;br /&gt;
&amp;gt; bronze &amp;lt;- c(29, 23, 19)&lt;br /&gt;
&amp;gt; medals.2012 &amp;lt;- data.frame(country, gold, silver, bronze)&lt;br /&gt;
&amp;gt; medals.2012&lt;br /&gt;
    country  gold  silver  bronze&lt;br /&gt;
1       USA    46      29      29&lt;br /&gt;
2     China    38      27      23&lt;br /&gt;
3        GB    29      17      19&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can access columns of a data frame using the '''$''' operator:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; medals.2012$country&lt;br /&gt;
[1] USA   China GB   &lt;br /&gt;
Levels: China GB USA&lt;br /&gt;
&amp;gt; medals.2012$gold&lt;br /&gt;
[1] 46 38 29&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Standard Graphics: A taster=&lt;br /&gt;
&lt;br /&gt;
An aspect which makes R popular are it's graphing functions.  R also has some very handy built-in data sets--we'll use this to demonstrate just a small fraction of R's graphing abilities.&lt;br /&gt;
&lt;br /&gt;
First up is the humble '''plot()''' function.  Given a data frame of points, such as one charting the relationship between temperature and the vapour pressure of mercury, it will give us a (handily labelled) scatter plot: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(pressure)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See the gallery below for all the plots created in this section.&lt;br /&gt;
&lt;br /&gt;
The plot function will also accept a time-series (another class of object recognised by R) and will sensibly join the points with a line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(co2)&lt;br /&gt;
&amp;gt; class(co2)&lt;br /&gt;
[1] &amp;quot;ts&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Pie charts are easily constructed.  In this case, to show the relative proportions of electricity generated from different sources in the UK in 2011 (source: https://www.gov.uk/government/.../5942-uk-energy-in-brief-2012.pdf‎):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; uk.electricty.sources.2011 &amp;lt;- c(41,29,18,5,4,2,1)&lt;br /&gt;
&amp;gt; names(uk.electricty.sources.2011) &amp;lt;- c(&amp;quot;Gas&amp;quot;, &amp;quot;Coal&amp;quot;, &amp;quot;Nuclear&amp;quot;, &amp;quot;Hydro &amp;amp; other&amp;quot;, &amp;quot;Wind&amp;quot;, &amp;quot;Imports&amp;quot;, &amp;quot;Oil&amp;quot;)&lt;br /&gt;
&amp;gt; pie(uk.electricty.sources.2011, main=&amp;quot;UK Electricty Generating Mix, 2011&amp;quot;, col=rainbow(7))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Next, let's create a bar chart of monthly average precipitation falling here in the fair city of Bristol (source: http://www.worldweatheronline.com):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; bristol.precip &amp;lt;- c(82.9, 56.1, 59.2, 69, 50.8, 50.9, 50.8, 74.8, 74.7, 91.1, 94.5, 93.6)&lt;br /&gt;
&amp;gt; names(bristol.precip) &amp;lt;- c(&amp;quot;Jan&amp;quot;, &amp;quot;Feb&amp;quot;, &amp;quot;Mar&amp;quot;, &amp;quot;Apr&amp;quot;, &amp;quot;May&amp;quot;, &amp;quot;Jun&amp;quot;, &amp;quot;Jul&amp;quot;, &amp;quot;Aug&amp;quot;, &amp;quot;Sep&amp;quot;, &amp;quot;Oct&amp;quot;, &amp;quot;Nov&amp;quot;, &amp;quot;Dec&amp;quot;)&lt;br /&gt;
&amp;gt; barplot(bristol.precip,&lt;br /&gt;
+ main=&amp;quot;Average Monthly Precipitation in Bristol&amp;quot;,&lt;br /&gt;
+ ylab=&amp;quot;Mean precipitation (mm)&amp;quot;,&lt;br /&gt;
+ ylim=c(0,100),&lt;br /&gt;
+ col=c(&amp;quot;darkblue&amp;quot;))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Box_plot 'Box and whisker' plots] are useful ways to graph the quartiles of some data.  In this case, the fuel efficiencies of various US cars, circa 1974:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; boxplot(mpg~cyl,data=mtcars, main=&amp;quot;Car Milage Data&amp;quot;,&lt;br /&gt;
+    xlab=&amp;quot;Number of Cylinders&amp;quot;, ylab=&amp;quot;Miles Per Gallon&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
R includes a very useful help facility.  In the case of the '''filled.contour()''' plotting function, the help page includes an example of it's use to plot the topology of a volcano in Auckland, NZ:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; ?filled.countour&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;gallery widths=300px heights=300px perrow=3&amp;gt;&lt;br /&gt;
File:Vapour-pressure.png|Vapour pressure of mercury against temperature&lt;br /&gt;
File:Mauna-loa.png|CO2 concentrations measured at Mauna-Loa between 1959 and 1997&lt;br /&gt;
File:Pie.png|The UK's electricity generating mix, 2011&lt;br /&gt;
File:Barplot.png|Average monthly precipitation in Bristol&lt;br /&gt;
File:Boxplot.png|Range of fuel efficiencies for different engine sizes &lt;br /&gt;
File:Maunga-Whau.png|Topology of Maunga Whau volcano in Auckland&lt;br /&gt;
&amp;lt;/gallery&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are many more example plots--complete with the R code required to create the plots (at the bottom of the page, after the comments)--on the following web pages:&lt;br /&gt;
* http://www.sr.bham.ac.uk/~ajrs/R/r-gallery.html&lt;br /&gt;
* http://blog.revolutionanalytics.com/2009/01/r-graph-gallery.html&lt;br /&gt;
* https://www.facebook.com/pages/R-Graph-Gallery/169231589826661&lt;br /&gt;
* http://research.stowers-institute.org/efg/R/&lt;br /&gt;
* http://rspatial.r-forge.r-project.org/gallery/&lt;br /&gt;
&lt;br /&gt;
=Loops=&lt;br /&gt;
&lt;br /&gt;
A simple '''for''' loop:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; for (ii in seq(1,10)) print(ii)&lt;br /&gt;
[1] 1&lt;br /&gt;
[1] 2&lt;br /&gt;
[1] 3&lt;br /&gt;
[1] 4&lt;br /&gt;
[1] 5&lt;br /&gt;
[1] 6&lt;br /&gt;
[1] 7&lt;br /&gt;
[1] 8&lt;br /&gt;
[1] 9&lt;br /&gt;
[1] 10&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Some more exotic counting:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; for (ii in seq(from=10, to=0, by=-2)) print(ii)&lt;br /&gt;
[1] 10&lt;br /&gt;
[1] 8&lt;br /&gt;
[1] 6&lt;br /&gt;
[1] 4&lt;br /&gt;
[1] 2&lt;br /&gt;
[1] 0&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''while''' loops are for when we don't know the number of iterations in advance:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; ii &amp;lt;- runif(1,0,1)&lt;br /&gt;
&amp;gt; ii&lt;br /&gt;
[1] 0.3998513&lt;br /&gt;
&amp;gt; while (ii &amp;lt; 0.5) {print(ii); ii &amp;lt;- runif(1,0,1)}&lt;br /&gt;
[1] 0.3998513&lt;br /&gt;
[1] 0.05469244&lt;br /&gt;
&amp;gt; ii&lt;br /&gt;
[1] 0.8265036&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Functions=&lt;br /&gt;
&lt;br /&gt;
You can define your own functions in R, using the '''function''' keyword.  For example, Pythagoras' Theorem:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypotenuse &amp;lt;- function(x, y) {sqrt(x^2 + y^2)}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The braces ({}) are optional, but add clarity.&lt;br /&gt;
&lt;br /&gt;
To call the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypotenuse(3,4)&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can provide default values for the arguments, which can be overridden for any given invocation of the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot2 &amp;lt;- function(x=3 ,y=4) {sqrt(x^2 + y^2)}&lt;br /&gt;
&amp;gt; hypot2()&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;gt; hypot2(12,16)&lt;br /&gt;
[1] 20&lt;br /&gt;
&amp;gt; hypot2(y=16, x=12)&lt;br /&gt;
[1] 20&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can see that the order of the arguments is respected, unless the names are given, in which case the order can be changed. &lt;br /&gt;
&lt;br /&gt;
Longer functions can be spread over several lines.  We can also use the '''return''' keyword to control which value is returned by the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot3 &amp;lt;- function(x=3 ,y=4) {&lt;br /&gt;
+ x_sq &amp;lt;- x^2&lt;br /&gt;
+ y_sq &amp;lt;- y^2&lt;br /&gt;
+ return( sqrt(x_sq + y_sq) )}&lt;br /&gt;
&amp;gt; hypot3(6,8)&lt;br /&gt;
[1] 10&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can check on the contents of a function, by just typing it's name (without parentheses):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot3&lt;br /&gt;
function(x=3 ,y=4) {&lt;br /&gt;
x_sq &amp;lt;- x^2&lt;br /&gt;
y_sq &amp;lt;- y^2&lt;br /&gt;
return( sqrt(x_sq + y_sq) )}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or just check the arguments, using the '''args''' function.  (The body of the function in general is reported as NULL):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; args(hypot3)&lt;br /&gt;
function (x = 3, y = 4) &lt;br /&gt;
NULL&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Packages=&lt;br /&gt;
&lt;br /&gt;
Listed at http://cran.r-project.org/&lt;br /&gt;
&lt;br /&gt;
Let's install the '''GoogleVis''' package, that provides an interface between R and Google Charts.  Google Charts offer interactive charts which can be embedded into web pages. The best known of these charts is probably the Motion Chart, popularised by Hans Rosling in his TED talks.  (See http://cran.r-project.org/web/packages/googleVis/vignettes/googleVis.pdf for more information on this package.)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; install.packages(&amp;quot;googleVis&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Et voila!  It is done.&lt;br /&gt;
&lt;br /&gt;
We can check which packages are currently loaded into the library available from our workspace:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; library()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If we need to add one, we type e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; library(googleVis)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Reading Data from File=&lt;br /&gt;
&lt;br /&gt;
R provides some very useful functions for reading and writing data from/to file.&lt;br /&gt;
&lt;br /&gt;
==Text Files==&lt;br /&gt;
&lt;br /&gt;
Let's start with text files.  If your data is organised into a file such that it looks like a table with column headings:  &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Perhaps the simplest one is '''read.table()'''.  If I have a text file with the following contents:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
country              gold silver bronze&lt;br /&gt;
&amp;quot;USA&amp;quot;                46   29     29&lt;br /&gt;
&amp;quot;China&amp;quot;              38   27     23&lt;br /&gt;
&amp;quot;Great Britain&amp;quot;      29   17     19&lt;br /&gt;
&amp;quot;Russian Federation&amp;quot; 24   26     32&lt;br /&gt;
&amp;quot;Republic of Korea&amp;quot;  13   8      7&lt;br /&gt;
&amp;quot;Germany&amp;quot;            11   19     14&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It will be a simple matter to use the '''read.table()''' function to load the data into R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; medals.2012 &amp;lt;- read.table(&amp;quot;medals.txt&amp;quot;, header=TRUE)&lt;br /&gt;
&amp;gt; medals.2012&lt;br /&gt;
             country gold silver bronze&lt;br /&gt;
1                USA   46     29     29&lt;br /&gt;
2              China   38     27     23&lt;br /&gt;
3      Great Britain   29     17     19&lt;br /&gt;
4 Russian Federation   24     26     32&lt;br /&gt;
5  Republic of Korea   13      8      7&lt;br /&gt;
6            Germany   11     19     14&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is a corresponding '''write.table()''' function to export the contents of a data frame into a text file.&lt;br /&gt;
&lt;br /&gt;
CSV files can be easily handled by specifying '''sep=&amp;quot;,&amp;quot;''' as an argument to read.table().  However, for convenience, there are also '''read.csv()''' and '''write.csv()''' functions defined.  For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; write.csv(medals.2012,&amp;quot;medals.csv&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Gives us the file, '''medals.csv''', with the contents:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;quot;&amp;quot;,&amp;quot;country&amp;quot;,&amp;quot;gold&amp;quot;,&amp;quot;silver&amp;quot;,&amp;quot;bronze&amp;quot;&lt;br /&gt;
&amp;quot;1&amp;quot;,&amp;quot;USA&amp;quot;,46,29,29&lt;br /&gt;
&amp;quot;2&amp;quot;,&amp;quot;China&amp;quot;,38,27,23&lt;br /&gt;
&amp;quot;3&amp;quot;,&amp;quot;Great Britain&amp;quot;,29,17,19&lt;br /&gt;
&amp;quot;4&amp;quot;,&amp;quot;Russian Federation&amp;quot;,24,26,32&lt;br /&gt;
&amp;quot;5&amp;quot;,&amp;quot;Republic of Korea&amp;quot;,13,8,7&lt;br /&gt;
&amp;quot;6&amp;quot;,&amp;quot;Germany&amp;quot;,11,19,14&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Binary Files==&lt;br /&gt;
&lt;br /&gt;
The '''save()''' function will store an R data structure in binary form:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; save(medals.2012,file=&amp;quot;medals.RData&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~$ file medals.RData &lt;br /&gt;
medals.RData: gzip compressed data, from Unix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is, of course, a corresponding function to load such data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; load(&amp;quot;medals.RData&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Databases==&lt;br /&gt;
&lt;br /&gt;
If you would like to read and write data directly from/to a database, there are several packages to help you.  See http://cran.r-project.org/doc/manuals/r-release/R-data.html#Relational-databases for more information.&lt;br /&gt;
&lt;br /&gt;
==NetCDF==&lt;br /&gt;
&lt;br /&gt;
The [http://cran.r-project.org/web/packages/ncdf/index.html '''ncdf''' package] provides an interface to NetCDF files.  Before installing the package, you will need the Unidata NetCDF libraries installed on your system.  On Linux, the standard package managers conveniently provide this.  Note that you will need the 'development' packages.  Once the prerequisites are satisfied, you can use the standard R command to install the package from CRAN:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; install.packages(&amp;quot;ncdf&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Examples of Common Tasks=&lt;br /&gt;
&lt;br /&gt;
==Preparing Data==&lt;br /&gt;
&lt;br /&gt;
===Sorting===&lt;br /&gt;
&lt;br /&gt;
Using '''sort''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; railway.engines &amp;lt;- c(&amp;quot;thomas&amp;quot;, &amp;quot;henry&amp;quot;, &amp;quot;gordon&amp;quot;, &amp;quot;edward&amp;quot;, &amp;quot;james&amp;quot;)&lt;br /&gt;
&amp;gt; sort(railway.engines)&lt;br /&gt;
[1] &amp;quot;edward&amp;quot; &amp;quot;gordon&amp;quot; &amp;quot;henry&amp;quot;  &amp;quot;james&amp;quot;  &amp;quot;thomas&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/sort.html&lt;br /&gt;
&lt;br /&gt;
===Random Sampling===&lt;br /&gt;
&lt;br /&gt;
Using '''sample''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; railway.engines &amp;lt;- c(&amp;quot;thomas&amp;quot;, &amp;quot;henry&amp;quot;, &amp;quot;gordon&amp;quot;, &amp;quot;edward&amp;quot;, &amp;quot;james&amp;quot;)&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;gordon&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;james&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;edward&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;thomas&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;gordon&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;james&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/sample.html&lt;br /&gt;
&lt;br /&gt;
===Combining===&lt;br /&gt;
&lt;br /&gt;
Using '''rbind''' to add combine the rows to two data frames:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; country &amp;lt;- c(&amp;quot;France&amp;quot;, &amp;quot;Italy&amp;quot;, &amp;quot;Hungary&amp;quot;, &amp;quot;Australia&amp;quot;)&lt;br /&gt;
&amp;gt; gold &amp;lt;- c(11, 8, 8, 7)&lt;br /&gt;
&amp;gt; silver &amp;lt;- c(11, 9, 4, 16)&lt;br /&gt;
&amp;gt; bronze &amp;lt;- c(12, 11, 5, 12)&lt;br /&gt;
&amp;gt; extras.2012 &amp;lt;- data.frame(country, gold, silver, bronze)&lt;br /&gt;
&amp;gt; rbind(medals.2012, extras.2012)&lt;br /&gt;
              country gold silver bronze&lt;br /&gt;
1                 USA   46     29     29&lt;br /&gt;
2               China   38     27     23&lt;br /&gt;
3       Great Britain   29     17     19&lt;br /&gt;
4  Russian Federation   24     26     32&lt;br /&gt;
5   Republic of Korea   13      8      7&lt;br /&gt;
6             Germany   11     19     14&lt;br /&gt;
7              France   11     11     12&lt;br /&gt;
8               Italy    8      9     11&lt;br /&gt;
9             Hungary    8      4      5&lt;br /&gt;
10          Australia    7     16     12&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/cbind.html&lt;br /&gt;
&lt;br /&gt;
===Binning Data===&lt;br /&gt;
&lt;br /&gt;
Using '''cut''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; girls_2=c(83.8, 86.2, 85.1, 88.6, 83, 88.9, 89.7, 81.3, 88.7, 88.4)&lt;br /&gt;
&amp;gt; bins=cut(girls_2, breaks=3)&lt;br /&gt;
&amp;gt; bins&lt;br /&gt;
 [1] (81.3,84.1] (84.1,86.9] (84.1,86.9] (86.9,89.7] (81.3,84.1] (86.9,89.7]&lt;br /&gt;
 [7] (86.9,89.7] (81.3,84.1] (86.9,89.7] (86.9,89.7]&lt;br /&gt;
Levels: (81.3,84.1] (84.1,86.9] (86.9,89.7]&lt;br /&gt;
&amp;gt; plot(bins)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Plotting the data couldn't be simpler with '''plot(bins)'''!&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/cut.html&lt;br /&gt;
&lt;br /&gt;
==Linear Regression==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(cars)&lt;br /&gt;
&amp;gt; res=lm(dist ~ speed, data=cars)&lt;br /&gt;
&amp;gt; abline(res)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Image:R-lm(cars)-abline.png|400px|thumbnail|center|linear regression of stopping distance against speed from the built-in data set, cars]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
'''Exercises'''&lt;br /&gt;
* You may wish to compare different methods of estimation.  From the MASS package, you can fit a line with the '''rlm''' and '''lqs'' funtions.  You can plot all the lines against the data using:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; abline(res.lm, lty=1)&lt;br /&gt;
&amp;gt; abline(res.rlm, lty=2)&lt;br /&gt;
&amp;gt; abline(res.lqs, lty=3)&lt;br /&gt;
&amp;gt; legend(x=5, y=100, legend=c(&amp;quot;lm&amp;quot;,&amp;quot;rlm&amp;quot;,&amp;quot;lqs&amp;quot;), lty=c(1,2,3))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-patched/library/MASS/html/rlm.html and http://stat.ethz.ch/R-manual/R-devel/RHOME/library/MASS/html/lqs.html.&lt;br /&gt;
&lt;br /&gt;
* Weighted least squares.  The '''lm''' function will accept a vector of weights, '''lm(... weights=...)'''.  If given, the function will optimise the line of best fit according a the equation of weighted least squares.  Experiment with different linear model fits, given different weighting vectors.  Some handy hints for creating a vector of weights:&lt;br /&gt;
** '''w1&amp;lt;-rep(0.1,50)''' will give you a vector, length 50, where each element has a value of 0.1.  W1[1]&amp;lt;-10 will give the first element of the vector a value of 10.&lt;br /&gt;
** '''w2&amp;lt;-seq(from=0.02, to=1.0, by=0.02)''' provides a vector containing a sequence of values from 0.02 to 1.0 in steps of 0.02 (handily, again 50 in total).&lt;br /&gt;
&lt;br /&gt;
==Significance Testing==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; boys_2=c(90.2, 91.4, 86.4, 87.6, 86.7, 88.1, 82.2, 83.8, 91, 87.4)&lt;br /&gt;
&amp;gt; girls_2=c(83.8, 86.2, 85.1, 88.6, 83, 88.9, 89.7, 81.3, 88.7, 88.4)&lt;br /&gt;
&amp;gt; res=var.test(boys_2,girls_2)&lt;br /&gt;
&amp;gt; res&lt;br /&gt;
&lt;br /&gt;
	F test to compare two variances&lt;br /&gt;
&lt;br /&gt;
data:  boys_2 and girls_2 &lt;br /&gt;
F = 1.0186, num df = 9, denom df = 9, p-value = 0.9786&lt;br /&gt;
alternative hypothesis: true ratio of variances is not equal to 1 &lt;br /&gt;
95 percent confidence interval:&lt;br /&gt;
 0.2529956 4.1007126 &lt;br /&gt;
sample estimates:&lt;br /&gt;
ratio of variances &lt;br /&gt;
          1.018559 &lt;br /&gt;
&amp;gt; res=t.test(boys_2, girls_2, var.equal=TRUE, paired=FALSE)&lt;br /&gt;
&amp;gt; res&lt;br /&gt;
&lt;br /&gt;
	Two Sample t-test&lt;br /&gt;
&lt;br /&gt;
data:  boys_2 and girls_2 &lt;br /&gt;
t = 0.8429, df = 18, p-value = 0.4103&lt;br /&gt;
alternative hypothesis: true difference in means is not equal to 0 &lt;br /&gt;
95 percent confidence interval:&lt;br /&gt;
 -1.656675  3.876675 &lt;br /&gt;
sample estimates:&lt;br /&gt;
mean of x mean of y &lt;br /&gt;
    87.48     86.3&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Classification==&lt;br /&gt;
&lt;br /&gt;
===k Nearest Neighbours===&lt;br /&gt;
&lt;br /&gt;
This famous (Fisher's or Anderson's) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa (s), versicolor (c), and virginica (v).&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-patched/library/datasets/html/iris.html&lt;br /&gt;
&lt;br /&gt;
k-nearest neighbour classification for test set from training set: For each row of the test set, the k nearest (in Euclidean distance) training set vectors are found, and the classification is decided by majority vote, with ties broken at random. If there are ties for the kth nearest vector, all candidates are included in the vote.&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/class/html/knn.html&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(class)&lt;br /&gt;
train &amp;lt;- rbind(iris3[1:25,,1], iris3[1:25,,2], iris3[1:25,,3])&lt;br /&gt;
test &amp;lt;- rbind(iris3[26:50,,1], iris3[26:50,,2], iris3[26:50,,3])&lt;br /&gt;
cl &amp;lt;- factor(c(rep(&amp;quot;s&amp;quot;,25), rep(&amp;quot;c&amp;quot;,25), rep(&amp;quot;v&amp;quot;,25)))&lt;br /&gt;
iris3.knn &amp;lt;- knn(train, test, cl, k = 3, prob=TRUE)&lt;br /&gt;
table(predicted=iris3.knn, actual=cl)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
How did we do?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
         actual&lt;br /&gt;
predicted  c  s  v&lt;br /&gt;
        c 23  0  3&lt;br /&gt;
        s  0 25  0&lt;br /&gt;
        v  2  0 22&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Classification Trees===&lt;br /&gt;
&lt;br /&gt;
The kyphosis data frame has 81 rows and 4 columns. representing data on children who have had corrective spinal surgery.&lt;br /&gt;
&lt;br /&gt;
This data frame contains the following columns:&lt;br /&gt;
* Kyphosis: a factor with levels absent present indicating if a kyphosis (a type of deformation) was present after the operation.&lt;br /&gt;
* Age: in months&lt;br /&gt;
* Number: the number of vertebrae involved&lt;br /&gt;
* Start: the number of the first (topmost) vertebra operated on.&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/rpart/html/kyphosis.html&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
fit &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)&lt;br /&gt;
fit2 &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis,&lt;br /&gt;
              parms = list(prior = c(.65,.35), split = &amp;quot;information&amp;quot;))&lt;br /&gt;
fit3 &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis,&lt;br /&gt;
              control = rpart.control(cp = 0.05))&lt;br /&gt;
par(mfrow = c(1,2), xpd = NA) # otherwise on some devices the text is clipped&lt;br /&gt;
plot(fit)&lt;br /&gt;
text(fit, use.n = TRUE)&lt;br /&gt;
plot(fit2)&lt;br /&gt;
text(fit2, use.n = TRUE)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Image:R-classification-tree.png|500px|thumbnail|center|Classification tree for the kyphosis data frame.]]&lt;br /&gt;
&lt;br /&gt;
==Solving Systems of Linear Equations==&lt;br /&gt;
&lt;br /&gt;
See, e.g.: https://source.ggy.bris.ac.uk/wiki/NumMethodsPDEs&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; A &amp;lt;- array(c(1,3,2,3,5,4,-2,6,3), dim=c(3,3))&lt;br /&gt;
&amp;gt; b &amp;lt;- c(5,7,8)&lt;br /&gt;
&amp;gt; solve(A,b)&lt;br /&gt;
[1] -15   8   2&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Suggested Exercises=&lt;br /&gt;
&lt;br /&gt;
If you would like to work through some exercises, with model answers included, you could take a look at:&lt;br /&gt;
* http://www2.warwick.ac.uk/fac/sci/statistics/staff/academic-research/reed/rexercises.pdf&lt;br /&gt;
&lt;br /&gt;
If you would prefer to noodle about with some real-world data, you could take a look at:&lt;br /&gt;
* http://www.theguardian.com/news/datablog/2010/oct/18/historic-government-spending-area#data&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=Subversion&amp;diff=9471</id>
		<title>Subversion</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=Subversion&amp;diff=9471"/>
		<updated>2014-11-07T11:24:03Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: /* Modifying your Working Copy */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Pragmatic Programming]]&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
In this workshop, we'll look at using a particular Version Control System (VCS) called Subversion (often abbreviated to SVN).  Before getting into the nitty-gritty of using SVN, we'll pause to consider the motivations for adopting version control and also the key concepts that are common to most available systems. &lt;br /&gt;
&lt;br /&gt;
==Why is Version Control useful?==&lt;br /&gt;
&lt;br /&gt;
OK, here's the sales pitch:&lt;br /&gt;
&lt;br /&gt;
* It '''removes confusion''' about versions.  For example, you will no longer have to keep inventing names for different versions of essentially the same document  e.g. blah.old, blah.sav, blah.older, blah.newest2 (look familiar?).&lt;br /&gt;
* It makes '''collaborative working''' easier.  Version control assists in coordination as it removes any confusion about versions, highlights conflicts, allows the use of independent working copies, records log messages and much more besides.&lt;br /&gt;
* It makes '''distributing your code''' easier.  A version control repository can be visible to the world (often as a URL).  However, using some highly customisable access controls, you can arrange for some (perhaps anyone) to download your project while also specifying that only a select few may be trusted to upload. &lt;br /&gt;
* It makes '''reproducing experiments''' easier.  The ability to reproduce an experiment is a ''key characteristic of science''.  However, all too often, in the digital age, people are unable to run the same version of a model that they ran six months ago.  With version control, you can always access any previous version of your model.&lt;br /&gt;
* It aids '''disaster recovery'''.  You computer is fried?  No problem!  Just checkout your code to another and you're working productively again in minutes.&lt;br /&gt;
&lt;br /&gt;
=Version Control Concepts=&lt;br /&gt;
&lt;br /&gt;
A picture can be worth a thousand words, so let's try illustrating some of the key version control concepts, before wading into acres of text:&lt;br /&gt;
&lt;br /&gt;
[[Image:Svn-cartoon.jpg|700px|thumbnail|center|Files stored in a repository; a checkout; modification and commit.  All versions recorded.]]&lt;br /&gt;
[[Image:File-tree.jpg|400px|thumbnail|center|A checkout adds a copy of the files held in the repository to your local computer.]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
usernames are countries:&lt;br /&gt;
&lt;br /&gt;
'''greece, germany, switzerland, egypt, ireland, cuba, finland, portugal, england, spain, russia, norway, canada, france, italy, japan'''&lt;br /&gt;
&lt;br /&gt;
files are capital cities:&lt;br /&gt;
&lt;br /&gt;
'''athens, berlin, bern, cairo, dublin, havana, helsinki, lisbon, london, madrid, moscow, oslo, ottawa, paris, rome, tokyo'''&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Subversion is a centralised version control system. Centralised version control means that a copy of your project is held in a central location called the '''repository''' and the subversion server logs all operations happening on the repository: every time something is changed in the repository, the server logs the '''time and date''', the '''changes''', the '''author''' as well as a '''log message'''. The server can be configured to give privavcy; allowing some people actions which are disallowed for others. For instance, in the practical, the server allows anonymous read-only access but only a selected number of people can changes things.&lt;br /&gt;
&lt;br /&gt;
All the operations described above (logging and authentication) happen on the server. However, the server is only accessible directly to system administrators. To interact with the server, a user makes use of a subversion '''client'''. Some of you might already know about some graphical subversion clients such as TortoiseSVN (see the screen grab below). This practical will show how the command line client can be used. The subversion client can be used to (1) ask information from and (2) send information to the server. The client can also be used to get information about your '''working copy''' which is the local copy of the project that resides on your filespace. You can use the client to ask questions such as:&lt;br /&gt;
* which files have I modified since I last synchronised with the server?&lt;br /&gt;
* when was that file last modified?&lt;br /&gt;
* who, inadvertently, created a bug at line 18 in file foo.c?&lt;br /&gt;
* what has changed in that file?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;gallery widths=500px heights=350px perrow=2&amp;gt;&lt;br /&gt;
File:Tsvn_switch1.png|TortoiseSVN for MS Windows&lt;br /&gt;
File:Svn-cli.png|svn command line client for Linux&lt;br /&gt;
&amp;lt;/gallery&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Acquiring a Repository=&lt;br /&gt;
&lt;br /&gt;
For the purposes of this practical, you can get yourself a repository from one of the hosting sites that can be found out in the cloud.  Or you could use another repository that you have access to--perhaps hosted here in the University, some other portion of UK academia or elsewhere.  I've prepared the examples using a repository accessed via the GitHub website (https://github.com).  '''Note, however, that to obtain a free repository on GitHub, you must agree to it being readable by anyone.'''  &lt;br /&gt;
&lt;br /&gt;
'''NB With that in mind, you may want to have a think about the right home for any of your intellectual property.'''&lt;br /&gt;
&lt;br /&gt;
OK, let's assume that you are happy to work with a GitHub hosted repository--at least for your initial steps learning about version control and subversion.  (A natty feature of GitHub repositories is that they can be used with both Subversion and Git VCS.)&lt;br /&gt;
&lt;br /&gt;
Registering and creating a repository is easy, just follow the instructions on the webpage:&lt;br /&gt;
&lt;br /&gt;
[[Image:Github.png|500px|thumbnail|center|The Github web interface]]&lt;br /&gt;
&lt;br /&gt;
Be sure to check the box: '''Initialize this repository with a README'''&lt;br /&gt;
&lt;br /&gt;
In addition to the command line client that we will describe in the following sections, you can manage your repository, view and even edit your files through the GitHub website: &lt;br /&gt;
&lt;br /&gt;
[[Image:Github-ggdagw-test.png|500px|thumbnail|center|A test repository]]&lt;br /&gt;
&lt;br /&gt;
=svn: The Subversion Command Line Client=&lt;br /&gt;
The subversion command line client is called '''svn'''. To execute a subversion command, simply type:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ svn command arguments&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Some commands can also use options which are given with dashes:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
svn command arguments --option optionvalue&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Subversion provides extensive help about the commands to use. To get help for a particular subversion command, simply use:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ svn help command&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Checkout a Working Copy=&lt;br /&gt;
&lt;br /&gt;
Now that you have access to a repository, let's create a '''working copy''' of the files in the repository.  To do this we use the '''svn checkout''' command, or '''svn co''', for short:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~$ svn co https://github.com/ggdagw/test ./test&lt;br /&gt;
A    test/branches&lt;br /&gt;
A    test/trunk&lt;br /&gt;
A    test/trunk/README.md&lt;br /&gt;
Checked out revision 1.&lt;br /&gt;
&amp;lt;/pre&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This command gets a copy of the content at the URL and places it in a new directory called test.  The letter &amp;quot;A&amp;quot; simply means that these files have been added to your working copy.  You'll also notice two subdirectories called '''trunk''' and '''branches'''.  This pattern follows an convention.  Usually, subversion repositories are organised so that the main strand of development is in the ''trunk''.  Sometimes it is useful to store variants of the trunk version (more of that later) and the ''branches'' folder exists to accommodate those.  (This is purely convention as far as subversion is concerned, however, and &amp;quot;trunk&amp;quot; and &amp;quot;branches&amp;quot; are merely two folders under the URL.)&lt;br /&gt;
&lt;br /&gt;
The content that you saw through your browser is now in your own file space. You may also notice hidden directories called &amp;quot;.svn&amp;quot;. '''It is very important that you do not touch these directories.'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~$ cd test/&lt;br /&gt;
gethin@gethin-desktop:~/test$ ls -al&lt;br /&gt;
total 28&lt;br /&gt;
drwxr-xr-x   5 gethin gethin  4096 2013-07-26 12:29 .&lt;br /&gt;
drwxr-xr-x 117 gethin gethin 12288 2013-07-26 12:29 ..&lt;br /&gt;
drwxr-xr-x   3 gethin gethin  4096 2013-07-26 12:29 branches&lt;br /&gt;
drwxr-xr-x   6 gethin gethin  4096 2013-07-26 12:29 .svn&lt;br /&gt;
drwxr-xr-x   3 gethin gethin  4096 2013-07-26 12:29 trunk&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Modifying your Working Copy=&lt;br /&gt;
&lt;br /&gt;
Right oh.  The working copy is yours to work with so let's go ahead and modify the README.md file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~/test$ cd trunk&lt;br /&gt;
gethin@gethin-desktop:~/test/trunk$ ls&lt;br /&gt;
README.md&lt;br /&gt;
gethin@gethin-desktop:~/test/trunk$ emacs -nw README.md &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
(See, e.g. https://www.acrc.bris.ac.uk/acrc/pdf/emacs.pdf, if you'd like to use the emacs text editor, but are new to it.)&lt;br /&gt;
&lt;br /&gt;
To see what files you have modified, you ask the client for the status of your working copy:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~/test/trunk$ svn status&lt;br /&gt;
M       README.md&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The status shows the letter &amp;quot;M&amp;quot; for README.md, indicating it has been modified.&lt;br /&gt;
&lt;br /&gt;
Note that this status only shows the things that have changed in '''your''' working copy.  It does not show any changes made by others, either in the repository or in their own working copies.&lt;br /&gt;
&lt;br /&gt;
You can also add a new file.  Let's add a file called '''foo.txt''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~/test/trunk$ touch foo.txt&lt;br /&gt;
gethin@gethin-desktop:~/test/trunk$ svn status&lt;br /&gt;
?       foo.txt&lt;br /&gt;
M       README.md&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The question mark shows that the subversion client knows nothing about the new file (i.e. it is not currently under the auspices to version control).  By default, svn will ignore new files.  To indicate that a new file should be versioned, use the '''add''' command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~/test/trunk$ svn add foo.txt&lt;br /&gt;
A         foo.txt&lt;br /&gt;
gethin@gethin-desktop:~/test/trunk$ svn status&lt;br /&gt;
A       foo.txt&lt;br /&gt;
M       README.md&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The letter &amp;quot;A&amp;quot; is used to indicate an addition.&lt;br /&gt;
&lt;br /&gt;
=Recording Changes in the Repository=&lt;br /&gt;
&lt;br /&gt;
Sending changes to the repository is called a '''commit'''.  Here's the command I used to send our two local modifications:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~/test/trunk$ svn commit --message &amp;quot;Added text to README.md and added the empty file foo.txt&amp;quot;&lt;br /&gt;
Sending        trunk/README.md&lt;br /&gt;
Adding         trunk/foo.txt&lt;br /&gt;
Transmitting file data ..&lt;br /&gt;
Committed revision 2.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where the '''--message''', or '''-m''' for short, allows us to write a log message inline.&lt;br /&gt;
&lt;br /&gt;
Notice the revision number.  These numbers encode the state of the whole repository at a given juncture and are the passport to retrieving earlier versions of your project.  As you commit future changes to your repository, your revision numbers will steadily increase.    &lt;br /&gt;
&lt;br /&gt;
Sometimes, you want a long message to go with a commit.  To do this, simply execute the commit without the --message option.  A text editor will then pop-up to be used to write the message and by saving and exiting, the commit will be done.  Note that svn uses the editor indicated by the '''EDITOR''' environment variable.  The editor often defaults to vi if this variable is undefined.  If you are an emacs fan, set the variable first:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ export EDITOR=emacs&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
(Note that you can use ''':q!''' to get out of vi, if you started it by accident.  You could also set EDITOR=nano or gedit etc.  You can also use the SVN_EDITOR environment variable.)&lt;br /&gt;
&lt;br /&gt;
=Revert: Your &amp;quot;Get-Out-of-Jail Card&amp;quot;=&lt;br /&gt;
&lt;br /&gt;
Just as we can add files, we can '''delete''', for example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~/test/trunk$ svn delete foo.txt&lt;br /&gt;
D         foo.txt&lt;br /&gt;
gethin@gethin-desktop:~/test/trunk$ ls&lt;br /&gt;
README.md&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The letter &amp;quot;D&amp;quot; indicates deletions and we see from typing 'ls' that the file has gone.&lt;br /&gt;
&lt;br /&gt;
Subversion allows you to '''revert''' changes when you have made an error.  Let's assume that 'foo.txt' was deleted in error.  Fear not, you can get it back with:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~/test/trunk$ svn revert foo.txt&lt;br /&gt;
Reverted 'foo.txt'&lt;br /&gt;
gethin@gethin-desktop:~/test/trunk$ ls&lt;br /&gt;
foo.txt  README.md&lt;br /&gt;
gethin@gethin-desktop:~/test/trunk$ svn status&lt;br /&gt;
gethin@gethin-desktop:~/test2/trunk$&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and foo.txt is back!  A silent return from svn status, svn stat for short, indicates that there are no pending modifications in your working copy.  Put another way, it exactly matches the repository version of the project when you made the checkout.&lt;br /&gt;
&lt;br /&gt;
=Updating your Working Copy=&lt;br /&gt;
&lt;br /&gt;
You can '''update''' your working copy to synchronise it with the latest version (known as the HEAD) held in the repository.  The general form of the update command is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ svn update&lt;br /&gt;
... &amp;lt;- list of files that have been added/modified&lt;br /&gt;
At revision X.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If I update my working copy now:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~/test/trunk$ svn update&lt;br /&gt;
At revision 2.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
we see an empty list of files--i.e. there is nothing to update and my working copy perfectly matches the HEAD of the repository.&lt;br /&gt;
&lt;br /&gt;
That needn't be the case, however.  Let's imagine that you and a collaborator in Japan have access to the repository.  You obviously work independently and, for good measure, in different time zones.  Your collaborator may have committed some changes to the repository since you were last in front of a computer.  That being the case, an update will bring all of her changes to your working copy.&lt;br /&gt;
&lt;br /&gt;
A similar situation can arise if you are simultaneously operating two checkouts.  Perhaps one at work and another on your home computer.  If you had done some work at home yesterday evening and committed the fruits of your labours, and update will bring your work copy in line. &lt;br /&gt;
&lt;br /&gt;
You can even update your working copy if you have some local modifications pending.  In that situation, SVN will attempt to merge your changes with those from the southern hemisphere.  If you both have edited the same line in a file, a '''conflict''' is flagged.  More on that possibility later.&lt;br /&gt;
&lt;br /&gt;
With all the foregoing in mind, '''status''', '''commit''' and '''update''' will probably be your most widely used commands:&lt;br /&gt;
# '''update''' regularly to bring other people's work&lt;br /&gt;
# '''status''' to make sure all is well&lt;br /&gt;
# '''commit''' frequently so that you can always recover a version you care about&lt;br /&gt;
&lt;br /&gt;
=Investigating Changes=&lt;br /&gt;
&lt;br /&gt;
This section highlights some commands can be used to boost productivity.&lt;br /&gt;
&lt;br /&gt;
==log==&lt;br /&gt;
To get a log of what happened in the repository, use the log command. To see the files that have been modified as well as the log messages, use the --verbose option:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~/test/trunk$ svn log --verbose&lt;br /&gt;
------------------------------------------------------------------------&lt;br /&gt;
r2 | ggdagw | 2013-07-26 14:58:08 +0100 (Fri, 26 Jul 2013) | 2 lines&lt;br /&gt;
Changed paths:&lt;br /&gt;
   M /trunk/README.md&lt;br /&gt;
   A /trunk/foo.txt&lt;br /&gt;
&lt;br /&gt;
Added text to README.md and added the empty file foo.txt&lt;br /&gt;
&lt;br /&gt;
------------------------------------------------------------------------&lt;br /&gt;
r1 | ggdagw | 2013-07-26 12:28:32 +0100 (Fri, 26 Jul 2013) | 2 lines&lt;br /&gt;
Changed paths:&lt;br /&gt;
   A /branches&lt;br /&gt;
   A /trunk&lt;br /&gt;
   A /trunk/README.md&lt;br /&gt;
&lt;br /&gt;
Initial commit&lt;br /&gt;
&lt;br /&gt;
------------------------------------------------------------------------&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can also invoke the log command on a particular file/path and provide a range of revisions.&lt;br /&gt;
For instance to see which commits affected file1 between revisions 4 and 6, one could use:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ svn log --verbose --revision 4:6 file1&lt;br /&gt;
... &amp;lt;- log output&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==diff==&lt;br /&gt;
After you have modified something, it can be handy to highlight what you've done.  You can do this using the '''diff''' command.&lt;br /&gt;
&lt;br /&gt;
For instance add some text to 'README.md' and use '''diff''' to see what you have done.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~/test/trunk$ svn stat&lt;br /&gt;
M       README.md&lt;br /&gt;
gethin@gethin-desktop:~/test/trunk$ svn diff README.md &lt;br /&gt;
Index: README.md&lt;br /&gt;
===================================================================&lt;br /&gt;
--- README.md	(revision 2)&lt;br /&gt;
+++ README.md	(working copy)&lt;br /&gt;
@@ -7,3 +7,16 @@&lt;br /&gt;
 ------------&lt;br /&gt;
 &lt;br /&gt;
 This file is formatted in [Markdown](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet) and will be automatically rendered on your GitHub webpage.&lt;br /&gt;
+&lt;br /&gt;
+Here is an itemised list:&lt;br /&gt;
+* bread&lt;br /&gt;
+* butter&lt;br /&gt;
+* marmalade&lt;br /&gt;
+&lt;br /&gt;
+A Table:&lt;br /&gt;
+&lt;br /&gt;
+| Name    | Colour        | Price         |&lt;br /&gt;
+| ------- |:-------------:|--------------:|&lt;br /&gt;
+| Thomas  | centered      | right-aligned |&lt;br /&gt;
+| Gordon  | blue          |         £3.56 |&lt;br /&gt;
+| Henry   |         green | £2.81         |&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can also use diff to highlight differences between two versions of some file, as stored in the repository:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ svn diff -r73:74 foo.txt&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==blame (praise)==&lt;br /&gt;
Sometimes, you want to know who wrote a particular bit of code. Subversion makes that easy with the '''blame''' command:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ svn blame file2&lt;br /&gt;
     2   jprenaud Added some stuff&lt;br /&gt;
     3   jprenaud Another line&lt;br /&gt;
     4   jprenaud A third line.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You see the content of file2 and for each line the name of the author and the revision number. You could then fetch the log message for that particular revision to get more information.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ svn log file2 --revision 3&lt;br /&gt;
------------------------------------------------------------------------&lt;br /&gt;
r3 | jprenaud | 2008-05-14 16:03:45 +0100 (Wed, 14 May 2008) | 1 line&lt;br /&gt;
&lt;br /&gt;
More things.&lt;br /&gt;
------------------------------------------------------------------------&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
However, this feature is not available for github hosted repositories:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~/test/trunk$ svn blame README.md &lt;br /&gt;
svn: Server does not support custom revprops via log&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Conflicts=&lt;br /&gt;
&lt;br /&gt;
Sometimes, a commit or an update will fail because of conflicting changes. As a rule, you should always update before a commit so the example here will show a conflict created after an update. &lt;br /&gt;
&lt;br /&gt;
==Creating the conflict==&lt;br /&gt;
&lt;br /&gt;
As mentioned previously, conflicts arise when SVN cannot merge together changes to the same file--i.e. the changes are on the same line.&lt;br /&gt;
&lt;br /&gt;
You can manufacture a conflict using two checkouts--let's call them A and B.  I could create two such working copies by typing the following:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
svn co https://github.com/ggdagw/test ./testA&lt;br /&gt;
svn co https://github.com/ggdagw/test ./testB&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is nothing to stop me having multiple checkouts on the same computer.  Now, that we have the raw materials:&lt;br /&gt;
&lt;br /&gt;
# Ensure that both A and B are up-to-date.&lt;br /&gt;
# edit line 1 of README.md in A and commit.&lt;br /&gt;
# edit line 1 of README.md in B--do not commit.&lt;br /&gt;
# Now attempt to update B.&lt;br /&gt;
&lt;br /&gt;
Et voila, you will have a conflict.  Since SVN cannot resolve it, we must apply the old human grey matter to the task.  If you were working with a collaborator, this may well involve a phone or email conversation to decide on the best course of action. &lt;br /&gt;
&lt;br /&gt;
The update does not immediately fail.  Rather, you are be presented with some options:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Conflict discovered in 'README.md'.&lt;br /&gt;
Select: (p) postpone, (df) diff-full, (e) edit,&lt;br /&gt;
        (mc) mine-conflict, (tc) theirs-conflict,&lt;br /&gt;
        (s) show all options: df&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you choose &amp;quot;df&amp;quot;, then you will be presented with a summary of the 3-way difference: First how it was prior to your local change; second how it is in your working copy and lastly how it currently is in the repository.  You will also be presented with the list of options again.  If you choose mine-conflict, &amp;quot;mc&amp;quot;, your local modifications will be preferred--at least in this working copy, since nothing has been committed back at this stage.  Theirs-conflict, &amp;quot;tc&amp;quot;, will prefer the repository version.  If you elect to postpone, &amp;quot;p&amp;quot;, then 'README.md' is flagged with the letter &amp;quot;C&amp;quot; indicating a conflict and you will notice new files in your working copy:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$svn status&lt;br /&gt;
?      README.md.r5&lt;br /&gt;
?      README.md.r6&lt;br /&gt;
?      README.md.mine&lt;br /&gt;
C      README.md&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* README.md.r5 is README.md as at revision 5 (i.e. the one at your last update)&lt;br /&gt;
* README.md.r6 is README.md at revision 6 (i.e. the one that is on the repository now)&lt;br /&gt;
* README.md.mine is README.md as it was in working copy before the update&lt;br /&gt;
* README.md contains an attempt at merging the changes (this will be similar to what you see with &amp;quot;df&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
The last option, edit (e), will present to you the attempted merge in a text editor, for you to resolve as you see fit.  Note that if you type &amp;quot;svn status&amp;quot; after editing README.md in this situation, you will see that the file is still marked as conflicted and you will not be able to commit your changes until you have '''resolved''' the conflict by typing e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
svn resolved README.md&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Other useful commands=&lt;br /&gt;
&lt;br /&gt;
==info==&lt;br /&gt;
&lt;br /&gt;
From inside a working copy, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
svn info&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
will give:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~/test/trunk$ svn info&lt;br /&gt;
Path: .&lt;br /&gt;
URL: https://github.com/ggdagw/test/trunk&lt;br /&gt;
Repository Root: https://github.com/ggdagw/test&lt;br /&gt;
Repository UUID: be566c2d-dc09-ebaf-f5e5-ce57b7db7bff&lt;br /&gt;
Revision: 3&lt;br /&gt;
Node Kind: directory&lt;br /&gt;
Schedule: normal&lt;br /&gt;
Last Changed Author: ggdagw&lt;br /&gt;
Last Changed Rev: 3&lt;br /&gt;
Last Changed Date: 2013-07-26 16:03:50 +0100 (Fri, 26 Jul 2013)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==list==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~/test2/trunk$ svn list https://github.com/ggdagw/test&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
will give:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
branches/&lt;br /&gt;
trunk/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~/test2/trunk$ svn list https://github.com/ggdagw/test/trunk&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
will give:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
README.md&lt;br /&gt;
foo.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==move &amp;amp; copy==&lt;br /&gt;
If you rename a file or directory manually, you loose its history, this is because subversion needs to be notified that a tracked file or directory will have a new name. It is simpler to use the subversion '''move''' command. For instance, to rename &amp;quot;file2&amp;quot;, do: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ svn move file2 new_file2&lt;br /&gt;
A         new_file2&lt;br /&gt;
D         file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You notice that the new file is added and the old one deleted. You could have done this manually but the advantage of this is that the history of the new file before the new name is still available.&lt;br /&gt;
&lt;br /&gt;
A close relation to '''move''' is '''copy'''.  This creates a new file, with a copy of the revision history of it's template:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ svn copy havana havana2&lt;br /&gt;
A         havana2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==import==&lt;br /&gt;
When you ask for a new repository, it is empty by default. To populate it, you can use the import command. (import is because the action is done from the server, it imports something). The syntax is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ svn import PATH URL/trunk --message &amp;quot;Log message.&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* PATH is the path to the local folder (by default it ses &amp;quot;./&amp;quot;, i.e. the current folder)&lt;br /&gt;
* URL is the full URL of the repository. In the example, I also added &amp;quot;trunk/&amp;quot; at the end and the trunk would be created automatically.&lt;br /&gt;
&lt;br /&gt;
==mkdir ==&lt;br /&gt;
Often people ask how then can create the &amp;quot;branches/&amp;quot; directory in the repository to store some specific versions of their code. This can be done by invoking mkdir directly on the server. The syntax is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ svn mkdir URL/branches --message &amp;quot;Log message.&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==export==&lt;br /&gt;
Sometimes, you want to get the files from the version control system but this will not be used as a working copy, for instance you are going to send the files to somebody who is not involved in the development. For instance, it is the command that was used for the [[Linux1]] and [[Linux2]] practicals.&lt;br /&gt;
&lt;br /&gt;
You could do a checkout and remove all the hidden &amp;quot;.ssh&amp;quot; directories manually, but the easiest to to use the &amp;quot;export&amp;quot; command. It works exactly like a checkout except that you end up with a normal local folder, not a working copy. The syntax is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ svn export URL PATH&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==branching and merge==&lt;br /&gt;
&lt;br /&gt;
Subversion can support multiple development strands via the creation of branches.  The svn '''copy''' command is used to make a (space efficient) copy of an entire file tree from, say, the trunk to a subdir called &amp;quot;branches&amp;quot;.  A popular reason for creating a branch is for a particular developer (or team) to work on something speculative or disruptive:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
svn copy https://svn.ggy.bris.ac.uk/subversion/ourproject/trunk \&lt;br /&gt;
https://svn.ggy.bris.ac.uk/subversion/ourproject/branches/sally_dev \&lt;br /&gt;
-m &amp;quot;The reason why I'm branching is...&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
When you have different branches in your project, you might want to merge the changes from one branch to another. For instance somebody has fixed a bug on a branch that is still present in the trunk. You might want to apply the changes done on branch back onto the trunk. Subversion allows you to do this and it is called a merge operation. We will cover this topic very quickly here but you can refer to the [http://svnbook.red-bean.com Subversion Red Book] for more information about merging.&lt;br /&gt;
&lt;br /&gt;
[[Image:Svn_merge_general.jpg|frame|centre|General situation for svn merge]]&lt;br /&gt;
&lt;br /&gt;
For example, the following command will merge in a change set from revA to revB from the sally_dev branch of ourproject into your working copy:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
svn merge -r revA:revB https://svn.ggy.bris.ac.uk/subversion/ourproject/branches/sally_dev&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can then commit these changes--should you so desire--which will end up in the development line of whatever you chose to checkout in order to obtain your working copy.&lt;br /&gt;
&lt;br /&gt;
=To go further=&lt;br /&gt;
The [http://svnbook.red-bean.com/ Subversion Red Book] is the bible of subversion. Highly recommended.&lt;br /&gt;
&lt;br /&gt;
In the book, you can see how to create your own repository, should you desire.  For example, some simple repository setup commands will provide you with a working facility via the filesystem (i.e. the repository is on the same computer that you typically work on), or SSH (i.e. you have SSH access to the machine that will host the respository).  See below.  '''However, note two things:'''&lt;br /&gt;
* '''It is dangerous to create a repository on a files-ystem that is not backed-up.  You could lose all your work that way.'''&lt;br /&gt;
* '''You can request a (backed-up) University of Bristol repository simply by contacting the service desk.'''&lt;br /&gt;
&lt;br /&gt;
==Creating our own Repository==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
svnadmin create $HOME/my_test_repo&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Next you might import some files (that you have stored in a directory called 'projectX').  If you're working on the same filesystem you would use: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cd projectX&lt;br /&gt;
svn import . file://$HOME/my_test_repo/trunk -m &amp;quot;my import message&amp;quot;&lt;br /&gt;
svn list file://$HOME/my_test_repo/trunk&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Which--via SSH--would translate to:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cd projectX&lt;br /&gt;
svn import . svn+ssh://user@host/absolute/path/to/my_test_repo/trunk -m &amp;quot;my import message&amp;quot;&lt;br /&gt;
svn list svn+ssh://user@host/absolute/path/to/my_test_repo/trunk&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt; &lt;br /&gt;
&lt;br /&gt;
NB where the components of '''user@host/absolute/path/to''' are:&lt;br /&gt;
* user: your username on the remote machine hosting your repository&lt;br /&gt;
* host: the hostname of the remote machine&lt;br /&gt;
* /absolute/path/to: you must specify the absolute path to your repository ($HOME will be evaluated on the machine to SSH'ing ''from''&lt;br /&gt;
&lt;br /&gt;
And you're away!&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=Subversion&amp;diff=9470</id>
		<title>Subversion</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=Subversion&amp;diff=9470"/>
		<updated>2014-11-05T14:34:12Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: /* To go further */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Pragmatic Programming]]&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
In this workshop, we'll look at using a particular Version Control System (VCS) called Subversion (often abbreviated to SVN).  Before getting into the nitty-gritty of using SVN, we'll pause to consider the motivations for adopting version control and also the key concepts that are common to most available systems. &lt;br /&gt;
&lt;br /&gt;
==Why is Version Control useful?==&lt;br /&gt;
&lt;br /&gt;
OK, here's the sales pitch:&lt;br /&gt;
&lt;br /&gt;
* It '''removes confusion''' about versions.  For example, you will no longer have to keep inventing names for different versions of essentially the same document  e.g. blah.old, blah.sav, blah.older, blah.newest2 (look familiar?).&lt;br /&gt;
* It makes '''collaborative working''' easier.  Version control assists in coordination as it removes any confusion about versions, highlights conflicts, allows the use of independent working copies, records log messages and much more besides.&lt;br /&gt;
* It makes '''distributing your code''' easier.  A version control repository can be visible to the world (often as a URL).  However, using some highly customisable access controls, you can arrange for some (perhaps anyone) to download your project while also specifying that only a select few may be trusted to upload. &lt;br /&gt;
* It makes '''reproducing experiments''' easier.  The ability to reproduce an experiment is a ''key characteristic of science''.  However, all too often, in the digital age, people are unable to run the same version of a model that they ran six months ago.  With version control, you can always access any previous version of your model.&lt;br /&gt;
* It aids '''disaster recovery'''.  You computer is fried?  No problem!  Just checkout your code to another and you're working productively again in minutes.&lt;br /&gt;
&lt;br /&gt;
=Version Control Concepts=&lt;br /&gt;
&lt;br /&gt;
A picture can be worth a thousand words, so let's try illustrating some of the key version control concepts, before wading into acres of text:&lt;br /&gt;
&lt;br /&gt;
[[Image:Svn-cartoon.jpg|700px|thumbnail|center|Files stored in a repository; a checkout; modification and commit.  All versions recorded.]]&lt;br /&gt;
[[Image:File-tree.jpg|400px|thumbnail|center|A checkout adds a copy of the files held in the repository to your local computer.]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
usernames are countries:&lt;br /&gt;
&lt;br /&gt;
'''greece, germany, switzerland, egypt, ireland, cuba, finland, portugal, england, spain, russia, norway, canada, france, italy, japan'''&lt;br /&gt;
&lt;br /&gt;
files are capital cities:&lt;br /&gt;
&lt;br /&gt;
'''athens, berlin, bern, cairo, dublin, havana, helsinki, lisbon, london, madrid, moscow, oslo, ottawa, paris, rome, tokyo'''&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Subversion is a centralised version control system. Centralised version control means that a copy of your project is held in a central location called the '''repository''' and the subversion server logs all operations happening on the repository: every time something is changed in the repository, the server logs the '''time and date''', the '''changes''', the '''author''' as well as a '''log message'''. The server can be configured to give privavcy; allowing some people actions which are disallowed for others. For instance, in the practical, the server allows anonymous read-only access but only a selected number of people can changes things.&lt;br /&gt;
&lt;br /&gt;
All the operations described above (logging and authentication) happen on the server. However, the server is only accessible directly to system administrators. To interact with the server, a user makes use of a subversion '''client'''. Some of you might already know about some graphical subversion clients such as TortoiseSVN (see the screen grab below). This practical will show how the command line client can be used. The subversion client can be used to (1) ask information from and (2) send information to the server. The client can also be used to get information about your '''working copy''' which is the local copy of the project that resides on your filespace. You can use the client to ask questions such as:&lt;br /&gt;
* which files have I modified since I last synchronised with the server?&lt;br /&gt;
* when was that file last modified?&lt;br /&gt;
* who, inadvertently, created a bug at line 18 in file foo.c?&lt;br /&gt;
* what has changed in that file?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;gallery widths=500px heights=350px perrow=2&amp;gt;&lt;br /&gt;
File:Tsvn_switch1.png|TortoiseSVN for MS Windows&lt;br /&gt;
File:Svn-cli.png|svn command line client for Linux&lt;br /&gt;
&amp;lt;/gallery&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Acquiring a Repository=&lt;br /&gt;
&lt;br /&gt;
For the purposes of this practical, you can get yourself a repository from one of the hosting sites that can be found out in the cloud.  Or you could use another repository that you have access to--perhaps hosted here in the University, some other portion of UK academia or elsewhere.  I've prepared the examples using a repository accessed via the GitHub website (https://github.com).  '''Note, however, that to obtain a free repository on GitHub, you must agree to it being readable by anyone.'''  &lt;br /&gt;
&lt;br /&gt;
'''NB With that in mind, you may want to have a think about the right home for any of your intellectual property.'''&lt;br /&gt;
&lt;br /&gt;
OK, let's assume that you are happy to work with a GitHub hosted repository--at least for your initial steps learning about version control and subversion.  (A natty feature of GitHub repositories is that they can be used with both Subversion and Git VCS.)&lt;br /&gt;
&lt;br /&gt;
Registering and creating a repository is easy, just follow the instructions on the webpage:&lt;br /&gt;
&lt;br /&gt;
[[Image:Github.png|500px|thumbnail|center|The Github web interface]]&lt;br /&gt;
&lt;br /&gt;
Be sure to check the box: '''Initialize this repository with a README'''&lt;br /&gt;
&lt;br /&gt;
In addition to the command line client that we will describe in the following sections, you can manage your repository, view and even edit your files through the GitHub website: &lt;br /&gt;
&lt;br /&gt;
[[Image:Github-ggdagw-test.png|500px|thumbnail|center|A test repository]]&lt;br /&gt;
&lt;br /&gt;
=svn: The Subversion Command Line Client=&lt;br /&gt;
The subversion command line client is called '''svn'''. To execute a subversion command, simply type:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ svn command arguments&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Some commands can also use options which are given with dashes:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
svn command arguments --option optionvalue&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Subversion provides extensive help about the commands to use. To get help for a particular subversion command, simply use:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ svn help command&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Checkout a Working Copy=&lt;br /&gt;
&lt;br /&gt;
Now that you have access to a repository, let's create a '''working copy''' of the files in the repository.  To do this we use the '''svn checkout''' command, or '''svn co''', for short:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~$ svn co https://github.com/ggdagw/test ./test&lt;br /&gt;
A    test/branches&lt;br /&gt;
A    test/trunk&lt;br /&gt;
A    test/trunk/README.md&lt;br /&gt;
Checked out revision 1.&lt;br /&gt;
&amp;lt;/pre&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This command gets a copy of the content at the URL and places it in a new directory called test.  The letter &amp;quot;A&amp;quot; simply means that these files have been added to your working copy.  You'll also notice two subdirectories called '''trunk''' and '''branches'''.  This pattern follows an convention.  Usually, subversion repositories are organised so that the main strand of development is in the ''trunk''.  Sometimes it is useful to store variants of the trunk version (more of that later) and the ''branches'' folder exists to accommodate those.  (This is purely convention as far as subversion is concerned, however, and &amp;quot;trunk&amp;quot; and &amp;quot;branches&amp;quot; are merely two folders under the URL.)&lt;br /&gt;
&lt;br /&gt;
The content that you saw through your browser is now in your own file space. You may also notice hidden directories called &amp;quot;.svn&amp;quot;. '''It is very important that you do not touch these directories.'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~$ cd test/&lt;br /&gt;
gethin@gethin-desktop:~/test$ ls -al&lt;br /&gt;
total 28&lt;br /&gt;
drwxr-xr-x   5 gethin gethin  4096 2013-07-26 12:29 .&lt;br /&gt;
drwxr-xr-x 117 gethin gethin 12288 2013-07-26 12:29 ..&lt;br /&gt;
drwxr-xr-x   3 gethin gethin  4096 2013-07-26 12:29 branches&lt;br /&gt;
drwxr-xr-x   6 gethin gethin  4096 2013-07-26 12:29 .svn&lt;br /&gt;
drwxr-xr-x   3 gethin gethin  4096 2013-07-26 12:29 trunk&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Modifying your Working Copy=&lt;br /&gt;
&lt;br /&gt;
Right oh.  The working copy is yours to work with so let's go ahead and modify the README.md file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~/test$ cd trunk&lt;br /&gt;
gethin@gethin-desktop:~/test/trunk$ ls&lt;br /&gt;
README.md&lt;br /&gt;
gethin@gethin-desktop:~/test/trunk$ emacs -nw README.md &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
(See, e.g. http://refcards.com/docs/gildeas/gnu-emacs/emacs-refcard-a4.pdf, if you'd like to use the emacs text editor, but are new to it.)&lt;br /&gt;
&lt;br /&gt;
To see what files you have modified, you ask the client for the status of your working copy:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~/test/trunk$ svn status&lt;br /&gt;
M       README.md&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The status shows the letter &amp;quot;M&amp;quot; for README.md, indicating it has been modified.&lt;br /&gt;
&lt;br /&gt;
Note that this status only shows the things that have changed in '''your''' working copy.  It does not show any changes made by others, either in the repository or in their own working copies.&lt;br /&gt;
&lt;br /&gt;
You can also add a new file.  Let's add a file called '''foo.txt''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~/test/trunk$ touch foo.txt&lt;br /&gt;
gethin@gethin-desktop:~/test/trunk$ svn status&lt;br /&gt;
?       foo.txt&lt;br /&gt;
M       README.md&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The question mark shows that the subversion client knows nothing about the new file (i.e. it is not currently under the auspices to version control).  By default, svn will ignore new files.  To indicate that a new file should be versioned, use the '''add''' command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~/test/trunk$ svn add foo.txt&lt;br /&gt;
A         foo.txt&lt;br /&gt;
gethin@gethin-desktop:~/test/trunk$ svn status&lt;br /&gt;
A       foo.txt&lt;br /&gt;
M       README.md&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The letter &amp;quot;A&amp;quot; is used to indicate an addition.&lt;br /&gt;
&lt;br /&gt;
=Recording Changes in the Repository=&lt;br /&gt;
&lt;br /&gt;
Sending changes to the repository is called a '''commit'''.  Here's the command I used to send our two local modifications:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~/test/trunk$ svn commit --message &amp;quot;Added text to README.md and added the empty file foo.txt&amp;quot;&lt;br /&gt;
Sending        trunk/README.md&lt;br /&gt;
Adding         trunk/foo.txt&lt;br /&gt;
Transmitting file data ..&lt;br /&gt;
Committed revision 2.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where the '''--message''', or '''-m''' for short, allows us to write a log message inline.&lt;br /&gt;
&lt;br /&gt;
Notice the revision number.  These numbers encode the state of the whole repository at a given juncture and are the passport to retrieving earlier versions of your project.  As you commit future changes to your repository, your revision numbers will steadily increase.    &lt;br /&gt;
&lt;br /&gt;
Sometimes, you want a long message to go with a commit.  To do this, simply execute the commit without the --message option.  A text editor will then pop-up to be used to write the message and by saving and exiting, the commit will be done.  Note that svn uses the editor indicated by the '''EDITOR''' environment variable.  The editor often defaults to vi if this variable is undefined.  If you are an emacs fan, set the variable first:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ export EDITOR=emacs&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
(Note that you can use ''':q!''' to get out of vi, if you started it by accident.  You could also set EDITOR=nano or gedit etc.  You can also use the SVN_EDITOR environment variable.)&lt;br /&gt;
&lt;br /&gt;
=Revert: Your &amp;quot;Get-Out-of-Jail Card&amp;quot;=&lt;br /&gt;
&lt;br /&gt;
Just as we can add files, we can '''delete''', for example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~/test/trunk$ svn delete foo.txt&lt;br /&gt;
D         foo.txt&lt;br /&gt;
gethin@gethin-desktop:~/test/trunk$ ls&lt;br /&gt;
README.md&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The letter &amp;quot;D&amp;quot; indicates deletions and we see from typing 'ls' that the file has gone.&lt;br /&gt;
&lt;br /&gt;
Subversion allows you to '''revert''' changes when you have made an error.  Let's assume that 'foo.txt' was deleted in error.  Fear not, you can get it back with:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~/test/trunk$ svn revert foo.txt&lt;br /&gt;
Reverted 'foo.txt'&lt;br /&gt;
gethin@gethin-desktop:~/test/trunk$ ls&lt;br /&gt;
foo.txt  README.md&lt;br /&gt;
gethin@gethin-desktop:~/test/trunk$ svn status&lt;br /&gt;
gethin@gethin-desktop:~/test2/trunk$&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and foo.txt is back!  A silent return from svn status, svn stat for short, indicates that there are no pending modifications in your working copy.  Put another way, it exactly matches the repository version of the project when you made the checkout.&lt;br /&gt;
&lt;br /&gt;
=Updating your Working Copy=&lt;br /&gt;
&lt;br /&gt;
You can '''update''' your working copy to synchronise it with the latest version (known as the HEAD) held in the repository.  The general form of the update command is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ svn update&lt;br /&gt;
... &amp;lt;- list of files that have been added/modified&lt;br /&gt;
At revision X.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If I update my working copy now:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~/test/trunk$ svn update&lt;br /&gt;
At revision 2.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
we see an empty list of files--i.e. there is nothing to update and my working copy perfectly matches the HEAD of the repository.&lt;br /&gt;
&lt;br /&gt;
That needn't be the case, however.  Let's imagine that you and a collaborator in Japan have access to the repository.  You obviously work independently and, for good measure, in different time zones.  Your collaborator may have committed some changes to the repository since you were last in front of a computer.  That being the case, an update will bring all of her changes to your working copy.&lt;br /&gt;
&lt;br /&gt;
A similar situation can arise if you are simultaneously operating two checkouts.  Perhaps one at work and another on your home computer.  If you had done some work at home yesterday evening and committed the fruits of your labours, and update will bring your work copy in line. &lt;br /&gt;
&lt;br /&gt;
You can even update your working copy if you have some local modifications pending.  In that situation, SVN will attempt to merge your changes with those from the southern hemisphere.  If you both have edited the same line in a file, a '''conflict''' is flagged.  More on that possibility later.&lt;br /&gt;
&lt;br /&gt;
With all the foregoing in mind, '''status''', '''commit''' and '''update''' will probably be your most widely used commands:&lt;br /&gt;
# '''update''' regularly to bring other people's work&lt;br /&gt;
# '''status''' to make sure all is well&lt;br /&gt;
# '''commit''' frequently so that you can always recover a version you care about&lt;br /&gt;
&lt;br /&gt;
=Investigating Changes=&lt;br /&gt;
&lt;br /&gt;
This section highlights some commands can be used to boost productivity.&lt;br /&gt;
&lt;br /&gt;
==log==&lt;br /&gt;
To get a log of what happened in the repository, use the log command. To see the files that have been modified as well as the log messages, use the --verbose option:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~/test/trunk$ svn log --verbose&lt;br /&gt;
------------------------------------------------------------------------&lt;br /&gt;
r2 | ggdagw | 2013-07-26 14:58:08 +0100 (Fri, 26 Jul 2013) | 2 lines&lt;br /&gt;
Changed paths:&lt;br /&gt;
   M /trunk/README.md&lt;br /&gt;
   A /trunk/foo.txt&lt;br /&gt;
&lt;br /&gt;
Added text to README.md and added the empty file foo.txt&lt;br /&gt;
&lt;br /&gt;
------------------------------------------------------------------------&lt;br /&gt;
r1 | ggdagw | 2013-07-26 12:28:32 +0100 (Fri, 26 Jul 2013) | 2 lines&lt;br /&gt;
Changed paths:&lt;br /&gt;
   A /branches&lt;br /&gt;
   A /trunk&lt;br /&gt;
   A /trunk/README.md&lt;br /&gt;
&lt;br /&gt;
Initial commit&lt;br /&gt;
&lt;br /&gt;
------------------------------------------------------------------------&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can also invoke the log command on a particular file/path and provide a range of revisions.&lt;br /&gt;
For instance to see which commits affected file1 between revisions 4 and 6, one could use:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ svn log --verbose --revision 4:6 file1&lt;br /&gt;
... &amp;lt;- log output&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==diff==&lt;br /&gt;
After you have modified something, it can be handy to highlight what you've done.  You can do this using the '''diff''' command.&lt;br /&gt;
&lt;br /&gt;
For instance add some text to 'README.md' and use '''diff''' to see what you have done.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~/test/trunk$ svn stat&lt;br /&gt;
M       README.md&lt;br /&gt;
gethin@gethin-desktop:~/test/trunk$ svn diff README.md &lt;br /&gt;
Index: README.md&lt;br /&gt;
===================================================================&lt;br /&gt;
--- README.md	(revision 2)&lt;br /&gt;
+++ README.md	(working copy)&lt;br /&gt;
@@ -7,3 +7,16 @@&lt;br /&gt;
 ------------&lt;br /&gt;
 &lt;br /&gt;
 This file is formatted in [Markdown](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet) and will be automatically rendered on your GitHub webpage.&lt;br /&gt;
+&lt;br /&gt;
+Here is an itemised list:&lt;br /&gt;
+* bread&lt;br /&gt;
+* butter&lt;br /&gt;
+* marmalade&lt;br /&gt;
+&lt;br /&gt;
+A Table:&lt;br /&gt;
+&lt;br /&gt;
+| Name    | Colour        | Price         |&lt;br /&gt;
+| ------- |:-------------:|--------------:|&lt;br /&gt;
+| Thomas  | centered      | right-aligned |&lt;br /&gt;
+| Gordon  | blue          |         £3.56 |&lt;br /&gt;
+| Henry   |         green | £2.81         |&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can also use diff to highlight differences between two versions of some file, as stored in the repository:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ svn diff -r73:74 foo.txt&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==blame (praise)==&lt;br /&gt;
Sometimes, you want to know who wrote a particular bit of code. Subversion makes that easy with the '''blame''' command:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ svn blame file2&lt;br /&gt;
     2   jprenaud Added some stuff&lt;br /&gt;
     3   jprenaud Another line&lt;br /&gt;
     4   jprenaud A third line.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You see the content of file2 and for each line the name of the author and the revision number. You could then fetch the log message for that particular revision to get more information.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ svn log file2 --revision 3&lt;br /&gt;
------------------------------------------------------------------------&lt;br /&gt;
r3 | jprenaud | 2008-05-14 16:03:45 +0100 (Wed, 14 May 2008) | 1 line&lt;br /&gt;
&lt;br /&gt;
More things.&lt;br /&gt;
------------------------------------------------------------------------&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
However, this feature is not available for github hosted repositories:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~/test/trunk$ svn blame README.md &lt;br /&gt;
svn: Server does not support custom revprops via log&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Conflicts=&lt;br /&gt;
&lt;br /&gt;
Sometimes, a commit or an update will fail because of conflicting changes. As a rule, you should always update before a commit so the example here will show a conflict created after an update. &lt;br /&gt;
&lt;br /&gt;
==Creating the conflict==&lt;br /&gt;
&lt;br /&gt;
As mentioned previously, conflicts arise when SVN cannot merge together changes to the same file--i.e. the changes are on the same line.&lt;br /&gt;
&lt;br /&gt;
You can manufacture a conflict using two checkouts--let's call them A and B.  I could create two such working copies by typing the following:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
svn co https://github.com/ggdagw/test ./testA&lt;br /&gt;
svn co https://github.com/ggdagw/test ./testB&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is nothing to stop me having multiple checkouts on the same computer.  Now, that we have the raw materials:&lt;br /&gt;
&lt;br /&gt;
# Ensure that both A and B are up-to-date.&lt;br /&gt;
# edit line 1 of README.md in A and commit.&lt;br /&gt;
# edit line 1 of README.md in B--do not commit.&lt;br /&gt;
# Now attempt to update B.&lt;br /&gt;
&lt;br /&gt;
Et voila, you will have a conflict.  Since SVN cannot resolve it, we must apply the old human grey matter to the task.  If you were working with a collaborator, this may well involve a phone or email conversation to decide on the best course of action. &lt;br /&gt;
&lt;br /&gt;
The update does not immediately fail.  Rather, you are be presented with some options:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Conflict discovered in 'README.md'.&lt;br /&gt;
Select: (p) postpone, (df) diff-full, (e) edit,&lt;br /&gt;
        (mc) mine-conflict, (tc) theirs-conflict,&lt;br /&gt;
        (s) show all options: df&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you choose &amp;quot;df&amp;quot;, then you will be presented with a summary of the 3-way difference: First how it was prior to your local change; second how it is in your working copy and lastly how it currently is in the repository.  You will also be presented with the list of options again.  If you choose mine-conflict, &amp;quot;mc&amp;quot;, your local modifications will be preferred--at least in this working copy, since nothing has been committed back at this stage.  Theirs-conflict, &amp;quot;tc&amp;quot;, will prefer the repository version.  If you elect to postpone, &amp;quot;p&amp;quot;, then 'README.md' is flagged with the letter &amp;quot;C&amp;quot; indicating a conflict and you will notice new files in your working copy:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$svn status&lt;br /&gt;
?      README.md.r5&lt;br /&gt;
?      README.md.r6&lt;br /&gt;
?      README.md.mine&lt;br /&gt;
C      README.md&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* README.md.r5 is README.md as at revision 5 (i.e. the one at your last update)&lt;br /&gt;
* README.md.r6 is README.md at revision 6 (i.e. the one that is on the repository now)&lt;br /&gt;
* README.md.mine is README.md as it was in working copy before the update&lt;br /&gt;
* README.md contains an attempt at merging the changes (this will be similar to what you see with &amp;quot;df&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
The last option, edit (e), will present to you the attempted merge in a text editor, for you to resolve as you see fit.  Note that if you type &amp;quot;svn status&amp;quot; after editing README.md in this situation, you will see that the file is still marked as conflicted and you will not be able to commit your changes until you have '''resolved''' the conflict by typing e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
svn resolved README.md&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Other useful commands=&lt;br /&gt;
&lt;br /&gt;
==info==&lt;br /&gt;
&lt;br /&gt;
From inside a working copy, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
svn info&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
will give:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~/test/trunk$ svn info&lt;br /&gt;
Path: .&lt;br /&gt;
URL: https://github.com/ggdagw/test/trunk&lt;br /&gt;
Repository Root: https://github.com/ggdagw/test&lt;br /&gt;
Repository UUID: be566c2d-dc09-ebaf-f5e5-ce57b7db7bff&lt;br /&gt;
Revision: 3&lt;br /&gt;
Node Kind: directory&lt;br /&gt;
Schedule: normal&lt;br /&gt;
Last Changed Author: ggdagw&lt;br /&gt;
Last Changed Rev: 3&lt;br /&gt;
Last Changed Date: 2013-07-26 16:03:50 +0100 (Fri, 26 Jul 2013)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==list==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~/test2/trunk$ svn list https://github.com/ggdagw/test&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
will give:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
branches/&lt;br /&gt;
trunk/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~/test2/trunk$ svn list https://github.com/ggdagw/test/trunk&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
will give:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
README.md&lt;br /&gt;
foo.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==move &amp;amp; copy==&lt;br /&gt;
If you rename a file or directory manually, you loose its history, this is because subversion needs to be notified that a tracked file or directory will have a new name. It is simpler to use the subversion '''move''' command. For instance, to rename &amp;quot;file2&amp;quot;, do: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ svn move file2 new_file2&lt;br /&gt;
A         new_file2&lt;br /&gt;
D         file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You notice that the new file is added and the old one deleted. You could have done this manually but the advantage of this is that the history of the new file before the new name is still available.&lt;br /&gt;
&lt;br /&gt;
A close relation to '''move''' is '''copy'''.  This creates a new file, with a copy of the revision history of it's template:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ svn copy havana havana2&lt;br /&gt;
A         havana2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==import==&lt;br /&gt;
When you ask for a new repository, it is empty by default. To populate it, you can use the import command. (import is because the action is done from the server, it imports something). The syntax is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ svn import PATH URL/trunk --message &amp;quot;Log message.&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* PATH is the path to the local folder (by default it ses &amp;quot;./&amp;quot;, i.e. the current folder)&lt;br /&gt;
* URL is the full URL of the repository. In the example, I also added &amp;quot;trunk/&amp;quot; at the end and the trunk would be created automatically.&lt;br /&gt;
&lt;br /&gt;
==mkdir ==&lt;br /&gt;
Often people ask how then can create the &amp;quot;branches/&amp;quot; directory in the repository to store some specific versions of their code. This can be done by invoking mkdir directly on the server. The syntax is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ svn mkdir URL/branches --message &amp;quot;Log message.&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==export==&lt;br /&gt;
Sometimes, you want to get the files from the version control system but this will not be used as a working copy, for instance you are going to send the files to somebody who is not involved in the development. For instance, it is the command that was used for the [[Linux1]] and [[Linux2]] practicals.&lt;br /&gt;
&lt;br /&gt;
You could do a checkout and remove all the hidden &amp;quot;.ssh&amp;quot; directories manually, but the easiest to to use the &amp;quot;export&amp;quot; command. It works exactly like a checkout except that you end up with a normal local folder, not a working copy. The syntax is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ svn export URL PATH&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==branching and merge==&lt;br /&gt;
&lt;br /&gt;
Subversion can support multiple development strands via the creation of branches.  The svn '''copy''' command is used to make a (space efficient) copy of an entire file tree from, say, the trunk to a subdir called &amp;quot;branches&amp;quot;.  A popular reason for creating a branch is for a particular developer (or team) to work on something speculative or disruptive:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
svn copy https://svn.ggy.bris.ac.uk/subversion/ourproject/trunk \&lt;br /&gt;
https://svn.ggy.bris.ac.uk/subversion/ourproject/branches/sally_dev \&lt;br /&gt;
-m &amp;quot;The reason why I'm branching is...&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
When you have different branches in your project, you might want to merge the changes from one branch to another. For instance somebody has fixed a bug on a branch that is still present in the trunk. You might want to apply the changes done on branch back onto the trunk. Subversion allows you to do this and it is called a merge operation. We will cover this topic very quickly here but you can refer to the [http://svnbook.red-bean.com Subversion Red Book] for more information about merging.&lt;br /&gt;
&lt;br /&gt;
[[Image:Svn_merge_general.jpg|frame|centre|General situation for svn merge]]&lt;br /&gt;
&lt;br /&gt;
For example, the following command will merge in a change set from revA to revB from the sally_dev branch of ourproject into your working copy:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
svn merge -r revA:revB https://svn.ggy.bris.ac.uk/subversion/ourproject/branches/sally_dev&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can then commit these changes--should you so desire--which will end up in the development line of whatever you chose to checkout in order to obtain your working copy.&lt;br /&gt;
&lt;br /&gt;
=To go further=&lt;br /&gt;
The [http://svnbook.red-bean.com/ Subversion Red Book] is the bible of subversion. Highly recommended.&lt;br /&gt;
&lt;br /&gt;
In the book, you can see how to create your own repository, should you desire.  For example, some simple repository setup commands will provide you with a working facility via the filesystem (i.e. the repository is on the same computer that you typically work on), or SSH (i.e. you have SSH access to the machine that will host the respository).  See below.  '''However, note two things:'''&lt;br /&gt;
* '''It is dangerous to create a repository on a files-ystem that is not backed-up.  You could lose all your work that way.'''&lt;br /&gt;
* '''You can request a (backed-up) University of Bristol repository simply by contacting the service desk.'''&lt;br /&gt;
&lt;br /&gt;
==Creating our own Repository==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
svnadmin create $HOME/my_test_repo&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Next you might import some files (that you have stored in a directory called 'projectX').  If you're working on the same filesystem you would use: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cd projectX&lt;br /&gt;
svn import . file://$HOME/my_test_repo/trunk -m &amp;quot;my import message&amp;quot;&lt;br /&gt;
svn list file://$HOME/my_test_repo/trunk&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Which--via SSH--would translate to:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cd projectX&lt;br /&gt;
svn import . svn+ssh://user@host/absolute/path/to/my_test_repo/trunk -m &amp;quot;my import message&amp;quot;&lt;br /&gt;
svn list svn+ssh://user@host/absolute/path/to/my_test_repo/trunk&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt; &lt;br /&gt;
&lt;br /&gt;
NB where the components of '''user@host/absolute/path/to''' are:&lt;br /&gt;
* user: your username on the remote machine hosting your repository&lt;br /&gt;
* host: the hostname of the remote machine&lt;br /&gt;
* /absolute/path/to: you must specify the absolute path to your repository ($HOME will be evaluated on the machine to SSH'ing ''from''&lt;br /&gt;
&lt;br /&gt;
And you're away!&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=R1&amp;diff=9469</id>
		<title>R1</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=R1&amp;diff=9469"/>
		<updated>2014-11-04T14:42:03Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: /* Standard Graphics: A taster */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[category:Pragmatic Programming]]&lt;br /&gt;
'''Open Source Statistics with R'''&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
R is a mature, open-source (i.e. free!) statistics package, with an intuitive interface, excellent graphics and a vibrant community constantly adding new methods for the statistical investigation of your data to the library of packages available.&lt;br /&gt;
&lt;br /&gt;
The goal of this tutorial is to introduce you to the R package, and not to be an introductory course in statistics.&lt;br /&gt;
&lt;br /&gt;
If you are working on a Linux system, you will typically start R from the command line.  On a Windows machine, or a Mac, you will typically start up R in some form of GUI.  However you get R started, you will have access to an R command prompt.  The good news is that the examples below will all work at the R command prompt, however you gained access to it.&lt;br /&gt;
&lt;br /&gt;
Further resources:&lt;br /&gt;
&lt;br /&gt;
* The R manual is a great resource for learning R: http://cran.r-project.org/doc/manuals/r-release/R-intro.pdf&lt;br /&gt;
* Some excellent examples of using R can also be found at: http://msenux.redwoods.edu/math/R/ and http://www.r-tutor.com/&lt;br /&gt;
&lt;br /&gt;
=Getting Started=&lt;br /&gt;
&lt;br /&gt;
The very simplest thing we can do with R is to perform some arithmetic at the command prompt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; phi &amp;lt;- (1+sqrt(5))/2&lt;br /&gt;
&amp;gt; phi&lt;br /&gt;
[1] 1.618034&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Parentheses are used to modify the usual order of precedence of the operators ('''/''' will typically be evaluated before '''+''').  Note the '''[1]''' accompanying the returned value.  All numbers entered at the console are interpreted as a vector.  The '[1]' indicates that the line in question is displaying the vector of values starting at first index.  We can use the handy sequence function to create a vector containing more than a single element:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; odds &amp;lt;- seq(from=1, to=67, by=2)&lt;br /&gt;
&amp;gt; odds&lt;br /&gt;
 [1]  1  3  5  7  9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49&lt;br /&gt;
[26] 51 53 55 57 59 61 63 65 67&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
From the above example, we can see that both the '''&amp;lt;-''' and '''=''' operators can be used for assignment.&lt;br /&gt;
&lt;br /&gt;
Vectors are commonly used data structures in R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
coords.bris &amp;lt;- c(51.5, 2.6)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As are matrices:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; magic &amp;lt;- matrix(data=c(2,7,6,9,5,1,4,3,8),nrow=3,ncol=3)&lt;br /&gt;
&amp;gt; magic&lt;br /&gt;
     [,1] [,2] [,3]&lt;br /&gt;
[1,]    2    9    4&lt;br /&gt;
[2,]    7    5    3&lt;br /&gt;
[3,]    6    1    8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the '''c''' function combines the arguments given in the parentheses.  We can access portions of the array using the syntax shown in the square brackets.  For example, we can access the first row using the '''[1,]''' notation, and similarly the second column using '''[,2]'''.  Since the square is 3x3 magic, the numbers in both slices should sum to 15:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; sum(magic[1,])&lt;br /&gt;
[1] 15&lt;br /&gt;
&amp;gt; sum(magic[,2])&lt;br /&gt;
[1] 15&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Single elements and ranges can also accessed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; magic[2,2]&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;gt; magic[2:3,2:3]&lt;br /&gt;
     [,1] [,2]&lt;br /&gt;
[1,]    5    3&lt;br /&gt;
[2,]    1    8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
R also provides '''arrays''', which have more than two dimensions, and '''lists''' to hold heterogeneous collections.&lt;br /&gt;
&lt;br /&gt;
An example list:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; list.r4 &amp;lt;- list(name=&amp;quot;Radio4&amp;quot;, frequency=&amp;quot;93.7&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The items of which, we can access in several ways:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; list.r4$frequency&lt;br /&gt;
[1] &amp;quot;93.7&amp;quot;&lt;br /&gt;
&amp;gt; list.r4[1]&lt;br /&gt;
$name&lt;br /&gt;
[1] &amp;quot;Radio4&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;gt; list.r4[[1]]&lt;br /&gt;
[1] &amp;quot;Radio4&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A very commonly used data structure is the '''data frame''', which R uses to store tabular data.  Given several vectors of equal length, we can collate them into a data frame:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; country &amp;lt;- c(&amp;quot;USA&amp;quot;, &amp;quot;China&amp;quot;, &amp;quot;GB&amp;quot;)&lt;br /&gt;
&amp;gt; gold &amp;lt;- c(46, 38, 29)&lt;br /&gt;
&amp;gt; silver &amp;lt;- c(29, 27, 17)&lt;br /&gt;
&amp;gt; bronze &amp;lt;- c(29, 23, 19)&lt;br /&gt;
&amp;gt; medals.2012 &amp;lt;- data.frame(country, gold, silver, bronze)&lt;br /&gt;
&amp;gt; medals.2012&lt;br /&gt;
    country  gold  silver  bronze&lt;br /&gt;
1       USA    46      29      29&lt;br /&gt;
2     China    38      27      23&lt;br /&gt;
3        GB    29      17      19&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can access columns of a data frame using the '''$''' operator:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; medals.2012$country&lt;br /&gt;
[1] USA   China GB   &lt;br /&gt;
Levels: China GB USA&lt;br /&gt;
&amp;gt; medals.2012$gold&lt;br /&gt;
[1] 46 38 29&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Standard Graphics: A taster=&lt;br /&gt;
&lt;br /&gt;
An aspect which makes R popular are it's graphing functions.  R also has some very handy built-in data sets--we'll use this to demonstrate just a small fraction of R's graphing abilities.&lt;br /&gt;
&lt;br /&gt;
First up is the humble '''plot()''' function.  Given a data frame of points, such as one charting the relationship between temperature and the vapour pressure of mercury, it will give us a (handily labelled) scatter plot: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(pressure)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See the gallery below for all the plots created in this section.&lt;br /&gt;
&lt;br /&gt;
The plot function will also accept a time-series (another class of object recognised by R) and will sensibly join the points with a line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(co2)&lt;br /&gt;
&amp;gt; class(co2)&lt;br /&gt;
[1] &amp;quot;ts&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Pie charts are easily constructed.  In this case, to show the relative proportions of electricity generated from different sources in the UK in 2011 (source: https://www.gov.uk/government/.../5942-uk-energy-in-brief-2012.pdf‎):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; uk.electricty.sources.2011 &amp;lt;- c(41,29,18,5,4,2,1)&lt;br /&gt;
&amp;gt; names(uk.electricty.sources.2011) &amp;lt;- c(&amp;quot;Gas&amp;quot;, &amp;quot;Coal&amp;quot;, &amp;quot;Nuclear&amp;quot;, &amp;quot;Hydro &amp;amp; other&amp;quot;, &amp;quot;Wind&amp;quot;, &amp;quot;Imports&amp;quot;, &amp;quot;Oil&amp;quot;)&lt;br /&gt;
&amp;gt; pie(uk.electricty.sources.2011, main=&amp;quot;UK Electricty Generating Mix, 2011&amp;quot;, col=rainbow(7))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Next, let's create a bar chart of monthly average precipitation falling here in the fair city of Bristol (source: http://www.worldweatheronline.com):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; bristol.precip &amp;lt;- c(82.9, 56.1, 59.2, 69, 50.8, 50.9, 50.8, 74.8, 74.7, 91.1, 94.5, 93.6)&lt;br /&gt;
&amp;gt; names(bristol.precip) &amp;lt;- c(&amp;quot;Jan&amp;quot;, &amp;quot;Feb&amp;quot;, &amp;quot;Mar&amp;quot;, &amp;quot;Apr&amp;quot;, &amp;quot;May&amp;quot;, &amp;quot;Jun&amp;quot;, &amp;quot;Jul&amp;quot;, &amp;quot;Aug&amp;quot;, &amp;quot;Sep&amp;quot;, &amp;quot;Oct&amp;quot;, &amp;quot;Nov&amp;quot;, &amp;quot;Dec&amp;quot;)&lt;br /&gt;
&amp;gt; barplot(bristol.precip,&lt;br /&gt;
+ main=&amp;quot;Average Monthly Precipitation in Bristol&amp;quot;,&lt;br /&gt;
+ ylab=&amp;quot;Mean precipitation (mm)&amp;quot;,&lt;br /&gt;
+ ylim=c(0,100),&lt;br /&gt;
+ col=c(&amp;quot;darkblue&amp;quot;))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Box_plot 'Box and whisker' plots] are useful ways to graph the quartiles of some data.  In this case, the fuel efficiencies of various US cars, circa 1974:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; boxplot(mpg~cyl,data=mtcars, main=&amp;quot;Car Milage Data&amp;quot;,&lt;br /&gt;
+    xlab=&amp;quot;Number of Cylinders&amp;quot;, ylab=&amp;quot;Miles Per Gallon&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
R includes a very useful help facility.  In the case of the '''filled.contour()''' plotting function, the help page includes an example of it's use to plot the topology of a volcano in Auckland, NZ:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; ?filled.countour&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;gallery widths=300px heights=300px perrow=3&amp;gt;&lt;br /&gt;
File:Vapour-pressure.png|Vapour pressure of mercury against temperature&lt;br /&gt;
File:Mauna-loa.png|CO2 concentrations measured at Mauna-Loa between 1959 and 1997&lt;br /&gt;
File:Pie.png|The UK's electricity generating mix, 2011&lt;br /&gt;
File:Barplot.png|Average monthly precipitation in Bristol&lt;br /&gt;
File:Boxplot.png|Range of fuel efficiencies for different engine sizes &lt;br /&gt;
File:Maunga-Whau.png|Topology of Maunga Whau volcano in Auckland&lt;br /&gt;
&amp;lt;/gallery&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are many more example plots--complete with the R code required to create the plots (at the bottom of the page, after the comments)--on the following web pages:&lt;br /&gt;
* http://www.sr.bham.ac.uk/~ajrs/R/r-gallery.html&lt;br /&gt;
* http://blog.revolutionanalytics.com/2009/01/r-graph-gallery.html&lt;br /&gt;
* https://www.facebook.com/pages/R-Graph-Gallery/169231589826661&lt;br /&gt;
* http://research.stowers-institute.org/efg/R/&lt;br /&gt;
* http://rspatial.r-forge.r-project.org/gallery/&lt;br /&gt;
&lt;br /&gt;
=Loops=&lt;br /&gt;
&lt;br /&gt;
A simple '''for''' loop:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; for (ii in seq(1,10)) print(ii)&lt;br /&gt;
[1] 1&lt;br /&gt;
[1] 2&lt;br /&gt;
[1] 3&lt;br /&gt;
[1] 4&lt;br /&gt;
[1] 5&lt;br /&gt;
[1] 6&lt;br /&gt;
[1] 7&lt;br /&gt;
[1] 8&lt;br /&gt;
[1] 9&lt;br /&gt;
[1] 10&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Some more exotic counting:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; for (ii in seq(from=10, to=0, by=-2)) print(ii)&lt;br /&gt;
[1] 10&lt;br /&gt;
[1] 8&lt;br /&gt;
[1] 6&lt;br /&gt;
[1] 4&lt;br /&gt;
[1] 2&lt;br /&gt;
[1] 0&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''while''' loops are for when we don't know the number of iterations in advance:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; ii &amp;lt;- runif(1,0,1)&lt;br /&gt;
&amp;gt; ii&lt;br /&gt;
[1] 0.3998513&lt;br /&gt;
&amp;gt; while (ii &amp;lt; 0.5) {print(ii); ii &amp;lt;- runif(1,0,1)}&lt;br /&gt;
[1] 0.3998513&lt;br /&gt;
[1] 0.05469244&lt;br /&gt;
&amp;gt; ii&lt;br /&gt;
[1] 0.8265036&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Functions=&lt;br /&gt;
&lt;br /&gt;
You can define your own functions in R, using the '''function''' keyword.  For example, Pythagoras' Theorem:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypotenuse &amp;lt;- function(x, y) {sqrt(x^2 + y^2)}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The braces ({}) are optional, but add clarity.&lt;br /&gt;
&lt;br /&gt;
To call the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypotenuse(3,4)&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can provide default values for the arguments, which can be overridden for any given invocation of the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot2 &amp;lt;- function(x=3 ,y=4) {sqrt(x^2 + y^2)}&lt;br /&gt;
&amp;gt; hypot2()&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;gt; hypot2(12,16)&lt;br /&gt;
[1] 20&lt;br /&gt;
&amp;gt; hypot2(y=16, x=12)&lt;br /&gt;
[1] 20&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can see that the order of the arguments is respected, unless the names are given, in which case the order can be changed. &lt;br /&gt;
&lt;br /&gt;
Longer functions can be spread over several lines.  We can also use the '''return''' keyword to control which value is returned by the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot3 &amp;lt;- function(x=3 ,y=4) {&lt;br /&gt;
+ x_sq &amp;lt;- x^2&lt;br /&gt;
+ y_sq &amp;lt;- y^2&lt;br /&gt;
+ return( sqrt(x_sq + y_sq) )}&lt;br /&gt;
&amp;gt; hypot3(6,8)&lt;br /&gt;
[1] 10&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can check on the contents of a function, by just typing it's name (without parentheses):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot3&lt;br /&gt;
function(x=3 ,y=4) {&lt;br /&gt;
x_sq &amp;lt;- x^2&lt;br /&gt;
y_sq &amp;lt;- y^2&lt;br /&gt;
return( sqrt(x_sq + y_sq) )}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or just check the arguments, using the '''args''' function.  (The body of the function in general is reported as NULL):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; args(hypot3)&lt;br /&gt;
function (x = 3, y = 4) &lt;br /&gt;
NULL&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Packages=&lt;br /&gt;
&lt;br /&gt;
Listed at http://cran.r-project.org/&lt;br /&gt;
&lt;br /&gt;
Let's install the '''multicore''' package, that will give us access to functions within R which will run on the multiple processors which we often find in our computers these days:   &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; install.packages(&amp;quot;multicore&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Et voila!  It is done.&lt;br /&gt;
&lt;br /&gt;
We can check which packages are currently loaded into the library available from our workspace:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; library()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If we need to add one, we type e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; library(multicore)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now, an example of using a function from the multicore package.  The '''lapply''' function, which is included in the standard R core, will map a given function over a list inputs, giving a list of the function outputs in return.  For example, we can map a squaring function over the list of integers from 1 to 3:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; lapply(1:3, function(x) {x^2})&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
which gives us the list:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[[1]]&lt;br /&gt;
[1] 1&lt;br /&gt;
&lt;br /&gt;
[[2]]&lt;br /&gt;
[1] 4&lt;br /&gt;
&lt;br /&gt;
[[3]]&lt;br /&gt;
[1] 9&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now, we can do the same work in parallel using:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; mclapply(1:3, function(x) {x^2})&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Reading Data from File=&lt;br /&gt;
&lt;br /&gt;
R provides some very useful functions for reading and writing data from/to file.&lt;br /&gt;
&lt;br /&gt;
==Text Files==&lt;br /&gt;
&lt;br /&gt;
Let's start with text files.  If your data is organised into a file such that it looks like a table with column headings:  &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Perhaps the simplest one is '''read.table()'''.  If I have a text file with the following contents:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
country              gold silver bronze&lt;br /&gt;
&amp;quot;USA&amp;quot;                46   29     29&lt;br /&gt;
&amp;quot;China&amp;quot;              38   27     23&lt;br /&gt;
&amp;quot;Great Britain&amp;quot;      29   17     19&lt;br /&gt;
&amp;quot;Russian Federation&amp;quot; 24   26     32&lt;br /&gt;
&amp;quot;Republic of Korea&amp;quot;  13   8      7&lt;br /&gt;
&amp;quot;Germany&amp;quot;            11   19     14&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It will be a simple matter to use the '''read.table()''' function to load the data into R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; medals.2012 &amp;lt;- read.table(&amp;quot;medals.txt&amp;quot;, header=TRUE)&lt;br /&gt;
&amp;gt; medals.2012&lt;br /&gt;
             country gold silver bronze&lt;br /&gt;
1                USA   46     29     29&lt;br /&gt;
2              China   38     27     23&lt;br /&gt;
3      Great Britain   29     17     19&lt;br /&gt;
4 Russian Federation   24     26     32&lt;br /&gt;
5  Republic of Korea   13      8      7&lt;br /&gt;
6            Germany   11     19     14&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is a corresponding '''write.table()''' function to export the contents of a data frame into a text file.&lt;br /&gt;
&lt;br /&gt;
CSV files can be easily handled by specifying '''sep=&amp;quot;,&amp;quot;''' as an argument to read.table().  However, for convenience, there are also '''read.csv()''' and '''write.csv()''' functions defined.  For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; write.csv(medals.2012,&amp;quot;medals.csv&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Gives us the file, '''medals.csv''', with the contents:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;quot;&amp;quot;,&amp;quot;country&amp;quot;,&amp;quot;gold&amp;quot;,&amp;quot;silver&amp;quot;,&amp;quot;bronze&amp;quot;&lt;br /&gt;
&amp;quot;1&amp;quot;,&amp;quot;USA&amp;quot;,46,29,29&lt;br /&gt;
&amp;quot;2&amp;quot;,&amp;quot;China&amp;quot;,38,27,23&lt;br /&gt;
&amp;quot;3&amp;quot;,&amp;quot;Great Britain&amp;quot;,29,17,19&lt;br /&gt;
&amp;quot;4&amp;quot;,&amp;quot;Russian Federation&amp;quot;,24,26,32&lt;br /&gt;
&amp;quot;5&amp;quot;,&amp;quot;Republic of Korea&amp;quot;,13,8,7&lt;br /&gt;
&amp;quot;6&amp;quot;,&amp;quot;Germany&amp;quot;,11,19,14&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Binary Files==&lt;br /&gt;
&lt;br /&gt;
The '''save()''' function will store an R data structure in binary form:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; save(medals.2012,file=&amp;quot;medals.RData&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~$ file medals.RData &lt;br /&gt;
medals.RData: gzip compressed data, from Unix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is, of course, a corresponding function to load such data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; load(&amp;quot;medals.RData&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Databases==&lt;br /&gt;
&lt;br /&gt;
If you would like to read and write data directly from/to a database, there are several packages to help you.  See http://cran.r-project.org/doc/manuals/r-release/R-data.html#Relational-databases for more information.&lt;br /&gt;
&lt;br /&gt;
==NetCDF==&lt;br /&gt;
&lt;br /&gt;
The [http://cran.r-project.org/web/packages/ncdf/index.html '''ncdf''' package] provides an interface to NetCDF files.  Before installing the package, you will need the Unidata NetCDF libraries installed on your system.  On Linux, the standard package managers conveniently provide this.  Note that you will need the 'development' packages.  Once the prerequisites are satisfied, you can use the standard R command to install the package from CRAN:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; install.packages(&amp;quot;ncdf&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Examples of Common Tasks=&lt;br /&gt;
&lt;br /&gt;
==Preparing Data==&lt;br /&gt;
&lt;br /&gt;
===Sorting===&lt;br /&gt;
&lt;br /&gt;
Using '''sort''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; railway.engines &amp;lt;- c(&amp;quot;thomas&amp;quot;, &amp;quot;henry&amp;quot;, &amp;quot;gordon&amp;quot;, &amp;quot;edward&amp;quot;, &amp;quot;james&amp;quot;)&lt;br /&gt;
&amp;gt; sort(railway.engines)&lt;br /&gt;
[1] &amp;quot;edward&amp;quot; &amp;quot;gordon&amp;quot; &amp;quot;henry&amp;quot;  &amp;quot;james&amp;quot;  &amp;quot;thomas&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/sort.html&lt;br /&gt;
&lt;br /&gt;
===Random Sampling===&lt;br /&gt;
&lt;br /&gt;
Using '''sample''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; railway.engines &amp;lt;- c(&amp;quot;thomas&amp;quot;, &amp;quot;henry&amp;quot;, &amp;quot;gordon&amp;quot;, &amp;quot;edward&amp;quot;, &amp;quot;james&amp;quot;)&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;gordon&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;james&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;edward&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;thomas&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;gordon&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;james&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/sample.html&lt;br /&gt;
&lt;br /&gt;
===Combining===&lt;br /&gt;
&lt;br /&gt;
Using '''rbind''' to add combine the rows to two data frames:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; country &amp;lt;- c(&amp;quot;France&amp;quot;, &amp;quot;Italy&amp;quot;, &amp;quot;Hungary&amp;quot;, &amp;quot;Australia&amp;quot;)&lt;br /&gt;
&amp;gt; gold &amp;lt;- c(11, 8, 8, 7)&lt;br /&gt;
&amp;gt; silver &amp;lt;- c(11, 9, 4, 16)&lt;br /&gt;
&amp;gt; bronze &amp;lt;- c(12, 11, 5, 12)&lt;br /&gt;
&amp;gt; extras.2012 &amp;lt;- data.frame(country, gold, silver, bronze)&lt;br /&gt;
&amp;gt; rbind(medals.2012, extras.2012)&lt;br /&gt;
              country gold silver bronze&lt;br /&gt;
1                 USA   46     29     29&lt;br /&gt;
2               China   38     27     23&lt;br /&gt;
3       Great Britain   29     17     19&lt;br /&gt;
4  Russian Federation   24     26     32&lt;br /&gt;
5   Republic of Korea   13      8      7&lt;br /&gt;
6             Germany   11     19     14&lt;br /&gt;
7              France   11     11     12&lt;br /&gt;
8               Italy    8      9     11&lt;br /&gt;
9             Hungary    8      4      5&lt;br /&gt;
10          Australia    7     16     12&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/cbind.html&lt;br /&gt;
&lt;br /&gt;
===Binning Data===&lt;br /&gt;
&lt;br /&gt;
Using '''cut''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; girls_2=c(83.8, 86.2, 85.1, 88.6, 83, 88.9, 89.7, 81.3, 88.7, 88.4)&lt;br /&gt;
&amp;gt; bins=cut(girls_2, breaks=3)&lt;br /&gt;
&amp;gt; bins&lt;br /&gt;
 [1] (81.3,84.1] (84.1,86.9] (84.1,86.9] (86.9,89.7] (81.3,84.1] (86.9,89.7]&lt;br /&gt;
 [7] (86.9,89.7] (81.3,84.1] (86.9,89.7] (86.9,89.7]&lt;br /&gt;
Levels: (81.3,84.1] (84.1,86.9] (86.9,89.7]&lt;br /&gt;
&amp;gt; plot(bins)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Plotting the data couldn't be simpler with '''plot(bins)'''!&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/cut.html&lt;br /&gt;
&lt;br /&gt;
==Linear Regression==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(cars)&lt;br /&gt;
&amp;gt; res=lm(dist ~ speed, data=cars)&lt;br /&gt;
&amp;gt; abline(res)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Image:R-lm(cars)-abline.png|400px|thumbnail|center|linear regression of stopping distance against speed from the built-in data set, cars]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
'''Exercises'''&lt;br /&gt;
* You may wish to compare different methods of estimation.  From the MASS package, you can fit a line with the '''rlm''' and '''lqs'' funtions.  You can plot all the lines against the data using:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; abline(res.lm, lty=1)&lt;br /&gt;
&amp;gt; abline(res.rlm, lty=2)&lt;br /&gt;
&amp;gt; abline(res.lqs, lty=3)&lt;br /&gt;
&amp;gt; legend(x=5, y=100, legend=c(&amp;quot;lm&amp;quot;,&amp;quot;rlm&amp;quot;,&amp;quot;lqs&amp;quot;), lty=c(1,2,3))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-patched/library/MASS/html/rlm.html and http://stat.ethz.ch/R-manual/R-devel/RHOME/library/MASS/html/lqs.html.&lt;br /&gt;
&lt;br /&gt;
* Weighted least squares.  The '''lm''' function will accept a vector of weights, '''lm(... weights=...)'''.  If given, the function will optimise the line of best fit according a the equation of weighted least squares.  Experiment with different linear model fits, given different weighting vectors.  Some handy hints for creating a vector of weights:&lt;br /&gt;
** '''w1&amp;lt;-rep(0.1,50)''' will give you a vector, length 50, where each element has a value of 0.1.  W1[1]&amp;lt;-10 will give the first element of the vector a value of 10.&lt;br /&gt;
** '''w2&amp;lt;-seq(from=0.02, to=1.0, by=0.02)''' provides a vector containing a sequence of values from 0.02 to 1.0 in steps of 0.02 (handily, again 50 in total).&lt;br /&gt;
&lt;br /&gt;
==Significance Testing==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; boys_2=c(90.2, 91.4, 86.4, 87.6, 86.7, 88.1, 82.2, 83.8, 91, 87.4)&lt;br /&gt;
&amp;gt; girls_2=c(83.8, 86.2, 85.1, 88.6, 83, 88.9, 89.7, 81.3, 88.7, 88.4)&lt;br /&gt;
&amp;gt; res=var.test(boys_2,girls_2)&lt;br /&gt;
&amp;gt; res&lt;br /&gt;
&lt;br /&gt;
	F test to compare two variances&lt;br /&gt;
&lt;br /&gt;
data:  boys_2 and girls_2 &lt;br /&gt;
F = 1.0186, num df = 9, denom df = 9, p-value = 0.9786&lt;br /&gt;
alternative hypothesis: true ratio of variances is not equal to 1 &lt;br /&gt;
95 percent confidence interval:&lt;br /&gt;
 0.2529956 4.1007126 &lt;br /&gt;
sample estimates:&lt;br /&gt;
ratio of variances &lt;br /&gt;
          1.018559 &lt;br /&gt;
&amp;gt; res=t.test(boys_2, girls_2, var.equal=TRUE, paired=FALSE)&lt;br /&gt;
&amp;gt; res&lt;br /&gt;
&lt;br /&gt;
	Two Sample t-test&lt;br /&gt;
&lt;br /&gt;
data:  boys_2 and girls_2 &lt;br /&gt;
t = 0.8429, df = 18, p-value = 0.4103&lt;br /&gt;
alternative hypothesis: true difference in means is not equal to 0 &lt;br /&gt;
95 percent confidence interval:&lt;br /&gt;
 -1.656675  3.876675 &lt;br /&gt;
sample estimates:&lt;br /&gt;
mean of x mean of y &lt;br /&gt;
    87.48     86.3&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Classification==&lt;br /&gt;
&lt;br /&gt;
===k Nearest Neighbours===&lt;br /&gt;
&lt;br /&gt;
This famous (Fisher's or Anderson's) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa (s), versicolor (c), and virginica (v).&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-patched/library/datasets/html/iris.html&lt;br /&gt;
&lt;br /&gt;
k-nearest neighbour classification for test set from training set: For each row of the test set, the k nearest (in Euclidean distance) training set vectors are found, and the classification is decided by majority vote, with ties broken at random. If there are ties for the kth nearest vector, all candidates are included in the vote.&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/class/html/knn.html&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(class)&lt;br /&gt;
train &amp;lt;- rbind(iris3[1:25,,1], iris3[1:25,,2], iris3[1:25,,3])&lt;br /&gt;
test &amp;lt;- rbind(iris3[26:50,,1], iris3[26:50,,2], iris3[26:50,,3])&lt;br /&gt;
cl &amp;lt;- factor(c(rep(&amp;quot;s&amp;quot;,25), rep(&amp;quot;c&amp;quot;,25), rep(&amp;quot;v&amp;quot;,25)))&lt;br /&gt;
iris3.knn &amp;lt;- knn(train, test, cl, k = 3, prob=TRUE)&lt;br /&gt;
table(predicted=iris3.knn, actual=cl)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
How did we do?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
         actual&lt;br /&gt;
predicted  c  s  v&lt;br /&gt;
        c 23  0  3&lt;br /&gt;
        s  0 25  0&lt;br /&gt;
        v  2  0 22&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Classification Trees===&lt;br /&gt;
&lt;br /&gt;
The kyphosis data frame has 81 rows and 4 columns. representing data on children who have had corrective spinal surgery.&lt;br /&gt;
&lt;br /&gt;
This data frame contains the following columns:&lt;br /&gt;
* Kyphosis: a factor with levels absent present indicating if a kyphosis (a type of deformation) was present after the operation.&lt;br /&gt;
* Age: in months&lt;br /&gt;
* Number: the number of vertebrae involved&lt;br /&gt;
* Start: the number of the first (topmost) vertebra operated on.&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/rpart/html/kyphosis.html&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
fit &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)&lt;br /&gt;
fit2 &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis,&lt;br /&gt;
              parms = list(prior = c(.65,.35), split = &amp;quot;information&amp;quot;))&lt;br /&gt;
fit3 &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis,&lt;br /&gt;
              control = rpart.control(cp = 0.05))&lt;br /&gt;
par(mfrow = c(1,2), xpd = NA) # otherwise on some devices the text is clipped&lt;br /&gt;
plot(fit)&lt;br /&gt;
text(fit, use.n = TRUE)&lt;br /&gt;
plot(fit2)&lt;br /&gt;
text(fit2, use.n = TRUE)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Image:R-classification-tree.png|500px|thumbnail|center|Classification tree for the kyphosis data frame.]]&lt;br /&gt;
&lt;br /&gt;
==Solving Systems of Linear Equations==&lt;br /&gt;
&lt;br /&gt;
See, e.g.: https://source.ggy.bris.ac.uk/wiki/NumMethodsPDEs&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; A &amp;lt;- array(c(1,3,2,3,5,4,-2,6,3), dim=c(3,3))&lt;br /&gt;
&amp;gt; b &amp;lt;- c(5,7,8)&lt;br /&gt;
&amp;gt; solve(A,b)&lt;br /&gt;
[1] -15   8   2&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Suggested Exercises=&lt;br /&gt;
&lt;br /&gt;
If you would like to work through some exercises, with model answers included, you could take a look at:&lt;br /&gt;
* http://www2.warwick.ac.uk/fac/sci/statistics/staff/academic-research/reed/rexercises.pdf&lt;br /&gt;
&lt;br /&gt;
If you would prefer to noodle about with some real-world data, you could take a look at:&lt;br /&gt;
* http://www.theguardian.com/news/datablog/2010/oct/18/historic-government-spending-area#data&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=R1&amp;diff=9468</id>
		<title>R1</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=R1&amp;diff=9468"/>
		<updated>2014-11-03T16:28:27Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: /* Standard Graphics: A taster */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[category:Pragmatic Programming]]&lt;br /&gt;
'''Open Source Statistics with R'''&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
R is a mature, open-source (i.e. free!) statistics package, with an intuitive interface, excellent graphics and a vibrant community constantly adding new methods for the statistical investigation of your data to the library of packages available.&lt;br /&gt;
&lt;br /&gt;
The goal of this tutorial is to introduce you to the R package, and not to be an introductory course in statistics.&lt;br /&gt;
&lt;br /&gt;
If you are working on a Linux system, you will typically start R from the command line.  On a Windows machine, or a Mac, you will typically start up R in some form of GUI.  However you get R started, you will have access to an R command prompt.  The good news is that the examples below will all work at the R command prompt, however you gained access to it.&lt;br /&gt;
&lt;br /&gt;
Further resources:&lt;br /&gt;
&lt;br /&gt;
* The R manual is a great resource for learning R: http://cran.r-project.org/doc/manuals/r-release/R-intro.pdf&lt;br /&gt;
* Some excellent examples of using R can also be found at: http://msenux.redwoods.edu/math/R/ and http://www.r-tutor.com/&lt;br /&gt;
&lt;br /&gt;
=Getting Started=&lt;br /&gt;
&lt;br /&gt;
The very simplest thing we can do with R is to perform some arithmetic at the command prompt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; phi &amp;lt;- (1+sqrt(5))/2&lt;br /&gt;
&amp;gt; phi&lt;br /&gt;
[1] 1.618034&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Parentheses are used to modify the usual order of precedence of the operators ('''/''' will typically be evaluated before '''+''').  Note the '''[1]''' accompanying the returned value.  All numbers entered at the console are interpreted as a vector.  The '[1]' indicates that the line in question is displaying the vector of values starting at first index.  We can use the handy sequence function to create a vector containing more than a single element:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; odds &amp;lt;- seq(from=1, to=67, by=2)&lt;br /&gt;
&amp;gt; odds&lt;br /&gt;
 [1]  1  3  5  7  9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49&lt;br /&gt;
[26] 51 53 55 57 59 61 63 65 67&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
From the above example, we can see that both the '''&amp;lt;-''' and '''=''' operators can be used for assignment.&lt;br /&gt;
&lt;br /&gt;
Vectors are commonly used data structures in R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
coords.bris &amp;lt;- c(51.5, 2.6)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As are matrices:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; magic &amp;lt;- matrix(data=c(2,7,6,9,5,1,4,3,8),nrow=3,ncol=3)&lt;br /&gt;
&amp;gt; magic&lt;br /&gt;
     [,1] [,2] [,3]&lt;br /&gt;
[1,]    2    9    4&lt;br /&gt;
[2,]    7    5    3&lt;br /&gt;
[3,]    6    1    8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the '''c''' function combines the arguments given in the parentheses.  We can access portions of the array using the syntax shown in the square brackets.  For example, we can access the first row using the '''[1,]''' notation, and similarly the second column using '''[,2]'''.  Since the square is 3x3 magic, the numbers in both slices should sum to 15:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; sum(magic[1,])&lt;br /&gt;
[1] 15&lt;br /&gt;
&amp;gt; sum(magic[,2])&lt;br /&gt;
[1] 15&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Single elements and ranges can also accessed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; magic[2,2]&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;gt; magic[2:3,2:3]&lt;br /&gt;
     [,1] [,2]&lt;br /&gt;
[1,]    5    3&lt;br /&gt;
[2,]    1    8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
R also provides '''arrays''', which have more than two dimensions, and '''lists''' to hold heterogeneous collections.&lt;br /&gt;
&lt;br /&gt;
An example list:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; list.r4 &amp;lt;- list(name=&amp;quot;Radio4&amp;quot;, frequency=&amp;quot;93.7&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The items of which, we can access in several ways:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; list.r4$frequency&lt;br /&gt;
[1] &amp;quot;93.7&amp;quot;&lt;br /&gt;
&amp;gt; list.r4[1]&lt;br /&gt;
$name&lt;br /&gt;
[1] &amp;quot;Radio4&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;gt; list.r4[[1]]&lt;br /&gt;
[1] &amp;quot;Radio4&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A very commonly used data structure is the '''data frame''', which R uses to store tabular data.  Given several vectors of equal length, we can collate them into a data frame:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; country &amp;lt;- c(&amp;quot;USA&amp;quot;, &amp;quot;China&amp;quot;, &amp;quot;GB&amp;quot;)&lt;br /&gt;
&amp;gt; gold &amp;lt;- c(46, 38, 29)&lt;br /&gt;
&amp;gt; silver &amp;lt;- c(29, 27, 17)&lt;br /&gt;
&amp;gt; bronze &amp;lt;- c(29, 23, 19)&lt;br /&gt;
&amp;gt; medals.2012 &amp;lt;- data.frame(country, gold, silver, bronze)&lt;br /&gt;
&amp;gt; medals.2012&lt;br /&gt;
    country  gold  silver  bronze&lt;br /&gt;
1       USA    46      29      29&lt;br /&gt;
2     China    38      27      23&lt;br /&gt;
3        GB    29      17      19&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can access columns of a data frame using the '''$''' operator:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; medals.2012$country&lt;br /&gt;
[1] USA   China GB   &lt;br /&gt;
Levels: China GB USA&lt;br /&gt;
&amp;gt; medals.2012$gold&lt;br /&gt;
[1] 46 38 29&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Standard Graphics: A taster=&lt;br /&gt;
&lt;br /&gt;
An aspect which makes R popular are it's graphing functions.  R also has some very handy built-in data sets--we'll use this to demonstrate just a small fraction of R's graphing abilities.&lt;br /&gt;
&lt;br /&gt;
First up is the humble '''plot()''' function.  Given a data frame of points, such as one charting the relationship between temperature and the vapour pressure of mercury, it will give us a (handily labelled) scatter plot: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(pressure)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See the gallery below for all the plots created in this section.&lt;br /&gt;
&lt;br /&gt;
The plot function will also accept a time-series (another class of object recognised by R) and will sensibly join the points with a line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(co2)&lt;br /&gt;
&amp;gt; class(co2)&lt;br /&gt;
[1] &amp;quot;ts&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Pie charts are easily constructed.  In this case, to show the relative proportions of electricity generated from different sources in the UK in 2011 (source: https://www.gov.uk/government/.../5942-uk-energy-in-brief-2012.pdf‎):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; uk.electricty.sources.2011 &amp;lt;- c(41,29,18,5,4,2,1)&lt;br /&gt;
&amp;gt; names(uk.electricty.sources.2011) &amp;lt;- c(&amp;quot;Gas&amp;quot;, &amp;quot;Coal&amp;quot;, &amp;quot;Nuclear&amp;quot;, &amp;quot;Hydro &amp;amp; other&amp;quot;, &amp;quot;Wind&amp;quot;, &amp;quot;Imports&amp;quot;, &amp;quot;Oil&amp;quot;)&lt;br /&gt;
&amp;gt; pie(uk.electricty.sources.2011, main=&amp;quot;UK Electricty Generating Mix, 2011&amp;quot;, col=rainbow(7))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Next, let's create a bar chart of monthly average precipitation falling here in the fair city of Bristol (source: http://www.worldweatheronline.com):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; bristol.precip &amp;lt;- c(82.9, 56.1, 59.2, 69, 50.8, 50.9, 50.8, 74.8, 74.7, 91.1, 94.5, 93.6)&lt;br /&gt;
&amp;gt; names(bristol.precip) &amp;lt;- c(&amp;quot;Jan&amp;quot;, &amp;quot;Feb&amp;quot;, &amp;quot;Mar&amp;quot;, &amp;quot;Apr&amp;quot;, &amp;quot;May&amp;quot;, &amp;quot;Jun&amp;quot;, &amp;quot;Jul&amp;quot;, &amp;quot;Aug&amp;quot;, &amp;quot;Sep&amp;quot;, &amp;quot;Oct&amp;quot;, &amp;quot;Nov&amp;quot;, &amp;quot;Dec&amp;quot;)&lt;br /&gt;
&amp;gt; barplot(bristol.precip,&lt;br /&gt;
+ main=&amp;quot;Average Monthly Precipitation in Bristol&amp;quot;,&lt;br /&gt;
+ ylab=&amp;quot;Mean precipitation (mm)&amp;quot;,&lt;br /&gt;
+ ylim=c(0,100),&lt;br /&gt;
+ col=c(&amp;quot;darkblue&amp;quot;))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Box_plot 'Box and whisker' plots] are useful ways to graph the quartiles of some data.  In this case, the fuel efficiencies of various US cars, circa 1974:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; boxplot(mpg~cyl,data=mtcars, main=&amp;quot;Car Milage Data&amp;quot;,&lt;br /&gt;
+    xlab=&amp;quot;Number of Cylinders&amp;quot;, ylab=&amp;quot;Miles Per Gallon&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
R includes a very useful help facility.  In the case of the '''filled.contour()''' plotting function, the help page includes an example of it's use to plot the topology of a volcano in Auckland, NZ:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; ?filled.countour&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;gallery widths=300px heights=300px perrow=3&amp;gt;&lt;br /&gt;
File:Vapour-pressure.png|Vapour pressure of mercury against temperature&lt;br /&gt;
File:Mauna-loa.png|CO2 concentrations measured at Mauna-Loa between 1959 and 1997&lt;br /&gt;
File:Pie.png|The UK's electricity generating mix, 2011&lt;br /&gt;
File:Barplot.png|Average monthly precipitation in Bristol&lt;br /&gt;
File:Boxplot.png|Range of fuel efficiencies for different engine sizes &lt;br /&gt;
File:Maunga-Whau.png|Topology of Maunga Whau volcano in Auckland&lt;br /&gt;
&amp;lt;/gallery&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are many more example plots--complete with the R code required to create the plots (at the bottom of the page, after the comments)--on the following web page:&lt;br /&gt;
* http://gallery.r-enthusiasts.com/thumbs.php&lt;br /&gt;
&lt;br /&gt;
=Loops=&lt;br /&gt;
&lt;br /&gt;
A simple '''for''' loop:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; for (ii in seq(1,10)) print(ii)&lt;br /&gt;
[1] 1&lt;br /&gt;
[1] 2&lt;br /&gt;
[1] 3&lt;br /&gt;
[1] 4&lt;br /&gt;
[1] 5&lt;br /&gt;
[1] 6&lt;br /&gt;
[1] 7&lt;br /&gt;
[1] 8&lt;br /&gt;
[1] 9&lt;br /&gt;
[1] 10&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Some more exotic counting:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; for (ii in seq(from=10, to=0, by=-2)) print(ii)&lt;br /&gt;
[1] 10&lt;br /&gt;
[1] 8&lt;br /&gt;
[1] 6&lt;br /&gt;
[1] 4&lt;br /&gt;
[1] 2&lt;br /&gt;
[1] 0&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''while''' loops are for when we don't know the number of iterations in advance:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; ii &amp;lt;- runif(1,0,1)&lt;br /&gt;
&amp;gt; ii&lt;br /&gt;
[1] 0.3998513&lt;br /&gt;
&amp;gt; while (ii &amp;lt; 0.5) {print(ii); ii &amp;lt;- runif(1,0,1)}&lt;br /&gt;
[1] 0.3998513&lt;br /&gt;
[1] 0.05469244&lt;br /&gt;
&amp;gt; ii&lt;br /&gt;
[1] 0.8265036&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Functions=&lt;br /&gt;
&lt;br /&gt;
You can define your own functions in R, using the '''function''' keyword.  For example, Pythagoras' Theorem:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypotenuse &amp;lt;- function(x, y) {sqrt(x^2 + y^2)}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The braces ({}) are optional, but add clarity.&lt;br /&gt;
&lt;br /&gt;
To call the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypotenuse(3,4)&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can provide default values for the arguments, which can be overridden for any given invocation of the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot2 &amp;lt;- function(x=3 ,y=4) {sqrt(x^2 + y^2)}&lt;br /&gt;
&amp;gt; hypot2()&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;gt; hypot2(12,16)&lt;br /&gt;
[1] 20&lt;br /&gt;
&amp;gt; hypot2(y=16, x=12)&lt;br /&gt;
[1] 20&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can see that the order of the arguments is respected, unless the names are given, in which case the order can be changed. &lt;br /&gt;
&lt;br /&gt;
Longer functions can be spread over several lines.  We can also use the '''return''' keyword to control which value is returned by the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot3 &amp;lt;- function(x=3 ,y=4) {&lt;br /&gt;
+ x_sq &amp;lt;- x^2&lt;br /&gt;
+ y_sq &amp;lt;- y^2&lt;br /&gt;
+ return( sqrt(x_sq + y_sq) )}&lt;br /&gt;
&amp;gt; hypot3(6,8)&lt;br /&gt;
[1] 10&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can check on the contents of a function, by just typing it's name (without parentheses):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot3&lt;br /&gt;
function(x=3 ,y=4) {&lt;br /&gt;
x_sq &amp;lt;- x^2&lt;br /&gt;
y_sq &amp;lt;- y^2&lt;br /&gt;
return( sqrt(x_sq + y_sq) )}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or just check the arguments, using the '''args''' function.  (The body of the function in general is reported as NULL):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; args(hypot3)&lt;br /&gt;
function (x = 3, y = 4) &lt;br /&gt;
NULL&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Packages=&lt;br /&gt;
&lt;br /&gt;
Listed at http://cran.r-project.org/&lt;br /&gt;
&lt;br /&gt;
Let's install the '''multicore''' package, that will give us access to functions within R which will run on the multiple processors which we often find in our computers these days:   &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; install.packages(&amp;quot;multicore&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Et voila!  It is done.&lt;br /&gt;
&lt;br /&gt;
We can check which packages are currently loaded into the library available from our workspace:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; library()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If we need to add one, we type e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; library(multicore)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now, an example of using a function from the multicore package.  The '''lapply''' function, which is included in the standard R core, will map a given function over a list inputs, giving a list of the function outputs in return.  For example, we can map a squaring function over the list of integers from 1 to 3:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; lapply(1:3, function(x) {x^2})&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
which gives us the list:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[[1]]&lt;br /&gt;
[1] 1&lt;br /&gt;
&lt;br /&gt;
[[2]]&lt;br /&gt;
[1] 4&lt;br /&gt;
&lt;br /&gt;
[[3]]&lt;br /&gt;
[1] 9&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now, we can do the same work in parallel using:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; mclapply(1:3, function(x) {x^2})&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Reading Data from File=&lt;br /&gt;
&lt;br /&gt;
R provides some very useful functions for reading and writing data from/to file.&lt;br /&gt;
&lt;br /&gt;
==Text Files==&lt;br /&gt;
&lt;br /&gt;
Let's start with text files.  If your data is organised into a file such that it looks like a table with column headings:  &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Perhaps the simplest one is '''read.table()'''.  If I have a text file with the following contents:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
country              gold silver bronze&lt;br /&gt;
&amp;quot;USA&amp;quot;                46   29     29&lt;br /&gt;
&amp;quot;China&amp;quot;              38   27     23&lt;br /&gt;
&amp;quot;Great Britain&amp;quot;      29   17     19&lt;br /&gt;
&amp;quot;Russian Federation&amp;quot; 24   26     32&lt;br /&gt;
&amp;quot;Republic of Korea&amp;quot;  13   8      7&lt;br /&gt;
&amp;quot;Germany&amp;quot;            11   19     14&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It will be a simple matter to use the '''read.table()''' function to load the data into R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; medals.2012 &amp;lt;- read.table(&amp;quot;medals.txt&amp;quot;, header=TRUE)&lt;br /&gt;
&amp;gt; medals.2012&lt;br /&gt;
             country gold silver bronze&lt;br /&gt;
1                USA   46     29     29&lt;br /&gt;
2              China   38     27     23&lt;br /&gt;
3      Great Britain   29     17     19&lt;br /&gt;
4 Russian Federation   24     26     32&lt;br /&gt;
5  Republic of Korea   13      8      7&lt;br /&gt;
6            Germany   11     19     14&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is a corresponding '''write.table()''' function to export the contents of a data frame into a text file.&lt;br /&gt;
&lt;br /&gt;
CSV files can be easily handled by specifying '''sep=&amp;quot;,&amp;quot;''' as an argument to read.table().  However, for convenience, there are also '''read.csv()''' and '''write.csv()''' functions defined.  For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; write.csv(medals.2012,&amp;quot;medals.csv&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Gives us the file, '''medals.csv''', with the contents:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;quot;&amp;quot;,&amp;quot;country&amp;quot;,&amp;quot;gold&amp;quot;,&amp;quot;silver&amp;quot;,&amp;quot;bronze&amp;quot;&lt;br /&gt;
&amp;quot;1&amp;quot;,&amp;quot;USA&amp;quot;,46,29,29&lt;br /&gt;
&amp;quot;2&amp;quot;,&amp;quot;China&amp;quot;,38,27,23&lt;br /&gt;
&amp;quot;3&amp;quot;,&amp;quot;Great Britain&amp;quot;,29,17,19&lt;br /&gt;
&amp;quot;4&amp;quot;,&amp;quot;Russian Federation&amp;quot;,24,26,32&lt;br /&gt;
&amp;quot;5&amp;quot;,&amp;quot;Republic of Korea&amp;quot;,13,8,7&lt;br /&gt;
&amp;quot;6&amp;quot;,&amp;quot;Germany&amp;quot;,11,19,14&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Binary Files==&lt;br /&gt;
&lt;br /&gt;
The '''save()''' function will store an R data structure in binary form:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; save(medals.2012,file=&amp;quot;medals.RData&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~$ file medals.RData &lt;br /&gt;
medals.RData: gzip compressed data, from Unix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is, of course, a corresponding function to load such data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; load(&amp;quot;medals.RData&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Databases==&lt;br /&gt;
&lt;br /&gt;
If you would like to read and write data directly from/to a database, there are several packages to help you.  See http://cran.r-project.org/doc/manuals/r-release/R-data.html#Relational-databases for more information.&lt;br /&gt;
&lt;br /&gt;
==NetCDF==&lt;br /&gt;
&lt;br /&gt;
The [http://cran.r-project.org/web/packages/ncdf/index.html '''ncdf''' package] provides an interface to NetCDF files.  Before installing the package, you will need the Unidata NetCDF libraries installed on your system.  On Linux, the standard package managers conveniently provide this.  Note that you will need the 'development' packages.  Once the prerequisites are satisfied, you can use the standard R command to install the package from CRAN:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; install.packages(&amp;quot;ncdf&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Examples of Common Tasks=&lt;br /&gt;
&lt;br /&gt;
==Preparing Data==&lt;br /&gt;
&lt;br /&gt;
===Sorting===&lt;br /&gt;
&lt;br /&gt;
Using '''sort''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; railway.engines &amp;lt;- c(&amp;quot;thomas&amp;quot;, &amp;quot;henry&amp;quot;, &amp;quot;gordon&amp;quot;, &amp;quot;edward&amp;quot;, &amp;quot;james&amp;quot;)&lt;br /&gt;
&amp;gt; sort(railway.engines)&lt;br /&gt;
[1] &amp;quot;edward&amp;quot; &amp;quot;gordon&amp;quot; &amp;quot;henry&amp;quot;  &amp;quot;james&amp;quot;  &amp;quot;thomas&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/sort.html&lt;br /&gt;
&lt;br /&gt;
===Random Sampling===&lt;br /&gt;
&lt;br /&gt;
Using '''sample''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; railway.engines &amp;lt;- c(&amp;quot;thomas&amp;quot;, &amp;quot;henry&amp;quot;, &amp;quot;gordon&amp;quot;, &amp;quot;edward&amp;quot;, &amp;quot;james&amp;quot;)&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;gordon&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;james&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;edward&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;thomas&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;gordon&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;james&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/sample.html&lt;br /&gt;
&lt;br /&gt;
===Combining===&lt;br /&gt;
&lt;br /&gt;
Using '''rbind''' to add combine the rows to two data frames:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; country &amp;lt;- c(&amp;quot;France&amp;quot;, &amp;quot;Italy&amp;quot;, &amp;quot;Hungary&amp;quot;, &amp;quot;Australia&amp;quot;)&lt;br /&gt;
&amp;gt; gold &amp;lt;- c(11, 8, 8, 7)&lt;br /&gt;
&amp;gt; silver &amp;lt;- c(11, 9, 4, 16)&lt;br /&gt;
&amp;gt; bronze &amp;lt;- c(12, 11, 5, 12)&lt;br /&gt;
&amp;gt; extras.2012 &amp;lt;- data.frame(country, gold, silver, bronze)&lt;br /&gt;
&amp;gt; rbind(medals.2012, extras.2012)&lt;br /&gt;
              country gold silver bronze&lt;br /&gt;
1                 USA   46     29     29&lt;br /&gt;
2               China   38     27     23&lt;br /&gt;
3       Great Britain   29     17     19&lt;br /&gt;
4  Russian Federation   24     26     32&lt;br /&gt;
5   Republic of Korea   13      8      7&lt;br /&gt;
6             Germany   11     19     14&lt;br /&gt;
7              France   11     11     12&lt;br /&gt;
8               Italy    8      9     11&lt;br /&gt;
9             Hungary    8      4      5&lt;br /&gt;
10          Australia    7     16     12&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/cbind.html&lt;br /&gt;
&lt;br /&gt;
===Binning Data===&lt;br /&gt;
&lt;br /&gt;
Using '''cut''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; girls_2=c(83.8, 86.2, 85.1, 88.6, 83, 88.9, 89.7, 81.3, 88.7, 88.4)&lt;br /&gt;
&amp;gt; bins=cut(girls_2, breaks=3)&lt;br /&gt;
&amp;gt; bins&lt;br /&gt;
 [1] (81.3,84.1] (84.1,86.9] (84.1,86.9] (86.9,89.7] (81.3,84.1] (86.9,89.7]&lt;br /&gt;
 [7] (86.9,89.7] (81.3,84.1] (86.9,89.7] (86.9,89.7]&lt;br /&gt;
Levels: (81.3,84.1] (84.1,86.9] (86.9,89.7]&lt;br /&gt;
&amp;gt; plot(bins)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Plotting the data couldn't be simpler with '''plot(bins)'''!&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/cut.html&lt;br /&gt;
&lt;br /&gt;
==Linear Regression==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(cars)&lt;br /&gt;
&amp;gt; res=lm(dist ~ speed, data=cars)&lt;br /&gt;
&amp;gt; abline(res)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Image:R-lm(cars)-abline.png|400px|thumbnail|center|linear regression of stopping distance against speed from the built-in data set, cars]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
'''Exercises'''&lt;br /&gt;
* You may wish to compare different methods of estimation.  From the MASS package, you can fit a line with the '''rlm''' and '''lqs'' funtions.  You can plot all the lines against the data using:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; abline(res.lm, lty=1)&lt;br /&gt;
&amp;gt; abline(res.rlm, lty=2)&lt;br /&gt;
&amp;gt; abline(res.lqs, lty=3)&lt;br /&gt;
&amp;gt; legend(x=5, y=100, legend=c(&amp;quot;lm&amp;quot;,&amp;quot;rlm&amp;quot;,&amp;quot;lqs&amp;quot;), lty=c(1,2,3))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-patched/library/MASS/html/rlm.html and http://stat.ethz.ch/R-manual/R-devel/RHOME/library/MASS/html/lqs.html.&lt;br /&gt;
&lt;br /&gt;
* Weighted least squares.  The '''lm''' function will accept a vector of weights, '''lm(... weights=...)'''.  If given, the function will optimise the line of best fit according a the equation of weighted least squares.  Experiment with different linear model fits, given different weighting vectors.  Some handy hints for creating a vector of weights:&lt;br /&gt;
** '''w1&amp;lt;-rep(0.1,50)''' will give you a vector, length 50, where each element has a value of 0.1.  W1[1]&amp;lt;-10 will give the first element of the vector a value of 10.&lt;br /&gt;
** '''w2&amp;lt;-seq(from=0.02, to=1.0, by=0.02)''' provides a vector containing a sequence of values from 0.02 to 1.0 in steps of 0.02 (handily, again 50 in total).&lt;br /&gt;
&lt;br /&gt;
==Significance Testing==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; boys_2=c(90.2, 91.4, 86.4, 87.6, 86.7, 88.1, 82.2, 83.8, 91, 87.4)&lt;br /&gt;
&amp;gt; girls_2=c(83.8, 86.2, 85.1, 88.6, 83, 88.9, 89.7, 81.3, 88.7, 88.4)&lt;br /&gt;
&amp;gt; res=var.test(boys_2,girls_2)&lt;br /&gt;
&amp;gt; res&lt;br /&gt;
&lt;br /&gt;
	F test to compare two variances&lt;br /&gt;
&lt;br /&gt;
data:  boys_2 and girls_2 &lt;br /&gt;
F = 1.0186, num df = 9, denom df = 9, p-value = 0.9786&lt;br /&gt;
alternative hypothesis: true ratio of variances is not equal to 1 &lt;br /&gt;
95 percent confidence interval:&lt;br /&gt;
 0.2529956 4.1007126 &lt;br /&gt;
sample estimates:&lt;br /&gt;
ratio of variances &lt;br /&gt;
          1.018559 &lt;br /&gt;
&amp;gt; res=t.test(boys_2, girls_2, var.equal=TRUE, paired=FALSE)&lt;br /&gt;
&amp;gt; res&lt;br /&gt;
&lt;br /&gt;
	Two Sample t-test&lt;br /&gt;
&lt;br /&gt;
data:  boys_2 and girls_2 &lt;br /&gt;
t = 0.8429, df = 18, p-value = 0.4103&lt;br /&gt;
alternative hypothesis: true difference in means is not equal to 0 &lt;br /&gt;
95 percent confidence interval:&lt;br /&gt;
 -1.656675  3.876675 &lt;br /&gt;
sample estimates:&lt;br /&gt;
mean of x mean of y &lt;br /&gt;
    87.48     86.3&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Classification==&lt;br /&gt;
&lt;br /&gt;
===k Nearest Neighbours===&lt;br /&gt;
&lt;br /&gt;
This famous (Fisher's or Anderson's) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa (s), versicolor (c), and virginica (v).&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-patched/library/datasets/html/iris.html&lt;br /&gt;
&lt;br /&gt;
k-nearest neighbour classification for test set from training set: For each row of the test set, the k nearest (in Euclidean distance) training set vectors are found, and the classification is decided by majority vote, with ties broken at random. If there are ties for the kth nearest vector, all candidates are included in the vote.&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/class/html/knn.html&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(class)&lt;br /&gt;
train &amp;lt;- rbind(iris3[1:25,,1], iris3[1:25,,2], iris3[1:25,,3])&lt;br /&gt;
test &amp;lt;- rbind(iris3[26:50,,1], iris3[26:50,,2], iris3[26:50,,3])&lt;br /&gt;
cl &amp;lt;- factor(c(rep(&amp;quot;s&amp;quot;,25), rep(&amp;quot;c&amp;quot;,25), rep(&amp;quot;v&amp;quot;,25)))&lt;br /&gt;
iris3.knn &amp;lt;- knn(train, test, cl, k = 3, prob=TRUE)&lt;br /&gt;
table(predicted=iris3.knn, actual=cl)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
How did we do?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
         actual&lt;br /&gt;
predicted  c  s  v&lt;br /&gt;
        c 23  0  3&lt;br /&gt;
        s  0 25  0&lt;br /&gt;
        v  2  0 22&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Classification Trees===&lt;br /&gt;
&lt;br /&gt;
The kyphosis data frame has 81 rows and 4 columns. representing data on children who have had corrective spinal surgery.&lt;br /&gt;
&lt;br /&gt;
This data frame contains the following columns:&lt;br /&gt;
* Kyphosis: a factor with levels absent present indicating if a kyphosis (a type of deformation) was present after the operation.&lt;br /&gt;
* Age: in months&lt;br /&gt;
* Number: the number of vertebrae involved&lt;br /&gt;
* Start: the number of the first (topmost) vertebra operated on.&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/rpart/html/kyphosis.html&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
fit &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)&lt;br /&gt;
fit2 &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis,&lt;br /&gt;
              parms = list(prior = c(.65,.35), split = &amp;quot;information&amp;quot;))&lt;br /&gt;
fit3 &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis,&lt;br /&gt;
              control = rpart.control(cp = 0.05))&lt;br /&gt;
par(mfrow = c(1,2), xpd = NA) # otherwise on some devices the text is clipped&lt;br /&gt;
plot(fit)&lt;br /&gt;
text(fit, use.n = TRUE)&lt;br /&gt;
plot(fit2)&lt;br /&gt;
text(fit2, use.n = TRUE)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Image:R-classification-tree.png|500px|thumbnail|center|Classification tree for the kyphosis data frame.]]&lt;br /&gt;
&lt;br /&gt;
==Solving Systems of Linear Equations==&lt;br /&gt;
&lt;br /&gt;
See, e.g.: https://source.ggy.bris.ac.uk/wiki/NumMethodsPDEs&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; A &amp;lt;- array(c(1,3,2,3,5,4,-2,6,3), dim=c(3,3))&lt;br /&gt;
&amp;gt; b &amp;lt;- c(5,7,8)&lt;br /&gt;
&amp;gt; solve(A,b)&lt;br /&gt;
[1] -15   8   2&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Suggested Exercises=&lt;br /&gt;
&lt;br /&gt;
If you would like to work through some exercises, with model answers included, you could take a look at:&lt;br /&gt;
* http://www2.warwick.ac.uk/fac/sci/statistics/staff/academic-research/reed/rexercises.pdf&lt;br /&gt;
&lt;br /&gt;
If you would prefer to noodle about with some real-world data, you could take a look at:&lt;br /&gt;
* http://www.theguardian.com/news/datablog/2010/oct/18/historic-government-spending-area#data&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=R1&amp;diff=9467</id>
		<title>R1</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=R1&amp;diff=9467"/>
		<updated>2014-11-03T16:24:35Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[category:Pragmatic Programming]]&lt;br /&gt;
'''Open Source Statistics with R'''&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
R is a mature, open-source (i.e. free!) statistics package, with an intuitive interface, excellent graphics and a vibrant community constantly adding new methods for the statistical investigation of your data to the library of packages available.&lt;br /&gt;
&lt;br /&gt;
The goal of this tutorial is to introduce you to the R package, and not to be an introductory course in statistics.&lt;br /&gt;
&lt;br /&gt;
If you are working on a Linux system, you will typically start R from the command line.  On a Windows machine, or a Mac, you will typically start up R in some form of GUI.  However you get R started, you will have access to an R command prompt.  The good news is that the examples below will all work at the R command prompt, however you gained access to it.&lt;br /&gt;
&lt;br /&gt;
Further resources:&lt;br /&gt;
&lt;br /&gt;
* The R manual is a great resource for learning R: http://cran.r-project.org/doc/manuals/r-release/R-intro.pdf&lt;br /&gt;
* Some excellent examples of using R can also be found at: http://msenux.redwoods.edu/math/R/ and http://www.r-tutor.com/&lt;br /&gt;
&lt;br /&gt;
=Getting Started=&lt;br /&gt;
&lt;br /&gt;
The very simplest thing we can do with R is to perform some arithmetic at the command prompt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; phi &amp;lt;- (1+sqrt(5))/2&lt;br /&gt;
&amp;gt; phi&lt;br /&gt;
[1] 1.618034&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Parentheses are used to modify the usual order of precedence of the operators ('''/''' will typically be evaluated before '''+''').  Note the '''[1]''' accompanying the returned value.  All numbers entered at the console are interpreted as a vector.  The '[1]' indicates that the line in question is displaying the vector of values starting at first index.  We can use the handy sequence function to create a vector containing more than a single element:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; odds &amp;lt;- seq(from=1, to=67, by=2)&lt;br /&gt;
&amp;gt; odds&lt;br /&gt;
 [1]  1  3  5  7  9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49&lt;br /&gt;
[26] 51 53 55 57 59 61 63 65 67&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
From the above example, we can see that both the '''&amp;lt;-''' and '''=''' operators can be used for assignment.&lt;br /&gt;
&lt;br /&gt;
Vectors are commonly used data structures in R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
coords.bris &amp;lt;- c(51.5, 2.6)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As are matrices:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; magic &amp;lt;- matrix(data=c(2,7,6,9,5,1,4,3,8),nrow=3,ncol=3)&lt;br /&gt;
&amp;gt; magic&lt;br /&gt;
     [,1] [,2] [,3]&lt;br /&gt;
[1,]    2    9    4&lt;br /&gt;
[2,]    7    5    3&lt;br /&gt;
[3,]    6    1    8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the '''c''' function combines the arguments given in the parentheses.  We can access portions of the array using the syntax shown in the square brackets.  For example, we can access the first row using the '''[1,]''' notation, and similarly the second column using '''[,2]'''.  Since the square is 3x3 magic, the numbers in both slices should sum to 15:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; sum(magic[1,])&lt;br /&gt;
[1] 15&lt;br /&gt;
&amp;gt; sum(magic[,2])&lt;br /&gt;
[1] 15&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Single elements and ranges can also accessed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; magic[2,2]&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;gt; magic[2:3,2:3]&lt;br /&gt;
     [,1] [,2]&lt;br /&gt;
[1,]    5    3&lt;br /&gt;
[2,]    1    8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
R also provides '''arrays''', which have more than two dimensions, and '''lists''' to hold heterogeneous collections.&lt;br /&gt;
&lt;br /&gt;
An example list:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; list.r4 &amp;lt;- list(name=&amp;quot;Radio4&amp;quot;, frequency=&amp;quot;93.7&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The items of which, we can access in several ways:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; list.r4$frequency&lt;br /&gt;
[1] &amp;quot;93.7&amp;quot;&lt;br /&gt;
&amp;gt; list.r4[1]&lt;br /&gt;
$name&lt;br /&gt;
[1] &amp;quot;Radio4&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;gt; list.r4[[1]]&lt;br /&gt;
[1] &amp;quot;Radio4&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A very commonly used data structure is the '''data frame''', which R uses to store tabular data.  Given several vectors of equal length, we can collate them into a data frame:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; country &amp;lt;- c(&amp;quot;USA&amp;quot;, &amp;quot;China&amp;quot;, &amp;quot;GB&amp;quot;)&lt;br /&gt;
&amp;gt; gold &amp;lt;- c(46, 38, 29)&lt;br /&gt;
&amp;gt; silver &amp;lt;- c(29, 27, 17)&lt;br /&gt;
&amp;gt; bronze &amp;lt;- c(29, 23, 19)&lt;br /&gt;
&amp;gt; medals.2012 &amp;lt;- data.frame(country, gold, silver, bronze)&lt;br /&gt;
&amp;gt; medals.2012&lt;br /&gt;
    country  gold  silver  bronze&lt;br /&gt;
1       USA    46      29      29&lt;br /&gt;
2     China    38      27      23&lt;br /&gt;
3        GB    29      17      19&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can access columns of a data frame using the '''$''' operator:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; medals.2012$country&lt;br /&gt;
[1] USA   China GB   &lt;br /&gt;
Levels: China GB USA&lt;br /&gt;
&amp;gt; medals.2012$gold&lt;br /&gt;
[1] 46 38 29&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Standard Graphics: A taster=&lt;br /&gt;
&lt;br /&gt;
An aspect which makes R popular are it's graphing functions.  R also has some very handy built-in data sets--we'll use this to demonstrate just a small fraction of R's graphing abilities.&lt;br /&gt;
&lt;br /&gt;
First up is the humble '''plot()''' function.  Given a data frame of points, such as one charting the relationship between temperature and the vapour pressure of mercury, it will give us a (handily labelled) scatter plot: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(pressure)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See the gallery below for all the plots created in this section.&lt;br /&gt;
&lt;br /&gt;
The plot function will also accept a time-series (another class of object recognised by R) and will sensibly join the points with a line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(co2)&lt;br /&gt;
&amp;gt; class(co2)&lt;br /&gt;
[1] &amp;quot;ts&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Pie charts are easily constructed.  In this case, to show the relative proportions of electricity generated from different sources in the UK in 2011 (source: https://www.gov.uk/government/.../5942-uk-energy-in-brief-2012.pdf‎):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; uk.electricty.sources.2011 &amp;lt;- c(41,29,18,5,4,2,1)&lt;br /&gt;
&amp;gt; names(uk.electricty.sources.2011) &amp;lt;- (&amp;quot;Gas&amp;quot;, &amp;quot;Coal&amp;quot;, &amp;quot;Nuclear&amp;quot;, &amp;quot;Hydro &amp;amp; other&amp;quot;, &amp;quot;Wind&amp;quot;, &amp;quot;Imports&amp;quot;, &amp;quot;Oil&amp;quot;)&lt;br /&gt;
&amp;gt; pie(uk.electricty.sources.2011, main=&amp;quot;UK Electricty Generating Mix, 2011&amp;quot;, col=rainbow(7))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Next, let's create a bar chart of monthly average precipitation falling here in the fair city of Bristol (source: http://www.worldweatheronline.com):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; bristol.precip &amp;lt;- c(82.9, 56.1, 59.2, 69, 50.8, 50.9, 50.8, 74.8, 74.7, 91.1, 94.5, 93.6)&lt;br /&gt;
&amp;gt; names(bristol.precip) &amp;lt;- c(&amp;quot;Jan&amp;quot;, &amp;quot;Feb&amp;quot;, &amp;quot;Mar&amp;quot;, &amp;quot;Apr&amp;quot;, &amp;quot;May&amp;quot;, &amp;quot;Jun&amp;quot;, &amp;quot;Jul&amp;quot;, &amp;quot;Aug&amp;quot;, &amp;quot;Sep&amp;quot;, &amp;quot;Oct&amp;quot;, &amp;quot;Nov&amp;quot;, &amp;quot;Dec&amp;quot;)&lt;br /&gt;
&amp;gt; barplot(bristol.precip,&lt;br /&gt;
+ main=&amp;quot;Average Monthly Precipitation in Bristol&amp;quot;,&lt;br /&gt;
+ ylab=&amp;quot;Mean precipitation (mm)&amp;quot;,&lt;br /&gt;
+ ylim=c(0,100),&lt;br /&gt;
+ col=c(&amp;quot;darkblue&amp;quot;))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Box_plot 'Box and whisker' plots] are useful ways to graph the quartiles of some data.  In this case, the fuel efficiencies of various US cars, circa 1974:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; boxplot(mpg~cyl,data=mtcars, main=&amp;quot;Car Milage Data&amp;quot;,&lt;br /&gt;
+    xlab=&amp;quot;Number of Cylinders&amp;quot;, ylab=&amp;quot;Miles Per Gallon&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
R includes a very useful help facility.  In the case of the '''filled.contour()''' plotting function, the help page includes an example of it's use to plot the topology of a volcano in Auckland, NZ:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; ?filled.countour&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;gallery widths=300px heights=300px perrow=3&amp;gt;&lt;br /&gt;
File:Vapour-pressure.png|Vapour pressure of mercury against temperature&lt;br /&gt;
File:Mauna-loa.png|CO2 concentrations measured at Mauna-Loa between 1959 and 1997&lt;br /&gt;
File:Pie.png|The UK's electricity generating mix, 2011&lt;br /&gt;
File:Barplot.png|Average monthly precipitation in Bristol&lt;br /&gt;
File:Boxplot.png|Range of fuel efficiencies for different engine sizes &lt;br /&gt;
File:Maunga-Whau.png|Topology of Maunga Whau volcano in Auckland&lt;br /&gt;
&amp;lt;/gallery&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are many more example plots--complete with the R code required to create the plots (at the bottom of the page, after the comments)--on the following web page:&lt;br /&gt;
* http://gallery.r-enthusiasts.com/thumbs.php&lt;br /&gt;
&lt;br /&gt;
=Loops=&lt;br /&gt;
&lt;br /&gt;
A simple '''for''' loop:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; for (ii in seq(1,10)) print(ii)&lt;br /&gt;
[1] 1&lt;br /&gt;
[1] 2&lt;br /&gt;
[1] 3&lt;br /&gt;
[1] 4&lt;br /&gt;
[1] 5&lt;br /&gt;
[1] 6&lt;br /&gt;
[1] 7&lt;br /&gt;
[1] 8&lt;br /&gt;
[1] 9&lt;br /&gt;
[1] 10&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Some more exotic counting:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; for (ii in seq(from=10, to=0, by=-2)) print(ii)&lt;br /&gt;
[1] 10&lt;br /&gt;
[1] 8&lt;br /&gt;
[1] 6&lt;br /&gt;
[1] 4&lt;br /&gt;
[1] 2&lt;br /&gt;
[1] 0&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''while''' loops are for when we don't know the number of iterations in advance:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; ii &amp;lt;- runif(1,0,1)&lt;br /&gt;
&amp;gt; ii&lt;br /&gt;
[1] 0.3998513&lt;br /&gt;
&amp;gt; while (ii &amp;lt; 0.5) {print(ii); ii &amp;lt;- runif(1,0,1)}&lt;br /&gt;
[1] 0.3998513&lt;br /&gt;
[1] 0.05469244&lt;br /&gt;
&amp;gt; ii&lt;br /&gt;
[1] 0.8265036&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Functions=&lt;br /&gt;
&lt;br /&gt;
You can define your own functions in R, using the '''function''' keyword.  For example, Pythagoras' Theorem:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypotenuse &amp;lt;- function(x, y) {sqrt(x^2 + y^2)}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The braces ({}) are optional, but add clarity.&lt;br /&gt;
&lt;br /&gt;
To call the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypotenuse(3,4)&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can provide default values for the arguments, which can be overridden for any given invocation of the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot2 &amp;lt;- function(x=3 ,y=4) {sqrt(x^2 + y^2)}&lt;br /&gt;
&amp;gt; hypot2()&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;gt; hypot2(12,16)&lt;br /&gt;
[1] 20&lt;br /&gt;
&amp;gt; hypot2(y=16, x=12)&lt;br /&gt;
[1] 20&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can see that the order of the arguments is respected, unless the names are given, in which case the order can be changed. &lt;br /&gt;
&lt;br /&gt;
Longer functions can be spread over several lines.  We can also use the '''return''' keyword to control which value is returned by the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot3 &amp;lt;- function(x=3 ,y=4) {&lt;br /&gt;
+ x_sq &amp;lt;- x^2&lt;br /&gt;
+ y_sq &amp;lt;- y^2&lt;br /&gt;
+ return( sqrt(x_sq + y_sq) )}&lt;br /&gt;
&amp;gt; hypot3(6,8)&lt;br /&gt;
[1] 10&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can check on the contents of a function, by just typing it's name (without parentheses):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot3&lt;br /&gt;
function(x=3 ,y=4) {&lt;br /&gt;
x_sq &amp;lt;- x^2&lt;br /&gt;
y_sq &amp;lt;- y^2&lt;br /&gt;
return( sqrt(x_sq + y_sq) )}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or just check the arguments, using the '''args''' function.  (The body of the function in general is reported as NULL):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; args(hypot3)&lt;br /&gt;
function (x = 3, y = 4) &lt;br /&gt;
NULL&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Packages=&lt;br /&gt;
&lt;br /&gt;
Listed at http://cran.r-project.org/&lt;br /&gt;
&lt;br /&gt;
Let's install the '''multicore''' package, that will give us access to functions within R which will run on the multiple processors which we often find in our computers these days:   &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; install.packages(&amp;quot;multicore&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Et voila!  It is done.&lt;br /&gt;
&lt;br /&gt;
We can check which packages are currently loaded into the library available from our workspace:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; library()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If we need to add one, we type e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; library(multicore)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now, an example of using a function from the multicore package.  The '''lapply''' function, which is included in the standard R core, will map a given function over a list inputs, giving a list of the function outputs in return.  For example, we can map a squaring function over the list of integers from 1 to 3:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; lapply(1:3, function(x) {x^2})&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
which gives us the list:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[[1]]&lt;br /&gt;
[1] 1&lt;br /&gt;
&lt;br /&gt;
[[2]]&lt;br /&gt;
[1] 4&lt;br /&gt;
&lt;br /&gt;
[[3]]&lt;br /&gt;
[1] 9&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now, we can do the same work in parallel using:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; mclapply(1:3, function(x) {x^2})&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Reading Data from File=&lt;br /&gt;
&lt;br /&gt;
R provides some very useful functions for reading and writing data from/to file.&lt;br /&gt;
&lt;br /&gt;
==Text Files==&lt;br /&gt;
&lt;br /&gt;
Let's start with text files.  If your data is organised into a file such that it looks like a table with column headings:  &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Perhaps the simplest one is '''read.table()'''.  If I have a text file with the following contents:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
country              gold silver bronze&lt;br /&gt;
&amp;quot;USA&amp;quot;                46   29     29&lt;br /&gt;
&amp;quot;China&amp;quot;              38   27     23&lt;br /&gt;
&amp;quot;Great Britain&amp;quot;      29   17     19&lt;br /&gt;
&amp;quot;Russian Federation&amp;quot; 24   26     32&lt;br /&gt;
&amp;quot;Republic of Korea&amp;quot;  13   8      7&lt;br /&gt;
&amp;quot;Germany&amp;quot;            11   19     14&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It will be a simple matter to use the '''read.table()''' function to load the data into R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; medals.2012 &amp;lt;- read.table(&amp;quot;medals.txt&amp;quot;, header=TRUE)&lt;br /&gt;
&amp;gt; medals.2012&lt;br /&gt;
             country gold silver bronze&lt;br /&gt;
1                USA   46     29     29&lt;br /&gt;
2              China   38     27     23&lt;br /&gt;
3      Great Britain   29     17     19&lt;br /&gt;
4 Russian Federation   24     26     32&lt;br /&gt;
5  Republic of Korea   13      8      7&lt;br /&gt;
6            Germany   11     19     14&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is a corresponding '''write.table()''' function to export the contents of a data frame into a text file.&lt;br /&gt;
&lt;br /&gt;
CSV files can be easily handled by specifying '''sep=&amp;quot;,&amp;quot;''' as an argument to read.table().  However, for convenience, there are also '''read.csv()''' and '''write.csv()''' functions defined.  For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; write.csv(medals.2012,&amp;quot;medals.csv&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Gives us the file, '''medals.csv''', with the contents:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;quot;&amp;quot;,&amp;quot;country&amp;quot;,&amp;quot;gold&amp;quot;,&amp;quot;silver&amp;quot;,&amp;quot;bronze&amp;quot;&lt;br /&gt;
&amp;quot;1&amp;quot;,&amp;quot;USA&amp;quot;,46,29,29&lt;br /&gt;
&amp;quot;2&amp;quot;,&amp;quot;China&amp;quot;,38,27,23&lt;br /&gt;
&amp;quot;3&amp;quot;,&amp;quot;Great Britain&amp;quot;,29,17,19&lt;br /&gt;
&amp;quot;4&amp;quot;,&amp;quot;Russian Federation&amp;quot;,24,26,32&lt;br /&gt;
&amp;quot;5&amp;quot;,&amp;quot;Republic of Korea&amp;quot;,13,8,7&lt;br /&gt;
&amp;quot;6&amp;quot;,&amp;quot;Germany&amp;quot;,11,19,14&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Binary Files==&lt;br /&gt;
&lt;br /&gt;
The '''save()''' function will store an R data structure in binary form:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; save(medals.2012,file=&amp;quot;medals.RData&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~$ file medals.RData &lt;br /&gt;
medals.RData: gzip compressed data, from Unix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is, of course, a corresponding function to load such data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; load(&amp;quot;medals.RData&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Databases==&lt;br /&gt;
&lt;br /&gt;
If you would like to read and write data directly from/to a database, there are several packages to help you.  See http://cran.r-project.org/doc/manuals/r-release/R-data.html#Relational-databases for more information.&lt;br /&gt;
&lt;br /&gt;
==NetCDF==&lt;br /&gt;
&lt;br /&gt;
The [http://cran.r-project.org/web/packages/ncdf/index.html '''ncdf''' package] provides an interface to NetCDF files.  Before installing the package, you will need the Unidata NetCDF libraries installed on your system.  On Linux, the standard package managers conveniently provide this.  Note that you will need the 'development' packages.  Once the prerequisites are satisfied, you can use the standard R command to install the package from CRAN:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; install.packages(&amp;quot;ncdf&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Examples of Common Tasks=&lt;br /&gt;
&lt;br /&gt;
==Preparing Data==&lt;br /&gt;
&lt;br /&gt;
===Sorting===&lt;br /&gt;
&lt;br /&gt;
Using '''sort''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; railway.engines &amp;lt;- c(&amp;quot;thomas&amp;quot;, &amp;quot;henry&amp;quot;, &amp;quot;gordon&amp;quot;, &amp;quot;edward&amp;quot;, &amp;quot;james&amp;quot;)&lt;br /&gt;
&amp;gt; sort(railway.engines)&lt;br /&gt;
[1] &amp;quot;edward&amp;quot; &amp;quot;gordon&amp;quot; &amp;quot;henry&amp;quot;  &amp;quot;james&amp;quot;  &amp;quot;thomas&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/sort.html&lt;br /&gt;
&lt;br /&gt;
===Random Sampling===&lt;br /&gt;
&lt;br /&gt;
Using '''sample''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; railway.engines &amp;lt;- c(&amp;quot;thomas&amp;quot;, &amp;quot;henry&amp;quot;, &amp;quot;gordon&amp;quot;, &amp;quot;edward&amp;quot;, &amp;quot;james&amp;quot;)&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;gordon&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;james&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;edward&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;thomas&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;gordon&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;james&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/sample.html&lt;br /&gt;
&lt;br /&gt;
===Combining===&lt;br /&gt;
&lt;br /&gt;
Using '''rbind''' to add combine the rows to two data frames:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; country &amp;lt;- c(&amp;quot;France&amp;quot;, &amp;quot;Italy&amp;quot;, &amp;quot;Hungary&amp;quot;, &amp;quot;Australia&amp;quot;)&lt;br /&gt;
&amp;gt; gold &amp;lt;- c(11, 8, 8, 7)&lt;br /&gt;
&amp;gt; silver &amp;lt;- c(11, 9, 4, 16)&lt;br /&gt;
&amp;gt; bronze &amp;lt;- c(12, 11, 5, 12)&lt;br /&gt;
&amp;gt; extras.2012 &amp;lt;- data.frame(country, gold, silver, bronze)&lt;br /&gt;
&amp;gt; rbind(medals.2012, extras.2012)&lt;br /&gt;
              country gold silver bronze&lt;br /&gt;
1                 USA   46     29     29&lt;br /&gt;
2               China   38     27     23&lt;br /&gt;
3       Great Britain   29     17     19&lt;br /&gt;
4  Russian Federation   24     26     32&lt;br /&gt;
5   Republic of Korea   13      8      7&lt;br /&gt;
6             Germany   11     19     14&lt;br /&gt;
7              France   11     11     12&lt;br /&gt;
8               Italy    8      9     11&lt;br /&gt;
9             Hungary    8      4      5&lt;br /&gt;
10          Australia    7     16     12&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/cbind.html&lt;br /&gt;
&lt;br /&gt;
===Binning Data===&lt;br /&gt;
&lt;br /&gt;
Using '''cut''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; girls_2=c(83.8, 86.2, 85.1, 88.6, 83, 88.9, 89.7, 81.3, 88.7, 88.4)&lt;br /&gt;
&amp;gt; bins=cut(girls_2, breaks=3)&lt;br /&gt;
&amp;gt; bins&lt;br /&gt;
 [1] (81.3,84.1] (84.1,86.9] (84.1,86.9] (86.9,89.7] (81.3,84.1] (86.9,89.7]&lt;br /&gt;
 [7] (86.9,89.7] (81.3,84.1] (86.9,89.7] (86.9,89.7]&lt;br /&gt;
Levels: (81.3,84.1] (84.1,86.9] (86.9,89.7]&lt;br /&gt;
&amp;gt; plot(bins)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Plotting the data couldn't be simpler with '''plot(bins)'''!&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/cut.html&lt;br /&gt;
&lt;br /&gt;
==Linear Regression==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(cars)&lt;br /&gt;
&amp;gt; res=lm(dist ~ speed, data=cars)&lt;br /&gt;
&amp;gt; abline(res)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Image:R-lm(cars)-abline.png|400px|thumbnail|center|linear regression of stopping distance against speed from the built-in data set, cars]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
'''Exercises'''&lt;br /&gt;
* You may wish to compare different methods of estimation.  From the MASS package, you can fit a line with the '''rlm''' and '''lqs'' funtions.  You can plot all the lines against the data using:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; abline(res.lm, lty=1)&lt;br /&gt;
&amp;gt; abline(res.rlm, lty=2)&lt;br /&gt;
&amp;gt; abline(res.lqs, lty=3)&lt;br /&gt;
&amp;gt; legend(x=5, y=100, legend=c(&amp;quot;lm&amp;quot;,&amp;quot;rlm&amp;quot;,&amp;quot;lqs&amp;quot;), lty=c(1,2,3))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-patched/library/MASS/html/rlm.html and http://stat.ethz.ch/R-manual/R-devel/RHOME/library/MASS/html/lqs.html.&lt;br /&gt;
&lt;br /&gt;
* Weighted least squares.  The '''lm''' function will accept a vector of weights, '''lm(... weights=...)'''.  If given, the function will optimise the line of best fit according a the equation of weighted least squares.  Experiment with different linear model fits, given different weighting vectors.  Some handy hints for creating a vector of weights:&lt;br /&gt;
** '''w1&amp;lt;-rep(0.1,50)''' will give you a vector, length 50, where each element has a value of 0.1.  W1[1]&amp;lt;-10 will give the first element of the vector a value of 10.&lt;br /&gt;
** '''w2&amp;lt;-seq(from=0.02, to=1.0, by=0.02)''' provides a vector containing a sequence of values from 0.02 to 1.0 in steps of 0.02 (handily, again 50 in total).&lt;br /&gt;
&lt;br /&gt;
==Significance Testing==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; boys_2=c(90.2, 91.4, 86.4, 87.6, 86.7, 88.1, 82.2, 83.8, 91, 87.4)&lt;br /&gt;
&amp;gt; girls_2=c(83.8, 86.2, 85.1, 88.6, 83, 88.9, 89.7, 81.3, 88.7, 88.4)&lt;br /&gt;
&amp;gt; res=var.test(boys_2,girls_2)&lt;br /&gt;
&amp;gt; res&lt;br /&gt;
&lt;br /&gt;
	F test to compare two variances&lt;br /&gt;
&lt;br /&gt;
data:  boys_2 and girls_2 &lt;br /&gt;
F = 1.0186, num df = 9, denom df = 9, p-value = 0.9786&lt;br /&gt;
alternative hypothesis: true ratio of variances is not equal to 1 &lt;br /&gt;
95 percent confidence interval:&lt;br /&gt;
 0.2529956 4.1007126 &lt;br /&gt;
sample estimates:&lt;br /&gt;
ratio of variances &lt;br /&gt;
          1.018559 &lt;br /&gt;
&amp;gt; res=t.test(boys_2, girls_2, var.equal=TRUE, paired=FALSE)&lt;br /&gt;
&amp;gt; res&lt;br /&gt;
&lt;br /&gt;
	Two Sample t-test&lt;br /&gt;
&lt;br /&gt;
data:  boys_2 and girls_2 &lt;br /&gt;
t = 0.8429, df = 18, p-value = 0.4103&lt;br /&gt;
alternative hypothesis: true difference in means is not equal to 0 &lt;br /&gt;
95 percent confidence interval:&lt;br /&gt;
 -1.656675  3.876675 &lt;br /&gt;
sample estimates:&lt;br /&gt;
mean of x mean of y &lt;br /&gt;
    87.48     86.3&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Classification==&lt;br /&gt;
&lt;br /&gt;
===k Nearest Neighbours===&lt;br /&gt;
&lt;br /&gt;
This famous (Fisher's or Anderson's) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa (s), versicolor (c), and virginica (v).&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-patched/library/datasets/html/iris.html&lt;br /&gt;
&lt;br /&gt;
k-nearest neighbour classification for test set from training set: For each row of the test set, the k nearest (in Euclidean distance) training set vectors are found, and the classification is decided by majority vote, with ties broken at random. If there are ties for the kth nearest vector, all candidates are included in the vote.&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/class/html/knn.html&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(class)&lt;br /&gt;
train &amp;lt;- rbind(iris3[1:25,,1], iris3[1:25,,2], iris3[1:25,,3])&lt;br /&gt;
test &amp;lt;- rbind(iris3[26:50,,1], iris3[26:50,,2], iris3[26:50,,3])&lt;br /&gt;
cl &amp;lt;- factor(c(rep(&amp;quot;s&amp;quot;,25), rep(&amp;quot;c&amp;quot;,25), rep(&amp;quot;v&amp;quot;,25)))&lt;br /&gt;
iris3.knn &amp;lt;- knn(train, test, cl, k = 3, prob=TRUE)&lt;br /&gt;
table(predicted=iris3.knn, actual=cl)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
How did we do?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
         actual&lt;br /&gt;
predicted  c  s  v&lt;br /&gt;
        c 23  0  3&lt;br /&gt;
        s  0 25  0&lt;br /&gt;
        v  2  0 22&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Classification Trees===&lt;br /&gt;
&lt;br /&gt;
The kyphosis data frame has 81 rows and 4 columns. representing data on children who have had corrective spinal surgery.&lt;br /&gt;
&lt;br /&gt;
This data frame contains the following columns:&lt;br /&gt;
* Kyphosis: a factor with levels absent present indicating if a kyphosis (a type of deformation) was present after the operation.&lt;br /&gt;
* Age: in months&lt;br /&gt;
* Number: the number of vertebrae involved&lt;br /&gt;
* Start: the number of the first (topmost) vertebra operated on.&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/rpart/html/kyphosis.html&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
fit &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)&lt;br /&gt;
fit2 &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis,&lt;br /&gt;
              parms = list(prior = c(.65,.35), split = &amp;quot;information&amp;quot;))&lt;br /&gt;
fit3 &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis,&lt;br /&gt;
              control = rpart.control(cp = 0.05))&lt;br /&gt;
par(mfrow = c(1,2), xpd = NA) # otherwise on some devices the text is clipped&lt;br /&gt;
plot(fit)&lt;br /&gt;
text(fit, use.n = TRUE)&lt;br /&gt;
plot(fit2)&lt;br /&gt;
text(fit2, use.n = TRUE)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Image:R-classification-tree.png|500px|thumbnail|center|Classification tree for the kyphosis data frame.]]&lt;br /&gt;
&lt;br /&gt;
==Solving Systems of Linear Equations==&lt;br /&gt;
&lt;br /&gt;
See, e.g.: https://source.ggy.bris.ac.uk/wiki/NumMethodsPDEs&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; A &amp;lt;- array(c(1,3,2,3,5,4,-2,6,3), dim=c(3,3))&lt;br /&gt;
&amp;gt; b &amp;lt;- c(5,7,8)&lt;br /&gt;
&amp;gt; solve(A,b)&lt;br /&gt;
[1] -15   8   2&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Suggested Exercises=&lt;br /&gt;
&lt;br /&gt;
If you would like to work through some exercises, with model answers included, you could take a look at:&lt;br /&gt;
* http://www2.warwick.ac.uk/fac/sci/statistics/staff/academic-research/reed/rexercises.pdf&lt;br /&gt;
&lt;br /&gt;
If you would prefer to noodle about with some real-world data, you could take a look at:&lt;br /&gt;
* http://www.theguardian.com/news/datablog/2010/oct/18/historic-government-spending-area#data&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=R2&amp;diff=9466</id>
		<title>R2</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=R2&amp;diff=9466"/>
		<updated>2014-11-03T16:23:50Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[category:Pragmatic Programming]]&lt;br /&gt;
'''Open Source Statistics with R'''&lt;br /&gt;
&lt;br /&gt;
=Writing Faster R Code=&lt;br /&gt;
&lt;br /&gt;
In the above sections we've introduced a number of features of R and have begun the journey to becoming a proficient and productive user of the language.  In the remaining sections, we'll switch tack and focus on a question commonly asked by those beginning to use R in anger--'''&amp;quot;My R code is slow.  How can I speed it up?&amp;quot;'''.  In this section we'll consider the related tasks of finding which bits of your R code is responsible for the majority of the run-time and what you can do about it.&lt;br /&gt;
&lt;br /&gt;
==Profiling &amp;amp; Timing==&lt;br /&gt;
&lt;br /&gt;
In order to remain productive (and sane, and have a social life...), it is essential that we first identify which portions of your R code are responsible for the majority of the run-time.  We could spend ages optimising a portion that we ''think'' may be running slowly, but computers have the gift(!) to constantly surprise us, and if that portion of your program accounted for, say, 10% of the run-time, then you will have sweated for absolutely no useful gain.&lt;br /&gt;
&lt;br /&gt;
The simplest method of investigation is to simply time the application of a function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
system.time(some.function())&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can get a more detailed analysis of a block of code using the built-in R profiler.  The general pattern of invocation is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
Rprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
# Do some work&lt;br /&gt;
Rprof()&lt;br /&gt;
summaryRprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For example, here's an R script, '''profile.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
Rprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
# Create a 10 x 100,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(100000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)})&lt;br /&gt;
Rprof()&lt;br /&gt;
summaryRprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Which I ran by typing:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
R CMD BATCH profile.r&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the output file, '''profile.r.Rout''', I found the following break down:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
               self.time self.pct total.time total.pct&lt;br /&gt;
&amp;quot;simpleLoess&amp;quot;       4.84    88.00       5.10     92.73&lt;br /&gt;
&amp;quot;rnorm&amp;quot;             0.22     4.00       0.22      4.00&lt;br /&gt;
&amp;quot;loess.smooth&amp;quot;      0.18     3.27       5.28     96.00&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The profile tells us that the function '''simpleLoess''' take 88% of the runtime, whereas '''rnorm''' takes only 4%.&lt;br /&gt;
&lt;br /&gt;
==Preallocation of Memory==&lt;br /&gt;
&lt;br /&gt;
As with other scripting languages, such as MATLAB, the simplest method that you can use to speed up your R code is to pre-allocate the storage for variables whenever possible.  To see the benefits of this, consider the following two functions:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; f1 &amp;lt;- function() {&lt;br /&gt;
+ v &amp;lt;- c()&lt;br /&gt;
+ for (i in 1:30000)&lt;br /&gt;
+   v[i] &amp;lt;- i^2&lt;br /&gt;
+ }&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; f2 &amp;lt;- function() {&lt;br /&gt;
+ v &amp;lt;- c(NA)&lt;br /&gt;
+ length(v) &amp;lt;- 30000&lt;br /&gt;
+ for (i in 1:30000)&lt;br /&gt;
+   v[i] &amp;lt;- i^2&lt;br /&gt;
+ }&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Timing calls to each of them shows that the pre-allocation of memory gives a whopping ~'''x30 speed-up'''.  Your mileage will vary depending upon the details of your code.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; system.time(f1())&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  1.720   0.040   1.762&lt;br /&gt;
&amp;gt; system.time(f2())&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.052   0.000   0.05&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Vectorised Operations==&lt;br /&gt;
&lt;br /&gt;
The other principle method for speeding up your R code is to eliminate loops whenever you can.  Many functions and operators in R will accept arrays as input, rather than just single values and this may allow you to not use a loop.  The examples in the previous section used for loops to step through an array, squaring each element.  However, you can achieve the same result far more quickly by passing the array ''en masse'' to exponentiation operator:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; system.time(v &amp;lt;- (1:1000000)^2)&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.024   0.004   0.026&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Here we've been able to square 1,000,000 items in half the time it took to process 30,000!&lt;br /&gt;
&lt;br /&gt;
==Calling Functions Written in a Compiled Language (e.g. C or Fortran)==&lt;br /&gt;
&lt;br /&gt;
Another way to get more speed is to outsource portions of R code that are found to be slow to a compiled language, such as C or Fortran.  A good starting point on this topic is:&lt;br /&gt;
&lt;br /&gt;
* http://mazamascience.com/WorkingWithData/?p=1067&lt;br /&gt;
&lt;br /&gt;
=R and HPC=&lt;br /&gt;
&lt;br /&gt;
If you've profiled your code and tried all that you can to speed it up, as described in the previous section, you might be interested in the various initiatives that exist to run R on high performance computers, such as bluecrsytal:  &lt;br /&gt;
&lt;br /&gt;
* http://cran.r-project.org/web/views/HighPerformanceComputing.html&lt;br /&gt;
&lt;br /&gt;
We will see in the following examples, the general approach to running R in parallel is to arrange your task so that a function is applied to a list of inputs, and then to split the list over several CPU cores or cluster worker nodes.&lt;br /&gt;
&lt;br /&gt;
==Multicore==&lt;br /&gt;
&lt;br /&gt;
The '''multicore''' package allows us to make use of several CPU cores within a single machine.  Note, however, that the package  does not work on a MS Windows computers.&lt;br /&gt;
&lt;br /&gt;
As an example, let's look at the use of the package's '''mclapply''' function, a multicore equivalent of R's built-in list apply mapper, '''lapply'''.   I saved the following commands into an R script called '''mutlicore.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(multicore)&lt;br /&gt;
# how many cores are present?&lt;br /&gt;
multicore:::detectCores()&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And used the following submission script to run it on bluecrystal phase2:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=00:00:05&lt;br /&gt;
&lt;br /&gt;
#! Ensure that we have the correct version of R loaded&lt;br /&gt;
module add languages/R-2.15.1&lt;br /&gt;
&lt;br /&gt;
#! change the working directory (default is home directory)&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
#! Run the R script&lt;br /&gt;
R CMD BATCH multicore.r&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After the job had run, I got the following output in the file '''multicore.r.Rout''':&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(multicore)&lt;br /&gt;
&amp;gt; # how many cores are present?&lt;br /&gt;
&amp;gt; multicore:::detectCores()&lt;br /&gt;
[1] 8&lt;br /&gt;
&amp;gt; # Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
&amp;gt; data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
&amp;gt; # Map a function over the matrix.  First in serial..&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.674   0.007   0.749 &lt;br /&gt;
&amp;gt; # .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.301   0.074   0.113 &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Rmpi==&lt;br /&gt;
&lt;br /&gt;
The '''Rmpi''' package allows us to create and use cohorts of message passing processes from within R.  It does so by providing an interface to the MPI (Message Passing Interface) library.&lt;br /&gt;
&lt;br /&gt;
In order to use the Rmpi package on BCp2, you will need the '''ofed/openmpi/gcc/64/1.4.2-qlc''' module loaded.&lt;br /&gt;
&lt;br /&gt;
Here's a short example that I saved as '''Rmpi.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(Rmpi)&lt;br /&gt;
# spawn as many slaves as possible&lt;br /&gt;
mpi.spawn.Rslaves()&lt;br /&gt;
mpi.remote.exec(mpi.get.processor.name())&lt;br /&gt;
mpi.remote.exec(runif(1))&lt;br /&gt;
mpi.close.Rslaves()&lt;br /&gt;
mpi.quit()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I submitted the job to BCp2 using the following submission script:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#PBS -l nodes=4:ppn=1,walltime=00:00:05&lt;br /&gt;
&lt;br /&gt;
#! Ensure that we have the correct version of R loaded&lt;br /&gt;
module add languages/R-2.15.1&lt;br /&gt;
&lt;br /&gt;
#! change the working directory (default is home directory)&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
#! Create a machine file (used for multi-node jobs)&lt;br /&gt;
cat $PBS_NODEFILE &amp;gt; machine.file.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
#! Disable PSM on the QLogic HCAs&lt;br /&gt;
export OMPI_MCA_mtl=^psm&lt;br /&gt;
&lt;br /&gt;
#! Run the R script&lt;br /&gt;
mpirun -np 1 -machinefile machine.file.$PBS_JOBID R CMD BATCH Rmpi.r&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and got the following output:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(Rmpi)&lt;br /&gt;
&amp;gt; # spawn as many slaves as possible&lt;br /&gt;
&amp;gt; mpi.spawn.Rslaves()&lt;br /&gt;
        4 slaves are spawned successfully. 0 failed.&lt;br /&gt;
master (rank 0, comm 1) of size 5 is running on: u03n074 &lt;br /&gt;
slave1 (rank 1, comm 1) of size 5 is running on: u03n098 &lt;br /&gt;
slave2 (rank 2, comm 1) of size 5 is running on: u04n029 &lt;br /&gt;
slave3 (rank 3, comm 1) of size 5 is running on: u04n030 &lt;br /&gt;
slave4 (rank 4, comm 1) of size 5 is running on: u03n074 &lt;br /&gt;
&amp;gt; mpi.remote.exec(mpi.get.processor.name())&lt;br /&gt;
$slave1&lt;br /&gt;
[1] &amp;quot;u03n098&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave2&lt;br /&gt;
[1] &amp;quot;u04n029&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave3&lt;br /&gt;
[1] &amp;quot;u04n030&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave4&lt;br /&gt;
[1] &amp;quot;u03n074&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;gt; mpi.remote.exec(runif(1))&lt;br /&gt;
         X1        X2        X3        X4&lt;br /&gt;
1 0.5154871 0.5154871 0.5154871 0.5154871&lt;br /&gt;
&amp;gt; mpi.close.Rslaves()&lt;br /&gt;
[1] 1&lt;br /&gt;
&amp;gt; mpi.quit()&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Snow==&lt;br /&gt;
&lt;br /&gt;
Calling MPI routines from within R may be too low level for many people to use comfortably.  Happily, the '''snow''' package provides a higher level abstraction for distributed memory programming from within R.&lt;br /&gt;
&lt;br /&gt;
Here's my example program that a saved as '''snow.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(snow)&lt;br /&gt;
# request a cluster of 3 worker nodes&lt;br /&gt;
cl &amp;lt;- makeCluster(3)&lt;br /&gt;
clusterCall(cl, function() Sys.info()[c(&amp;quot;nodename&amp;quot;,&amp;quot;machine&amp;quot;)])&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using snow, across a cluster of workers)&lt;br /&gt;
system.time(x &amp;lt;- clusterApply(cl, data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
stopCluster(cl)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I ran it on BCp2 using the same submission script given for Rmpi, save for changing Rmpi.r to snow.r.  The output was:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(snow)&lt;br /&gt;
&amp;gt; # request a cluster of 3 worker nodes&lt;br /&gt;
&amp;gt; cl &amp;lt;- makeCluster(3)&lt;br /&gt;
Loading required package: Rmpi&lt;br /&gt;
        3 slaves are spawned successfully. 0 failed.&lt;br /&gt;
&amp;gt; clusterCall(cl, function() Sys.info()[c(&amp;quot;nodename&amp;quot;,&amp;quot;machine&amp;quot;)])&lt;br /&gt;
[[1]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u01n105&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
[[2]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u02n014&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
[[3]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u03n098&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&amp;gt; # Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
&amp;gt; data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
&amp;gt; # Map a function over the matrix.  First in serial..&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.711   0.001   0.715 &lt;br /&gt;
&amp;gt; # .. and secondly in parallel (using snow, across a cluster of workers)&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- clusterApply(cl, data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.259   0.001   0.260 &lt;br /&gt;
&amp;gt; stopCluster(cl)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Parallel==&lt;br /&gt;
&lt;br /&gt;
The '''parallel''' package is an amalgamation of functionality from the multicore and snow packages.  The shared memory parallelism in this package runs on an MS Windows machine (unlike the multicore package). &lt;br /&gt;
&lt;br /&gt;
I trivial translation of our previous multicore example is:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(parallel)&lt;br /&gt;
# how many cores are present?&lt;br /&gt;
parallel:::detectCores()&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I have not been able to get a distributed memory cluster working on BCp2 using the parallel package.&lt;br /&gt;
&lt;br /&gt;
=Further Reading=&lt;br /&gt;
&lt;br /&gt;
* [http://shop.oreilly.com/product/9780596801717.do R in a Nutshell]&lt;br /&gt;
* [http://shop.oreilly.com/product/0636920021421.do Parallel R]&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=R2&amp;diff=9465</id>
		<title>R2</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=R2&amp;diff=9465"/>
		<updated>2014-11-03T16:23:24Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: Created page with '=Writing Faster R Code=  In the above sections we've introduced a number of features of R and have begun the journey to becoming a proficient and productive user of the language.…'&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Writing Faster R Code=&lt;br /&gt;
&lt;br /&gt;
In the above sections we've introduced a number of features of R and have begun the journey to becoming a proficient and productive user of the language.  In the remaining sections, we'll switch tack and focus on a question commonly asked by those beginning to use R in anger--'''&amp;quot;My R code is slow.  How can I speed it up?&amp;quot;'''.  In this section we'll consider the related tasks of finding which bits of your R code is responsible for the majority of the run-time and what you can do about it.&lt;br /&gt;
&lt;br /&gt;
==Profiling &amp;amp; Timing==&lt;br /&gt;
&lt;br /&gt;
In order to remain productive (and sane, and have a social life...), it is essential that we first identify which portions of your R code are responsible for the majority of the run-time.  We could spend ages optimising a portion that we ''think'' may be running slowly, but computers have the gift(!) to constantly surprise us, and if that portion of your program accounted for, say, 10% of the run-time, then you will have sweated for absolutely no useful gain.&lt;br /&gt;
&lt;br /&gt;
The simplest method of investigation is to simply time the application of a function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
system.time(some.function())&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can get a more detailed analysis of a block of code using the built-in R profiler.  The general pattern of invocation is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
Rprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
# Do some work&lt;br /&gt;
Rprof()&lt;br /&gt;
summaryRprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For example, here's an R script, '''profile.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
Rprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
# Create a 10 x 100,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(100000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)})&lt;br /&gt;
Rprof()&lt;br /&gt;
summaryRprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Which I ran by typing:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
R CMD BATCH profile.r&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the output file, '''profile.r.Rout''', I found the following break down:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
               self.time self.pct total.time total.pct&lt;br /&gt;
&amp;quot;simpleLoess&amp;quot;       4.84    88.00       5.10     92.73&lt;br /&gt;
&amp;quot;rnorm&amp;quot;             0.22     4.00       0.22      4.00&lt;br /&gt;
&amp;quot;loess.smooth&amp;quot;      0.18     3.27       5.28     96.00&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The profile tells us that the function '''simpleLoess''' take 88% of the runtime, whereas '''rnorm''' takes only 4%.&lt;br /&gt;
&lt;br /&gt;
==Preallocation of Memory==&lt;br /&gt;
&lt;br /&gt;
As with other scripting languages, such as MATLAB, the simplest method that you can use to speed up your R code is to pre-allocate the storage for variables whenever possible.  To see the benefits of this, consider the following two functions:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; f1 &amp;lt;- function() {&lt;br /&gt;
+ v &amp;lt;- c()&lt;br /&gt;
+ for (i in 1:30000)&lt;br /&gt;
+   v[i] &amp;lt;- i^2&lt;br /&gt;
+ }&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; f2 &amp;lt;- function() {&lt;br /&gt;
+ v &amp;lt;- c(NA)&lt;br /&gt;
+ length(v) &amp;lt;- 30000&lt;br /&gt;
+ for (i in 1:30000)&lt;br /&gt;
+   v[i] &amp;lt;- i^2&lt;br /&gt;
+ }&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Timing calls to each of them shows that the pre-allocation of memory gives a whopping ~'''x30 speed-up'''.  Your mileage will vary depending upon the details of your code.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; system.time(f1())&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  1.720   0.040   1.762&lt;br /&gt;
&amp;gt; system.time(f2())&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.052   0.000   0.05&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Vectorised Operations==&lt;br /&gt;
&lt;br /&gt;
The other principle method for speeding up your R code is to eliminate loops whenever you can.  Many functions and operators in R will accept arrays as input, rather than just single values and this may allow you to not use a loop.  The examples in the previous section used for loops to step through an array, squaring each element.  However, you can achieve the same result far more quickly by passing the array ''en masse'' to exponentiation operator:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; system.time(v &amp;lt;- (1:1000000)^2)&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.024   0.004   0.026&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Here we've been able to square 1,000,000 items in half the time it took to process 30,000!&lt;br /&gt;
&lt;br /&gt;
==Calling Functions Written in a Compiled Language (e.g. C or Fortran)==&lt;br /&gt;
&lt;br /&gt;
Another way to get more speed is to outsource portions of R code that are found to be slow to a compiled language, such as C or Fortran.  A good starting point on this topic is:&lt;br /&gt;
&lt;br /&gt;
* http://mazamascience.com/WorkingWithData/?p=1067&lt;br /&gt;
&lt;br /&gt;
=R and HPC=&lt;br /&gt;
&lt;br /&gt;
If you've profiled your code and tried all that you can to speed it up, as described in the previous section, you might be interested in the various initiatives that exist to run R on high performance computers, such as bluecrsytal:  &lt;br /&gt;
&lt;br /&gt;
* http://cran.r-project.org/web/views/HighPerformanceComputing.html&lt;br /&gt;
&lt;br /&gt;
We will see in the following examples, the general approach to running R in parallel is to arrange your task so that a function is applied to a list of inputs, and then to split the list over several CPU cores or cluster worker nodes.&lt;br /&gt;
&lt;br /&gt;
==Multicore==&lt;br /&gt;
&lt;br /&gt;
The '''multicore''' package allows us to make use of several CPU cores within a single machine.  Note, however, that the package  does not work on a MS Windows computers.&lt;br /&gt;
&lt;br /&gt;
As an example, let's look at the use of the package's '''mclapply''' function, a multicore equivalent of R's built-in list apply mapper, '''lapply'''.   I saved the following commands into an R script called '''mutlicore.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(multicore)&lt;br /&gt;
# how many cores are present?&lt;br /&gt;
multicore:::detectCores()&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And used the following submission script to run it on bluecrystal phase2:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=00:00:05&lt;br /&gt;
&lt;br /&gt;
#! Ensure that we have the correct version of R loaded&lt;br /&gt;
module add languages/R-2.15.1&lt;br /&gt;
&lt;br /&gt;
#! change the working directory (default is home directory)&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
#! Run the R script&lt;br /&gt;
R CMD BATCH multicore.r&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After the job had run, I got the following output in the file '''multicore.r.Rout''':&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(multicore)&lt;br /&gt;
&amp;gt; # how many cores are present?&lt;br /&gt;
&amp;gt; multicore:::detectCores()&lt;br /&gt;
[1] 8&lt;br /&gt;
&amp;gt; # Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
&amp;gt; data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
&amp;gt; # Map a function over the matrix.  First in serial..&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.674   0.007   0.749 &lt;br /&gt;
&amp;gt; # .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.301   0.074   0.113 &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Rmpi==&lt;br /&gt;
&lt;br /&gt;
The '''Rmpi''' package allows us to create and use cohorts of message passing processes from within R.  It does so by providing an interface to the MPI (Message Passing Interface) library.&lt;br /&gt;
&lt;br /&gt;
In order to use the Rmpi package on BCp2, you will need the '''ofed/openmpi/gcc/64/1.4.2-qlc''' module loaded.&lt;br /&gt;
&lt;br /&gt;
Here's a short example that I saved as '''Rmpi.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(Rmpi)&lt;br /&gt;
# spawn as many slaves as possible&lt;br /&gt;
mpi.spawn.Rslaves()&lt;br /&gt;
mpi.remote.exec(mpi.get.processor.name())&lt;br /&gt;
mpi.remote.exec(runif(1))&lt;br /&gt;
mpi.close.Rslaves()&lt;br /&gt;
mpi.quit()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I submitted the job to BCp2 using the following submission script:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#PBS -l nodes=4:ppn=1,walltime=00:00:05&lt;br /&gt;
&lt;br /&gt;
#! Ensure that we have the correct version of R loaded&lt;br /&gt;
module add languages/R-2.15.1&lt;br /&gt;
&lt;br /&gt;
#! change the working directory (default is home directory)&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
#! Create a machine file (used for multi-node jobs)&lt;br /&gt;
cat $PBS_NODEFILE &amp;gt; machine.file.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
#! Disable PSM on the QLogic HCAs&lt;br /&gt;
export OMPI_MCA_mtl=^psm&lt;br /&gt;
&lt;br /&gt;
#! Run the R script&lt;br /&gt;
mpirun -np 1 -machinefile machine.file.$PBS_JOBID R CMD BATCH Rmpi.r&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and got the following output:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(Rmpi)&lt;br /&gt;
&amp;gt; # spawn as many slaves as possible&lt;br /&gt;
&amp;gt; mpi.spawn.Rslaves()&lt;br /&gt;
        4 slaves are spawned successfully. 0 failed.&lt;br /&gt;
master (rank 0, comm 1) of size 5 is running on: u03n074 &lt;br /&gt;
slave1 (rank 1, comm 1) of size 5 is running on: u03n098 &lt;br /&gt;
slave2 (rank 2, comm 1) of size 5 is running on: u04n029 &lt;br /&gt;
slave3 (rank 3, comm 1) of size 5 is running on: u04n030 &lt;br /&gt;
slave4 (rank 4, comm 1) of size 5 is running on: u03n074 &lt;br /&gt;
&amp;gt; mpi.remote.exec(mpi.get.processor.name())&lt;br /&gt;
$slave1&lt;br /&gt;
[1] &amp;quot;u03n098&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave2&lt;br /&gt;
[1] &amp;quot;u04n029&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave3&lt;br /&gt;
[1] &amp;quot;u04n030&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave4&lt;br /&gt;
[1] &amp;quot;u03n074&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;gt; mpi.remote.exec(runif(1))&lt;br /&gt;
         X1        X2        X3        X4&lt;br /&gt;
1 0.5154871 0.5154871 0.5154871 0.5154871&lt;br /&gt;
&amp;gt; mpi.close.Rslaves()&lt;br /&gt;
[1] 1&lt;br /&gt;
&amp;gt; mpi.quit()&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Snow==&lt;br /&gt;
&lt;br /&gt;
Calling MPI routines from within R may be too low level for many people to use comfortably.  Happily, the '''snow''' package provides a higher level abstraction for distributed memory programming from within R.&lt;br /&gt;
&lt;br /&gt;
Here's my example program that a saved as '''snow.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(snow)&lt;br /&gt;
# request a cluster of 3 worker nodes&lt;br /&gt;
cl &amp;lt;- makeCluster(3)&lt;br /&gt;
clusterCall(cl, function() Sys.info()[c(&amp;quot;nodename&amp;quot;,&amp;quot;machine&amp;quot;)])&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using snow, across a cluster of workers)&lt;br /&gt;
system.time(x &amp;lt;- clusterApply(cl, data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
stopCluster(cl)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I ran it on BCp2 using the same submission script given for Rmpi, save for changing Rmpi.r to snow.r.  The output was:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(snow)&lt;br /&gt;
&amp;gt; # request a cluster of 3 worker nodes&lt;br /&gt;
&amp;gt; cl &amp;lt;- makeCluster(3)&lt;br /&gt;
Loading required package: Rmpi&lt;br /&gt;
        3 slaves are spawned successfully. 0 failed.&lt;br /&gt;
&amp;gt; clusterCall(cl, function() Sys.info()[c(&amp;quot;nodename&amp;quot;,&amp;quot;machine&amp;quot;)])&lt;br /&gt;
[[1]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u01n105&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
[[2]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u02n014&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
[[3]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u03n098&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&amp;gt; # Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
&amp;gt; data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
&amp;gt; # Map a function over the matrix.  First in serial..&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.711   0.001   0.715 &lt;br /&gt;
&amp;gt; # .. and secondly in parallel (using snow, across a cluster of workers)&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- clusterApply(cl, data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.259   0.001   0.260 &lt;br /&gt;
&amp;gt; stopCluster(cl)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Parallel==&lt;br /&gt;
&lt;br /&gt;
The '''parallel''' package is an amalgamation of functionality from the multicore and snow packages.  The shared memory parallelism in this package runs on an MS Windows machine (unlike the multicore package). &lt;br /&gt;
&lt;br /&gt;
I trivial translation of our previous multicore example is:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(parallel)&lt;br /&gt;
# how many cores are present?&lt;br /&gt;
parallel:::detectCores()&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I have not been able to get a distributed memory cluster working on BCp2 using the parallel package.&lt;br /&gt;
&lt;br /&gt;
=Further Reading=&lt;br /&gt;
&lt;br /&gt;
* [http://shop.oreilly.com/product/9780596801717.do R in a Nutshell]&lt;br /&gt;
* [http://shop.oreilly.com/product/0636920021421.do Parallel R]&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=Python1&amp;diff=9463</id>
		<title>Python1</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=Python1&amp;diff=9463"/>
		<updated>2014-10-10T15:21:17Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: /* Interrogating a Module */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[category:Pragmatic Programming]]&lt;br /&gt;
'''Python for Scientists'''&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
[[Image:Python.png|thumb|1100px|none|http://xkcd.com/353/]]&lt;br /&gt;
&lt;br /&gt;
With thanks to Simon Metson and Mike Wallace for much of the following material.&lt;br /&gt;
&lt;br /&gt;
=Getting Started on BlueCrystal Phase-2=&lt;br /&gt;
&lt;br /&gt;
After you have logged in, type the following at the command line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module add languages/python-2.7.2.0&lt;br /&gt;
python&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This should start up an interactive python session:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Python 2.7.2 (default, Aug 25 2011, 10:51:03) &lt;br /&gt;
[GCC 4.3.3] on linux2&lt;br /&gt;
Type &amp;quot;help&amp;quot;, &amp;quot;copyright&amp;quot;, &amp;quot;credits&amp;quot; or &amp;quot;license&amp;quot; for more information.&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where we can type commands at the '''&amp;gt;&amp;gt;&amp;gt;''' prompt.&lt;br /&gt;
&lt;br /&gt;
=Python as a Calculator=&lt;br /&gt;
&lt;br /&gt;
To get started, let's just try a few commands out.  If you type:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print &amp;quot;Hello!&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
you'll get:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hello!&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you try:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print 5 + 9&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
you'll get:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
14&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far so simple!  Here is a copy of a session containing a few more commands where we've set the values of some variables and also defined and run our own function: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; five = 5&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; neuf = 9&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print five + neuf&lt;br /&gt;
14&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; def say_hello():&lt;br /&gt;
...     print &amp;quot;Hello, world!&amp;quot;&lt;br /&gt;
... # hit return here &lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; say_hello()&lt;br /&gt;
Hello, world!&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can exit an interactive session at any time by typing '''Ctrl-D'''.&lt;br /&gt;
&lt;br /&gt;
=Getting Help=&lt;br /&gt;
&lt;br /&gt;
One of the good things about Python is that it has lots of useful online documentation.  ([[A_Good_Read|There are good books on the language too]].)  For example, take a look at: http://docs.python.org/.  You can also type '''help()''' and the interpreter prompt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; help()&lt;br /&gt;
&lt;br /&gt;
Welcome to Python 2.7!  This is the online help utility.&lt;br /&gt;
&lt;br /&gt;
If this is your first time using Python, you should definitely check out&lt;br /&gt;
the tutorial on the Internet at http://docs.python.org/tutorial/.&lt;br /&gt;
&lt;br /&gt;
Enter the name of any module, keyword, or topic to get help on writing&lt;br /&gt;
Python programs and using Python modules.  To quit this help utility and&lt;br /&gt;
return to the interpreter, just type &amp;quot;quit&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
...&lt;br /&gt;
&lt;br /&gt;
help&amp;gt; keywords&lt;br /&gt;
&lt;br /&gt;
Here is a list of the Python keywords.  Enter any keyword to get more help.&lt;br /&gt;
&lt;br /&gt;
and                 elif                if                  print&lt;br /&gt;
...&lt;br /&gt;
&lt;br /&gt;
help&amp;gt; if&lt;br /&gt;
The ``if`` statement&lt;br /&gt;
********************&lt;br /&gt;
&lt;br /&gt;
The ``if`` statement is used for conditional execution:&lt;br /&gt;
&lt;br /&gt;
   if_stmt ::= &amp;quot;if&amp;quot; expression &amp;quot;:&amp;quot; suite&lt;br /&gt;
               ( &amp;quot;elif&amp;quot; expression &amp;quot;:&amp;quot; suite )*&lt;br /&gt;
               [&amp;quot;else&amp;quot; &amp;quot;:&amp;quot; suite]&lt;br /&gt;
&lt;br /&gt;
It selects exactly one of the suites by evaluating the expressions one&lt;br /&gt;
by one until one is found to be true...&lt;br /&gt;
...&lt;br /&gt;
&lt;br /&gt;
help&amp;gt; quit&lt;br /&gt;
&lt;br /&gt;
You are now leaving help and returning to the Python interpreter.&lt;br /&gt;
...&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Making a Script=&lt;br /&gt;
&lt;br /&gt;
An interactive session can be fun and useful for trying things out.  However--to save our fingers--we will typically want to execute a series of commands as a script, created using your favourite text editor.  Here are the contents of an example script:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/env python&lt;br /&gt;
&lt;br /&gt;
print &amp;quot;Hello, from a python script!&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
Ensure that your script is executable:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
chmod u+x myscript.py&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and now you can run it:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[ggdagw@bigblue4 ~]$ ./myscript.py &lt;br /&gt;
Hello, from a python script!&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Python and Whitespace=&lt;br /&gt;
&lt;br /&gt;
Love it of hate it, Python incorporates whitespace in it's syntax. (It's either that or demarcate blocks with some other syntax, such as ending a line with a semi-colon as it is in C.  Pick your poison.)  Spacing is therefore key in creating a valid python script.  For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
message = &amp;quot;happy days!&amp;quot;&lt;br /&gt;
if len(message) &amp;gt; 10:&lt;br /&gt;
    print &amp;quot;longer..&amp;quot;&lt;br /&gt;
else:&lt;br /&gt;
    print &amp;quot;shorter..&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
will work, but:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
message = &amp;quot;happy days!&amp;quot;&lt;br /&gt;
if len(message) &amp;gt; 10:&lt;br /&gt;
 print &amp;quot;longer..&amp;quot;&lt;br /&gt;
else:&lt;br /&gt;
print &amp;quot;shorter..&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
will not:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
  File &amp;quot;./myscript.py&amp;quot;, line 7&lt;br /&gt;
    print &amp;quot;shorter..&amp;quot;&lt;br /&gt;
        ^&lt;br /&gt;
IndentationError: expected an indented block&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It is therefore a great advantage, when writing to python script, to use a text editor which has a dedicated python mode--such as '''emacs'''--and will actively help you to keep your spacing correct.  See, http://wiki.python.org/moin/PythonEditors, for an extensive list.&lt;br /&gt;
&lt;br /&gt;
=Some Suggested Exercises=&lt;br /&gt;
&lt;br /&gt;
* Calculate the volume of a sphere.  You can experiment with the following (where r needs to be set to some value):&lt;br /&gt;
** &amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;4/3 * 3.14159265359 * r ** 3&amp;lt;/source&amp;gt;&lt;br /&gt;
** &amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;4.0/3.0 * 3.14159265359 * pow(r,3)&amp;lt;/source&amp;gt;&lt;br /&gt;
** &amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;float(4)/float(3) * 3.14159265359 * pow(r,3)&amp;lt;/source&amp;gt;&lt;br /&gt;
* Concatenate two strings&lt;br /&gt;
* Write a recursive function to compute fibonacci numbers (Hint: F(n) = F(n-1) +F(n-2), F(0)=0 and F(1)=1)&lt;br /&gt;
&lt;br /&gt;
=Nuts and Bolts=&lt;br /&gt;
&lt;br /&gt;
==Types==&lt;br /&gt;
&lt;br /&gt;
Python has intrinsic types including, integers, floats, booleans and complex numbers.  It is dynamically typed (meaning that you don't have to have a block of variable declarations at the top of your script), but it is '''not weakly''' typed, for example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; my_complex = 2 + 0.5j&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; my_complex&lt;br /&gt;
(2+0.5j)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; my_complex.real&lt;br /&gt;
2.0&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; my_complex.imag&lt;br /&gt;
0.5&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; name = 'fred'&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; lucky = 7&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; name + lucky&lt;br /&gt;
Traceback (most recent call last):&lt;br /&gt;
  File &amp;quot;&amp;lt;stdin&amp;gt;&amp;quot;, line 1, in &amp;lt;module&amp;gt;&lt;br /&gt;
TypeError: cannot concatenate 'str' and 'int' objects&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Strings==&lt;br /&gt;
&lt;br /&gt;
The eagle-eyed will have spotted in a previous examples that we could ask the length a character string--straight off the bat.  No need to write a counting routine ourselves:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
message = &amp;quot;happy days!&amp;quot;&lt;br /&gt;
print len(message)&lt;br /&gt;
11&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We also take '''slices''' of our character string.  In my case&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
print message[:5]&lt;br /&gt;
happy&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Since a string is an '''object''' (in the object oriented programming sense of the word, but more of that another time...) we can call a number of methods that operate on a string.  A selected sample include:&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot;&lt;br /&gt;
|| s.find(sub) || Finds the first occurrence of the given substring&lt;br /&gt;
|-&lt;br /&gt;
|| s.islower() || Checks whether all characters are lowercase&lt;br /&gt;
|-&lt;br /&gt;
|| s.upper() || Returns '''s''' converted to uppercase&lt;br /&gt;
|-&lt;br /&gt;
|| s.strip() || Removes leading and trailing whitespace&lt;br /&gt;
|-&lt;br /&gt;
|| s.replace(old,new) || Replaces substring '''old''' with '''new'''&lt;br /&gt;
|-&lt;br /&gt;
|| s.split([sep]) || Splits '''s''' uses (optional) '''sep''' as a delimiter.  Returns a list&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Lists and Tuples==&lt;br /&gt;
&lt;br /&gt;
An example of a list is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
shopping = ['bread', 'marmalade', 'milk', 'tea']&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and we can inquire about the length of that using the same function as before:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
len(shopping)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can also take '''slices''' of a list, as we did with a string:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
shopping[0:2]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and even reset a portion of the list that way:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
shopping[0:2] = ['bagels', 'jam']&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Since a list is also an object, we have more handy methods, including:&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot;&lt;br /&gt;
|| s.append(x) || Appends an new element '''x''' to the end of '''s'''&lt;br /&gt;
|-&lt;br /&gt;
|| s.count(x) || Returns the number of occurences of '''x''' in '''s'''&lt;br /&gt;
|-&lt;br /&gt;
|| s.reverse(x) || Reverses items of '''s''' in place&lt;br /&gt;
|-&lt;br /&gt;
|| s.sort([compfunc]) || Sorts items of '''s''' in place.  '''compfunc''' is an optional comparison function&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Tuples are very similar to lists and support many of the same operations (indexing, slicing, concatenation etc.) but differ in that they are '''not mutable''' after creation:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source land=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; mytuple = ('fred', 'ginger', 7, 2.5)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; mylist = ['fred', 'ginger', 7, 2.5]&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; mylist[2] = 8&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print mylist&lt;br /&gt;
['fred', 'ginger', 8, 2.5]&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print mytuple[2]    &lt;br /&gt;
7&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; mytuple[2] = 8&lt;br /&gt;
Traceback (most recent call last):&lt;br /&gt;
  File &amp;quot;&amp;lt;stdin&amp;gt;&amp;quot;, line 1, in &amp;lt;module&amp;gt;&lt;br /&gt;
TypeError: 'tuple' object does not support item assignment&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
List comprehension:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; numbers = [12, 3, 90, 40, 52, 11, 10]&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; small_numbers_doubled = [number * 2 for number in numbers if number &amp;lt; 20]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; small_numbers_doubled&lt;br /&gt;
[24, 6, 22, 20]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Dictionaries==&lt;br /&gt;
&lt;br /&gt;
A dictionary is an associative array or hash table, containing '''key-value''' pairs:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
mydict = {'thomas':'blue', 'james':'red', 'henry':'green'}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print mydict['james']&lt;br /&gt;
red&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can write much more user-friendly and intuitive code using dictionaries, rather than arbitrary indexes into a list.&lt;br /&gt;
&lt;br /&gt;
Some example dictionary methods are:&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot;&lt;br /&gt;
|| m.keys() || Returns a list of the keys in '''m'''&lt;br /&gt;
|-&lt;br /&gt;
|| m.items() || Returns a list of the (key,value) pairs in '''m'''&lt;br /&gt;
|-&lt;br /&gt;
|| m[k] = x || Sets m[k] to x&lt;br /&gt;
|-&lt;br /&gt;
|| m.update(b) || Adds objects from dictionary '''b''' to '''m'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Control Structures==&lt;br /&gt;
&lt;br /&gt;
Of course, we'll need conditionals and loops etc. to go beyond the simplest of scripts.  Here is an '''if-then-else''', python style:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
if sky == ‘blue’:&lt;br /&gt;
    birds_sing()&lt;br /&gt;
elif sky == ‘black’:&lt;br /&gt;
    birds_sleep()&lt;br /&gt;
else:&lt;br /&gt;
    pass #do nothing&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and a classic '''for loop''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
for ii in range(1,10):&lt;br /&gt;
    print ii&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1&lt;br /&gt;
...&lt;br /&gt;
9&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We'll also see a '''while loop''' shoehorned into the next example.&lt;br /&gt;
&lt;br /&gt;
For our control statements, we can use comparison operators such as, '''==''', '''!=''', '''&amp;gt;''', '''&amp;lt;''', '''&amp;lt;=''', '''&amp;gt;=''', and logical operators, such as, '''and''', '''or''','''not'''&lt;br /&gt;
&lt;br /&gt;
==File Input and Output==&lt;br /&gt;
&lt;br /&gt;
Here's some code for printing the contents of a text file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
fp = open(&amp;quot;foo.txt&amp;quot;,&amp;quot;r&amp;quot;)&lt;br /&gt;
line = fp.readline()&lt;br /&gt;
while line:&lt;br /&gt;
    line = line.strip()&lt;br /&gt;
    print line&lt;br /&gt;
    line = fp.readline()&lt;br /&gt;
fp.close()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We could open a file for writing with:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
fp = open(&amp;quot;foo.txt&amp;quot;,&amp;quot;w&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and use:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
fp.write(...)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
to write to that file.&lt;br /&gt;
&lt;br /&gt;
=Object Oriented Programming in Python=&lt;br /&gt;
&lt;br /&gt;
Here is an example of using a class in python:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
&lt;br /&gt;
class Radio:&lt;br /&gt;
    &amp;quot;A simple radio&amp;quot;&lt;br /&gt;
    def __init__(self,freq=0.0,name=&amp;quot;&amp;quot;):&lt;br /&gt;
        &amp;quot;Constructor method&amp;quot;&lt;br /&gt;
        self.__frequency=freq&lt;br /&gt;
        self.name=name&lt;br /&gt;
    def tune(self,freq):&lt;br /&gt;
        self.__frequency=freq&lt;br /&gt;
    def tuned_to(self):&lt;br /&gt;
        print self.name, &amp;quot;tuned to:&amp;quot;, self.__frequency&lt;br /&gt;
&lt;br /&gt;
if __name__ == &amp;quot;__main__&amp;quot;:&lt;br /&gt;
    # declare two radio instances&lt;br /&gt;
    car = Radio(name=&amp;quot;car&amp;quot;)&lt;br /&gt;
    kitchen = Radio(91.5,&amp;quot;kitchen&amp;quot;)&lt;br /&gt;
    # call some methods&lt;br /&gt;
    car.tuned_to()&lt;br /&gt;
    kitchen.tuned_to()&lt;br /&gt;
    car.tune(89.3)&lt;br /&gt;
    car.tuned_to()&lt;br /&gt;
    # Docstrings--double quotes at the top of the class:                        &lt;br /&gt;
    print car.__doc__&lt;br /&gt;
    # NB members not private by default:&lt;br /&gt;
    print car.name&lt;br /&gt;
    # BUT leading double underscores will trigger&lt;br /&gt;
    # name mangling and hence the member will be hidden &lt;br /&gt;
    print car.__frequency&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Running the script gives us:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
car tuned to: 0.0&lt;br /&gt;
kitchen tuned to: 91.5&lt;br /&gt;
car tuned to: 89.3&lt;br /&gt;
A simple radio&lt;br /&gt;
car&lt;br /&gt;
Traceback (most recent call last):&lt;br /&gt;
  File &amp;quot;./foo.py&amp;quot;, line 27, in &amp;lt;module&amp;gt;&lt;br /&gt;
    print car.__frequency&lt;br /&gt;
AttributeError: Radio instance has no attribute '__frequency'&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Using Packages=&lt;br /&gt;
&lt;br /&gt;
Python packages are great because they provide us with a whole lot of extra functionality--above and beyond the core language--that we didn't have to write and debug ourselves.&lt;br /&gt;
&lt;br /&gt;
Let's walk through a simple example using a package.  At an interactive prompt type:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
from random import randint&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will give us access to the '''randint(x,y)''' function, which returns a randomly chosen integer from the given range [x,y]:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
4&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
1&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
3&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
0&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
OK, so far so good.  One thing to note is that the above '''import''' statement has drawn the name ''randint'' into our current '''namespace'''.  What if we had already defined a function named ''randint''.  That could cause problems.  In order to protect ourselves from this kind of problem, there are several import variants.&lt;br /&gt;
&lt;br /&gt;
By default, functions will be added to a namespace with the same name as the package.  In order to call the functions we will, in this case, have to prefix them with there namespace:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; import random&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; random.randint(0,10)&lt;br /&gt;
6&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Should we desire, we can apply a little more control and specify the namespace for the import ourselves: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; import random as rnd&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; rnd.randint(0,10)&lt;br /&gt;
3&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Another--more 'devil-may-care'--approach is to do away with the separate namespace and pull everything from a given package into the current namespace:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from random import *&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
9&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; random()&lt;br /&gt;
0.3172268098313996&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
(The '''random()''' function returns a randomly selected floating point number in the range [0, 1)--that is, between 0 and 1, including 0.0 but always smaller than 1.0.)&lt;br /&gt;
&lt;br /&gt;
==Interrogating a Module==&lt;br /&gt;
&lt;br /&gt;
To find all the functions that are in a particular module, type '''dir(&amp;lt;modulename&amp;gt;)'''.&lt;br /&gt;
&lt;br /&gt;
If you have the '''pip''' package installed, you can easily see which other packages are installed using '''pip list''' on the linux command line.&lt;br /&gt;
&lt;br /&gt;
==A Namespace Collision==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; def randint():&lt;br /&gt;
...     print &amp;quot;dummy function&amp;quot;&lt;br /&gt;
... &lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint()&lt;br /&gt;
dummy function&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from random import randint&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint()&lt;br /&gt;
Traceback (most recent call last):&lt;br /&gt;
  File &amp;quot;&amp;lt;stdin&amp;gt;&amp;quot;, line 1, in &amp;lt;module&amp;gt;&lt;br /&gt;
TypeError: randint() takes exactly 3 arguments (1 given)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
0&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Python for Shell Scripting=&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
from subprocess import call&lt;br /&gt;
call([&amp;quot;ls&amp;quot;, &amp;quot;-l&amp;quot;])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Python as a Glue Languge=&lt;br /&gt;
&lt;br /&gt;
* Calling R from python is possible using: http://rpy.sourceforge.net/index.html.&lt;br /&gt;
* Calling Matlab from python: http://mlabwrap.sourceforge.net.&lt;br /&gt;
* With SWIG you can make many bindings, including Python to C and C++: http://www.swig.org/.&lt;br /&gt;
* Or if Fortran is more your cup-of-tea, you can use f2py: http://cens.ioc.ee/projects/f2py2e/.&lt;br /&gt;
* There are many more examples.&lt;br /&gt;
&lt;br /&gt;
=Command Line Parsing=&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
&lt;br /&gt;
import sys&lt;br /&gt;
&lt;br /&gt;
if __name__ == &amp;quot;__main__&amp;quot;:&lt;br /&gt;
    # We can test on the length of argv&lt;br /&gt;
    if len(sys.argv) &amp;lt; 2:&lt;br /&gt;
        print &amp;quot;usage: to use this script...&amp;quot;&lt;br /&gt;
    else:&lt;br /&gt;
        ii = 0&lt;br /&gt;
        for arg in sys.argv:&lt;br /&gt;
            # (typically) argv[0] is bound to the script name&lt;br /&gt;
            print &amp;quot;arg&amp;quot;, ii, &amp;quot;is:&amp;quot;, arg&lt;br /&gt;
            ii = ii+1&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~$ ./cmdline.py&lt;br /&gt;
usage: to use this script...&lt;br /&gt;
gethin@gethin-desktop:~$ ./cmdline.py fred ginger&lt;br /&gt;
arg 0 is: ./cmdline.py&lt;br /&gt;
arg 1 is: fred&lt;br /&gt;
arg 2 is: ginger&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Databases=&lt;br /&gt;
&lt;br /&gt;
==Simple Databases==&lt;br /&gt;
&lt;br /&gt;
Python provides access to some database packages through some standard packages.  The '''bsddb''' module allows you to access the highly popular '''Berkeley DB database''' from your python code.&lt;br /&gt;
&lt;br /&gt;
The interface to the database provided by this module is very similar to the way in which we access a dictionary.  First, let's populate a database:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
import bsddb&lt;br /&gt;
d = bsddb.btopen('engines.db')&lt;br /&gt;
d['thomas'] = 'blue'&lt;br /&gt;
d['james'] = 'red'&lt;br /&gt;
d['henry'] = 'green'&lt;br /&gt;
d.close()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now let's open the database again and query it's contents:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; d = bsddb.btopen('engines.db')&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; d.keys()&lt;br /&gt;
['henry', 'james', 'thomas']&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; d.first()&lt;br /&gt;
('henry', 'green')&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; d.last()&lt;br /&gt;
('thomas', 'blue')&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; colour = d['james']&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; colour&lt;br /&gt;
'red'&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; del d['henry']&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; d.keys()&lt;br /&gt;
['james', 'thomas']&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Relational Databases==&lt;br /&gt;
&lt;br /&gt;
Relational databases give us more oomph.  '''SQLite''' is a useful relational database to consider as it is light, in that it requires hardly anything in terms of setup or management, yet still understands queries formulated in SQL.  As such it is useful for creating relatively simple examples of SQL access to a database in python and is a stepping stone toward more powerful database packages.&lt;br /&gt;
&lt;br /&gt;
Here is a script which will create a table called '''planets''' in the file '''pytest.db''' and populate with details of the planets in our solar system:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
#&lt;br /&gt;
# Example python script using sqlite3 package&lt;br /&gt;
# to connect to an SQLite database.&lt;br /&gt;
#&lt;br /&gt;
&lt;br /&gt;
import sqlite3&lt;br /&gt;
 &lt;br /&gt;
conn = sqlite3.connect('pytest.db') # or use :memory: to put it in RAM&lt;br /&gt;
&lt;br /&gt;
cursor = conn.cursor()&lt;br /&gt;
 &lt;br /&gt;
# create a table&lt;br /&gt;
cursor.execute(&amp;quot;&amp;quot;&amp;quot;CREATE TABLE planets&lt;br /&gt;
                  (Id INT, Name TEXT, Diameter REAL, &lt;br /&gt;
                   Mass REAL, Orbital_Period REAL)&amp;quot;&amp;quot;&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
# insert a single record&lt;br /&gt;
cursor.execute(&amp;quot;INSERT INTO planets VALUES(1,'Mercury',0.382,0.06,0.24)&amp;quot;)&lt;br /&gt;
conn.commit() # save data to file&lt;br /&gt;
 &lt;br /&gt;
# insert multiple records&lt;br /&gt;
other_planets = [(2,'Venus',0.949,0.82,0.72),&lt;br /&gt;
                 (3,'Earth',1.0,1.0,1.0),&lt;br /&gt;
                 (4,'Mars',0.532,0.11,1.52),&lt;br /&gt;
                 (5,'Jupiter',11.209,317.8,5.20),&lt;br /&gt;
                 (6,'Saturn',9.449,95.2,9.54),&lt;br /&gt;
                 (7,'Uranus',4.007,14.6,19.22),&lt;br /&gt;
                 (8,'Neptune',3.883,17.2,30.06),&lt;br /&gt;
                 (9,'Pluto',0.18,0.002,248.09)]&lt;br /&gt;
cursor.executemany(&amp;quot;INSERT INTO planets VALUES (?,?,?,?,?)&amp;quot;, other_planets)&lt;br /&gt;
conn.commit() # save data to file&lt;br /&gt;
&lt;br /&gt;
# delete a record&lt;br /&gt;
sql = &amp;quot;&amp;quot;&amp;quot;&lt;br /&gt;
DELETE FROM planets&lt;br /&gt;
WHERE Name = 'Pluto'&lt;br /&gt;
&amp;quot;&amp;quot;&amp;quot;&lt;br /&gt;
cursor.execute(sql)  # poor old pluto! &lt;br /&gt;
conn.commit()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And here is a short example script showing a couple of ways to interrogate the database: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
#&lt;br /&gt;
# Example python script using sqlite3 package&lt;br /&gt;
# to connect to an SQLite database.&lt;br /&gt;
#&lt;br /&gt;
&lt;br /&gt;
import sqlite3&lt;br /&gt;
 &lt;br /&gt;
conn = sqlite3.connect('pytest.db') # or use :memory: to put it in RAM&lt;br /&gt;
&lt;br /&gt;
cursor = conn.cursor()&lt;br /&gt;
&lt;br /&gt;
print &amp;quot;All the records in the table, ordered by Name:\n&amp;quot;&lt;br /&gt;
for row in cursor.execute(&amp;quot;SELECT rowid, * FROM planets ORDER BY Name&amp;quot;):&lt;br /&gt;
    print row&lt;br /&gt;
&lt;br /&gt;
print &amp;quot;\n&amp;quot;&lt;br /&gt;
&lt;br /&gt;
print &amp;quot;All the planets with a mass greater than or equal to that of Earth:\n&amp;quot;&lt;br /&gt;
sql = &amp;quot;SELECT * FROM planets WHERE Mass&amp;gt;=?&amp;quot;&lt;br /&gt;
cursor.execute(sql, [(&amp;quot;1.0&amp;quot;)])&lt;br /&gt;
for row in cursor.fetchall():  # or use fetchone()&lt;br /&gt;
    print row&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the results of running the script are:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
All the records in the table, ordered by Name:&lt;br /&gt;
&lt;br /&gt;
(3, 3, u'Earth', 1.0, 1.0, 1.0)&lt;br /&gt;
(5, 5, u'Jupiter', 11.209, 317.80000000000001, 5.2000000000000002)&lt;br /&gt;
(4, 4, u'Mars', 0.53200000000000003, 0.11, 1.52)&lt;br /&gt;
(1, 1, u'Mercury', 0.38200000000000001, 0.059999999999999998, 0.23999999999999999)&lt;br /&gt;
(8, 8, u'Neptune', 3.883, 17.199999999999999, 30.059999999999999)&lt;br /&gt;
(6, 6, u'Saturn', 9.4489999999999998, 95.200000000000003, 9.5399999999999991)&lt;br /&gt;
(7, 7, u'Uranus', 4.0069999999999997, 14.6, 19.219999999999999)&lt;br /&gt;
(2, 2, u'Venus', 0.94899999999999995, 0.81999999999999995, 0.71999999999999997)&lt;br /&gt;
&lt;br /&gt;
All the planets with a mass greater than or equal to that of Earth:&lt;br /&gt;
&lt;br /&gt;
(3, u'Earth', 1.0, 1.0, 1.0),&lt;br /&gt;
(5, u'Jupiter', 11.209, 317.80000000000001, 5.2000000000000002),&lt;br /&gt;
(6, u'Saturn', 9.4489999999999998, 95.200000000000003, 9.5399999999999991),&lt;br /&gt;
(7, u'Uranus', 4.0069999999999997, 14.6, 19.219999999999999),&lt;br /&gt;
(8, u'Neptune', 3.883, 17.199999999999999, 30.059999999999999)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more information on using SQLite with Python, see, e.g.:&lt;br /&gt;
* http://zetcode.com/db/sqlitepythontutorial/&lt;br /&gt;
* http://www.blog.pythonlibrary.org/2012/07/18/python-a-simple-step-by-step-sqlite-tutorial/&lt;br /&gt;
&lt;br /&gt;
You can also connect to a MySQL database from python using, e.g. the [http://mysql-python.sourceforge.net/ python-mysqldb] package.  A snippet of python code for connecting to a database is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
import MySQLdb&lt;br /&gt;
&lt;br /&gt;
conn = MySQLdb.connect(host=&amp;quot;localhost&amp;quot;,   # your host, usually localhost&lt;br /&gt;
                     user=&amp;quot;gethin&amp;quot;,      # your username&lt;br /&gt;
                      passwd=&amp;quot;changeme&amp;quot;, # your password&lt;br /&gt;
                      db=&amp;quot;menagerie&amp;quot;)    # name of the data base&lt;br /&gt;
&lt;br /&gt;
# Create a cursor object, as before with SQLite&lt;br /&gt;
cur = conn.cursor() &lt;br /&gt;
&lt;br /&gt;
# and then you can submit your SQL command:&lt;br /&gt;
cur.execute(&amp;quot;SELECT * FROM YOUR_TABLE_NAME&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Numpy=&lt;br /&gt;
&lt;br /&gt;
OK, let's move onto looking at python's numerical processing capabilities.  We will start by looking at the '''numpy''' package:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
from numpy import *&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now that we have access to the functions from '''numpy''', let's create an array.  '''Note that a numpy array is an object of a different type to an intrinsic array in Python'''.   A simple approach is to use the '''array''' function.  For example we might enter:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
a = array([[1.0,0.0,0.0],[0.0,1.0,0.0],[0.0,0.0,1.0]])&lt;br /&gt;
b = array([[1,2,3],[4,5,6],[7,8,9]])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a&lt;br /&gt;
array([[ 1.,  0.,  0.],&lt;br /&gt;
       [ 0.,  1.,  0.],&lt;br /&gt;
       [ 0.,  0.,  1.]])&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; b        &lt;br /&gt;
array([[1, 2, 3],&lt;br /&gt;
       [4, 5, 6],&lt;br /&gt;
       [7, 8, 9]])&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; transpose(b)&lt;br /&gt;
array([[1, 4, 7],&lt;br /&gt;
       [2, 5, 8],&lt;br /&gt;
       [3, 6, 9]])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Given an array, we may inquire about it's shape:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
print a.shape&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and we are told that it is a 2-dimensional array (i.e. an array of rank 2) and that the length of both dimensions is 3:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
(3, 3)&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
We can also apply operators to array objects.  For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
a = a * 9&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
array([[ 9.,  0.,  0.],&lt;br /&gt;
       [ 0.,  9.,  0.],&lt;br /&gt;
       [ 0.,  0.,  9.]])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Note, however, that most operations on numpy arrays are done element-wise''', which is '''different to a linear algebra operation that you may have been expecting.'''  We will return to linear algebra operations when we look at the '''scipy''' package.&lt;br /&gt;
&lt;br /&gt;
Should we so desire, we could re-shape the array.  One way to do this is to to set it's shape attribute directly:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a.shape = (1,9)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a&lt;br /&gt;
array([[ 9.,  0.,  0.,  0.,  9.,  0.,  0.,  0.,  9.]])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As with the list example, it can be useful to read or change the value of an element (or sub array) individually.  Let's turn the array back to it's rank-2 form and try it out:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a.shape = (3,3)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a[1,1] = 777.0&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print a&lt;br /&gt;
[[   9.    0.    0.]&lt;br /&gt;
 [   0.  777.    0.]&lt;br /&gt;
 [   0.    0.    9.]]&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a[1:,1:] = [[777.0, 777.0],[777.0, 777.0]]&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print a&lt;br /&gt;
[[   9.    0.    0.]&lt;br /&gt;
 [   0.  777.  777.]&lt;br /&gt;
 [   0.  777.  777.]]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This is all pretty handy so far, but specifying the value of each element explicitly could become a chore.  Happily some helper functions exist to give you a head start with some building blocks.  For example, your can use:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; b = zeros((3,3))&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print b&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; b = ones((3,2))&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print b&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; b = identity(2)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print b&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; big = resize(b, (6,6))&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print big&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The use of '''resize''' in the last example illustrates a useful '''replicating feature'''.&lt;br /&gt;
&lt;br /&gt;
A list of all the functions and operations contained within numpy is: http://scipy.org/Numpy_Example_List.&lt;br /&gt;
&lt;br /&gt;
=Pylab and Matplotlib=&lt;br /&gt;
&lt;br /&gt;
The above examples are quite natty, but we have deliberately kept the array sizes small so that we can print the element values easily.  In practice, you may find that your array sizes are much larger and printing the values to the screen is impractical.  Fear not!  Python has many packages which help you plot your data, so that you can explore it.&lt;br /&gt;
&lt;br /&gt;
Using the pylab plotting interface we can create:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
import pylab&lt;br /&gt;
from numpy import arange, pi, cos, sin, add, sqrt&lt;br /&gt;
t = arange(0.0, 3.0, 0.01)&lt;br /&gt;
c = cos(2 * pi * t)&lt;br /&gt;
s = sin(2 * pi * t)&lt;br /&gt;
pylab.ylabel('some numbers')&lt;br /&gt;
pylab.xlabel('some more numbers')&lt;br /&gt;
pylab.plot(t, c, 'r', lw=2)&lt;br /&gt;
pylab.plot(t, s, 'b', lw=2)&lt;br /&gt;
pylab.plot(t, c-s, 'gs', lw=2)&lt;br /&gt;
pylab.ylim(-1.5, 1.5)&lt;br /&gt;
pylab.title('sin and cos functions')&lt;br /&gt;
pylab.savefig('curves', dpi=300)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where '''curves.png''' looks like:&lt;br /&gt;
&lt;br /&gt;
[[Image:Curves.png|thumb|600px|none|Some nice curves]]&lt;br /&gt;
&lt;br /&gt;
You can open .png images from the linux command line (inc. bluecrystal) using, e.g.: '''display -resize 1000 curves.png''' &lt;br /&gt;
&lt;br /&gt;
We can also use Matplotlib directly for more control:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
import matplotlib.pyplot as plt&lt;br /&gt;
from pylab import meshgrid&lt;br /&gt;
from numpy import arange, add, sin, sqrt&lt;br /&gt;
x = arange(-5,10)&lt;br /&gt;
y = arange(-4,11)&lt;br /&gt;
z1 = sqrt(add.outer(x**2,y**2))&lt;br /&gt;
Z = sin(z1)/z1 &lt;br /&gt;
X, Y = meshgrid(x,y)&lt;br /&gt;
plt.figure()&lt;br /&gt;
plt.contour(X,Y,Z)&lt;br /&gt;
plt.show()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and you should get a window similar to:&lt;br /&gt;
&lt;br /&gt;
[[Image:Sinc-matplotlib-contour.png|thumb|600px|none|A contour map of the sinc function]]&lt;br /&gt;
&lt;br /&gt;
Perhaps the best way next step for matplotlib is to look at the gallery: http://matplotlib.org/gallery.html.&lt;br /&gt;
Just click on a figure and you will get the code used to generate it--a really great resource!&lt;br /&gt;
&lt;br /&gt;
==Input and Output==&lt;br /&gt;
&lt;br /&gt;
The foregoing is all very interesting, but life would be rather dull if you had to re-enter all your data by hand whenever you set to work with Python and numpy.  Therefore we need a means to save data to a file and load it again.  Happily, we can do this rather easily using a couple of routines from the '''pylab''' package:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from numpy import *&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from pylab import load&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from pylab import save&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; data = zeros((3,3))&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; save('myfile.txt', data)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; read_data = load(&amp;quot;myfile.txt&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''warning, the load() function of numpy will be shadowed''' in the above example.  One way to protect yourself against this is to make use of '''namespaces''':  Modify your import command to '''import pylab''' and then use '''pylab.load(..)'''.&lt;br /&gt;
&lt;br /&gt;
=Scipy=&lt;br /&gt;
&lt;br /&gt;
* http://www.scipy.org/&lt;br /&gt;
* ..and good examples on http://scipy-lectures.github.com/intro/scipy.html&lt;br /&gt;
* Many useful features:&lt;br /&gt;
* Integration &amp;amp; Differentiation&lt;br /&gt;
* Optimisation (curve fitting, etc)&lt;br /&gt;
* Fourier transforms&lt;br /&gt;
* Signal processing&lt;br /&gt;
* Statistical algorithms&lt;br /&gt;
* Much, much more...&lt;br /&gt;
* If you know Python you can use SciPy&lt;br /&gt;
&lt;br /&gt;
==An example: Differentiation==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; # derivative of x^2 at x=3&lt;br /&gt;
...&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from scipy import derivative&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; derivative(lambda x: x**2, 3)&lt;br /&gt;
6.0&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; # also works with arrays&lt;br /&gt;
...&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from numpy import array&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; my_array = array([1,2,3])&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; derivative(lambda x: x**2,my_array)&lt;br /&gt;
array([ 2., 4., 6.])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Google for many more examples pertaining to your favourite numerical procedure!&lt;br /&gt;
&lt;br /&gt;
=A Repository of Packages You Could Use=&lt;br /&gt;
&lt;br /&gt;
Now, we've touched on a couple, but there are thousands of python packages available.  Before you start writing your own function for X, check that someone hasn't contributed code for that already at http://pypi.python.org/pypi.&lt;br /&gt;
&lt;br /&gt;
'''pip''', the python package manager will look in pypi by default to install a package.  You can use the '''--user''' option to install python packages in your own user space.  See:&lt;br /&gt;
* https://pip.readthedocs.org/en/latest/&lt;br /&gt;
for more information on pip.&lt;br /&gt;
&lt;br /&gt;
=Writing Faster Python=&lt;br /&gt;
&lt;br /&gt;
As with other scripting languages, such as MATLAB and R, one of the simplest ways in which you can write faster python code is to eliminate loops by vectorising your code.&lt;br /&gt;
&lt;br /&gt;
Consider the following two scripts.  First '''for-loop.py''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
&lt;br /&gt;
import numpy as np&lt;br /&gt;
arr = np.random.rand(1000000)&lt;br /&gt;
&lt;br /&gt;
def filter(arr):&lt;br /&gt;
    for i, val in enumerate(arr):&lt;br /&gt;
        if val &amp;lt; 0.5:&lt;br /&gt;
            arr[i] = 0&lt;br /&gt;
    return arr&lt;br /&gt;
&lt;br /&gt;
if __name__ == &amp;quot;__main__&amp;quot;:&lt;br /&gt;
    filter(arr)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and secondly, '''vectorised.py''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
&lt;br /&gt;
import numpy as np&lt;br /&gt;
arr = np.random.rand(1000000)&lt;br /&gt;
&lt;br /&gt;
def filter(arr):&lt;br /&gt;
    arr[arr &amp;lt; 0.5] = 0&lt;br /&gt;
    return arr&lt;br /&gt;
&lt;br /&gt;
if __name__ == &amp;quot;__main__&amp;quot;:&lt;br /&gt;
    filter(arr)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If we now run these two scripts through the Linux command line '''time''' utility, we see that the vectorised code runs a lot faster than the for loop:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~$ time ./for-loop.py &lt;br /&gt;
&lt;br /&gt;
real	0m0.963s&lt;br /&gt;
user	0m0.952s&lt;br /&gt;
sys	0m0.012s&lt;br /&gt;
gethin@gethin-desktop:~$ time ./vectorised.py &lt;br /&gt;
&lt;br /&gt;
real	0m0.116s&lt;br /&gt;
user	0m0.096s&lt;br /&gt;
sys	0m0.020s&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For some more tips on writing faster python code, and examples of how to use one of the python profiler modules, take a look at:&lt;br /&gt;
* https://wiki.python.org/moin/PythonSpeed/PerformanceTips&lt;br /&gt;
* http://technicaldiscovery.blogspot.co.uk/2011/06/speeding-up-python-numpy-cython-and.html&lt;br /&gt;
* http://www.huyng.com/posts/python-performance-analysis/&lt;br /&gt;
* http://www.appneta.com/2012/05/21/profiling-python-performance-lineprof-statprof-cprofile/&lt;br /&gt;
&lt;br /&gt;
=Further Reading=&lt;br /&gt;
&lt;br /&gt;
* http://docs.python.org/tutorial/&lt;br /&gt;
* http://wiki.python.org/moin/PythonBooks&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=Python1&amp;diff=9462</id>
		<title>Python1</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=Python1&amp;diff=9462"/>
		<updated>2014-10-10T13:48:57Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: /* Using Packages */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[category:Pragmatic Programming]]&lt;br /&gt;
'''Python for Scientists'''&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
[[Image:Python.png|thumb|1100px|none|http://xkcd.com/353/]]&lt;br /&gt;
&lt;br /&gt;
With thanks to Simon Metson and Mike Wallace for much of the following material.&lt;br /&gt;
&lt;br /&gt;
=Getting Started on BlueCrystal Phase-2=&lt;br /&gt;
&lt;br /&gt;
After you have logged in, type the following at the command line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module add languages/python-2.7.2.0&lt;br /&gt;
python&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This should start up an interactive python session:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Python 2.7.2 (default, Aug 25 2011, 10:51:03) &lt;br /&gt;
[GCC 4.3.3] on linux2&lt;br /&gt;
Type &amp;quot;help&amp;quot;, &amp;quot;copyright&amp;quot;, &amp;quot;credits&amp;quot; or &amp;quot;license&amp;quot; for more information.&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where we can type commands at the '''&amp;gt;&amp;gt;&amp;gt;''' prompt.&lt;br /&gt;
&lt;br /&gt;
=Python as a Calculator=&lt;br /&gt;
&lt;br /&gt;
To get started, let's just try a few commands out.  If you type:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print &amp;quot;Hello!&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
you'll get:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hello!&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you try:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print 5 + 9&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
you'll get:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
14&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far so simple!  Here is a copy of a session containing a few more commands where we've set the values of some variables and also defined and run our own function: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; five = 5&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; neuf = 9&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print five + neuf&lt;br /&gt;
14&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; def say_hello():&lt;br /&gt;
...     print &amp;quot;Hello, world!&amp;quot;&lt;br /&gt;
... # hit return here &lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; say_hello()&lt;br /&gt;
Hello, world!&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can exit an interactive session at any time by typing '''Ctrl-D'''.&lt;br /&gt;
&lt;br /&gt;
=Getting Help=&lt;br /&gt;
&lt;br /&gt;
One of the good things about Python is that it has lots of useful online documentation.  ([[A_Good_Read|There are good books on the language too]].)  For example, take a look at: http://docs.python.org/.  You can also type '''help()''' and the interpreter prompt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; help()&lt;br /&gt;
&lt;br /&gt;
Welcome to Python 2.7!  This is the online help utility.&lt;br /&gt;
&lt;br /&gt;
If this is your first time using Python, you should definitely check out&lt;br /&gt;
the tutorial on the Internet at http://docs.python.org/tutorial/.&lt;br /&gt;
&lt;br /&gt;
Enter the name of any module, keyword, or topic to get help on writing&lt;br /&gt;
Python programs and using Python modules.  To quit this help utility and&lt;br /&gt;
return to the interpreter, just type &amp;quot;quit&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
...&lt;br /&gt;
&lt;br /&gt;
help&amp;gt; keywords&lt;br /&gt;
&lt;br /&gt;
Here is a list of the Python keywords.  Enter any keyword to get more help.&lt;br /&gt;
&lt;br /&gt;
and                 elif                if                  print&lt;br /&gt;
...&lt;br /&gt;
&lt;br /&gt;
help&amp;gt; if&lt;br /&gt;
The ``if`` statement&lt;br /&gt;
********************&lt;br /&gt;
&lt;br /&gt;
The ``if`` statement is used for conditional execution:&lt;br /&gt;
&lt;br /&gt;
   if_stmt ::= &amp;quot;if&amp;quot; expression &amp;quot;:&amp;quot; suite&lt;br /&gt;
               ( &amp;quot;elif&amp;quot; expression &amp;quot;:&amp;quot; suite )*&lt;br /&gt;
               [&amp;quot;else&amp;quot; &amp;quot;:&amp;quot; suite]&lt;br /&gt;
&lt;br /&gt;
It selects exactly one of the suites by evaluating the expressions one&lt;br /&gt;
by one until one is found to be true...&lt;br /&gt;
...&lt;br /&gt;
&lt;br /&gt;
help&amp;gt; quit&lt;br /&gt;
&lt;br /&gt;
You are now leaving help and returning to the Python interpreter.&lt;br /&gt;
...&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Making a Script=&lt;br /&gt;
&lt;br /&gt;
An interactive session can be fun and useful for trying things out.  However--to save our fingers--we will typically want to execute a series of commands as a script, created using your favourite text editor.  Here are the contents of an example script:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/env python&lt;br /&gt;
&lt;br /&gt;
print &amp;quot;Hello, from a python script!&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
Ensure that your script is executable:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
chmod u+x myscript.py&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and now you can run it:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[ggdagw@bigblue4 ~]$ ./myscript.py &lt;br /&gt;
Hello, from a python script!&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Python and Whitespace=&lt;br /&gt;
&lt;br /&gt;
Love it of hate it, Python incorporates whitespace in it's syntax. (It's either that or demarcate blocks with some other syntax, such as ending a line with a semi-colon as it is in C.  Pick your poison.)  Spacing is therefore key in creating a valid python script.  For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
message = &amp;quot;happy days!&amp;quot;&lt;br /&gt;
if len(message) &amp;gt; 10:&lt;br /&gt;
    print &amp;quot;longer..&amp;quot;&lt;br /&gt;
else:&lt;br /&gt;
    print &amp;quot;shorter..&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
will work, but:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
message = &amp;quot;happy days!&amp;quot;&lt;br /&gt;
if len(message) &amp;gt; 10:&lt;br /&gt;
 print &amp;quot;longer..&amp;quot;&lt;br /&gt;
else:&lt;br /&gt;
print &amp;quot;shorter..&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
will not:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
  File &amp;quot;./myscript.py&amp;quot;, line 7&lt;br /&gt;
    print &amp;quot;shorter..&amp;quot;&lt;br /&gt;
        ^&lt;br /&gt;
IndentationError: expected an indented block&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It is therefore a great advantage, when writing to python script, to use a text editor which has a dedicated python mode--such as '''emacs'''--and will actively help you to keep your spacing correct.  See, http://wiki.python.org/moin/PythonEditors, for an extensive list.&lt;br /&gt;
&lt;br /&gt;
=Some Suggested Exercises=&lt;br /&gt;
&lt;br /&gt;
* Calculate the volume of a sphere.  You can experiment with the following (where r needs to be set to some value):&lt;br /&gt;
** &amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;4/3 * 3.14159265359 * r ** 3&amp;lt;/source&amp;gt;&lt;br /&gt;
** &amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;4.0/3.0 * 3.14159265359 * pow(r,3)&amp;lt;/source&amp;gt;&lt;br /&gt;
** &amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;float(4)/float(3) * 3.14159265359 * pow(r,3)&amp;lt;/source&amp;gt;&lt;br /&gt;
* Concatenate two strings&lt;br /&gt;
* Write a recursive function to compute fibonacci numbers (Hint: F(n) = F(n-1) +F(n-2), F(0)=0 and F(1)=1)&lt;br /&gt;
&lt;br /&gt;
=Nuts and Bolts=&lt;br /&gt;
&lt;br /&gt;
==Types==&lt;br /&gt;
&lt;br /&gt;
Python has intrinsic types including, integers, floats, booleans and complex numbers.  It is dynamically typed (meaning that you don't have to have a block of variable declarations at the top of your script), but it is '''not weakly''' typed, for example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; my_complex = 2 + 0.5j&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; my_complex&lt;br /&gt;
(2+0.5j)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; my_complex.real&lt;br /&gt;
2.0&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; my_complex.imag&lt;br /&gt;
0.5&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; name = 'fred'&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; lucky = 7&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; name + lucky&lt;br /&gt;
Traceback (most recent call last):&lt;br /&gt;
  File &amp;quot;&amp;lt;stdin&amp;gt;&amp;quot;, line 1, in &amp;lt;module&amp;gt;&lt;br /&gt;
TypeError: cannot concatenate 'str' and 'int' objects&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Strings==&lt;br /&gt;
&lt;br /&gt;
The eagle-eyed will have spotted in a previous examples that we could ask the length a character string--straight off the bat.  No need to write a counting routine ourselves:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
message = &amp;quot;happy days!&amp;quot;&lt;br /&gt;
print len(message)&lt;br /&gt;
11&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We also take '''slices''' of our character string.  In my case&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
print message[:5]&lt;br /&gt;
happy&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Since a string is an '''object''' (in the object oriented programming sense of the word, but more of that another time...) we can call a number of methods that operate on a string.  A selected sample include:&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot;&lt;br /&gt;
|| s.find(sub) || Finds the first occurrence of the given substring&lt;br /&gt;
|-&lt;br /&gt;
|| s.islower() || Checks whether all characters are lowercase&lt;br /&gt;
|-&lt;br /&gt;
|| s.upper() || Returns '''s''' converted to uppercase&lt;br /&gt;
|-&lt;br /&gt;
|| s.strip() || Removes leading and trailing whitespace&lt;br /&gt;
|-&lt;br /&gt;
|| s.replace(old,new) || Replaces substring '''old''' with '''new'''&lt;br /&gt;
|-&lt;br /&gt;
|| s.split([sep]) || Splits '''s''' uses (optional) '''sep''' as a delimiter.  Returns a list&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Lists and Tuples==&lt;br /&gt;
&lt;br /&gt;
An example of a list is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
shopping = ['bread', 'marmalade', 'milk', 'tea']&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and we can inquire about the length of that using the same function as before:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
len(shopping)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can also take '''slices''' of a list, as we did with a string:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
shopping[0:2]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and even reset a portion of the list that way:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
shopping[0:2] = ['bagels', 'jam']&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Since a list is also an object, we have more handy methods, including:&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot;&lt;br /&gt;
|| s.append(x) || Appends an new element '''x''' to the end of '''s'''&lt;br /&gt;
|-&lt;br /&gt;
|| s.count(x) || Returns the number of occurences of '''x''' in '''s'''&lt;br /&gt;
|-&lt;br /&gt;
|| s.reverse(x) || Reverses items of '''s''' in place&lt;br /&gt;
|-&lt;br /&gt;
|| s.sort([compfunc]) || Sorts items of '''s''' in place.  '''compfunc''' is an optional comparison function&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Tuples are very similar to lists and support many of the same operations (indexing, slicing, concatenation etc.) but differ in that they are '''not mutable''' after creation:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source land=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; mytuple = ('fred', 'ginger', 7, 2.5)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; mylist = ['fred', 'ginger', 7, 2.5]&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; mylist[2] = 8&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print mylist&lt;br /&gt;
['fred', 'ginger', 8, 2.5]&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print mytuple[2]    &lt;br /&gt;
7&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; mytuple[2] = 8&lt;br /&gt;
Traceback (most recent call last):&lt;br /&gt;
  File &amp;quot;&amp;lt;stdin&amp;gt;&amp;quot;, line 1, in &amp;lt;module&amp;gt;&lt;br /&gt;
TypeError: 'tuple' object does not support item assignment&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
List comprehension:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; numbers = [12, 3, 90, 40, 52, 11, 10]&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; small_numbers_doubled = [number * 2 for number in numbers if number &amp;lt; 20]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; small_numbers_doubled&lt;br /&gt;
[24, 6, 22, 20]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Dictionaries==&lt;br /&gt;
&lt;br /&gt;
A dictionary is an associative array or hash table, containing '''key-value''' pairs:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
mydict = {'thomas':'blue', 'james':'red', 'henry':'green'}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print mydict['james']&lt;br /&gt;
red&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can write much more user-friendly and intuitive code using dictionaries, rather than arbitrary indexes into a list.&lt;br /&gt;
&lt;br /&gt;
Some example dictionary methods are:&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot;&lt;br /&gt;
|| m.keys() || Returns a list of the keys in '''m'''&lt;br /&gt;
|-&lt;br /&gt;
|| m.items() || Returns a list of the (key,value) pairs in '''m'''&lt;br /&gt;
|-&lt;br /&gt;
|| m[k] = x || Sets m[k] to x&lt;br /&gt;
|-&lt;br /&gt;
|| m.update(b) || Adds objects from dictionary '''b''' to '''m'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Control Structures==&lt;br /&gt;
&lt;br /&gt;
Of course, we'll need conditionals and loops etc. to go beyond the simplest of scripts.  Here is an '''if-then-else''', python style:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
if sky == ‘blue’:&lt;br /&gt;
    birds_sing()&lt;br /&gt;
elif sky == ‘black’:&lt;br /&gt;
    birds_sleep()&lt;br /&gt;
else:&lt;br /&gt;
    pass #do nothing&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and a classic '''for loop''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
for ii in range(1,10):&lt;br /&gt;
    print ii&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1&lt;br /&gt;
...&lt;br /&gt;
9&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We'll also see a '''while loop''' shoehorned into the next example.&lt;br /&gt;
&lt;br /&gt;
For our control statements, we can use comparison operators such as, '''==''', '''!=''', '''&amp;gt;''', '''&amp;lt;''', '''&amp;lt;=''', '''&amp;gt;=''', and logical operators, such as, '''and''', '''or''','''not'''&lt;br /&gt;
&lt;br /&gt;
==File Input and Output==&lt;br /&gt;
&lt;br /&gt;
Here's some code for printing the contents of a text file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
fp = open(&amp;quot;foo.txt&amp;quot;,&amp;quot;r&amp;quot;)&lt;br /&gt;
line = fp.readline()&lt;br /&gt;
while line:&lt;br /&gt;
    line = line.strip()&lt;br /&gt;
    print line&lt;br /&gt;
    line = fp.readline()&lt;br /&gt;
fp.close()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We could open a file for writing with:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
fp = open(&amp;quot;foo.txt&amp;quot;,&amp;quot;w&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and use:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
fp.write(...)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
to write to that file.&lt;br /&gt;
&lt;br /&gt;
=Object Oriented Programming in Python=&lt;br /&gt;
&lt;br /&gt;
Here is an example of using a class in python:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
&lt;br /&gt;
class Radio:&lt;br /&gt;
    &amp;quot;A simple radio&amp;quot;&lt;br /&gt;
    def __init__(self,freq=0.0,name=&amp;quot;&amp;quot;):&lt;br /&gt;
        &amp;quot;Constructor method&amp;quot;&lt;br /&gt;
        self.__frequency=freq&lt;br /&gt;
        self.name=name&lt;br /&gt;
    def tune(self,freq):&lt;br /&gt;
        self.__frequency=freq&lt;br /&gt;
    def tuned_to(self):&lt;br /&gt;
        print self.name, &amp;quot;tuned to:&amp;quot;, self.__frequency&lt;br /&gt;
&lt;br /&gt;
if __name__ == &amp;quot;__main__&amp;quot;:&lt;br /&gt;
    # declare two radio instances&lt;br /&gt;
    car = Radio(name=&amp;quot;car&amp;quot;)&lt;br /&gt;
    kitchen = Radio(91.5,&amp;quot;kitchen&amp;quot;)&lt;br /&gt;
    # call some methods&lt;br /&gt;
    car.tuned_to()&lt;br /&gt;
    kitchen.tuned_to()&lt;br /&gt;
    car.tune(89.3)&lt;br /&gt;
    car.tuned_to()&lt;br /&gt;
    # Docstrings--double quotes at the top of the class:                        &lt;br /&gt;
    print car.__doc__&lt;br /&gt;
    # NB members not private by default:&lt;br /&gt;
    print car.name&lt;br /&gt;
    # BUT leading double underscores will trigger&lt;br /&gt;
    # name mangling and hence the member will be hidden &lt;br /&gt;
    print car.__frequency&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Running the script gives us:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
car tuned to: 0.0&lt;br /&gt;
kitchen tuned to: 91.5&lt;br /&gt;
car tuned to: 89.3&lt;br /&gt;
A simple radio&lt;br /&gt;
car&lt;br /&gt;
Traceback (most recent call last):&lt;br /&gt;
  File &amp;quot;./foo.py&amp;quot;, line 27, in &amp;lt;module&amp;gt;&lt;br /&gt;
    print car.__frequency&lt;br /&gt;
AttributeError: Radio instance has no attribute '__frequency'&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Using Packages=&lt;br /&gt;
&lt;br /&gt;
Python packages are great because they provide us with a whole lot of extra functionality--above and beyond the core language--that we didn't have to write and debug ourselves.&lt;br /&gt;
&lt;br /&gt;
Let's walk through a simple example using a package.  At an interactive prompt type:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
from random import randint&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will give us access to the '''randint(x,y)''' function, which returns a randomly chosen integer from the given range [x,y]:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
4&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
1&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
3&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
0&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
OK, so far so good.  One thing to note is that the above '''import''' statement has drawn the name ''randint'' into our current '''namespace'''.  What if we had already defined a function named ''randint''.  That could cause problems.  In order to protect ourselves from this kind of problem, there are several import variants.&lt;br /&gt;
&lt;br /&gt;
By default, functions will be added to a namespace with the same name as the package.  In order to call the functions we will, in this case, have to prefix them with there namespace:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; import random&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; random.randint(0,10)&lt;br /&gt;
6&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Should we desire, we can apply a little more control and specify the namespace for the import ourselves: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; import random as rnd&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; rnd.randint(0,10)&lt;br /&gt;
3&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Another--more 'devil-may-care'--approach is to do away with the separate namespace and pull everything from a given package into the current namespace:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from random import *&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
9&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; random()&lt;br /&gt;
0.3172268098313996&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
(The '''random()''' function returns a randomly selected floating point number in the range [0, 1)--that is, between 0 and 1, including 0.0 but always smaller than 1.0.)&lt;br /&gt;
&lt;br /&gt;
==Interrogating a Module==&lt;br /&gt;
&lt;br /&gt;
To find all the functions that are in a particular module, type '''dir(&amp;lt;modulename&amp;gt;)'''.&lt;br /&gt;
&lt;br /&gt;
==A Namespace Collision==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; def randint():&lt;br /&gt;
...     print &amp;quot;dummy function&amp;quot;&lt;br /&gt;
... &lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint()&lt;br /&gt;
dummy function&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from random import randint&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint()&lt;br /&gt;
Traceback (most recent call last):&lt;br /&gt;
  File &amp;quot;&amp;lt;stdin&amp;gt;&amp;quot;, line 1, in &amp;lt;module&amp;gt;&lt;br /&gt;
TypeError: randint() takes exactly 3 arguments (1 given)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
0&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Python for Shell Scripting=&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
from subprocess import call&lt;br /&gt;
call([&amp;quot;ls&amp;quot;, &amp;quot;-l&amp;quot;])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Python as a Glue Languge=&lt;br /&gt;
&lt;br /&gt;
* Calling R from python is possible using: http://rpy.sourceforge.net/index.html.&lt;br /&gt;
* Calling Matlab from python: http://mlabwrap.sourceforge.net.&lt;br /&gt;
* With SWIG you can make many bindings, including Python to C and C++: http://www.swig.org/.&lt;br /&gt;
* Or if Fortran is more your cup-of-tea, you can use f2py: http://cens.ioc.ee/projects/f2py2e/.&lt;br /&gt;
* There are many more examples.&lt;br /&gt;
&lt;br /&gt;
=Command Line Parsing=&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
&lt;br /&gt;
import sys&lt;br /&gt;
&lt;br /&gt;
if __name__ == &amp;quot;__main__&amp;quot;:&lt;br /&gt;
    # We can test on the length of argv&lt;br /&gt;
    if len(sys.argv) &amp;lt; 2:&lt;br /&gt;
        print &amp;quot;usage: to use this script...&amp;quot;&lt;br /&gt;
    else:&lt;br /&gt;
        ii = 0&lt;br /&gt;
        for arg in sys.argv:&lt;br /&gt;
            # (typically) argv[0] is bound to the script name&lt;br /&gt;
            print &amp;quot;arg&amp;quot;, ii, &amp;quot;is:&amp;quot;, arg&lt;br /&gt;
            ii = ii+1&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~$ ./cmdline.py&lt;br /&gt;
usage: to use this script...&lt;br /&gt;
gethin@gethin-desktop:~$ ./cmdline.py fred ginger&lt;br /&gt;
arg 0 is: ./cmdline.py&lt;br /&gt;
arg 1 is: fred&lt;br /&gt;
arg 2 is: ginger&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Databases=&lt;br /&gt;
&lt;br /&gt;
==Simple Databases==&lt;br /&gt;
&lt;br /&gt;
Python provides access to some database packages through some standard packages.  The '''bsddb''' module allows you to access the highly popular '''Berkeley DB database''' from your python code.&lt;br /&gt;
&lt;br /&gt;
The interface to the database provided by this module is very similar to the way in which we access a dictionary.  First, let's populate a database:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
import bsddb&lt;br /&gt;
d = bsddb.btopen('engines.db')&lt;br /&gt;
d['thomas'] = 'blue'&lt;br /&gt;
d['james'] = 'red'&lt;br /&gt;
d['henry'] = 'green'&lt;br /&gt;
d.close()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now let's open the database again and query it's contents:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; d = bsddb.btopen('engines.db')&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; d.keys()&lt;br /&gt;
['henry', 'james', 'thomas']&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; d.first()&lt;br /&gt;
('henry', 'green')&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; d.last()&lt;br /&gt;
('thomas', 'blue')&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; colour = d['james']&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; colour&lt;br /&gt;
'red'&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; del d['henry']&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; d.keys()&lt;br /&gt;
['james', 'thomas']&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Relational Databases==&lt;br /&gt;
&lt;br /&gt;
Relational databases give us more oomph.  '''SQLite''' is a useful relational database to consider as it is light, in that it requires hardly anything in terms of setup or management, yet still understands queries formulated in SQL.  As such it is useful for creating relatively simple examples of SQL access to a database in python and is a stepping stone toward more powerful database packages.&lt;br /&gt;
&lt;br /&gt;
Here is a script which will create a table called '''planets''' in the file '''pytest.db''' and populate with details of the planets in our solar system:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
#&lt;br /&gt;
# Example python script using sqlite3 package&lt;br /&gt;
# to connect to an SQLite database.&lt;br /&gt;
#&lt;br /&gt;
&lt;br /&gt;
import sqlite3&lt;br /&gt;
 &lt;br /&gt;
conn = sqlite3.connect('pytest.db') # or use :memory: to put it in RAM&lt;br /&gt;
&lt;br /&gt;
cursor = conn.cursor()&lt;br /&gt;
 &lt;br /&gt;
# create a table&lt;br /&gt;
cursor.execute(&amp;quot;&amp;quot;&amp;quot;CREATE TABLE planets&lt;br /&gt;
                  (Id INT, Name TEXT, Diameter REAL, &lt;br /&gt;
                   Mass REAL, Orbital_Period REAL)&amp;quot;&amp;quot;&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
# insert a single record&lt;br /&gt;
cursor.execute(&amp;quot;INSERT INTO planets VALUES(1,'Mercury',0.382,0.06,0.24)&amp;quot;)&lt;br /&gt;
conn.commit() # save data to file&lt;br /&gt;
 &lt;br /&gt;
# insert multiple records&lt;br /&gt;
other_planets = [(2,'Venus',0.949,0.82,0.72),&lt;br /&gt;
                 (3,'Earth',1.0,1.0,1.0),&lt;br /&gt;
                 (4,'Mars',0.532,0.11,1.52),&lt;br /&gt;
                 (5,'Jupiter',11.209,317.8,5.20),&lt;br /&gt;
                 (6,'Saturn',9.449,95.2,9.54),&lt;br /&gt;
                 (7,'Uranus',4.007,14.6,19.22),&lt;br /&gt;
                 (8,'Neptune',3.883,17.2,30.06),&lt;br /&gt;
                 (9,'Pluto',0.18,0.002,248.09)]&lt;br /&gt;
cursor.executemany(&amp;quot;INSERT INTO planets VALUES (?,?,?,?,?)&amp;quot;, other_planets)&lt;br /&gt;
conn.commit() # save data to file&lt;br /&gt;
&lt;br /&gt;
# delete a record&lt;br /&gt;
sql = &amp;quot;&amp;quot;&amp;quot;&lt;br /&gt;
DELETE FROM planets&lt;br /&gt;
WHERE Name = 'Pluto'&lt;br /&gt;
&amp;quot;&amp;quot;&amp;quot;&lt;br /&gt;
cursor.execute(sql)  # poor old pluto! &lt;br /&gt;
conn.commit()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And here is a short example script showing a couple of ways to interrogate the database: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
#&lt;br /&gt;
# Example python script using sqlite3 package&lt;br /&gt;
# to connect to an SQLite database.&lt;br /&gt;
#&lt;br /&gt;
&lt;br /&gt;
import sqlite3&lt;br /&gt;
 &lt;br /&gt;
conn = sqlite3.connect('pytest.db') # or use :memory: to put it in RAM&lt;br /&gt;
&lt;br /&gt;
cursor = conn.cursor()&lt;br /&gt;
&lt;br /&gt;
print &amp;quot;All the records in the table, ordered by Name:\n&amp;quot;&lt;br /&gt;
for row in cursor.execute(&amp;quot;SELECT rowid, * FROM planets ORDER BY Name&amp;quot;):&lt;br /&gt;
    print row&lt;br /&gt;
&lt;br /&gt;
print &amp;quot;\n&amp;quot;&lt;br /&gt;
&lt;br /&gt;
print &amp;quot;All the planets with a mass greater than or equal to that of Earth:\n&amp;quot;&lt;br /&gt;
sql = &amp;quot;SELECT * FROM planets WHERE Mass&amp;gt;=?&amp;quot;&lt;br /&gt;
cursor.execute(sql, [(&amp;quot;1.0&amp;quot;)])&lt;br /&gt;
for row in cursor.fetchall():  # or use fetchone()&lt;br /&gt;
    print row&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the results of running the script are:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
All the records in the table, ordered by Name:&lt;br /&gt;
&lt;br /&gt;
(3, 3, u'Earth', 1.0, 1.0, 1.0)&lt;br /&gt;
(5, 5, u'Jupiter', 11.209, 317.80000000000001, 5.2000000000000002)&lt;br /&gt;
(4, 4, u'Mars', 0.53200000000000003, 0.11, 1.52)&lt;br /&gt;
(1, 1, u'Mercury', 0.38200000000000001, 0.059999999999999998, 0.23999999999999999)&lt;br /&gt;
(8, 8, u'Neptune', 3.883, 17.199999999999999, 30.059999999999999)&lt;br /&gt;
(6, 6, u'Saturn', 9.4489999999999998, 95.200000000000003, 9.5399999999999991)&lt;br /&gt;
(7, 7, u'Uranus', 4.0069999999999997, 14.6, 19.219999999999999)&lt;br /&gt;
(2, 2, u'Venus', 0.94899999999999995, 0.81999999999999995, 0.71999999999999997)&lt;br /&gt;
&lt;br /&gt;
All the planets with a mass greater than or equal to that of Earth:&lt;br /&gt;
&lt;br /&gt;
(3, u'Earth', 1.0, 1.0, 1.0),&lt;br /&gt;
(5, u'Jupiter', 11.209, 317.80000000000001, 5.2000000000000002),&lt;br /&gt;
(6, u'Saturn', 9.4489999999999998, 95.200000000000003, 9.5399999999999991),&lt;br /&gt;
(7, u'Uranus', 4.0069999999999997, 14.6, 19.219999999999999),&lt;br /&gt;
(8, u'Neptune', 3.883, 17.199999999999999, 30.059999999999999)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more information on using SQLite with Python, see, e.g.:&lt;br /&gt;
* http://zetcode.com/db/sqlitepythontutorial/&lt;br /&gt;
* http://www.blog.pythonlibrary.org/2012/07/18/python-a-simple-step-by-step-sqlite-tutorial/&lt;br /&gt;
&lt;br /&gt;
You can also connect to a MySQL database from python using, e.g. the [http://mysql-python.sourceforge.net/ python-mysqldb] package.  A snippet of python code for connecting to a database is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
import MySQLdb&lt;br /&gt;
&lt;br /&gt;
conn = MySQLdb.connect(host=&amp;quot;localhost&amp;quot;,   # your host, usually localhost&lt;br /&gt;
                     user=&amp;quot;gethin&amp;quot;,      # your username&lt;br /&gt;
                      passwd=&amp;quot;changeme&amp;quot;, # your password&lt;br /&gt;
                      db=&amp;quot;menagerie&amp;quot;)    # name of the data base&lt;br /&gt;
&lt;br /&gt;
# Create a cursor object, as before with SQLite&lt;br /&gt;
cur = conn.cursor() &lt;br /&gt;
&lt;br /&gt;
# and then you can submit your SQL command:&lt;br /&gt;
cur.execute(&amp;quot;SELECT * FROM YOUR_TABLE_NAME&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Numpy=&lt;br /&gt;
&lt;br /&gt;
OK, let's move onto looking at python's numerical processing capabilities.  We will start by looking at the '''numpy''' package:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
from numpy import *&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now that we have access to the functions from '''numpy''', let's create an array.  '''Note that a numpy array is an object of a different type to an intrinsic array in Python'''.   A simple approach is to use the '''array''' function.  For example we might enter:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
a = array([[1.0,0.0,0.0],[0.0,1.0,0.0],[0.0,0.0,1.0]])&lt;br /&gt;
b = array([[1,2,3],[4,5,6],[7,8,9]])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a&lt;br /&gt;
array([[ 1.,  0.,  0.],&lt;br /&gt;
       [ 0.,  1.,  0.],&lt;br /&gt;
       [ 0.,  0.,  1.]])&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; b        &lt;br /&gt;
array([[1, 2, 3],&lt;br /&gt;
       [4, 5, 6],&lt;br /&gt;
       [7, 8, 9]])&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; transpose(b)&lt;br /&gt;
array([[1, 4, 7],&lt;br /&gt;
       [2, 5, 8],&lt;br /&gt;
       [3, 6, 9]])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Given an array, we may inquire about it's shape:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
print a.shape&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and we are told that it is a 2-dimensional array (i.e. an array of rank 2) and that the length of both dimensions is 3:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
(3, 3)&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
We can also apply operators to array objects.  For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
a = a * 9&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
array([[ 9.,  0.,  0.],&lt;br /&gt;
       [ 0.,  9.,  0.],&lt;br /&gt;
       [ 0.,  0.,  9.]])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Note, however, that most operations on numpy arrays are done element-wise''', which is '''different to a linear algebra operation that you may have been expecting.'''  We will return to linear algebra operations when we look at the '''scipy''' package.&lt;br /&gt;
&lt;br /&gt;
Should we so desire, we could re-shape the array.  One way to do this is to to set it's shape attribute directly:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a.shape = (1,9)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a&lt;br /&gt;
array([[ 9.,  0.,  0.,  0.,  9.,  0.,  0.,  0.,  9.]])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As with the list example, it can be useful to read or change the value of an element (or sub array) individually.  Let's turn the array back to it's rank-2 form and try it out:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a.shape = (3,3)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a[1,1] = 777.0&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print a&lt;br /&gt;
[[   9.    0.    0.]&lt;br /&gt;
 [   0.  777.    0.]&lt;br /&gt;
 [   0.    0.    9.]]&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a[1:,1:] = [[777.0, 777.0],[777.0, 777.0]]&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print a&lt;br /&gt;
[[   9.    0.    0.]&lt;br /&gt;
 [   0.  777.  777.]&lt;br /&gt;
 [   0.  777.  777.]]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This is all pretty handy so far, but specifying the value of each element explicitly could become a chore.  Happily some helper functions exist to give you a head start with some building blocks.  For example, your can use:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; b = zeros((3,3))&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print b&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; b = ones((3,2))&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print b&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; b = identity(2)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print b&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; big = resize(b, (6,6))&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print big&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The use of '''resize''' in the last example illustrates a useful '''replicating feature'''.&lt;br /&gt;
&lt;br /&gt;
A list of all the functions and operations contained within numpy is: http://scipy.org/Numpy_Example_List.&lt;br /&gt;
&lt;br /&gt;
=Pylab and Matplotlib=&lt;br /&gt;
&lt;br /&gt;
The above examples are quite natty, but we have deliberately kept the array sizes small so that we can print the element values easily.  In practice, you may find that your array sizes are much larger and printing the values to the screen is impractical.  Fear not!  Python has many packages which help you plot your data, so that you can explore it.&lt;br /&gt;
&lt;br /&gt;
Using the pylab plotting interface we can create:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
import pylab&lt;br /&gt;
from numpy import arange, pi, cos, sin, add, sqrt&lt;br /&gt;
t = arange(0.0, 3.0, 0.01)&lt;br /&gt;
c = cos(2 * pi * t)&lt;br /&gt;
s = sin(2 * pi * t)&lt;br /&gt;
pylab.ylabel('some numbers')&lt;br /&gt;
pylab.xlabel('some more numbers')&lt;br /&gt;
pylab.plot(t, c, 'r', lw=2)&lt;br /&gt;
pylab.plot(t, s, 'b', lw=2)&lt;br /&gt;
pylab.plot(t, c-s, 'gs', lw=2)&lt;br /&gt;
pylab.ylim(-1.5, 1.5)&lt;br /&gt;
pylab.title('sin and cos functions')&lt;br /&gt;
pylab.savefig('curves', dpi=300)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where '''curves.png''' looks like:&lt;br /&gt;
&lt;br /&gt;
[[Image:Curves.png|thumb|600px|none|Some nice curves]]&lt;br /&gt;
&lt;br /&gt;
You can open .png images from the linux command line (inc. bluecrystal) using, e.g.: '''display -resize 1000 curves.png''' &lt;br /&gt;
&lt;br /&gt;
We can also use Matplotlib directly for more control:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
import matplotlib.pyplot as plt&lt;br /&gt;
from pylab import meshgrid&lt;br /&gt;
from numpy import arange, add, sin, sqrt&lt;br /&gt;
x = arange(-5,10)&lt;br /&gt;
y = arange(-4,11)&lt;br /&gt;
z1 = sqrt(add.outer(x**2,y**2))&lt;br /&gt;
Z = sin(z1)/z1 &lt;br /&gt;
X, Y = meshgrid(x,y)&lt;br /&gt;
plt.figure()&lt;br /&gt;
plt.contour(X,Y,Z)&lt;br /&gt;
plt.show()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and you should get a window similar to:&lt;br /&gt;
&lt;br /&gt;
[[Image:Sinc-matplotlib-contour.png|thumb|600px|none|A contour map of the sinc function]]&lt;br /&gt;
&lt;br /&gt;
Perhaps the best way next step for matplotlib is to look at the gallery: http://matplotlib.org/gallery.html.&lt;br /&gt;
Just click on a figure and you will get the code used to generate it--a really great resource!&lt;br /&gt;
&lt;br /&gt;
==Input and Output==&lt;br /&gt;
&lt;br /&gt;
The foregoing is all very interesting, but life would be rather dull if you had to re-enter all your data by hand whenever you set to work with Python and numpy.  Therefore we need a means to save data to a file and load it again.  Happily, we can do this rather easily using a couple of routines from the '''pylab''' package:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from numpy import *&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from pylab import load&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from pylab import save&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; data = zeros((3,3))&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; save('myfile.txt', data)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; read_data = load(&amp;quot;myfile.txt&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''warning, the load() function of numpy will be shadowed''' in the above example.  One way to protect yourself against this is to make use of '''namespaces''':  Modify your import command to '''import pylab''' and then use '''pylab.load(..)'''.&lt;br /&gt;
&lt;br /&gt;
=Scipy=&lt;br /&gt;
&lt;br /&gt;
* http://www.scipy.org/&lt;br /&gt;
* ..and good examples on http://scipy-lectures.github.com/intro/scipy.html&lt;br /&gt;
* Many useful features:&lt;br /&gt;
* Integration &amp;amp; Differentiation&lt;br /&gt;
* Optimisation (curve fitting, etc)&lt;br /&gt;
* Fourier transforms&lt;br /&gt;
* Signal processing&lt;br /&gt;
* Statistical algorithms&lt;br /&gt;
* Much, much more...&lt;br /&gt;
* If you know Python you can use SciPy&lt;br /&gt;
&lt;br /&gt;
==An example: Differentiation==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; # derivative of x^2 at x=3&lt;br /&gt;
...&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from scipy import derivative&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; derivative(lambda x: x**2, 3)&lt;br /&gt;
6.0&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; # also works with arrays&lt;br /&gt;
...&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from numpy import array&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; my_array = array([1,2,3])&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; derivative(lambda x: x**2,my_array)&lt;br /&gt;
array([ 2., 4., 6.])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Google for many more examples pertaining to your favourite numerical procedure!&lt;br /&gt;
&lt;br /&gt;
=A Repository of Packages You Could Use=&lt;br /&gt;
&lt;br /&gt;
Now, we've touched on a couple, but there are thousands of python packages available.  Before you start writing your own function for X, check that someone hasn't contributed code for that already at http://pypi.python.org/pypi.&lt;br /&gt;
&lt;br /&gt;
'''pip''', the python package manager will look in pypi by default to install a package.  You can use the '''--user''' option to install python packages in your own user space.  See:&lt;br /&gt;
* https://pip.readthedocs.org/en/latest/&lt;br /&gt;
for more information on pip.&lt;br /&gt;
&lt;br /&gt;
=Writing Faster Python=&lt;br /&gt;
&lt;br /&gt;
As with other scripting languages, such as MATLAB and R, one of the simplest ways in which you can write faster python code is to eliminate loops by vectorising your code.&lt;br /&gt;
&lt;br /&gt;
Consider the following two scripts.  First '''for-loop.py''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
&lt;br /&gt;
import numpy as np&lt;br /&gt;
arr = np.random.rand(1000000)&lt;br /&gt;
&lt;br /&gt;
def filter(arr):&lt;br /&gt;
    for i, val in enumerate(arr):&lt;br /&gt;
        if val &amp;lt; 0.5:&lt;br /&gt;
            arr[i] = 0&lt;br /&gt;
    return arr&lt;br /&gt;
&lt;br /&gt;
if __name__ == &amp;quot;__main__&amp;quot;:&lt;br /&gt;
    filter(arr)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and secondly, '''vectorised.py''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
&lt;br /&gt;
import numpy as np&lt;br /&gt;
arr = np.random.rand(1000000)&lt;br /&gt;
&lt;br /&gt;
def filter(arr):&lt;br /&gt;
    arr[arr &amp;lt; 0.5] = 0&lt;br /&gt;
    return arr&lt;br /&gt;
&lt;br /&gt;
if __name__ == &amp;quot;__main__&amp;quot;:&lt;br /&gt;
    filter(arr)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If we now run these two scripts through the Linux command line '''time''' utility, we see that the vectorised code runs a lot faster than the for loop:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~$ time ./for-loop.py &lt;br /&gt;
&lt;br /&gt;
real	0m0.963s&lt;br /&gt;
user	0m0.952s&lt;br /&gt;
sys	0m0.012s&lt;br /&gt;
gethin@gethin-desktop:~$ time ./vectorised.py &lt;br /&gt;
&lt;br /&gt;
real	0m0.116s&lt;br /&gt;
user	0m0.096s&lt;br /&gt;
sys	0m0.020s&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For some more tips on writing faster python code, and examples of how to use one of the python profiler modules, take a look at:&lt;br /&gt;
* https://wiki.python.org/moin/PythonSpeed/PerformanceTips&lt;br /&gt;
* http://technicaldiscovery.blogspot.co.uk/2011/06/speeding-up-python-numpy-cython-and.html&lt;br /&gt;
* http://www.huyng.com/posts/python-performance-analysis/&lt;br /&gt;
* http://www.appneta.com/2012/05/21/profiling-python-performance-lineprof-statprof-cprofile/&lt;br /&gt;
&lt;br /&gt;
=Further Reading=&lt;br /&gt;
&lt;br /&gt;
* http://docs.python.org/tutorial/&lt;br /&gt;
* http://wiki.python.org/moin/PythonBooks&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=Python1&amp;diff=9461</id>
		<title>Python1</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=Python1&amp;diff=9461"/>
		<updated>2014-10-10T13:32:47Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: /* Some Suggested Exercises */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[category:Pragmatic Programming]]&lt;br /&gt;
'''Python for Scientists'''&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
[[Image:Python.png|thumb|1100px|none|http://xkcd.com/353/]]&lt;br /&gt;
&lt;br /&gt;
With thanks to Simon Metson and Mike Wallace for much of the following material.&lt;br /&gt;
&lt;br /&gt;
=Getting Started on BlueCrystal Phase-2=&lt;br /&gt;
&lt;br /&gt;
After you have logged in, type the following at the command line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module add languages/python-2.7.2.0&lt;br /&gt;
python&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This should start up an interactive python session:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Python 2.7.2 (default, Aug 25 2011, 10:51:03) &lt;br /&gt;
[GCC 4.3.3] on linux2&lt;br /&gt;
Type &amp;quot;help&amp;quot;, &amp;quot;copyright&amp;quot;, &amp;quot;credits&amp;quot; or &amp;quot;license&amp;quot; for more information.&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where we can type commands at the '''&amp;gt;&amp;gt;&amp;gt;''' prompt.&lt;br /&gt;
&lt;br /&gt;
=Python as a Calculator=&lt;br /&gt;
&lt;br /&gt;
To get started, let's just try a few commands out.  If you type:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print &amp;quot;Hello!&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
you'll get:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hello!&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you try:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print 5 + 9&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
you'll get:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
14&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far so simple!  Here is a copy of a session containing a few more commands where we've set the values of some variables and also defined and run our own function: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; five = 5&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; neuf = 9&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print five + neuf&lt;br /&gt;
14&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; def say_hello():&lt;br /&gt;
...     print &amp;quot;Hello, world!&amp;quot;&lt;br /&gt;
... # hit return here &lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; say_hello()&lt;br /&gt;
Hello, world!&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can exit an interactive session at any time by typing '''Ctrl-D'''.&lt;br /&gt;
&lt;br /&gt;
=Getting Help=&lt;br /&gt;
&lt;br /&gt;
One of the good things about Python is that it has lots of useful online documentation.  ([[A_Good_Read|There are good books on the language too]].)  For example, take a look at: http://docs.python.org/.  You can also type '''help()''' and the interpreter prompt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; help()&lt;br /&gt;
&lt;br /&gt;
Welcome to Python 2.7!  This is the online help utility.&lt;br /&gt;
&lt;br /&gt;
If this is your first time using Python, you should definitely check out&lt;br /&gt;
the tutorial on the Internet at http://docs.python.org/tutorial/.&lt;br /&gt;
&lt;br /&gt;
Enter the name of any module, keyword, or topic to get help on writing&lt;br /&gt;
Python programs and using Python modules.  To quit this help utility and&lt;br /&gt;
return to the interpreter, just type &amp;quot;quit&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
...&lt;br /&gt;
&lt;br /&gt;
help&amp;gt; keywords&lt;br /&gt;
&lt;br /&gt;
Here is a list of the Python keywords.  Enter any keyword to get more help.&lt;br /&gt;
&lt;br /&gt;
and                 elif                if                  print&lt;br /&gt;
...&lt;br /&gt;
&lt;br /&gt;
help&amp;gt; if&lt;br /&gt;
The ``if`` statement&lt;br /&gt;
********************&lt;br /&gt;
&lt;br /&gt;
The ``if`` statement is used for conditional execution:&lt;br /&gt;
&lt;br /&gt;
   if_stmt ::= &amp;quot;if&amp;quot; expression &amp;quot;:&amp;quot; suite&lt;br /&gt;
               ( &amp;quot;elif&amp;quot; expression &amp;quot;:&amp;quot; suite )*&lt;br /&gt;
               [&amp;quot;else&amp;quot; &amp;quot;:&amp;quot; suite]&lt;br /&gt;
&lt;br /&gt;
It selects exactly one of the suites by evaluating the expressions one&lt;br /&gt;
by one until one is found to be true...&lt;br /&gt;
...&lt;br /&gt;
&lt;br /&gt;
help&amp;gt; quit&lt;br /&gt;
&lt;br /&gt;
You are now leaving help and returning to the Python interpreter.&lt;br /&gt;
...&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Making a Script=&lt;br /&gt;
&lt;br /&gt;
An interactive session can be fun and useful for trying things out.  However--to save our fingers--we will typically want to execute a series of commands as a script, created using your favourite text editor.  Here are the contents of an example script:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/env python&lt;br /&gt;
&lt;br /&gt;
print &amp;quot;Hello, from a python script!&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
Ensure that your script is executable:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
chmod u+x myscript.py&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and now you can run it:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[ggdagw@bigblue4 ~]$ ./myscript.py &lt;br /&gt;
Hello, from a python script!&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Python and Whitespace=&lt;br /&gt;
&lt;br /&gt;
Love it of hate it, Python incorporates whitespace in it's syntax. (It's either that or demarcate blocks with some other syntax, such as ending a line with a semi-colon as it is in C.  Pick your poison.)  Spacing is therefore key in creating a valid python script.  For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
message = &amp;quot;happy days!&amp;quot;&lt;br /&gt;
if len(message) &amp;gt; 10:&lt;br /&gt;
    print &amp;quot;longer..&amp;quot;&lt;br /&gt;
else:&lt;br /&gt;
    print &amp;quot;shorter..&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
will work, but:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
message = &amp;quot;happy days!&amp;quot;&lt;br /&gt;
if len(message) &amp;gt; 10:&lt;br /&gt;
 print &amp;quot;longer..&amp;quot;&lt;br /&gt;
else:&lt;br /&gt;
print &amp;quot;shorter..&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
will not:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
  File &amp;quot;./myscript.py&amp;quot;, line 7&lt;br /&gt;
    print &amp;quot;shorter..&amp;quot;&lt;br /&gt;
        ^&lt;br /&gt;
IndentationError: expected an indented block&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It is therefore a great advantage, when writing to python script, to use a text editor which has a dedicated python mode--such as '''emacs'''--and will actively help you to keep your spacing correct.  See, http://wiki.python.org/moin/PythonEditors, for an extensive list.&lt;br /&gt;
&lt;br /&gt;
=Some Suggested Exercises=&lt;br /&gt;
&lt;br /&gt;
* Calculate the volume of a sphere.  You can experiment with the following (where r needs to be set to some value):&lt;br /&gt;
** &amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;4/3 * 3.14159265359 * r ** 3&amp;lt;/source&amp;gt;&lt;br /&gt;
** &amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;4.0/3.0 * 3.14159265359 * pow(r,3)&amp;lt;/source&amp;gt;&lt;br /&gt;
** &amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;float(4)/float(3) * 3.14159265359 * pow(r,3)&amp;lt;/source&amp;gt;&lt;br /&gt;
* Concatenate two strings&lt;br /&gt;
* Write a recursive function to compute fibonacci numbers (Hint: F(n) = F(n-1) +F(n-2), F(0)=0 and F(1)=1)&lt;br /&gt;
&lt;br /&gt;
=Nuts and Bolts=&lt;br /&gt;
&lt;br /&gt;
==Types==&lt;br /&gt;
&lt;br /&gt;
Python has intrinsic types including, integers, floats, booleans and complex numbers.  It is dynamically typed (meaning that you don't have to have a block of variable declarations at the top of your script), but it is '''not weakly''' typed, for example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; my_complex = 2 + 0.5j&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; my_complex&lt;br /&gt;
(2+0.5j)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; my_complex.real&lt;br /&gt;
2.0&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; my_complex.imag&lt;br /&gt;
0.5&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; name = 'fred'&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; lucky = 7&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; name + lucky&lt;br /&gt;
Traceback (most recent call last):&lt;br /&gt;
  File &amp;quot;&amp;lt;stdin&amp;gt;&amp;quot;, line 1, in &amp;lt;module&amp;gt;&lt;br /&gt;
TypeError: cannot concatenate 'str' and 'int' objects&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Strings==&lt;br /&gt;
&lt;br /&gt;
The eagle-eyed will have spotted in a previous examples that we could ask the length a character string--straight off the bat.  No need to write a counting routine ourselves:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
message = &amp;quot;happy days!&amp;quot;&lt;br /&gt;
print len(message)&lt;br /&gt;
11&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We also take '''slices''' of our character string.  In my case&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
print message[:5]&lt;br /&gt;
happy&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Since a string is an '''object''' (in the object oriented programming sense of the word, but more of that another time...) we can call a number of methods that operate on a string.  A selected sample include:&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot;&lt;br /&gt;
|| s.find(sub) || Finds the first occurrence of the given substring&lt;br /&gt;
|-&lt;br /&gt;
|| s.islower() || Checks whether all characters are lowercase&lt;br /&gt;
|-&lt;br /&gt;
|| s.upper() || Returns '''s''' converted to uppercase&lt;br /&gt;
|-&lt;br /&gt;
|| s.strip() || Removes leading and trailing whitespace&lt;br /&gt;
|-&lt;br /&gt;
|| s.replace(old,new) || Replaces substring '''old''' with '''new'''&lt;br /&gt;
|-&lt;br /&gt;
|| s.split([sep]) || Splits '''s''' uses (optional) '''sep''' as a delimiter.  Returns a list&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Lists and Tuples==&lt;br /&gt;
&lt;br /&gt;
An example of a list is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
shopping = ['bread', 'marmalade', 'milk', 'tea']&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and we can inquire about the length of that using the same function as before:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
len(shopping)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can also take '''slices''' of a list, as we did with a string:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
shopping[0:2]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and even reset a portion of the list that way:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
shopping[0:2] = ['bagels', 'jam']&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Since a list is also an object, we have more handy methods, including:&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot;&lt;br /&gt;
|| s.append(x) || Appends an new element '''x''' to the end of '''s'''&lt;br /&gt;
|-&lt;br /&gt;
|| s.count(x) || Returns the number of occurences of '''x''' in '''s'''&lt;br /&gt;
|-&lt;br /&gt;
|| s.reverse(x) || Reverses items of '''s''' in place&lt;br /&gt;
|-&lt;br /&gt;
|| s.sort([compfunc]) || Sorts items of '''s''' in place.  '''compfunc''' is an optional comparison function&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Tuples are very similar to lists and support many of the same operations (indexing, slicing, concatenation etc.) but differ in that they are '''not mutable''' after creation:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source land=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; mytuple = ('fred', 'ginger', 7, 2.5)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; mylist = ['fred', 'ginger', 7, 2.5]&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; mylist[2] = 8&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print mylist&lt;br /&gt;
['fred', 'ginger', 8, 2.5]&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print mytuple[2]    &lt;br /&gt;
7&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; mytuple[2] = 8&lt;br /&gt;
Traceback (most recent call last):&lt;br /&gt;
  File &amp;quot;&amp;lt;stdin&amp;gt;&amp;quot;, line 1, in &amp;lt;module&amp;gt;&lt;br /&gt;
TypeError: 'tuple' object does not support item assignment&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
List comprehension:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; numbers = [12, 3, 90, 40, 52, 11, 10]&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; small_numbers_doubled = [number * 2 for number in numbers if number &amp;lt; 20]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; small_numbers_doubled&lt;br /&gt;
[24, 6, 22, 20]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Dictionaries==&lt;br /&gt;
&lt;br /&gt;
A dictionary is an associative array or hash table, containing '''key-value''' pairs:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
mydict = {'thomas':'blue', 'james':'red', 'henry':'green'}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print mydict['james']&lt;br /&gt;
red&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can write much more user-friendly and intuitive code using dictionaries, rather than arbitrary indexes into a list.&lt;br /&gt;
&lt;br /&gt;
Some example dictionary methods are:&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot;&lt;br /&gt;
|| m.keys() || Returns a list of the keys in '''m'''&lt;br /&gt;
|-&lt;br /&gt;
|| m.items() || Returns a list of the (key,value) pairs in '''m'''&lt;br /&gt;
|-&lt;br /&gt;
|| m[k] = x || Sets m[k] to x&lt;br /&gt;
|-&lt;br /&gt;
|| m.update(b) || Adds objects from dictionary '''b''' to '''m'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Control Structures==&lt;br /&gt;
&lt;br /&gt;
Of course, we'll need conditionals and loops etc. to go beyond the simplest of scripts.  Here is an '''if-then-else''', python style:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
if sky == ‘blue’:&lt;br /&gt;
    birds_sing()&lt;br /&gt;
elif sky == ‘black’:&lt;br /&gt;
    birds_sleep()&lt;br /&gt;
else:&lt;br /&gt;
    pass #do nothing&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and a classic '''for loop''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
for ii in range(1,10):&lt;br /&gt;
    print ii&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1&lt;br /&gt;
...&lt;br /&gt;
9&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We'll also see a '''while loop''' shoehorned into the next example.&lt;br /&gt;
&lt;br /&gt;
For our control statements, we can use comparison operators such as, '''==''', '''!=''', '''&amp;gt;''', '''&amp;lt;''', '''&amp;lt;=''', '''&amp;gt;=''', and logical operators, such as, '''and''', '''or''','''not'''&lt;br /&gt;
&lt;br /&gt;
==File Input and Output==&lt;br /&gt;
&lt;br /&gt;
Here's some code for printing the contents of a text file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
fp = open(&amp;quot;foo.txt&amp;quot;,&amp;quot;r&amp;quot;)&lt;br /&gt;
line = fp.readline()&lt;br /&gt;
while line:&lt;br /&gt;
    line = line.strip()&lt;br /&gt;
    print line&lt;br /&gt;
    line = fp.readline()&lt;br /&gt;
fp.close()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We could open a file for writing with:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
fp = open(&amp;quot;foo.txt&amp;quot;,&amp;quot;w&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and use:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
fp.write(...)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
to write to that file.&lt;br /&gt;
&lt;br /&gt;
=Object Oriented Programming in Python=&lt;br /&gt;
&lt;br /&gt;
Here is an example of using a class in python:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
&lt;br /&gt;
class Radio:&lt;br /&gt;
    &amp;quot;A simple radio&amp;quot;&lt;br /&gt;
    def __init__(self,freq=0.0,name=&amp;quot;&amp;quot;):&lt;br /&gt;
        &amp;quot;Constructor method&amp;quot;&lt;br /&gt;
        self.__frequency=freq&lt;br /&gt;
        self.name=name&lt;br /&gt;
    def tune(self,freq):&lt;br /&gt;
        self.__frequency=freq&lt;br /&gt;
    def tuned_to(self):&lt;br /&gt;
        print self.name, &amp;quot;tuned to:&amp;quot;, self.__frequency&lt;br /&gt;
&lt;br /&gt;
if __name__ == &amp;quot;__main__&amp;quot;:&lt;br /&gt;
    # declare two radio instances&lt;br /&gt;
    car = Radio(name=&amp;quot;car&amp;quot;)&lt;br /&gt;
    kitchen = Radio(91.5,&amp;quot;kitchen&amp;quot;)&lt;br /&gt;
    # call some methods&lt;br /&gt;
    car.tuned_to()&lt;br /&gt;
    kitchen.tuned_to()&lt;br /&gt;
    car.tune(89.3)&lt;br /&gt;
    car.tuned_to()&lt;br /&gt;
    # Docstrings--double quotes at the top of the class:                        &lt;br /&gt;
    print car.__doc__&lt;br /&gt;
    # NB members not private by default:&lt;br /&gt;
    print car.name&lt;br /&gt;
    # BUT leading double underscores will trigger&lt;br /&gt;
    # name mangling and hence the member will be hidden &lt;br /&gt;
    print car.__frequency&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Running the script gives us:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
car tuned to: 0.0&lt;br /&gt;
kitchen tuned to: 91.5&lt;br /&gt;
car tuned to: 89.3&lt;br /&gt;
A simple radio&lt;br /&gt;
car&lt;br /&gt;
Traceback (most recent call last):&lt;br /&gt;
  File &amp;quot;./foo.py&amp;quot;, line 27, in &amp;lt;module&amp;gt;&lt;br /&gt;
    print car.__frequency&lt;br /&gt;
AttributeError: Radio instance has no attribute '__frequency'&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Using Packages=&lt;br /&gt;
&lt;br /&gt;
Python packages are great because they provide us with a whole lot of extra functionality--above and beyond the core language--that we didn't have to write and debug ourselves.&lt;br /&gt;
&lt;br /&gt;
Let's walk through a simple example using a package.  At an interactive prompt type:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
from random import randint&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will give us access to the '''randint(x,y)''' function, which returns a randomly chosen integer from the given range [x,y]:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
4&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
1&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
3&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
0&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
OK, so far so good.  One thing to note is that the above '''import''' statement has drawn the name ''randint'' into our current '''namespace'''.  What if we had already defined a function named ''randint''.  That could cause problems.  In order to protect ourselves from this kind of problem, there are several import variants.&lt;br /&gt;
&lt;br /&gt;
By default, functions will be added to a namespace with the same name as the package.  In order to call the functions we will, in this case, have to prefix them with there namespace:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; import random&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; random.randint(0,10)&lt;br /&gt;
6&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Should we desire, we can apply a little more control and specify the namespace for the import ourselves: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; import random as rnd&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; rnd.randint(0,10)&lt;br /&gt;
3&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Another--more 'devil-may-care'--approach is to do away with the separate namespace and pull everything from a given package into the current namespace:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from random import *&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
9&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; random()&lt;br /&gt;
0.3172268098313996&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
(The '''random()''' function returns a randomly selected floating point number in the range [0, 1)--that is, between 0 and 1, including 0.0 but always smaller than 1.0.)&lt;br /&gt;
&lt;br /&gt;
==A Namespace Collision==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; def randint():&lt;br /&gt;
...     print &amp;quot;dummy function&amp;quot;&lt;br /&gt;
... &lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint()&lt;br /&gt;
dummy function&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from random import randint&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint()&lt;br /&gt;
Traceback (most recent call last):&lt;br /&gt;
  File &amp;quot;&amp;lt;stdin&amp;gt;&amp;quot;, line 1, in &amp;lt;module&amp;gt;&lt;br /&gt;
TypeError: randint() takes exactly 3 arguments (1 given)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
0&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Python for Shell Scripting=&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
from subprocess import call&lt;br /&gt;
call([&amp;quot;ls&amp;quot;, &amp;quot;-l&amp;quot;])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Python as a Glue Languge=&lt;br /&gt;
&lt;br /&gt;
* Calling R from python is possible using: http://rpy.sourceforge.net/index.html.&lt;br /&gt;
* Calling Matlab from python: http://mlabwrap.sourceforge.net.&lt;br /&gt;
* With SWIG you can make many bindings, including Python to C and C++: http://www.swig.org/.&lt;br /&gt;
* Or if Fortran is more your cup-of-tea, you can use f2py: http://cens.ioc.ee/projects/f2py2e/.&lt;br /&gt;
* There are many more examples.&lt;br /&gt;
&lt;br /&gt;
=Command Line Parsing=&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
&lt;br /&gt;
import sys&lt;br /&gt;
&lt;br /&gt;
if __name__ == &amp;quot;__main__&amp;quot;:&lt;br /&gt;
    # We can test on the length of argv&lt;br /&gt;
    if len(sys.argv) &amp;lt; 2:&lt;br /&gt;
        print &amp;quot;usage: to use this script...&amp;quot;&lt;br /&gt;
    else:&lt;br /&gt;
        ii = 0&lt;br /&gt;
        for arg in sys.argv:&lt;br /&gt;
            # (typically) argv[0] is bound to the script name&lt;br /&gt;
            print &amp;quot;arg&amp;quot;, ii, &amp;quot;is:&amp;quot;, arg&lt;br /&gt;
            ii = ii+1&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~$ ./cmdline.py&lt;br /&gt;
usage: to use this script...&lt;br /&gt;
gethin@gethin-desktop:~$ ./cmdline.py fred ginger&lt;br /&gt;
arg 0 is: ./cmdline.py&lt;br /&gt;
arg 1 is: fred&lt;br /&gt;
arg 2 is: ginger&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Databases=&lt;br /&gt;
&lt;br /&gt;
==Simple Databases==&lt;br /&gt;
&lt;br /&gt;
Python provides access to some database packages through some standard packages.  The '''bsddb''' module allows you to access the highly popular '''Berkeley DB database''' from your python code.&lt;br /&gt;
&lt;br /&gt;
The interface to the database provided by this module is very similar to the way in which we access a dictionary.  First, let's populate a database:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
import bsddb&lt;br /&gt;
d = bsddb.btopen('engines.db')&lt;br /&gt;
d['thomas'] = 'blue'&lt;br /&gt;
d['james'] = 'red'&lt;br /&gt;
d['henry'] = 'green'&lt;br /&gt;
d.close()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now let's open the database again and query it's contents:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; d = bsddb.btopen('engines.db')&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; d.keys()&lt;br /&gt;
['henry', 'james', 'thomas']&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; d.first()&lt;br /&gt;
('henry', 'green')&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; d.last()&lt;br /&gt;
('thomas', 'blue')&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; colour = d['james']&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; colour&lt;br /&gt;
'red'&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; del d['henry']&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; d.keys()&lt;br /&gt;
['james', 'thomas']&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Relational Databases==&lt;br /&gt;
&lt;br /&gt;
Relational databases give us more oomph.  '''SQLite''' is a useful relational database to consider as it is light, in that it requires hardly anything in terms of setup or management, yet still understands queries formulated in SQL.  As such it is useful for creating relatively simple examples of SQL access to a database in python and is a stepping stone toward more powerful database packages.&lt;br /&gt;
&lt;br /&gt;
Here is a script which will create a table called '''planets''' in the file '''pytest.db''' and populate with details of the planets in our solar system:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
#&lt;br /&gt;
# Example python script using sqlite3 package&lt;br /&gt;
# to connect to an SQLite database.&lt;br /&gt;
#&lt;br /&gt;
&lt;br /&gt;
import sqlite3&lt;br /&gt;
 &lt;br /&gt;
conn = sqlite3.connect('pytest.db') # or use :memory: to put it in RAM&lt;br /&gt;
&lt;br /&gt;
cursor = conn.cursor()&lt;br /&gt;
 &lt;br /&gt;
# create a table&lt;br /&gt;
cursor.execute(&amp;quot;&amp;quot;&amp;quot;CREATE TABLE planets&lt;br /&gt;
                  (Id INT, Name TEXT, Diameter REAL, &lt;br /&gt;
                   Mass REAL, Orbital_Period REAL)&amp;quot;&amp;quot;&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
# insert a single record&lt;br /&gt;
cursor.execute(&amp;quot;INSERT INTO planets VALUES(1,'Mercury',0.382,0.06,0.24)&amp;quot;)&lt;br /&gt;
conn.commit() # save data to file&lt;br /&gt;
 &lt;br /&gt;
# insert multiple records&lt;br /&gt;
other_planets = [(2,'Venus',0.949,0.82,0.72),&lt;br /&gt;
                 (3,'Earth',1.0,1.0,1.0),&lt;br /&gt;
                 (4,'Mars',0.532,0.11,1.52),&lt;br /&gt;
                 (5,'Jupiter',11.209,317.8,5.20),&lt;br /&gt;
                 (6,'Saturn',9.449,95.2,9.54),&lt;br /&gt;
                 (7,'Uranus',4.007,14.6,19.22),&lt;br /&gt;
                 (8,'Neptune',3.883,17.2,30.06),&lt;br /&gt;
                 (9,'Pluto',0.18,0.002,248.09)]&lt;br /&gt;
cursor.executemany(&amp;quot;INSERT INTO planets VALUES (?,?,?,?,?)&amp;quot;, other_planets)&lt;br /&gt;
conn.commit() # save data to file&lt;br /&gt;
&lt;br /&gt;
# delete a record&lt;br /&gt;
sql = &amp;quot;&amp;quot;&amp;quot;&lt;br /&gt;
DELETE FROM planets&lt;br /&gt;
WHERE Name = 'Pluto'&lt;br /&gt;
&amp;quot;&amp;quot;&amp;quot;&lt;br /&gt;
cursor.execute(sql)  # poor old pluto! &lt;br /&gt;
conn.commit()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And here is a short example script showing a couple of ways to interrogate the database: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
#&lt;br /&gt;
# Example python script using sqlite3 package&lt;br /&gt;
# to connect to an SQLite database.&lt;br /&gt;
#&lt;br /&gt;
&lt;br /&gt;
import sqlite3&lt;br /&gt;
 &lt;br /&gt;
conn = sqlite3.connect('pytest.db') # or use :memory: to put it in RAM&lt;br /&gt;
&lt;br /&gt;
cursor = conn.cursor()&lt;br /&gt;
&lt;br /&gt;
print &amp;quot;All the records in the table, ordered by Name:\n&amp;quot;&lt;br /&gt;
for row in cursor.execute(&amp;quot;SELECT rowid, * FROM planets ORDER BY Name&amp;quot;):&lt;br /&gt;
    print row&lt;br /&gt;
&lt;br /&gt;
print &amp;quot;\n&amp;quot;&lt;br /&gt;
&lt;br /&gt;
print &amp;quot;All the planets with a mass greater than or equal to that of Earth:\n&amp;quot;&lt;br /&gt;
sql = &amp;quot;SELECT * FROM planets WHERE Mass&amp;gt;=?&amp;quot;&lt;br /&gt;
cursor.execute(sql, [(&amp;quot;1.0&amp;quot;)])&lt;br /&gt;
for row in cursor.fetchall():  # or use fetchone()&lt;br /&gt;
    print row&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the results of running the script are:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
All the records in the table, ordered by Name:&lt;br /&gt;
&lt;br /&gt;
(3, 3, u'Earth', 1.0, 1.0, 1.0)&lt;br /&gt;
(5, 5, u'Jupiter', 11.209, 317.80000000000001, 5.2000000000000002)&lt;br /&gt;
(4, 4, u'Mars', 0.53200000000000003, 0.11, 1.52)&lt;br /&gt;
(1, 1, u'Mercury', 0.38200000000000001, 0.059999999999999998, 0.23999999999999999)&lt;br /&gt;
(8, 8, u'Neptune', 3.883, 17.199999999999999, 30.059999999999999)&lt;br /&gt;
(6, 6, u'Saturn', 9.4489999999999998, 95.200000000000003, 9.5399999999999991)&lt;br /&gt;
(7, 7, u'Uranus', 4.0069999999999997, 14.6, 19.219999999999999)&lt;br /&gt;
(2, 2, u'Venus', 0.94899999999999995, 0.81999999999999995, 0.71999999999999997)&lt;br /&gt;
&lt;br /&gt;
All the planets with a mass greater than or equal to that of Earth:&lt;br /&gt;
&lt;br /&gt;
(3, u'Earth', 1.0, 1.0, 1.0),&lt;br /&gt;
(5, u'Jupiter', 11.209, 317.80000000000001, 5.2000000000000002),&lt;br /&gt;
(6, u'Saturn', 9.4489999999999998, 95.200000000000003, 9.5399999999999991),&lt;br /&gt;
(7, u'Uranus', 4.0069999999999997, 14.6, 19.219999999999999),&lt;br /&gt;
(8, u'Neptune', 3.883, 17.199999999999999, 30.059999999999999)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more information on using SQLite with Python, see, e.g.:&lt;br /&gt;
* http://zetcode.com/db/sqlitepythontutorial/&lt;br /&gt;
* http://www.blog.pythonlibrary.org/2012/07/18/python-a-simple-step-by-step-sqlite-tutorial/&lt;br /&gt;
&lt;br /&gt;
You can also connect to a MySQL database from python using, e.g. the [http://mysql-python.sourceforge.net/ python-mysqldb] package.  A snippet of python code for connecting to a database is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
import MySQLdb&lt;br /&gt;
&lt;br /&gt;
conn = MySQLdb.connect(host=&amp;quot;localhost&amp;quot;,   # your host, usually localhost&lt;br /&gt;
                     user=&amp;quot;gethin&amp;quot;,      # your username&lt;br /&gt;
                      passwd=&amp;quot;changeme&amp;quot;, # your password&lt;br /&gt;
                      db=&amp;quot;menagerie&amp;quot;)    # name of the data base&lt;br /&gt;
&lt;br /&gt;
# Create a cursor object, as before with SQLite&lt;br /&gt;
cur = conn.cursor() &lt;br /&gt;
&lt;br /&gt;
# and then you can submit your SQL command:&lt;br /&gt;
cur.execute(&amp;quot;SELECT * FROM YOUR_TABLE_NAME&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Numpy=&lt;br /&gt;
&lt;br /&gt;
OK, let's move onto looking at python's numerical processing capabilities.  We will start by looking at the '''numpy''' package:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
from numpy import *&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now that we have access to the functions from '''numpy''', let's create an array.  '''Note that a numpy array is an object of a different type to an intrinsic array in Python'''.   A simple approach is to use the '''array''' function.  For example we might enter:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
a = array([[1.0,0.0,0.0],[0.0,1.0,0.0],[0.0,0.0,1.0]])&lt;br /&gt;
b = array([[1,2,3],[4,5,6],[7,8,9]])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a&lt;br /&gt;
array([[ 1.,  0.,  0.],&lt;br /&gt;
       [ 0.,  1.,  0.],&lt;br /&gt;
       [ 0.,  0.,  1.]])&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; b        &lt;br /&gt;
array([[1, 2, 3],&lt;br /&gt;
       [4, 5, 6],&lt;br /&gt;
       [7, 8, 9]])&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; transpose(b)&lt;br /&gt;
array([[1, 4, 7],&lt;br /&gt;
       [2, 5, 8],&lt;br /&gt;
       [3, 6, 9]])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Given an array, we may inquire about it's shape:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
print a.shape&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and we are told that it is a 2-dimensional array (i.e. an array of rank 2) and that the length of both dimensions is 3:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
(3, 3)&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
We can also apply operators to array objects.  For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
a = a * 9&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
array([[ 9.,  0.,  0.],&lt;br /&gt;
       [ 0.,  9.,  0.],&lt;br /&gt;
       [ 0.,  0.,  9.]])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Note, however, that most operations on numpy arrays are done element-wise''', which is '''different to a linear algebra operation that you may have been expecting.'''  We will return to linear algebra operations when we look at the '''scipy''' package.&lt;br /&gt;
&lt;br /&gt;
Should we so desire, we could re-shape the array.  One way to do this is to to set it's shape attribute directly:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a.shape = (1,9)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a&lt;br /&gt;
array([[ 9.,  0.,  0.,  0.,  9.,  0.,  0.,  0.,  9.]])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As with the list example, it can be useful to read or change the value of an element (or sub array) individually.  Let's turn the array back to it's rank-2 form and try it out:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a.shape = (3,3)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a[1,1] = 777.0&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print a&lt;br /&gt;
[[   9.    0.    0.]&lt;br /&gt;
 [   0.  777.    0.]&lt;br /&gt;
 [   0.    0.    9.]]&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a[1:,1:] = [[777.0, 777.0],[777.0, 777.0]]&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print a&lt;br /&gt;
[[   9.    0.    0.]&lt;br /&gt;
 [   0.  777.  777.]&lt;br /&gt;
 [   0.  777.  777.]]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This is all pretty handy so far, but specifying the value of each element explicitly could become a chore.  Happily some helper functions exist to give you a head start with some building blocks.  For example, your can use:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; b = zeros((3,3))&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print b&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; b = ones((3,2))&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print b&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; b = identity(2)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print b&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; big = resize(b, (6,6))&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print big&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The use of '''resize''' in the last example illustrates a useful '''replicating feature'''.&lt;br /&gt;
&lt;br /&gt;
A list of all the functions and operations contained within numpy is: http://scipy.org/Numpy_Example_List.&lt;br /&gt;
&lt;br /&gt;
=Pylab and Matplotlib=&lt;br /&gt;
&lt;br /&gt;
The above examples are quite natty, but we have deliberately kept the array sizes small so that we can print the element values easily.  In practice, you may find that your array sizes are much larger and printing the values to the screen is impractical.  Fear not!  Python has many packages which help you plot your data, so that you can explore it.&lt;br /&gt;
&lt;br /&gt;
Using the pylab plotting interface we can create:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
import pylab&lt;br /&gt;
from numpy import arange, pi, cos, sin, add, sqrt&lt;br /&gt;
t = arange(0.0, 3.0, 0.01)&lt;br /&gt;
c = cos(2 * pi * t)&lt;br /&gt;
s = sin(2 * pi * t)&lt;br /&gt;
pylab.ylabel('some numbers')&lt;br /&gt;
pylab.xlabel('some more numbers')&lt;br /&gt;
pylab.plot(t, c, 'r', lw=2)&lt;br /&gt;
pylab.plot(t, s, 'b', lw=2)&lt;br /&gt;
pylab.plot(t, c-s, 'gs', lw=2)&lt;br /&gt;
pylab.ylim(-1.5, 1.5)&lt;br /&gt;
pylab.title('sin and cos functions')&lt;br /&gt;
pylab.savefig('curves', dpi=300)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where '''curves.png''' looks like:&lt;br /&gt;
&lt;br /&gt;
[[Image:Curves.png|thumb|600px|none|Some nice curves]]&lt;br /&gt;
&lt;br /&gt;
You can open .png images from the linux command line (inc. bluecrystal) using, e.g.: '''display -resize 1000 curves.png''' &lt;br /&gt;
&lt;br /&gt;
We can also use Matplotlib directly for more control:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
import matplotlib.pyplot as plt&lt;br /&gt;
from pylab import meshgrid&lt;br /&gt;
from numpy import arange, add, sin, sqrt&lt;br /&gt;
x = arange(-5,10)&lt;br /&gt;
y = arange(-4,11)&lt;br /&gt;
z1 = sqrt(add.outer(x**2,y**2))&lt;br /&gt;
Z = sin(z1)/z1 &lt;br /&gt;
X, Y = meshgrid(x,y)&lt;br /&gt;
plt.figure()&lt;br /&gt;
plt.contour(X,Y,Z)&lt;br /&gt;
plt.show()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and you should get a window similar to:&lt;br /&gt;
&lt;br /&gt;
[[Image:Sinc-matplotlib-contour.png|thumb|600px|none|A contour map of the sinc function]]&lt;br /&gt;
&lt;br /&gt;
Perhaps the best way next step for matplotlib is to look at the gallery: http://matplotlib.org/gallery.html.&lt;br /&gt;
Just click on a figure and you will get the code used to generate it--a really great resource!&lt;br /&gt;
&lt;br /&gt;
==Input and Output==&lt;br /&gt;
&lt;br /&gt;
The foregoing is all very interesting, but life would be rather dull if you had to re-enter all your data by hand whenever you set to work with Python and numpy.  Therefore we need a means to save data to a file and load it again.  Happily, we can do this rather easily using a couple of routines from the '''pylab''' package:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from numpy import *&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from pylab import load&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from pylab import save&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; data = zeros((3,3))&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; save('myfile.txt', data)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; read_data = load(&amp;quot;myfile.txt&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''warning, the load() function of numpy will be shadowed''' in the above example.  One way to protect yourself against this is to make use of '''namespaces''':  Modify your import command to '''import pylab''' and then use '''pylab.load(..)'''.&lt;br /&gt;
&lt;br /&gt;
=Scipy=&lt;br /&gt;
&lt;br /&gt;
* http://www.scipy.org/&lt;br /&gt;
* ..and good examples on http://scipy-lectures.github.com/intro/scipy.html&lt;br /&gt;
* Many useful features:&lt;br /&gt;
* Integration &amp;amp; Differentiation&lt;br /&gt;
* Optimisation (curve fitting, etc)&lt;br /&gt;
* Fourier transforms&lt;br /&gt;
* Signal processing&lt;br /&gt;
* Statistical algorithms&lt;br /&gt;
* Much, much more...&lt;br /&gt;
* If you know Python you can use SciPy&lt;br /&gt;
&lt;br /&gt;
==An example: Differentiation==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; # derivative of x^2 at x=3&lt;br /&gt;
...&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from scipy import derivative&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; derivative(lambda x: x**2, 3)&lt;br /&gt;
6.0&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; # also works with arrays&lt;br /&gt;
...&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from numpy import array&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; my_array = array([1,2,3])&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; derivative(lambda x: x**2,my_array)&lt;br /&gt;
array([ 2., 4., 6.])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Google for many more examples pertaining to your favourite numerical procedure!&lt;br /&gt;
&lt;br /&gt;
=A Repository of Packages You Could Use=&lt;br /&gt;
&lt;br /&gt;
Now, we've touched on a couple, but there are thousands of python packages available.  Before you start writing your own function for X, check that someone hasn't contributed code for that already at http://pypi.python.org/pypi.&lt;br /&gt;
&lt;br /&gt;
'''pip''', the python package manager will look in pypi by default to install a package.  You can use the '''--user''' option to install python packages in your own user space.  See:&lt;br /&gt;
* https://pip.readthedocs.org/en/latest/&lt;br /&gt;
for more information on pip.&lt;br /&gt;
&lt;br /&gt;
=Writing Faster Python=&lt;br /&gt;
&lt;br /&gt;
As with other scripting languages, such as MATLAB and R, one of the simplest ways in which you can write faster python code is to eliminate loops by vectorising your code.&lt;br /&gt;
&lt;br /&gt;
Consider the following two scripts.  First '''for-loop.py''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
&lt;br /&gt;
import numpy as np&lt;br /&gt;
arr = np.random.rand(1000000)&lt;br /&gt;
&lt;br /&gt;
def filter(arr):&lt;br /&gt;
    for i, val in enumerate(arr):&lt;br /&gt;
        if val &amp;lt; 0.5:&lt;br /&gt;
            arr[i] = 0&lt;br /&gt;
    return arr&lt;br /&gt;
&lt;br /&gt;
if __name__ == &amp;quot;__main__&amp;quot;:&lt;br /&gt;
    filter(arr)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and secondly, '''vectorised.py''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
&lt;br /&gt;
import numpy as np&lt;br /&gt;
arr = np.random.rand(1000000)&lt;br /&gt;
&lt;br /&gt;
def filter(arr):&lt;br /&gt;
    arr[arr &amp;lt; 0.5] = 0&lt;br /&gt;
    return arr&lt;br /&gt;
&lt;br /&gt;
if __name__ == &amp;quot;__main__&amp;quot;:&lt;br /&gt;
    filter(arr)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If we now run these two scripts through the Linux command line '''time''' utility, we see that the vectorised code runs a lot faster than the for loop:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~$ time ./for-loop.py &lt;br /&gt;
&lt;br /&gt;
real	0m0.963s&lt;br /&gt;
user	0m0.952s&lt;br /&gt;
sys	0m0.012s&lt;br /&gt;
gethin@gethin-desktop:~$ time ./vectorised.py &lt;br /&gt;
&lt;br /&gt;
real	0m0.116s&lt;br /&gt;
user	0m0.096s&lt;br /&gt;
sys	0m0.020s&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For some more tips on writing faster python code, and examples of how to use one of the python profiler modules, take a look at:&lt;br /&gt;
* https://wiki.python.org/moin/PythonSpeed/PerformanceTips&lt;br /&gt;
* http://technicaldiscovery.blogspot.co.uk/2011/06/speeding-up-python-numpy-cython-and.html&lt;br /&gt;
* http://www.huyng.com/posts/python-performance-analysis/&lt;br /&gt;
* http://www.appneta.com/2012/05/21/profiling-python-performance-lineprof-statprof-cprofile/&lt;br /&gt;
&lt;br /&gt;
=Further Reading=&lt;br /&gt;
&lt;br /&gt;
* http://docs.python.org/tutorial/&lt;br /&gt;
* http://wiki.python.org/moin/PythonBooks&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=Python1&amp;diff=9460</id>
		<title>Python1</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=Python1&amp;diff=9460"/>
		<updated>2014-10-08T10:55:38Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[category:Pragmatic Programming]]&lt;br /&gt;
'''Python for Scientists'''&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
[[Image:Python.png|thumb|1100px|none|http://xkcd.com/353/]]&lt;br /&gt;
&lt;br /&gt;
With thanks to Simon Metson and Mike Wallace for much of the following material.&lt;br /&gt;
&lt;br /&gt;
=Getting Started on BlueCrystal Phase-2=&lt;br /&gt;
&lt;br /&gt;
After you have logged in, type the following at the command line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module add languages/python-2.7.2.0&lt;br /&gt;
python&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This should start up an interactive python session:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Python 2.7.2 (default, Aug 25 2011, 10:51:03) &lt;br /&gt;
[GCC 4.3.3] on linux2&lt;br /&gt;
Type &amp;quot;help&amp;quot;, &amp;quot;copyright&amp;quot;, &amp;quot;credits&amp;quot; or &amp;quot;license&amp;quot; for more information.&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where we can type commands at the '''&amp;gt;&amp;gt;&amp;gt;''' prompt.&lt;br /&gt;
&lt;br /&gt;
=Python as a Calculator=&lt;br /&gt;
&lt;br /&gt;
To get started, let's just try a few commands out.  If you type:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print &amp;quot;Hello!&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
you'll get:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hello!&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you try:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print 5 + 9&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
you'll get:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
14&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far so simple!  Here is a copy of a session containing a few more commands where we've set the values of some variables and also defined and run our own function: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; five = 5&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; neuf = 9&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print five + neuf&lt;br /&gt;
14&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; def say_hello():&lt;br /&gt;
...     print &amp;quot;Hello, world!&amp;quot;&lt;br /&gt;
... # hit return here &lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; say_hello()&lt;br /&gt;
Hello, world!&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can exit an interactive session at any time by typing '''Ctrl-D'''.&lt;br /&gt;
&lt;br /&gt;
=Getting Help=&lt;br /&gt;
&lt;br /&gt;
One of the good things about Python is that it has lots of useful online documentation.  ([[A_Good_Read|There are good books on the language too]].)  For example, take a look at: http://docs.python.org/.  You can also type '''help()''' and the interpreter prompt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; help()&lt;br /&gt;
&lt;br /&gt;
Welcome to Python 2.7!  This is the online help utility.&lt;br /&gt;
&lt;br /&gt;
If this is your first time using Python, you should definitely check out&lt;br /&gt;
the tutorial on the Internet at http://docs.python.org/tutorial/.&lt;br /&gt;
&lt;br /&gt;
Enter the name of any module, keyword, or topic to get help on writing&lt;br /&gt;
Python programs and using Python modules.  To quit this help utility and&lt;br /&gt;
return to the interpreter, just type &amp;quot;quit&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
...&lt;br /&gt;
&lt;br /&gt;
help&amp;gt; keywords&lt;br /&gt;
&lt;br /&gt;
Here is a list of the Python keywords.  Enter any keyword to get more help.&lt;br /&gt;
&lt;br /&gt;
and                 elif                if                  print&lt;br /&gt;
...&lt;br /&gt;
&lt;br /&gt;
help&amp;gt; if&lt;br /&gt;
The ``if`` statement&lt;br /&gt;
********************&lt;br /&gt;
&lt;br /&gt;
The ``if`` statement is used for conditional execution:&lt;br /&gt;
&lt;br /&gt;
   if_stmt ::= &amp;quot;if&amp;quot; expression &amp;quot;:&amp;quot; suite&lt;br /&gt;
               ( &amp;quot;elif&amp;quot; expression &amp;quot;:&amp;quot; suite )*&lt;br /&gt;
               [&amp;quot;else&amp;quot; &amp;quot;:&amp;quot; suite]&lt;br /&gt;
&lt;br /&gt;
It selects exactly one of the suites by evaluating the expressions one&lt;br /&gt;
by one until one is found to be true...&lt;br /&gt;
...&lt;br /&gt;
&lt;br /&gt;
help&amp;gt; quit&lt;br /&gt;
&lt;br /&gt;
You are now leaving help and returning to the Python interpreter.&lt;br /&gt;
...&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Making a Script=&lt;br /&gt;
&lt;br /&gt;
An interactive session can be fun and useful for trying things out.  However--to save our fingers--we will typically want to execute a series of commands as a script, created using your favourite text editor.  Here are the contents of an example script:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/env python&lt;br /&gt;
&lt;br /&gt;
print &amp;quot;Hello, from a python script!&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
Ensure that your script is executable:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
chmod u+x myscript.py&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and now you can run it:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[ggdagw@bigblue4 ~]$ ./myscript.py &lt;br /&gt;
Hello, from a python script!&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Python and Whitespace=&lt;br /&gt;
&lt;br /&gt;
Love it of hate it, Python incorporates whitespace in it's syntax. (It's either that or demarcate blocks with some other syntax, such as ending a line with a semi-colon as it is in C.  Pick your poison.)  Spacing is therefore key in creating a valid python script.  For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
message = &amp;quot;happy days!&amp;quot;&lt;br /&gt;
if len(message) &amp;gt; 10:&lt;br /&gt;
    print &amp;quot;longer..&amp;quot;&lt;br /&gt;
else:&lt;br /&gt;
    print &amp;quot;shorter..&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
will work, but:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
message = &amp;quot;happy days!&amp;quot;&lt;br /&gt;
if len(message) &amp;gt; 10:&lt;br /&gt;
 print &amp;quot;longer..&amp;quot;&lt;br /&gt;
else:&lt;br /&gt;
print &amp;quot;shorter..&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
will not:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
  File &amp;quot;./myscript.py&amp;quot;, line 7&lt;br /&gt;
    print &amp;quot;shorter..&amp;quot;&lt;br /&gt;
        ^&lt;br /&gt;
IndentationError: expected an indented block&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It is therefore a great advantage, when writing to python script, to use a text editor which has a dedicated python mode--such as '''emacs'''--and will actively help you to keep your spacing correct.  See, http://wiki.python.org/moin/PythonEditors, for an extensive list.&lt;br /&gt;
&lt;br /&gt;
=Some Suggested Exercises=&lt;br /&gt;
&lt;br /&gt;
* Calculate the volume of a sphere (Hint: 4/3*pi*r^3)&lt;br /&gt;
* Concatenate two strings&lt;br /&gt;
* Write a recursive function to compute fibonacci numbers (Hint: F(n) = F(n-1) +F(n-2), F(0)=0 and F(1)=1)&lt;br /&gt;
&lt;br /&gt;
=Nuts and Bolts=&lt;br /&gt;
&lt;br /&gt;
==Types==&lt;br /&gt;
&lt;br /&gt;
Python has intrinsic types including, integers, floats, booleans and complex numbers.  It is dynamically typed (meaning that you don't have to have a block of variable declarations at the top of your script), but it is '''not weakly''' typed, for example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; my_complex = 2 + 0.5j&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; my_complex&lt;br /&gt;
(2+0.5j)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; my_complex.real&lt;br /&gt;
2.0&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; my_complex.imag&lt;br /&gt;
0.5&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; name = 'fred'&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; lucky = 7&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; name + lucky&lt;br /&gt;
Traceback (most recent call last):&lt;br /&gt;
  File &amp;quot;&amp;lt;stdin&amp;gt;&amp;quot;, line 1, in &amp;lt;module&amp;gt;&lt;br /&gt;
TypeError: cannot concatenate 'str' and 'int' objects&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Strings==&lt;br /&gt;
&lt;br /&gt;
The eagle-eyed will have spotted in a previous examples that we could ask the length a character string--straight off the bat.  No need to write a counting routine ourselves:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
message = &amp;quot;happy days!&amp;quot;&lt;br /&gt;
print len(message)&lt;br /&gt;
11&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We also take '''slices''' of our character string.  In my case&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
print message[:5]&lt;br /&gt;
happy&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Since a string is an '''object''' (in the object oriented programming sense of the word, but more of that another time...) we can call a number of methods that operate on a string.  A selected sample include:&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot;&lt;br /&gt;
|| s.find(sub) || Finds the first occurrence of the given substring&lt;br /&gt;
|-&lt;br /&gt;
|| s.islower() || Checks whether all characters are lowercase&lt;br /&gt;
|-&lt;br /&gt;
|| s.upper() || Returns '''s''' converted to uppercase&lt;br /&gt;
|-&lt;br /&gt;
|| s.strip() || Removes leading and trailing whitespace&lt;br /&gt;
|-&lt;br /&gt;
|| s.replace(old,new) || Replaces substring '''old''' with '''new'''&lt;br /&gt;
|-&lt;br /&gt;
|| s.split([sep]) || Splits '''s''' uses (optional) '''sep''' as a delimiter.  Returns a list&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Lists and Tuples==&lt;br /&gt;
&lt;br /&gt;
An example of a list is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
shopping = ['bread', 'marmalade', 'milk', 'tea']&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and we can inquire about the length of that using the same function as before:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
len(shopping)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can also take '''slices''' of a list, as we did with a string:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
shopping[0:2]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and even reset a portion of the list that way:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
shopping[0:2] = ['bagels', 'jam']&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Since a list is also an object, we have more handy methods, including:&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot;&lt;br /&gt;
|| s.append(x) || Appends an new element '''x''' to the end of '''s'''&lt;br /&gt;
|-&lt;br /&gt;
|| s.count(x) || Returns the number of occurences of '''x''' in '''s'''&lt;br /&gt;
|-&lt;br /&gt;
|| s.reverse(x) || Reverses items of '''s''' in place&lt;br /&gt;
|-&lt;br /&gt;
|| s.sort([compfunc]) || Sorts items of '''s''' in place.  '''compfunc''' is an optional comparison function&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Tuples are very similar to lists and support many of the same operations (indexing, slicing, concatenation etc.) but differ in that they are '''not mutable''' after creation:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source land=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; mytuple = ('fred', 'ginger', 7, 2.5)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; mylist = ['fred', 'ginger', 7, 2.5]&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; mylist[2] = 8&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print mylist&lt;br /&gt;
['fred', 'ginger', 8, 2.5]&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print mytuple[2]    &lt;br /&gt;
7&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; mytuple[2] = 8&lt;br /&gt;
Traceback (most recent call last):&lt;br /&gt;
  File &amp;quot;&amp;lt;stdin&amp;gt;&amp;quot;, line 1, in &amp;lt;module&amp;gt;&lt;br /&gt;
TypeError: 'tuple' object does not support item assignment&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
List comprehension:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; numbers = [12, 3, 90, 40, 52, 11, 10]&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; small_numbers_doubled = [number * 2 for number in numbers if number &amp;lt; 20]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; small_numbers_doubled&lt;br /&gt;
[24, 6, 22, 20]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Dictionaries==&lt;br /&gt;
&lt;br /&gt;
A dictionary is an associative array or hash table, containing '''key-value''' pairs:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
mydict = {'thomas':'blue', 'james':'red', 'henry':'green'}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print mydict['james']&lt;br /&gt;
red&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can write much more user-friendly and intuitive code using dictionaries, rather than arbitrary indexes into a list.&lt;br /&gt;
&lt;br /&gt;
Some example dictionary methods are:&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot;&lt;br /&gt;
|| m.keys() || Returns a list of the keys in '''m'''&lt;br /&gt;
|-&lt;br /&gt;
|| m.items() || Returns a list of the (key,value) pairs in '''m'''&lt;br /&gt;
|-&lt;br /&gt;
|| m[k] = x || Sets m[k] to x&lt;br /&gt;
|-&lt;br /&gt;
|| m.update(b) || Adds objects from dictionary '''b''' to '''m'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Control Structures==&lt;br /&gt;
&lt;br /&gt;
Of course, we'll need conditionals and loops etc. to go beyond the simplest of scripts.  Here is an '''if-then-else''', python style:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
if sky == ‘blue’:&lt;br /&gt;
    birds_sing()&lt;br /&gt;
elif sky == ‘black’:&lt;br /&gt;
    birds_sleep()&lt;br /&gt;
else:&lt;br /&gt;
    pass #do nothing&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and a classic '''for loop''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
for ii in range(1,10):&lt;br /&gt;
    print ii&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1&lt;br /&gt;
...&lt;br /&gt;
9&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We'll also see a '''while loop''' shoehorned into the next example.&lt;br /&gt;
&lt;br /&gt;
For our control statements, we can use comparison operators such as, '''==''', '''!=''', '''&amp;gt;''', '''&amp;lt;''', '''&amp;lt;=''', '''&amp;gt;=''', and logical operators, such as, '''and''', '''or''','''not'''&lt;br /&gt;
&lt;br /&gt;
==File Input and Output==&lt;br /&gt;
&lt;br /&gt;
Here's some code for printing the contents of a text file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
fp = open(&amp;quot;foo.txt&amp;quot;,&amp;quot;r&amp;quot;)&lt;br /&gt;
line = fp.readline()&lt;br /&gt;
while line:&lt;br /&gt;
    line = line.strip()&lt;br /&gt;
    print line&lt;br /&gt;
    line = fp.readline()&lt;br /&gt;
fp.close()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We could open a file for writing with:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
fp = open(&amp;quot;foo.txt&amp;quot;,&amp;quot;w&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and use:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
fp.write(...)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
to write to that file.&lt;br /&gt;
&lt;br /&gt;
=Object Oriented Programming in Python=&lt;br /&gt;
&lt;br /&gt;
Here is an example of using a class in python:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
&lt;br /&gt;
class Radio:&lt;br /&gt;
    &amp;quot;A simple radio&amp;quot;&lt;br /&gt;
    def __init__(self,freq=0.0,name=&amp;quot;&amp;quot;):&lt;br /&gt;
        &amp;quot;Constructor method&amp;quot;&lt;br /&gt;
        self.__frequency=freq&lt;br /&gt;
        self.name=name&lt;br /&gt;
    def tune(self,freq):&lt;br /&gt;
        self.__frequency=freq&lt;br /&gt;
    def tuned_to(self):&lt;br /&gt;
        print self.name, &amp;quot;tuned to:&amp;quot;, self.__frequency&lt;br /&gt;
&lt;br /&gt;
if __name__ == &amp;quot;__main__&amp;quot;:&lt;br /&gt;
    # declare two radio instances&lt;br /&gt;
    car = Radio(name=&amp;quot;car&amp;quot;)&lt;br /&gt;
    kitchen = Radio(91.5,&amp;quot;kitchen&amp;quot;)&lt;br /&gt;
    # call some methods&lt;br /&gt;
    car.tuned_to()&lt;br /&gt;
    kitchen.tuned_to()&lt;br /&gt;
    car.tune(89.3)&lt;br /&gt;
    car.tuned_to()&lt;br /&gt;
    # Docstrings--double quotes at the top of the class:                        &lt;br /&gt;
    print car.__doc__&lt;br /&gt;
    # NB members not private by default:&lt;br /&gt;
    print car.name&lt;br /&gt;
    # BUT leading double underscores will trigger&lt;br /&gt;
    # name mangling and hence the member will be hidden &lt;br /&gt;
    print car.__frequency&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Running the script gives us:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
car tuned to: 0.0&lt;br /&gt;
kitchen tuned to: 91.5&lt;br /&gt;
car tuned to: 89.3&lt;br /&gt;
A simple radio&lt;br /&gt;
car&lt;br /&gt;
Traceback (most recent call last):&lt;br /&gt;
  File &amp;quot;./foo.py&amp;quot;, line 27, in &amp;lt;module&amp;gt;&lt;br /&gt;
    print car.__frequency&lt;br /&gt;
AttributeError: Radio instance has no attribute '__frequency'&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Using Packages=&lt;br /&gt;
&lt;br /&gt;
Python packages are great because they provide us with a whole lot of extra functionality--above and beyond the core language--that we didn't have to write and debug ourselves.&lt;br /&gt;
&lt;br /&gt;
Let's walk through a simple example using a package.  At an interactive prompt type:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
from random import randint&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will give us access to the '''randint(x,y)''' function, which returns a randomly chosen integer from the given range [x,y]:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
4&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
1&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
3&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
0&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
OK, so far so good.  One thing to note is that the above '''import''' statement has drawn the name ''randint'' into our current '''namespace'''.  What if we had already defined a function named ''randint''.  That could cause problems.  In order to protect ourselves from this kind of problem, there are several import variants.&lt;br /&gt;
&lt;br /&gt;
By default, functions will be added to a namespace with the same name as the package.  In order to call the functions we will, in this case, have to prefix them with there namespace:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; import random&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; random.randint(0,10)&lt;br /&gt;
6&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Should we desire, we can apply a little more control and specify the namespace for the import ourselves: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; import random as rnd&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; rnd.randint(0,10)&lt;br /&gt;
3&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Another--more 'devil-may-care'--approach is to do away with the separate namespace and pull everything from a given package into the current namespace:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from random import *&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
9&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; random()&lt;br /&gt;
0.3172268098313996&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
(The '''random()''' function returns a randomly selected floating point number in the range [0, 1)--that is, between 0 and 1, including 0.0 but always smaller than 1.0.)&lt;br /&gt;
&lt;br /&gt;
==A Namespace Collision==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; def randint():&lt;br /&gt;
...     print &amp;quot;dummy function&amp;quot;&lt;br /&gt;
... &lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint()&lt;br /&gt;
dummy function&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from random import randint&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint()&lt;br /&gt;
Traceback (most recent call last):&lt;br /&gt;
  File &amp;quot;&amp;lt;stdin&amp;gt;&amp;quot;, line 1, in &amp;lt;module&amp;gt;&lt;br /&gt;
TypeError: randint() takes exactly 3 arguments (1 given)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
0&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Python for Shell Scripting=&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
from subprocess import call&lt;br /&gt;
call([&amp;quot;ls&amp;quot;, &amp;quot;-l&amp;quot;])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Python as a Glue Languge=&lt;br /&gt;
&lt;br /&gt;
* Calling R from python is possible using: http://rpy.sourceforge.net/index.html.&lt;br /&gt;
* Calling Matlab from python: http://mlabwrap.sourceforge.net.&lt;br /&gt;
* With SWIG you can make many bindings, including Python to C and C++: http://www.swig.org/.&lt;br /&gt;
* Or if Fortran is more your cup-of-tea, you can use f2py: http://cens.ioc.ee/projects/f2py2e/.&lt;br /&gt;
* There are many more examples.&lt;br /&gt;
&lt;br /&gt;
=Command Line Parsing=&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
&lt;br /&gt;
import sys&lt;br /&gt;
&lt;br /&gt;
if __name__ == &amp;quot;__main__&amp;quot;:&lt;br /&gt;
    # We can test on the length of argv&lt;br /&gt;
    if len(sys.argv) &amp;lt; 2:&lt;br /&gt;
        print &amp;quot;usage: to use this script...&amp;quot;&lt;br /&gt;
    else:&lt;br /&gt;
        ii = 0&lt;br /&gt;
        for arg in sys.argv:&lt;br /&gt;
            # (typically) argv[0] is bound to the script name&lt;br /&gt;
            print &amp;quot;arg&amp;quot;, ii, &amp;quot;is:&amp;quot;, arg&lt;br /&gt;
            ii = ii+1&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~$ ./cmdline.py&lt;br /&gt;
usage: to use this script...&lt;br /&gt;
gethin@gethin-desktop:~$ ./cmdline.py fred ginger&lt;br /&gt;
arg 0 is: ./cmdline.py&lt;br /&gt;
arg 1 is: fred&lt;br /&gt;
arg 2 is: ginger&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Databases=&lt;br /&gt;
&lt;br /&gt;
==Simple Databases==&lt;br /&gt;
&lt;br /&gt;
Python provides access to some database packages through some standard packages.  The '''bsddb''' module allows you to access the highly popular '''Berkeley DB database''' from your python code.&lt;br /&gt;
&lt;br /&gt;
The interface to the database provided by this module is very similar to the way in which we access a dictionary.  First, let's populate a database:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
import bsddb&lt;br /&gt;
d = bsddb.btopen('engines.db')&lt;br /&gt;
d['thomas'] = 'blue'&lt;br /&gt;
d['james'] = 'red'&lt;br /&gt;
d['henry'] = 'green'&lt;br /&gt;
d.close()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now let's open the database again and query it's contents:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; d = bsddb.btopen('engines.db')&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; d.keys()&lt;br /&gt;
['henry', 'james', 'thomas']&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; d.first()&lt;br /&gt;
('henry', 'green')&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; d.last()&lt;br /&gt;
('thomas', 'blue')&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; colour = d['james']&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; colour&lt;br /&gt;
'red'&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; del d['henry']&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; d.keys()&lt;br /&gt;
['james', 'thomas']&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Relational Databases==&lt;br /&gt;
&lt;br /&gt;
Relational databases give us more oomph.  '''SQLite''' is a useful relational database to consider as it is light, in that it requires hardly anything in terms of setup or management, yet still understands queries formulated in SQL.  As such it is useful for creating relatively simple examples of SQL access to a database in python and is a stepping stone toward more powerful database packages.&lt;br /&gt;
&lt;br /&gt;
Here is a script which will create a table called '''planets''' in the file '''pytest.db''' and populate with details of the planets in our solar system:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
#&lt;br /&gt;
# Example python script using sqlite3 package&lt;br /&gt;
# to connect to an SQLite database.&lt;br /&gt;
#&lt;br /&gt;
&lt;br /&gt;
import sqlite3&lt;br /&gt;
 &lt;br /&gt;
conn = sqlite3.connect('pytest.db') # or use :memory: to put it in RAM&lt;br /&gt;
&lt;br /&gt;
cursor = conn.cursor()&lt;br /&gt;
 &lt;br /&gt;
# create a table&lt;br /&gt;
cursor.execute(&amp;quot;&amp;quot;&amp;quot;CREATE TABLE planets&lt;br /&gt;
                  (Id INT, Name TEXT, Diameter REAL, &lt;br /&gt;
                   Mass REAL, Orbital_Period REAL)&amp;quot;&amp;quot;&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
# insert a single record&lt;br /&gt;
cursor.execute(&amp;quot;INSERT INTO planets VALUES(1,'Mercury',0.382,0.06,0.24)&amp;quot;)&lt;br /&gt;
conn.commit() # save data to file&lt;br /&gt;
 &lt;br /&gt;
# insert multiple records&lt;br /&gt;
other_planets = [(2,'Venus',0.949,0.82,0.72),&lt;br /&gt;
                 (3,'Earth',1.0,1.0,1.0),&lt;br /&gt;
                 (4,'Mars',0.532,0.11,1.52),&lt;br /&gt;
                 (5,'Jupiter',11.209,317.8,5.20),&lt;br /&gt;
                 (6,'Saturn',9.449,95.2,9.54),&lt;br /&gt;
                 (7,'Uranus',4.007,14.6,19.22),&lt;br /&gt;
                 (8,'Neptune',3.883,17.2,30.06),&lt;br /&gt;
                 (9,'Pluto',0.18,0.002,248.09)]&lt;br /&gt;
cursor.executemany(&amp;quot;INSERT INTO planets VALUES (?,?,?,?,?)&amp;quot;, other_planets)&lt;br /&gt;
conn.commit() # save data to file&lt;br /&gt;
&lt;br /&gt;
# delete a record&lt;br /&gt;
sql = &amp;quot;&amp;quot;&amp;quot;&lt;br /&gt;
DELETE FROM planets&lt;br /&gt;
WHERE Name = 'Pluto'&lt;br /&gt;
&amp;quot;&amp;quot;&amp;quot;&lt;br /&gt;
cursor.execute(sql)  # poor old pluto! &lt;br /&gt;
conn.commit()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And here is a short example script showing a couple of ways to interrogate the database: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
#&lt;br /&gt;
# Example python script using sqlite3 package&lt;br /&gt;
# to connect to an SQLite database.&lt;br /&gt;
#&lt;br /&gt;
&lt;br /&gt;
import sqlite3&lt;br /&gt;
 &lt;br /&gt;
conn = sqlite3.connect('pytest.db') # or use :memory: to put it in RAM&lt;br /&gt;
&lt;br /&gt;
cursor = conn.cursor()&lt;br /&gt;
&lt;br /&gt;
print &amp;quot;All the records in the table, ordered by Name:\n&amp;quot;&lt;br /&gt;
for row in cursor.execute(&amp;quot;SELECT rowid, * FROM planets ORDER BY Name&amp;quot;):&lt;br /&gt;
    print row&lt;br /&gt;
&lt;br /&gt;
print &amp;quot;\n&amp;quot;&lt;br /&gt;
&lt;br /&gt;
print &amp;quot;All the planets with a mass greater than or equal to that of Earth:\n&amp;quot;&lt;br /&gt;
sql = &amp;quot;SELECT * FROM planets WHERE Mass&amp;gt;=?&amp;quot;&lt;br /&gt;
cursor.execute(sql, [(&amp;quot;1.0&amp;quot;)])&lt;br /&gt;
for row in cursor.fetchall():  # or use fetchone()&lt;br /&gt;
    print row&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the results of running the script are:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
All the records in the table, ordered by Name:&lt;br /&gt;
&lt;br /&gt;
(3, 3, u'Earth', 1.0, 1.0, 1.0)&lt;br /&gt;
(5, 5, u'Jupiter', 11.209, 317.80000000000001, 5.2000000000000002)&lt;br /&gt;
(4, 4, u'Mars', 0.53200000000000003, 0.11, 1.52)&lt;br /&gt;
(1, 1, u'Mercury', 0.38200000000000001, 0.059999999999999998, 0.23999999999999999)&lt;br /&gt;
(8, 8, u'Neptune', 3.883, 17.199999999999999, 30.059999999999999)&lt;br /&gt;
(6, 6, u'Saturn', 9.4489999999999998, 95.200000000000003, 9.5399999999999991)&lt;br /&gt;
(7, 7, u'Uranus', 4.0069999999999997, 14.6, 19.219999999999999)&lt;br /&gt;
(2, 2, u'Venus', 0.94899999999999995, 0.81999999999999995, 0.71999999999999997)&lt;br /&gt;
&lt;br /&gt;
All the planets with a mass greater than or equal to that of Earth:&lt;br /&gt;
&lt;br /&gt;
(3, u'Earth', 1.0, 1.0, 1.0),&lt;br /&gt;
(5, u'Jupiter', 11.209, 317.80000000000001, 5.2000000000000002),&lt;br /&gt;
(6, u'Saturn', 9.4489999999999998, 95.200000000000003, 9.5399999999999991),&lt;br /&gt;
(7, u'Uranus', 4.0069999999999997, 14.6, 19.219999999999999),&lt;br /&gt;
(8, u'Neptune', 3.883, 17.199999999999999, 30.059999999999999)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more information on using SQLite with Python, see, e.g.:&lt;br /&gt;
* http://zetcode.com/db/sqlitepythontutorial/&lt;br /&gt;
* http://www.blog.pythonlibrary.org/2012/07/18/python-a-simple-step-by-step-sqlite-tutorial/&lt;br /&gt;
&lt;br /&gt;
You can also connect to a MySQL database from python using, e.g. the [http://mysql-python.sourceforge.net/ python-mysqldb] package.  A snippet of python code for connecting to a database is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
import MySQLdb&lt;br /&gt;
&lt;br /&gt;
conn = MySQLdb.connect(host=&amp;quot;localhost&amp;quot;,   # your host, usually localhost&lt;br /&gt;
                     user=&amp;quot;gethin&amp;quot;,      # your username&lt;br /&gt;
                      passwd=&amp;quot;changeme&amp;quot;, # your password&lt;br /&gt;
                      db=&amp;quot;menagerie&amp;quot;)    # name of the data base&lt;br /&gt;
&lt;br /&gt;
# Create a cursor object, as before with SQLite&lt;br /&gt;
cur = conn.cursor() &lt;br /&gt;
&lt;br /&gt;
# and then you can submit your SQL command:&lt;br /&gt;
cur.execute(&amp;quot;SELECT * FROM YOUR_TABLE_NAME&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Numpy=&lt;br /&gt;
&lt;br /&gt;
OK, let's move onto looking at python's numerical processing capabilities.  We will start by looking at the '''numpy''' package:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
from numpy import *&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now that we have access to the functions from '''numpy''', let's create an array.  '''Note that a numpy array is an object of a different type to an intrinsic array in Python'''.   A simple approach is to use the '''array''' function.  For example we might enter:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
a = array([[1.0,0.0,0.0],[0.0,1.0,0.0],[0.0,0.0,1.0]])&lt;br /&gt;
b = array([[1,2,3],[4,5,6],[7,8,9]])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a&lt;br /&gt;
array([[ 1.,  0.,  0.],&lt;br /&gt;
       [ 0.,  1.,  0.],&lt;br /&gt;
       [ 0.,  0.,  1.]])&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; b        &lt;br /&gt;
array([[1, 2, 3],&lt;br /&gt;
       [4, 5, 6],&lt;br /&gt;
       [7, 8, 9]])&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; transpose(b)&lt;br /&gt;
array([[1, 4, 7],&lt;br /&gt;
       [2, 5, 8],&lt;br /&gt;
       [3, 6, 9]])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Given an array, we may inquire about it's shape:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
print a.shape&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and we are told that it is a 2-dimensional array (i.e. an array of rank 2) and that the length of both dimensions is 3:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
(3, 3)&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
We can also apply operators to array objects.  For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
a = a * 9&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
array([[ 9.,  0.,  0.],&lt;br /&gt;
       [ 0.,  9.,  0.],&lt;br /&gt;
       [ 0.,  0.,  9.]])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Note, however, that most operations on numpy arrays are done element-wise''', which is '''different to a linear algebra operation that you may have been expecting.'''  We will return to linear algebra operations when we look at the '''scipy''' package.&lt;br /&gt;
&lt;br /&gt;
Should we so desire, we could re-shape the array.  One way to do this is to to set it's shape attribute directly:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a.shape = (1,9)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a&lt;br /&gt;
array([[ 9.,  0.,  0.,  0.,  9.,  0.,  0.,  0.,  9.]])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As with the list example, it can be useful to read or change the value of an element (or sub array) individually.  Let's turn the array back to it's rank-2 form and try it out:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a.shape = (3,3)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a[1,1] = 777.0&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print a&lt;br /&gt;
[[   9.    0.    0.]&lt;br /&gt;
 [   0.  777.    0.]&lt;br /&gt;
 [   0.    0.    9.]]&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a[1:,1:] = [[777.0, 777.0],[777.0, 777.0]]&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print a&lt;br /&gt;
[[   9.    0.    0.]&lt;br /&gt;
 [   0.  777.  777.]&lt;br /&gt;
 [   0.  777.  777.]]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This is all pretty handy so far, but specifying the value of each element explicitly could become a chore.  Happily some helper functions exist to give you a head start with some building blocks.  For example, your can use:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; b = zeros((3,3))&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print b&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; b = ones((3,2))&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print b&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; b = identity(2)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print b&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; big = resize(b, (6,6))&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print big&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The use of '''resize''' in the last example illustrates a useful '''replicating feature'''.&lt;br /&gt;
&lt;br /&gt;
A list of all the functions and operations contained within numpy is: http://scipy.org/Numpy_Example_List.&lt;br /&gt;
&lt;br /&gt;
=Pylab and Matplotlib=&lt;br /&gt;
&lt;br /&gt;
The above examples are quite natty, but we have deliberately kept the array sizes small so that we can print the element values easily.  In practice, you may find that your array sizes are much larger and printing the values to the screen is impractical.  Fear not!  Python has many packages which help you plot your data, so that you can explore it.&lt;br /&gt;
&lt;br /&gt;
Using the pylab plotting interface we can create:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
import pylab&lt;br /&gt;
from numpy import arange, pi, cos, sin, add, sqrt&lt;br /&gt;
t = arange(0.0, 3.0, 0.01)&lt;br /&gt;
c = cos(2 * pi * t)&lt;br /&gt;
s = sin(2 * pi * t)&lt;br /&gt;
pylab.ylabel('some numbers')&lt;br /&gt;
pylab.xlabel('some more numbers')&lt;br /&gt;
pylab.plot(t, c, 'r', lw=2)&lt;br /&gt;
pylab.plot(t, s, 'b', lw=2)&lt;br /&gt;
pylab.plot(t, c-s, 'gs', lw=2)&lt;br /&gt;
pylab.ylim(-1.5, 1.5)&lt;br /&gt;
pylab.title('sin and cos functions')&lt;br /&gt;
pylab.savefig('curves', dpi=300)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where '''curves.png''' looks like:&lt;br /&gt;
&lt;br /&gt;
[[Image:Curves.png|thumb|600px|none|Some nice curves]]&lt;br /&gt;
&lt;br /&gt;
You can open .png images from the linux command line (inc. bluecrystal) using, e.g.: '''display -resize 1000 curves.png''' &lt;br /&gt;
&lt;br /&gt;
We can also use Matplotlib directly for more control:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
import matplotlib.pyplot as plt&lt;br /&gt;
from pylab import meshgrid&lt;br /&gt;
from numpy import arange, add, sin, sqrt&lt;br /&gt;
x = arange(-5,10)&lt;br /&gt;
y = arange(-4,11)&lt;br /&gt;
z1 = sqrt(add.outer(x**2,y**2))&lt;br /&gt;
Z = sin(z1)/z1 &lt;br /&gt;
X, Y = meshgrid(x,y)&lt;br /&gt;
plt.figure()&lt;br /&gt;
plt.contour(X,Y,Z)&lt;br /&gt;
plt.show()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and you should get a window similar to:&lt;br /&gt;
&lt;br /&gt;
[[Image:Sinc-matplotlib-contour.png|thumb|600px|none|A contour map of the sinc function]]&lt;br /&gt;
&lt;br /&gt;
Perhaps the best way next step for matplotlib is to look at the gallery: http://matplotlib.org/gallery.html.&lt;br /&gt;
Just click on a figure and you will get the code used to generate it--a really great resource!&lt;br /&gt;
&lt;br /&gt;
==Input and Output==&lt;br /&gt;
&lt;br /&gt;
The foregoing is all very interesting, but life would be rather dull if you had to re-enter all your data by hand whenever you set to work with Python and numpy.  Therefore we need a means to save data to a file and load it again.  Happily, we can do this rather easily using a couple of routines from the '''pylab''' package:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from numpy import *&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from pylab import load&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from pylab import save&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; data = zeros((3,3))&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; save('myfile.txt', data)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; read_data = load(&amp;quot;myfile.txt&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''warning, the load() function of numpy will be shadowed''' in the above example.  One way to protect yourself against this is to make use of '''namespaces''':  Modify your import command to '''import pylab''' and then use '''pylab.load(..)'''.&lt;br /&gt;
&lt;br /&gt;
=Scipy=&lt;br /&gt;
&lt;br /&gt;
* http://www.scipy.org/&lt;br /&gt;
* ..and good examples on http://scipy-lectures.github.com/intro/scipy.html&lt;br /&gt;
* Many useful features:&lt;br /&gt;
* Integration &amp;amp; Differentiation&lt;br /&gt;
* Optimisation (curve fitting, etc)&lt;br /&gt;
* Fourier transforms&lt;br /&gt;
* Signal processing&lt;br /&gt;
* Statistical algorithms&lt;br /&gt;
* Much, much more...&lt;br /&gt;
* If you know Python you can use SciPy&lt;br /&gt;
&lt;br /&gt;
==An example: Differentiation==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; # derivative of x^2 at x=3&lt;br /&gt;
...&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from scipy import derivative&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; derivative(lambda x: x**2, 3)&lt;br /&gt;
6.0&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; # also works with arrays&lt;br /&gt;
...&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from numpy import array&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; my_array = array([1,2,3])&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; derivative(lambda x: x**2,my_array)&lt;br /&gt;
array([ 2., 4., 6.])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Google for many more examples pertaining to your favourite numerical procedure!&lt;br /&gt;
&lt;br /&gt;
=A Repository of Packages You Could Use=&lt;br /&gt;
&lt;br /&gt;
Now, we've touched on a couple, but there are thousands of python packages available.  Before you start writing your own function for X, check that someone hasn't contributed code for that already at http://pypi.python.org/pypi.&lt;br /&gt;
&lt;br /&gt;
'''pip''', the python package manager will look in pypi by default to install a package.  You can use the '''--user''' option to install python packages in your own user space.  See:&lt;br /&gt;
* https://pip.readthedocs.org/en/latest/&lt;br /&gt;
for more information on pip.&lt;br /&gt;
&lt;br /&gt;
=Writing Faster Python=&lt;br /&gt;
&lt;br /&gt;
As with other scripting languages, such as MATLAB and R, one of the simplest ways in which you can write faster python code is to eliminate loops by vectorising your code.&lt;br /&gt;
&lt;br /&gt;
Consider the following two scripts.  First '''for-loop.py''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
&lt;br /&gt;
import numpy as np&lt;br /&gt;
arr = np.random.rand(1000000)&lt;br /&gt;
&lt;br /&gt;
def filter(arr):&lt;br /&gt;
    for i, val in enumerate(arr):&lt;br /&gt;
        if val &amp;lt; 0.5:&lt;br /&gt;
            arr[i] = 0&lt;br /&gt;
    return arr&lt;br /&gt;
&lt;br /&gt;
if __name__ == &amp;quot;__main__&amp;quot;:&lt;br /&gt;
    filter(arr)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and secondly, '''vectorised.py''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
&lt;br /&gt;
import numpy as np&lt;br /&gt;
arr = np.random.rand(1000000)&lt;br /&gt;
&lt;br /&gt;
def filter(arr):&lt;br /&gt;
    arr[arr &amp;lt; 0.5] = 0&lt;br /&gt;
    return arr&lt;br /&gt;
&lt;br /&gt;
if __name__ == &amp;quot;__main__&amp;quot;:&lt;br /&gt;
    filter(arr)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If we now run these two scripts through the Linux command line '''time''' utility, we see that the vectorised code runs a lot faster than the for loop:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~$ time ./for-loop.py &lt;br /&gt;
&lt;br /&gt;
real	0m0.963s&lt;br /&gt;
user	0m0.952s&lt;br /&gt;
sys	0m0.012s&lt;br /&gt;
gethin@gethin-desktop:~$ time ./vectorised.py &lt;br /&gt;
&lt;br /&gt;
real	0m0.116s&lt;br /&gt;
user	0m0.096s&lt;br /&gt;
sys	0m0.020s&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For some more tips on writing faster python code, and examples of how to use one of the python profiler modules, take a look at:&lt;br /&gt;
* https://wiki.python.org/moin/PythonSpeed/PerformanceTips&lt;br /&gt;
* http://technicaldiscovery.blogspot.co.uk/2011/06/speeding-up-python-numpy-cython-and.html&lt;br /&gt;
* http://www.huyng.com/posts/python-performance-analysis/&lt;br /&gt;
* http://www.appneta.com/2012/05/21/profiling-python-performance-lineprof-statprof-cprofile/&lt;br /&gt;
&lt;br /&gt;
=Further Reading=&lt;br /&gt;
&lt;br /&gt;
* http://docs.python.org/tutorial/&lt;br /&gt;
* http://wiki.python.org/moin/PythonBooks&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=Python1&amp;diff=9459</id>
		<title>Python1</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=Python1&amp;diff=9459"/>
		<updated>2014-10-08T10:53:55Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[category:Pragmatic Programming]]&lt;br /&gt;
'''Python for Scientists'''&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
[[Image:Python.png|thumb|1100px|none|http://xkcd.com/353/]]&lt;br /&gt;
&lt;br /&gt;
With thanks to Simon Metson and Mike Wallace for much of the following material.&lt;br /&gt;
&lt;br /&gt;
=Getting Started on BlueCrystal Phase-2=&lt;br /&gt;
&lt;br /&gt;
After you have logged in, type the following at the command line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module add languages/python-2.7.2.0&lt;br /&gt;
python&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This should start up an interactive python session:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Python 2.7.2 (default, Aug 25 2011, 10:51:03) &lt;br /&gt;
[GCC 4.3.3] on linux2&lt;br /&gt;
Type &amp;quot;help&amp;quot;, &amp;quot;copyright&amp;quot;, &amp;quot;credits&amp;quot; or &amp;quot;license&amp;quot; for more information.&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where we can type commands at the '''&amp;gt;&amp;gt;&amp;gt;''' prompt.&lt;br /&gt;
&lt;br /&gt;
=Python as a Calculator=&lt;br /&gt;
&lt;br /&gt;
To get started, let's just try a few commands out.  If you type:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print &amp;quot;Hello!&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
you'll get:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hello!&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you try:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print 5 + 9&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
you'll get:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
14&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far so simple!  Here is a copy of a session containing a few more commands where we've set the values of some variables and also defined and run our own function: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; five = 5&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; neuf = 9&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print five + neuf&lt;br /&gt;
14&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; def say_hello():&lt;br /&gt;
...     print &amp;quot;Hello, world!&amp;quot;&lt;br /&gt;
... # hit return here &lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; say_hello()&lt;br /&gt;
Hello, world!&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can exit an interactive session at any time by typing '''Ctrl-D'''.&lt;br /&gt;
&lt;br /&gt;
=Getting Help=&lt;br /&gt;
&lt;br /&gt;
One of the good things about Python is that it has lots of useful online documentation.  ([[A_Good_Read|There are good books on the language too]].)  For example, take a look at: http://docs.python.org/.  You can also type '''help()''' and the interpreter prompt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; help()&lt;br /&gt;
&lt;br /&gt;
Welcome to Python 2.7!  This is the online help utility.&lt;br /&gt;
&lt;br /&gt;
If this is your first time using Python, you should definitely check out&lt;br /&gt;
the tutorial on the Internet at http://docs.python.org/tutorial/.&lt;br /&gt;
&lt;br /&gt;
Enter the name of any module, keyword, or topic to get help on writing&lt;br /&gt;
Python programs and using Python modules.  To quit this help utility and&lt;br /&gt;
return to the interpreter, just type &amp;quot;quit&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
...&lt;br /&gt;
&lt;br /&gt;
help&amp;gt; keywords&lt;br /&gt;
&lt;br /&gt;
Here is a list of the Python keywords.  Enter any keyword to get more help.&lt;br /&gt;
&lt;br /&gt;
and                 elif                if                  print&lt;br /&gt;
...&lt;br /&gt;
&lt;br /&gt;
help&amp;gt; if&lt;br /&gt;
The ``if`` statement&lt;br /&gt;
********************&lt;br /&gt;
&lt;br /&gt;
The ``if`` statement is used for conditional execution:&lt;br /&gt;
&lt;br /&gt;
   if_stmt ::= &amp;quot;if&amp;quot; expression &amp;quot;:&amp;quot; suite&lt;br /&gt;
               ( &amp;quot;elif&amp;quot; expression &amp;quot;:&amp;quot; suite )*&lt;br /&gt;
               [&amp;quot;else&amp;quot; &amp;quot;:&amp;quot; suite]&lt;br /&gt;
&lt;br /&gt;
It selects exactly one of the suites by evaluating the expressions one&lt;br /&gt;
by one until one is found to be true...&lt;br /&gt;
...&lt;br /&gt;
&lt;br /&gt;
help&amp;gt; quit&lt;br /&gt;
&lt;br /&gt;
You are now leaving help and returning to the Python interpreter.&lt;br /&gt;
...&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Making a Script=&lt;br /&gt;
&lt;br /&gt;
An interactive session can be fun and useful for trying things out.  However--to save our fingers--we will typically want to execute a series of commands as a script, created using your favourite text editor.  Here are the contents of an example script:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/env python&lt;br /&gt;
&lt;br /&gt;
print &amp;quot;Hello, from a python script!&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
Ensure that your script is executable:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
chmod u+x myscript.py&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and now you can run it:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[ggdagw@bigblue4 ~]$ ./myscript.py &lt;br /&gt;
Hello, from a python script!&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Python and Whitespace=&lt;br /&gt;
&lt;br /&gt;
Love it of hate it, Python incorporates whitespace in it's syntax. (It's either that or demarcate blocks with some other syntax, such as ending a line with a semi-colon as it is in C.  Pick your poison.)  Spacing is therefore key in creating a valid python script.  For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
message = &amp;quot;happy days!&amp;quot;&lt;br /&gt;
if len(message) &amp;gt; 10:&lt;br /&gt;
    print &amp;quot;longer..&amp;quot;&lt;br /&gt;
else:&lt;br /&gt;
    print &amp;quot;shorter..&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
will work, but:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
message = &amp;quot;happy days!&amp;quot;&lt;br /&gt;
if len(message) &amp;gt; 10:&lt;br /&gt;
 print &amp;quot;longer..&amp;quot;&lt;br /&gt;
else:&lt;br /&gt;
print &amp;quot;shorter..&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
will not:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
  File &amp;quot;./myscript.py&amp;quot;, line 7&lt;br /&gt;
    print &amp;quot;shorter..&amp;quot;&lt;br /&gt;
        ^&lt;br /&gt;
IndentationError: expected an indented block&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It is therefore a great advantage, when writing to python script, to use a text editor which has a dedicated python mode--such as '''emacs'''--and will actively help you to keep your spacing correct.  See, http://wiki.python.org/moin/PythonEditors, for an extensive list.&lt;br /&gt;
&lt;br /&gt;
=Some Suggested Exercises=&lt;br /&gt;
&lt;br /&gt;
* Calculate the volume of a sphere (Hint: 4/3*pi*r^3)&lt;br /&gt;
* Concatenate two strings&lt;br /&gt;
* Write a recursive function to compute fibonacci numbers (Hint: F(n) = F(n-1) +F(n-2), F(0)=0 and F(1)=1)&lt;br /&gt;
&lt;br /&gt;
=Nuts and Bolts=&lt;br /&gt;
&lt;br /&gt;
==Types==&lt;br /&gt;
&lt;br /&gt;
Python has intrinsic types including, integers, floats, booleans and complex numbers.  It is dynamically typed (meaning that you don't have to have a block of variable declarations at the top of your script), but it is '''not weakly''' typed, for example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; my_complex = 2 + 0.5j&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; my_complex&lt;br /&gt;
(2+0.5j)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; my_complex.real&lt;br /&gt;
2.0&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; my_complex.imag&lt;br /&gt;
0.5&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; name = 'fred'&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; lucky = 7&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; name + lucky&lt;br /&gt;
Traceback (most recent call last):&lt;br /&gt;
  File &amp;quot;&amp;lt;stdin&amp;gt;&amp;quot;, line 1, in &amp;lt;module&amp;gt;&lt;br /&gt;
TypeError: cannot concatenate 'str' and 'int' objects&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Strings==&lt;br /&gt;
&lt;br /&gt;
The eagle-eyed will have spotted in a previous examples that we could ask the length a character string--straight off the bat.  No need to write a counting routine ourselves:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
message = &amp;quot;happy days!&amp;quot;&lt;br /&gt;
print len(message)&lt;br /&gt;
11&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We also take '''slices''' of our character string.  In my case&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
print message[:5]&lt;br /&gt;
happy&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Since a string is an '''object''' (in the object oriented programming sense of the word, but more of that another time...) we can call a number of methods that operate on a string.  A selected sample include:&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot;&lt;br /&gt;
|| s.find(sub) || Finds the first occurrence of the given substring&lt;br /&gt;
|-&lt;br /&gt;
|| s.islower() || Checks whether all characters are lowercase&lt;br /&gt;
|-&lt;br /&gt;
|| s.upper() || Returns '''s''' converted to uppercase&lt;br /&gt;
|-&lt;br /&gt;
|| s.strip() || Removes leading and trailing whitespace&lt;br /&gt;
|-&lt;br /&gt;
|| s.replace(old,new) || Replaces substring '''old''' with '''new'''&lt;br /&gt;
|-&lt;br /&gt;
|| s.split([sep]) || Splits '''s''' uses (optional) '''sep''' as a delimiter.  Returns a list&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Lists and Tuples==&lt;br /&gt;
&lt;br /&gt;
An example of a list is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
shopping = ['bread', 'marmalade', 'milk', 'tea']&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and we can inquire about the length of that using the same function as before:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
len(shopping)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can also take '''slices''' of a list, as we did with a string:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
shopping[0:2]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and even reset a portion of the list that way:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
shopping[0:2] = ['bagels', 'jam']&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Since a list is also an object, we have more handy methods, including:&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot;&lt;br /&gt;
|| s.append(x) || Appends an new element '''x''' to the end of '''s'''&lt;br /&gt;
|-&lt;br /&gt;
|| s.count(x) || Returns the number of occurences of '''x''' in '''s'''&lt;br /&gt;
|-&lt;br /&gt;
|| s.reverse(x) || Reverses items of '''s''' in place&lt;br /&gt;
|-&lt;br /&gt;
|| s.sort([compfunc]) || Sorts items of '''s''' in place.  '''compfunc''' is an optional comparison function&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Tuples are very similar to lists and support many of the same operations (indexing, slicing, concatenation etc.) but differ in that they are '''not mutable''' after creation:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source land=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; mytuple = ('fred', 'ginger', 7, 2.5)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; mylist = ['fred', 'ginger', 7, 2.5]&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; mylist[2] = 8&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print mylist&lt;br /&gt;
['fred', 'ginger', 8, 2.5]&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print mytuple[2]    &lt;br /&gt;
7&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; mytuple[2] = 8&lt;br /&gt;
Traceback (most recent call last):&lt;br /&gt;
  File &amp;quot;&amp;lt;stdin&amp;gt;&amp;quot;, line 1, in &amp;lt;module&amp;gt;&lt;br /&gt;
TypeError: 'tuple' object does not support item assignment&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
List comprehension:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; numbers = [12, 3, 90, 40, 52, 11, 10]&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; small_numbers_doubled = [number * 2 for number in numbers if number &amp;lt; 20]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; small_numbers_doubled&lt;br /&gt;
[24, 6, 22, 20]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Dictionaries==&lt;br /&gt;
&lt;br /&gt;
A dictionary is an associative array or hash table, containing '''key-value''' pairs:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
mydict = {'thomas':'blue', 'james':'red', 'henry':'green'}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print mydict['james']&lt;br /&gt;
red&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can write much more user-friendly and intuitive code using dictionaries, rather than arbitrary indexes into a list.&lt;br /&gt;
&lt;br /&gt;
Some example dictionary methods are:&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot;&lt;br /&gt;
|| m.keys() || Returns a list of the keys in '''m'''&lt;br /&gt;
|-&lt;br /&gt;
|| m.items() || Returns a list of the (key,value) pairs in '''m'''&lt;br /&gt;
|-&lt;br /&gt;
|| m[k] = x || Sets m[k] to x&lt;br /&gt;
|-&lt;br /&gt;
|| m.update(b) || Adds objects from dictionary '''b''' to '''m'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Control Structures==&lt;br /&gt;
&lt;br /&gt;
Of course, we'll need conditionals and loops etc. to go beyond the simplest of scripts.  Here is an '''if-then-else''', python style:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
if sky == ‘blue’:&lt;br /&gt;
    birds_sing()&lt;br /&gt;
elif sky == ‘black’:&lt;br /&gt;
    birds_sleep()&lt;br /&gt;
else:&lt;br /&gt;
    pass #do nothing&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and a classic '''for loop''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
for ii in range(1,10):&lt;br /&gt;
    print ii&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1&lt;br /&gt;
...&lt;br /&gt;
9&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We'll also see a '''while loop''' shoehorned into the next example.&lt;br /&gt;
&lt;br /&gt;
For our control statements, we can use comparison operators such as, '''==''', '''!=''', '''&amp;gt;''', '''&amp;lt;''', '''&amp;lt;=''', '''&amp;gt;=''', and logical operators, such as, '''and''', '''or''','''not'''&lt;br /&gt;
&lt;br /&gt;
==File Input and Output==&lt;br /&gt;
&lt;br /&gt;
Here's some code for printing the contents of a text file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
fp = open(&amp;quot;foo.txt&amp;quot;,&amp;quot;r&amp;quot;)&lt;br /&gt;
line = fp.readline()&lt;br /&gt;
while line:&lt;br /&gt;
    line = line.strip()&lt;br /&gt;
    print line&lt;br /&gt;
    line = fp.readline()&lt;br /&gt;
fp.close()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We could open a file for writing with:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
fp = open(&amp;quot;foo.txt&amp;quot;,&amp;quot;w&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and use:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
fp.write(...)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
to write to that file.&lt;br /&gt;
&lt;br /&gt;
==Command Line Parsing==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
&lt;br /&gt;
import sys&lt;br /&gt;
&lt;br /&gt;
if __name__ == &amp;quot;__main__&amp;quot;:&lt;br /&gt;
    # We can test on the length of argv&lt;br /&gt;
    if len(sys.argv) &amp;lt; 2:&lt;br /&gt;
        print &amp;quot;usage: to use this script...&amp;quot;&lt;br /&gt;
    else:&lt;br /&gt;
        ii = 0&lt;br /&gt;
        for arg in sys.argv:&lt;br /&gt;
            # (typically) argv[0] is bound to the script name&lt;br /&gt;
            print &amp;quot;arg&amp;quot;, ii, &amp;quot;is:&amp;quot;, arg&lt;br /&gt;
            ii = ii+1&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~$ ./cmdline.py&lt;br /&gt;
usage: to use this script...&lt;br /&gt;
gethin@gethin-desktop:~$ ./cmdline.py fred ginger&lt;br /&gt;
arg 0 is: ./cmdline.py&lt;br /&gt;
arg 1 is: fred&lt;br /&gt;
arg 2 is: ginger&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Object Oriented Programming in Python=&lt;br /&gt;
&lt;br /&gt;
Here is an example of using a class in python:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
&lt;br /&gt;
class Radio:&lt;br /&gt;
    &amp;quot;A simple radio&amp;quot;&lt;br /&gt;
    def __init__(self,freq=0.0,name=&amp;quot;&amp;quot;):&lt;br /&gt;
        &amp;quot;Constructor method&amp;quot;&lt;br /&gt;
        self.__frequency=freq&lt;br /&gt;
        self.name=name&lt;br /&gt;
    def tune(self,freq):&lt;br /&gt;
        self.__frequency=freq&lt;br /&gt;
    def tuned_to(self):&lt;br /&gt;
        print self.name, &amp;quot;tuned to:&amp;quot;, self.__frequency&lt;br /&gt;
&lt;br /&gt;
if __name__ == &amp;quot;__main__&amp;quot;:&lt;br /&gt;
    # declare two radio instances&lt;br /&gt;
    car = Radio(name=&amp;quot;car&amp;quot;)&lt;br /&gt;
    kitchen = Radio(91.5,&amp;quot;kitchen&amp;quot;)&lt;br /&gt;
    # call some methods&lt;br /&gt;
    car.tuned_to()&lt;br /&gt;
    kitchen.tuned_to()&lt;br /&gt;
    car.tune(89.3)&lt;br /&gt;
    car.tuned_to()&lt;br /&gt;
    # Docstrings--double quotes at the top of the class:                        &lt;br /&gt;
    print car.__doc__&lt;br /&gt;
    # NB members not private by default:&lt;br /&gt;
    print car.name&lt;br /&gt;
    # BUT leading double underscores will trigger&lt;br /&gt;
    # name mangling and hence the member will be hidden &lt;br /&gt;
    print car.__frequency&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Running the script gives us:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
car tuned to: 0.0&lt;br /&gt;
kitchen tuned to: 91.5&lt;br /&gt;
car tuned to: 89.3&lt;br /&gt;
A simple radio&lt;br /&gt;
car&lt;br /&gt;
Traceback (most recent call last):&lt;br /&gt;
  File &amp;quot;./foo.py&amp;quot;, line 27, in &amp;lt;module&amp;gt;&lt;br /&gt;
    print car.__frequency&lt;br /&gt;
AttributeError: Radio instance has no attribute '__frequency'&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Using Packages=&lt;br /&gt;
&lt;br /&gt;
Python packages are great because they provide us with a whole lot of extra functionality--above and beyond the core language--that we didn't have to write and debug ourselves.&lt;br /&gt;
&lt;br /&gt;
Let's walk through a simple example using a package.  At an interactive prompt type:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
from random import randint&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will give us access to the '''randint(x,y)''' function, which returns a randomly chosen integer from the given range [x,y]:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
4&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
1&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
3&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
0&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
OK, so far so good.  One thing to note is that the above '''import''' statement has drawn the name ''randint'' into our current '''namespace'''.  What if we had already defined a function named ''randint''.  That could cause problems.  In order to protect ourselves from this kind of problem, there are several import variants.&lt;br /&gt;
&lt;br /&gt;
By default, functions will be added to a namespace with the same name as the package.  In order to call the functions we will, in this case, have to prefix them with there namespace:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; import random&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; random.randint(0,10)&lt;br /&gt;
6&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Should we desire, we can apply a little more control and specify the namespace for the import ourselves: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; import random as rnd&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; rnd.randint(0,10)&lt;br /&gt;
3&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Another--more 'devil-may-care'--approach is to do away with the separate namespace and pull everything from a given package into the current namespace:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from random import *&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
9&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; random()&lt;br /&gt;
0.3172268098313996&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
(The '''random()''' function returns a randomly selected floating point number in the range [0, 1)--that is, between 0 and 1, including 0.0 but always smaller than 1.0.)&lt;br /&gt;
&lt;br /&gt;
==A Namespace Collision==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; def randint():&lt;br /&gt;
...     print &amp;quot;dummy function&amp;quot;&lt;br /&gt;
... &lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint()&lt;br /&gt;
dummy function&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from random import randint&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint()&lt;br /&gt;
Traceback (most recent call last):&lt;br /&gt;
  File &amp;quot;&amp;lt;stdin&amp;gt;&amp;quot;, line 1, in &amp;lt;module&amp;gt;&lt;br /&gt;
TypeError: randint() takes exactly 3 arguments (1 given)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
0&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Python for Shell Scripting=&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
from subprocess import call&lt;br /&gt;
call([&amp;quot;ls&amp;quot;, &amp;quot;-l&amp;quot;])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Python as a Glue Languge=&lt;br /&gt;
&lt;br /&gt;
* Calling R from python is possible using: http://rpy.sourceforge.net/index.html.&lt;br /&gt;
* Calling Matlab from python: http://mlabwrap.sourceforge.net.&lt;br /&gt;
* With SWIG you can make many bindings, including Python to C and C++: http://www.swig.org/.&lt;br /&gt;
* Or if Fortran is more your cup-of-tea, you can use f2py: http://cens.ioc.ee/projects/f2py2e/.&lt;br /&gt;
* There are many more examples.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Databases=&lt;br /&gt;
&lt;br /&gt;
==Simple Databases==&lt;br /&gt;
&lt;br /&gt;
Python provides access to some database packages through some standard packages.  The '''bsddb''' module allows you to access the highly popular '''Berkeley DB database''' from your python code.&lt;br /&gt;
&lt;br /&gt;
The interface to the database provided by this module is very similar to the way in which we access a dictionary.  First, let's populate a database:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
import bsddb&lt;br /&gt;
d = bsddb.btopen('engines.db')&lt;br /&gt;
d['thomas'] = 'blue'&lt;br /&gt;
d['james'] = 'red'&lt;br /&gt;
d['henry'] = 'green'&lt;br /&gt;
d.close()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now let's open the database again and query it's contents:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; d = bsddb.btopen('engines.db')&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; d.keys()&lt;br /&gt;
['henry', 'james', 'thomas']&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; d.first()&lt;br /&gt;
('henry', 'green')&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; d.last()&lt;br /&gt;
('thomas', 'blue')&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; colour = d['james']&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; colour&lt;br /&gt;
'red'&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; del d['henry']&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; d.keys()&lt;br /&gt;
['james', 'thomas']&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Relational Databases==&lt;br /&gt;
&lt;br /&gt;
Relational databases give us more oomph.  '''SQLite''' is a useful relational database to consider as it is light, in that it requires hardly anything in terms of setup or management, yet still understands queries formulated in SQL.  As such it is useful for creating relatively simple examples of SQL access to a database in python and is a stepping stone toward more powerful database packages.&lt;br /&gt;
&lt;br /&gt;
Here is a script which will create a table called '''planets''' in the file '''pytest.db''' and populate with details of the planets in our solar system:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
#&lt;br /&gt;
# Example python script using sqlite3 package&lt;br /&gt;
# to connect to an SQLite database.&lt;br /&gt;
#&lt;br /&gt;
&lt;br /&gt;
import sqlite3&lt;br /&gt;
 &lt;br /&gt;
conn = sqlite3.connect('pytest.db') # or use :memory: to put it in RAM&lt;br /&gt;
&lt;br /&gt;
cursor = conn.cursor()&lt;br /&gt;
 &lt;br /&gt;
# create a table&lt;br /&gt;
cursor.execute(&amp;quot;&amp;quot;&amp;quot;CREATE TABLE planets&lt;br /&gt;
                  (Id INT, Name TEXT, Diameter REAL, &lt;br /&gt;
                   Mass REAL, Orbital_Period REAL)&amp;quot;&amp;quot;&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
# insert a single record&lt;br /&gt;
cursor.execute(&amp;quot;INSERT INTO planets VALUES(1,'Mercury',0.382,0.06,0.24)&amp;quot;)&lt;br /&gt;
conn.commit() # save data to file&lt;br /&gt;
 &lt;br /&gt;
# insert multiple records&lt;br /&gt;
other_planets = [(2,'Venus',0.949,0.82,0.72),&lt;br /&gt;
                 (3,'Earth',1.0,1.0,1.0),&lt;br /&gt;
                 (4,'Mars',0.532,0.11,1.52),&lt;br /&gt;
                 (5,'Jupiter',11.209,317.8,5.20),&lt;br /&gt;
                 (6,'Saturn',9.449,95.2,9.54),&lt;br /&gt;
                 (7,'Uranus',4.007,14.6,19.22),&lt;br /&gt;
                 (8,'Neptune',3.883,17.2,30.06),&lt;br /&gt;
                 (9,'Pluto',0.18,0.002,248.09)]&lt;br /&gt;
cursor.executemany(&amp;quot;INSERT INTO planets VALUES (?,?,?,?,?)&amp;quot;, other_planets)&lt;br /&gt;
conn.commit() # save data to file&lt;br /&gt;
&lt;br /&gt;
# delete a record&lt;br /&gt;
sql = &amp;quot;&amp;quot;&amp;quot;&lt;br /&gt;
DELETE FROM planets&lt;br /&gt;
WHERE Name = 'Pluto'&lt;br /&gt;
&amp;quot;&amp;quot;&amp;quot;&lt;br /&gt;
cursor.execute(sql)  # poor old pluto! &lt;br /&gt;
conn.commit()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And here is a short example script showing a couple of ways to interrogate the database: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
#&lt;br /&gt;
# Example python script using sqlite3 package&lt;br /&gt;
# to connect to an SQLite database.&lt;br /&gt;
#&lt;br /&gt;
&lt;br /&gt;
import sqlite3&lt;br /&gt;
 &lt;br /&gt;
conn = sqlite3.connect('pytest.db') # or use :memory: to put it in RAM&lt;br /&gt;
&lt;br /&gt;
cursor = conn.cursor()&lt;br /&gt;
&lt;br /&gt;
print &amp;quot;All the records in the table, ordered by Name:\n&amp;quot;&lt;br /&gt;
for row in cursor.execute(&amp;quot;SELECT rowid, * FROM planets ORDER BY Name&amp;quot;):&lt;br /&gt;
    print row&lt;br /&gt;
&lt;br /&gt;
print &amp;quot;\n&amp;quot;&lt;br /&gt;
&lt;br /&gt;
print &amp;quot;All the planets with a mass greater than or equal to that of Earth:\n&amp;quot;&lt;br /&gt;
sql = &amp;quot;SELECT * FROM planets WHERE Mass&amp;gt;=?&amp;quot;&lt;br /&gt;
cursor.execute(sql, [(&amp;quot;1.0&amp;quot;)])&lt;br /&gt;
for row in cursor.fetchall():  # or use fetchone()&lt;br /&gt;
    print row&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the results of running the script are:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
All the records in the table, ordered by Name:&lt;br /&gt;
&lt;br /&gt;
(3, 3, u'Earth', 1.0, 1.0, 1.0)&lt;br /&gt;
(5, 5, u'Jupiter', 11.209, 317.80000000000001, 5.2000000000000002)&lt;br /&gt;
(4, 4, u'Mars', 0.53200000000000003, 0.11, 1.52)&lt;br /&gt;
(1, 1, u'Mercury', 0.38200000000000001, 0.059999999999999998, 0.23999999999999999)&lt;br /&gt;
(8, 8, u'Neptune', 3.883, 17.199999999999999, 30.059999999999999)&lt;br /&gt;
(6, 6, u'Saturn', 9.4489999999999998, 95.200000000000003, 9.5399999999999991)&lt;br /&gt;
(7, 7, u'Uranus', 4.0069999999999997, 14.6, 19.219999999999999)&lt;br /&gt;
(2, 2, u'Venus', 0.94899999999999995, 0.81999999999999995, 0.71999999999999997)&lt;br /&gt;
&lt;br /&gt;
All the planets with a mass greater than or equal to that of Earth:&lt;br /&gt;
&lt;br /&gt;
(3, u'Earth', 1.0, 1.0, 1.0),&lt;br /&gt;
(5, u'Jupiter', 11.209, 317.80000000000001, 5.2000000000000002),&lt;br /&gt;
(6, u'Saturn', 9.4489999999999998, 95.200000000000003, 9.5399999999999991),&lt;br /&gt;
(7, u'Uranus', 4.0069999999999997, 14.6, 19.219999999999999),&lt;br /&gt;
(8, u'Neptune', 3.883, 17.199999999999999, 30.059999999999999)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more information on using SQLite with Python, see, e.g.:&lt;br /&gt;
* http://zetcode.com/db/sqlitepythontutorial/&lt;br /&gt;
* http://www.blog.pythonlibrary.org/2012/07/18/python-a-simple-step-by-step-sqlite-tutorial/&lt;br /&gt;
&lt;br /&gt;
You can also connect to a MySQL database from python using, e.g. the [http://mysql-python.sourceforge.net/ python-mysqldb] package.  A snippet of python code for connecting to a database is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
import MySQLdb&lt;br /&gt;
&lt;br /&gt;
conn = MySQLdb.connect(host=&amp;quot;localhost&amp;quot;,   # your host, usually localhost&lt;br /&gt;
                     user=&amp;quot;gethin&amp;quot;,      # your username&lt;br /&gt;
                      passwd=&amp;quot;changeme&amp;quot;, # your password&lt;br /&gt;
                      db=&amp;quot;menagerie&amp;quot;)    # name of the data base&lt;br /&gt;
&lt;br /&gt;
# Create a cursor object, as before with SQLite&lt;br /&gt;
cur = conn.cursor() &lt;br /&gt;
&lt;br /&gt;
# and then you can submit your SQL command:&lt;br /&gt;
cur.execute(&amp;quot;SELECT * FROM YOUR_TABLE_NAME&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Numpy=&lt;br /&gt;
&lt;br /&gt;
OK, let's move onto looking at python's numerical processing capabilities.  We will start by looking at the '''numpy''' package:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
from numpy import *&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now that we have access to the functions from '''numpy''', let's create an array.  '''Note that a numpy array is an object of a different type to an intrinsic array in Python'''.   A simple approach is to use the '''array''' function.  For example we might enter:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
a = array([[1.0,0.0,0.0],[0.0,1.0,0.0],[0.0,0.0,1.0]])&lt;br /&gt;
b = array([[1,2,3],[4,5,6],[7,8,9]])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a&lt;br /&gt;
array([[ 1.,  0.,  0.],&lt;br /&gt;
       [ 0.,  1.,  0.],&lt;br /&gt;
       [ 0.,  0.,  1.]])&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; b        &lt;br /&gt;
array([[1, 2, 3],&lt;br /&gt;
       [4, 5, 6],&lt;br /&gt;
       [7, 8, 9]])&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; transpose(b)&lt;br /&gt;
array([[1, 4, 7],&lt;br /&gt;
       [2, 5, 8],&lt;br /&gt;
       [3, 6, 9]])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Given an array, we may inquire about it's shape:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
print a.shape&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and we are told that it is a 2-dimensional array (i.e. an array of rank 2) and that the length of both dimensions is 3:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
(3, 3)&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
We can also apply operators to array objects.  For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
a = a * 9&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
array([[ 9.,  0.,  0.],&lt;br /&gt;
       [ 0.,  9.,  0.],&lt;br /&gt;
       [ 0.,  0.,  9.]])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Note, however, that most operations on numpy arrays are done element-wise''', which is '''different to a linear algebra operation that you may have been expecting.'''  We will return to linear algebra operations when we look at the '''scipy''' package.&lt;br /&gt;
&lt;br /&gt;
Should we so desire, we could re-shape the array.  One way to do this is to to set it's shape attribute directly:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a.shape = (1,9)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a&lt;br /&gt;
array([[ 9.,  0.,  0.,  0.,  9.,  0.,  0.,  0.,  9.]])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As with the list example, it can be useful to read or change the value of an element (or sub array) individually.  Let's turn the array back to it's rank-2 form and try it out:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a.shape = (3,3)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a[1,1] = 777.0&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print a&lt;br /&gt;
[[   9.    0.    0.]&lt;br /&gt;
 [   0.  777.    0.]&lt;br /&gt;
 [   0.    0.    9.]]&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a[1:,1:] = [[777.0, 777.0],[777.0, 777.0]]&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print a&lt;br /&gt;
[[   9.    0.    0.]&lt;br /&gt;
 [   0.  777.  777.]&lt;br /&gt;
 [   0.  777.  777.]]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This is all pretty handy so far, but specifying the value of each element explicitly could become a chore.  Happily some helper functions exist to give you a head start with some building blocks.  For example, your can use:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; b = zeros((3,3))&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print b&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; b = ones((3,2))&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print b&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; b = identity(2)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print b&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; big = resize(b, (6,6))&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print big&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The use of '''resize''' in the last example illustrates a useful '''replicating feature'''.&lt;br /&gt;
&lt;br /&gt;
A list of all the functions and operations contained within numpy is: http://scipy.org/Numpy_Example_List.&lt;br /&gt;
&lt;br /&gt;
=Pylab and Matplotlib=&lt;br /&gt;
&lt;br /&gt;
The above examples are quite natty, but we have deliberately kept the array sizes small so that we can print the element values easily.  In practice, you may find that your array sizes are much larger and printing the values to the screen is impractical.  Fear not!  Python has many packages which help you plot your data, so that you can explore it.&lt;br /&gt;
&lt;br /&gt;
Using the pylab plotting interface we can create:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
import pylab&lt;br /&gt;
from numpy import arange, pi, cos, sin, add, sqrt&lt;br /&gt;
t = arange(0.0, 3.0, 0.01)&lt;br /&gt;
c = cos(2 * pi * t)&lt;br /&gt;
s = sin(2 * pi * t)&lt;br /&gt;
pylab.ylabel('some numbers')&lt;br /&gt;
pylab.xlabel('some more numbers')&lt;br /&gt;
pylab.plot(t, c, 'r', lw=2)&lt;br /&gt;
pylab.plot(t, s, 'b', lw=2)&lt;br /&gt;
pylab.plot(t, c-s, 'gs', lw=2)&lt;br /&gt;
pylab.ylim(-1.5, 1.5)&lt;br /&gt;
pylab.title('sin and cos functions')&lt;br /&gt;
pylab.savefig('curves', dpi=300)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where '''curves.png''' looks like:&lt;br /&gt;
&lt;br /&gt;
[[Image:Curves.png|thumb|600px|none|Some nice curves]]&lt;br /&gt;
&lt;br /&gt;
You can open .png images from the linux command line (inc. bluecrystal) using, e.g.: '''display -resize 1000 curves.png''' &lt;br /&gt;
&lt;br /&gt;
We can also use Matplotlib directly for more control:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
import matplotlib.pyplot as plt&lt;br /&gt;
from pylab import meshgrid&lt;br /&gt;
from numpy import arange, add, sin, sqrt&lt;br /&gt;
x = arange(-5,10)&lt;br /&gt;
y = arange(-4,11)&lt;br /&gt;
z1 = sqrt(add.outer(x**2,y**2))&lt;br /&gt;
Z = sin(z1)/z1 &lt;br /&gt;
X, Y = meshgrid(x,y)&lt;br /&gt;
plt.figure()&lt;br /&gt;
plt.contour(X,Y,Z)&lt;br /&gt;
plt.show()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and you should get a window similar to:&lt;br /&gt;
&lt;br /&gt;
[[Image:Sinc-matplotlib-contour.png|thumb|600px|none|A contour map of the sinc function]]&lt;br /&gt;
&lt;br /&gt;
Perhaps the best way next step for matplotlib is to look at the gallery: http://matplotlib.org/gallery.html.&lt;br /&gt;
Just click on a figure and you will get the code used to generate it--a really great resource!&lt;br /&gt;
&lt;br /&gt;
==Input and Output==&lt;br /&gt;
&lt;br /&gt;
The foregoing is all very interesting, but life would be rather dull if you had to re-enter all your data by hand whenever you set to work with Python and numpy.  Therefore we need a means to save data to a file and load it again.  Happily, we can do this rather easily using a couple of routines from the '''pylab''' package:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from numpy import *&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from pylab import load&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from pylab import save&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; data = zeros((3,3))&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; save('myfile.txt', data)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; read_data = load(&amp;quot;myfile.txt&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''warning, the load() function of numpy will be shadowed''' in the above example.  One way to protect yourself against this is to make use of '''namespaces''':  Modify your import command to '''import pylab''' and then use '''pylab.load(..)'''.&lt;br /&gt;
&lt;br /&gt;
=Scipy=&lt;br /&gt;
&lt;br /&gt;
* http://www.scipy.org/&lt;br /&gt;
* ..and good examples on http://scipy-lectures.github.com/intro/scipy.html&lt;br /&gt;
* Many useful features:&lt;br /&gt;
* Integration &amp;amp; Differentiation&lt;br /&gt;
* Optimisation (curve fitting, etc)&lt;br /&gt;
* Fourier transforms&lt;br /&gt;
* Signal processing&lt;br /&gt;
* Statistical algorithms&lt;br /&gt;
* Much, much more...&lt;br /&gt;
* If you know Python you can use SciPy&lt;br /&gt;
&lt;br /&gt;
==An example: Differentiation==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; # derivative of x^2 at x=3&lt;br /&gt;
...&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from scipy import derivative&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; derivative(lambda x: x**2, 3)&lt;br /&gt;
6.0&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; # also works with arrays&lt;br /&gt;
...&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from numpy import array&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; my_array = array([1,2,3])&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; derivative(lambda x: x**2,my_array)&lt;br /&gt;
array([ 2., 4., 6.])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Google for many more examples pertaining to your favourite numerical procedure!&lt;br /&gt;
&lt;br /&gt;
=A Repository of Packages You Could Use=&lt;br /&gt;
&lt;br /&gt;
Now, we've touched on a couple, but there are thousands of python packages available.  Before you start writing your own function for X, check that someone hasn't contributed code for that already at http://pypi.python.org/pypi.&lt;br /&gt;
&lt;br /&gt;
'''pip''', the python package manager will look in pypi by default to install a package.  You can use the '''--user''' option to install python packages in your own user space.  See:&lt;br /&gt;
* https://pip.readthedocs.org/en/latest/&lt;br /&gt;
for more information on pip.&lt;br /&gt;
&lt;br /&gt;
=Writing Faster Python=&lt;br /&gt;
&lt;br /&gt;
As with other scripting languages, such as MATLAB and R, one of the simplest ways in which you can write faster python code is to eliminate loops by vectorising your code.&lt;br /&gt;
&lt;br /&gt;
Consider the following two scripts.  First '''for-loop.py''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
&lt;br /&gt;
import numpy as np&lt;br /&gt;
arr = np.random.rand(1000000)&lt;br /&gt;
&lt;br /&gt;
def filter(arr):&lt;br /&gt;
    for i, val in enumerate(arr):&lt;br /&gt;
        if val &amp;lt; 0.5:&lt;br /&gt;
            arr[i] = 0&lt;br /&gt;
    return arr&lt;br /&gt;
&lt;br /&gt;
if __name__ == &amp;quot;__main__&amp;quot;:&lt;br /&gt;
    filter(arr)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and secondly, '''vectorised.py''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
&lt;br /&gt;
import numpy as np&lt;br /&gt;
arr = np.random.rand(1000000)&lt;br /&gt;
&lt;br /&gt;
def filter(arr):&lt;br /&gt;
    arr[arr &amp;lt; 0.5] = 0&lt;br /&gt;
    return arr&lt;br /&gt;
&lt;br /&gt;
if __name__ == &amp;quot;__main__&amp;quot;:&lt;br /&gt;
    filter(arr)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If we now run these two scripts through the Linux command line '''time''' utility, we see that the vectorised code runs a lot faster than the for loop:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~$ time ./for-loop.py &lt;br /&gt;
&lt;br /&gt;
real	0m0.963s&lt;br /&gt;
user	0m0.952s&lt;br /&gt;
sys	0m0.012s&lt;br /&gt;
gethin@gethin-desktop:~$ time ./vectorised.py &lt;br /&gt;
&lt;br /&gt;
real	0m0.116s&lt;br /&gt;
user	0m0.096s&lt;br /&gt;
sys	0m0.020s&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For some more tips on writing faster python code, and examples of how to use one of the python profiler modules, take a look at:&lt;br /&gt;
* https://wiki.python.org/moin/PythonSpeed/PerformanceTips&lt;br /&gt;
* http://technicaldiscovery.blogspot.co.uk/2011/06/speeding-up-python-numpy-cython-and.html&lt;br /&gt;
* http://www.huyng.com/posts/python-performance-analysis/&lt;br /&gt;
* http://www.appneta.com/2012/05/21/profiling-python-performance-lineprof-statprof-cprofile/&lt;br /&gt;
&lt;br /&gt;
=Further Reading=&lt;br /&gt;
&lt;br /&gt;
* http://docs.python.org/tutorial/&lt;br /&gt;
* http://wiki.python.org/moin/PythonBooks&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=Python1&amp;diff=9458</id>
		<title>Python1</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=Python1&amp;diff=9458"/>
		<updated>2014-10-08T10:49:37Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: /* A Repository of Packages You Could Use */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[category:Pragmatic Programming]]&lt;br /&gt;
'''Python for Scientists'''&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
[[Image:Python.png|thumb|1100px|none|http://xkcd.com/353/]]&lt;br /&gt;
&lt;br /&gt;
With thanks to Simon Metson and Mike Wallace for much of the following material.&lt;br /&gt;
&lt;br /&gt;
=Getting Started on BlueCrystal Phase-2=&lt;br /&gt;
&lt;br /&gt;
After you have logged in, type the following at the command line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module add languages/python-2.7.2.0&lt;br /&gt;
python&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This should start up an interactive python session:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Python 2.7.2 (default, Aug 25 2011, 10:51:03) &lt;br /&gt;
[GCC 4.3.3] on linux2&lt;br /&gt;
Type &amp;quot;help&amp;quot;, &amp;quot;copyright&amp;quot;, &amp;quot;credits&amp;quot; or &amp;quot;license&amp;quot; for more information.&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where we can type commands at the '''&amp;gt;&amp;gt;&amp;gt;''' prompt.&lt;br /&gt;
&lt;br /&gt;
=Python as a Calculator=&lt;br /&gt;
&lt;br /&gt;
To get started, let's just try a few commands out.  If you type:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print &amp;quot;Hello!&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
you'll get:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hello!&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you try:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print 5 + 9&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
you'll get:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
14&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far so simple!  Here is a copy of a session containing a few more commands where we've set the values of some variables and also defined and run our own function: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; five = 5&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; neuf = 9&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print five + neuf&lt;br /&gt;
14&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; def say_hello():&lt;br /&gt;
...     print &amp;quot;Hello, world!&amp;quot;&lt;br /&gt;
... # hit return here &lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; say_hello()&lt;br /&gt;
Hello, world!&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can exit an interactive session at any time by typing '''Ctrl-D'''.&lt;br /&gt;
&lt;br /&gt;
=Getting Help=&lt;br /&gt;
&lt;br /&gt;
One of the good things about Python is that it has lots of useful online documentation.  ([[A_Good_Read|There are good books on the language too]].)  For example, take a look at: http://docs.python.org/.  You can also type '''help()''' and the interpreter prompt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; help()&lt;br /&gt;
&lt;br /&gt;
Welcome to Python 2.7!  This is the online help utility.&lt;br /&gt;
&lt;br /&gt;
If this is your first time using Python, you should definitely check out&lt;br /&gt;
the tutorial on the Internet at http://docs.python.org/tutorial/.&lt;br /&gt;
&lt;br /&gt;
Enter the name of any module, keyword, or topic to get help on writing&lt;br /&gt;
Python programs and using Python modules.  To quit this help utility and&lt;br /&gt;
return to the interpreter, just type &amp;quot;quit&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
...&lt;br /&gt;
&lt;br /&gt;
help&amp;gt; keywords&lt;br /&gt;
&lt;br /&gt;
Here is a list of the Python keywords.  Enter any keyword to get more help.&lt;br /&gt;
&lt;br /&gt;
and                 elif                if                  print&lt;br /&gt;
...&lt;br /&gt;
&lt;br /&gt;
help&amp;gt; if&lt;br /&gt;
The ``if`` statement&lt;br /&gt;
********************&lt;br /&gt;
&lt;br /&gt;
The ``if`` statement is used for conditional execution:&lt;br /&gt;
&lt;br /&gt;
   if_stmt ::= &amp;quot;if&amp;quot; expression &amp;quot;:&amp;quot; suite&lt;br /&gt;
               ( &amp;quot;elif&amp;quot; expression &amp;quot;:&amp;quot; suite )*&lt;br /&gt;
               [&amp;quot;else&amp;quot; &amp;quot;:&amp;quot; suite]&lt;br /&gt;
&lt;br /&gt;
It selects exactly one of the suites by evaluating the expressions one&lt;br /&gt;
by one until one is found to be true...&lt;br /&gt;
...&lt;br /&gt;
&lt;br /&gt;
help&amp;gt; quit&lt;br /&gt;
&lt;br /&gt;
You are now leaving help and returning to the Python interpreter.&lt;br /&gt;
...&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Making a Script=&lt;br /&gt;
&lt;br /&gt;
An interactive session can be fun and useful for trying things out.  However--to save our fingers--we will typically want to execute a series of commands as a script, created using your favourite text editor.  Here are the contents of an example script:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/env python&lt;br /&gt;
&lt;br /&gt;
print &amp;quot;Hello, from a python script!&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
Ensure that your script is executable:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
chmod u+x myscript.py&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and now you can run it:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[ggdagw@bigblue4 ~]$ ./myscript.py &lt;br /&gt;
Hello, from a python script!&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Python and Whitespace=&lt;br /&gt;
&lt;br /&gt;
Love it of hate it, Python incorporates whitespace in it's syntax. (It's either that or demarcate blocks with some other syntax, such as ending a line with a semi-colon as it is in C.  Pick your poison.)  Spacing is therefore key in creating a valid python script.  For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
message = &amp;quot;happy days!&amp;quot;&lt;br /&gt;
if len(message) &amp;gt; 10:&lt;br /&gt;
    print &amp;quot;longer..&amp;quot;&lt;br /&gt;
else:&lt;br /&gt;
    print &amp;quot;shorter..&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
will work, but:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
message = &amp;quot;happy days!&amp;quot;&lt;br /&gt;
if len(message) &amp;gt; 10:&lt;br /&gt;
 print &amp;quot;longer..&amp;quot;&lt;br /&gt;
else:&lt;br /&gt;
print &amp;quot;shorter..&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
will not:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
  File &amp;quot;./myscript.py&amp;quot;, line 7&lt;br /&gt;
    print &amp;quot;shorter..&amp;quot;&lt;br /&gt;
        ^&lt;br /&gt;
IndentationError: expected an indented block&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It is therefore a great advantage, when writing to python script, to use a text editor which has a dedicated python mode--such as '''emacs'''--and will actively help you to keep your spacing correct.  See, http://wiki.python.org/moin/PythonEditors, for an extensive list.&lt;br /&gt;
&lt;br /&gt;
=Some Suggested Exercises=&lt;br /&gt;
&lt;br /&gt;
* Calculate the volume of a sphere (Hint: 4/3*pi*r^3)&lt;br /&gt;
* Concatenate two strings&lt;br /&gt;
* Write a recursive function to compute fibonacci numbers (Hint: F(n) = F(n-1) +F(n-2), F(0)=0 and F(1)=1)&lt;br /&gt;
&lt;br /&gt;
=Nuts and Bolts=&lt;br /&gt;
&lt;br /&gt;
==Types==&lt;br /&gt;
&lt;br /&gt;
Python has intrinsic types including, integers, floats, booleans and complex numbers.  It is dynamically typed (meaning that you don't have to have a block of variable declarations at the top of your script), but it is '''not weakly''' typed, for example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; my_complex = 2 + 0.5j&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; my_complex&lt;br /&gt;
(2+0.5j)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; my_complex.real&lt;br /&gt;
2.0&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; my_complex.imag&lt;br /&gt;
0.5&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; name = 'fred'&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; lucky = 7&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; name + lucky&lt;br /&gt;
Traceback (most recent call last):&lt;br /&gt;
  File &amp;quot;&amp;lt;stdin&amp;gt;&amp;quot;, line 1, in &amp;lt;module&amp;gt;&lt;br /&gt;
TypeError: cannot concatenate 'str' and 'int' objects&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Strings==&lt;br /&gt;
&lt;br /&gt;
The eagle-eyed will have spotted in a previous examples that we could ask the length a character string--straight off the bat.  No need to write a counting routine ourselves:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
message = &amp;quot;happy days!&amp;quot;&lt;br /&gt;
print len(message)&lt;br /&gt;
11&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We also take '''slices''' of our character string.  In my case&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
print message[:5]&lt;br /&gt;
happy&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Since a string is an '''object''' (in the object oriented programming sense of the word, but more of that another time...) we can call a number of methods that operate on a string.  A selected sample include:&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot;&lt;br /&gt;
|| s.find(sub) || Finds the first occurrence of the given substring&lt;br /&gt;
|-&lt;br /&gt;
|| s.islower() || Checks whether all characters are lowercase&lt;br /&gt;
|-&lt;br /&gt;
|| s.upper() || Returns '''s''' converted to uppercase&lt;br /&gt;
|-&lt;br /&gt;
|| s.strip() || Removes leading and trailing whitespace&lt;br /&gt;
|-&lt;br /&gt;
|| s.replace(old,new) || Replaces substring '''old''' with '''new'''&lt;br /&gt;
|-&lt;br /&gt;
|| s.split([sep]) || Splits '''s''' uses (optional) '''sep''' as a delimiter.  Returns a list&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Lists and Tuples==&lt;br /&gt;
&lt;br /&gt;
An example of a list is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
shopping = ['bread', 'marmalade', 'milk', 'tea']&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and we can inquire about the length of that using the same function as before:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
len(shopping)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can also take '''slices''' of a list, as we did with a string:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
shopping[0:2]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and even reset a portion of the list that way:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
shopping[0:2] = ['bagels', 'jam']&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Since a list is also an object, we have more handy methods, including:&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot;&lt;br /&gt;
|| s.append(x) || Appends an new element '''x''' to the end of '''s'''&lt;br /&gt;
|-&lt;br /&gt;
|| s.count(x) || Returns the number of occurences of '''x''' in '''s'''&lt;br /&gt;
|-&lt;br /&gt;
|| s.reverse(x) || Reverses items of '''s''' in place&lt;br /&gt;
|-&lt;br /&gt;
|| s.sort([compfunc]) || Sorts items of '''s''' in place.  '''compfunc''' is an optional comparison function&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Tuples are very similar to lists and support many of the same operations (indexing, slicing, concatenation etc.) but differ in that they are '''not mutable''' after creation:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source land=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; mytuple = ('fred', 'ginger', 7, 2.5)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; mylist = ['fred', 'ginger', 7, 2.5]&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; mylist[2] = 8&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print mylist&lt;br /&gt;
['fred', 'ginger', 8, 2.5]&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print mytuple[2]    &lt;br /&gt;
7&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; mytuple[2] = 8&lt;br /&gt;
Traceback (most recent call last):&lt;br /&gt;
  File &amp;quot;&amp;lt;stdin&amp;gt;&amp;quot;, line 1, in &amp;lt;module&amp;gt;&lt;br /&gt;
TypeError: 'tuple' object does not support item assignment&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
List comprehension:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; numbers = [12, 3, 90, 40, 52, 11, 10]&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; small_numbers_doubled = [number * 2 for number in numbers if number &amp;lt; 20]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; small_numbers_doubled&lt;br /&gt;
[24, 6, 22, 20]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Dictionaries==&lt;br /&gt;
&lt;br /&gt;
A dictionary is an associative array or hash table, containing '''key-value''' pairs:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
mydict = {'thomas':'blue', 'james':'red', 'henry':'green'}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print mydict['james']&lt;br /&gt;
red&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can write much more user-friendly and intuitive code using dictionaries, rather than arbitrary indexes into a list.&lt;br /&gt;
&lt;br /&gt;
Some example dictionary methods are:&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot;&lt;br /&gt;
|| m.keys() || Returns a list of the keys in '''m'''&lt;br /&gt;
|-&lt;br /&gt;
|| m.items() || Returns a list of the (key,value) pairs in '''m'''&lt;br /&gt;
|-&lt;br /&gt;
|| m[k] = x || Sets m[k] to x&lt;br /&gt;
|-&lt;br /&gt;
|| m.update(b) || Adds objects from dictionary '''b''' to '''m'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Control Structures==&lt;br /&gt;
&lt;br /&gt;
Of course, we'll need conditionals and loops etc. to go beyond the simplest of scripts.  Here is an '''if-then-else''', python style:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
if sky == ‘blue’:&lt;br /&gt;
    birds_sing()&lt;br /&gt;
elif sky == ‘black’:&lt;br /&gt;
    birds_sleep()&lt;br /&gt;
else:&lt;br /&gt;
    pass #do nothing&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and a classic '''for loop''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
for ii in range(1,10):&lt;br /&gt;
    print ii&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1&lt;br /&gt;
...&lt;br /&gt;
9&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We'll also see a '''while loop''' shoehorned into the next example.&lt;br /&gt;
&lt;br /&gt;
For our control statements, we can use comparison operators such as, '''==''', '''!=''', '''&amp;gt;''', '''&amp;lt;''', '''&amp;lt;=''', '''&amp;gt;=''', and logical operators, such as, '''and''', '''or''','''not'''&lt;br /&gt;
&lt;br /&gt;
==File Input and Output==&lt;br /&gt;
&lt;br /&gt;
Here's some code for printing the contents of a text file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
fp = open(&amp;quot;foo.txt&amp;quot;,&amp;quot;r&amp;quot;)&lt;br /&gt;
line = fp.readline()&lt;br /&gt;
while line:&lt;br /&gt;
    line = line.strip()&lt;br /&gt;
    print line&lt;br /&gt;
    line = fp.readline()&lt;br /&gt;
fp.close()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We could open a file for writing with:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
fp = open(&amp;quot;foo.txt&amp;quot;,&amp;quot;w&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Command Line Parsing==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
&lt;br /&gt;
import sys&lt;br /&gt;
&lt;br /&gt;
if __name__ == &amp;quot;__main__&amp;quot;:&lt;br /&gt;
    # We can test on the length of argv&lt;br /&gt;
    if len(sys.argv) &amp;lt; 2:&lt;br /&gt;
        print &amp;quot;usage: to use this script...&amp;quot;&lt;br /&gt;
    else:&lt;br /&gt;
        ii = 0&lt;br /&gt;
        for arg in sys.argv:&lt;br /&gt;
            # (typically) argv[0] is bound to the script name&lt;br /&gt;
            print &amp;quot;arg&amp;quot;, ii, &amp;quot;is:&amp;quot;, arg&lt;br /&gt;
            ii = ii+1&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~$ ./cmdline.py&lt;br /&gt;
usage: to use this script...&lt;br /&gt;
gethin@gethin-desktop:~$ ./cmdline.py fred ginger&lt;br /&gt;
arg 0 is: ./cmdline.py&lt;br /&gt;
arg 1 is: fred&lt;br /&gt;
arg 2 is: ginger&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Object Oriented Programming in Python=&lt;br /&gt;
&lt;br /&gt;
Here is an example of using a class in python:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
&lt;br /&gt;
class Radio:&lt;br /&gt;
    &amp;quot;A simple radio&amp;quot;&lt;br /&gt;
    def __init__(self,freq=0.0,name=&amp;quot;&amp;quot;):&lt;br /&gt;
        &amp;quot;Constructor method&amp;quot;&lt;br /&gt;
        self.__frequency=freq&lt;br /&gt;
        self.name=name&lt;br /&gt;
    def tune(self,freq):&lt;br /&gt;
        self.__frequency=freq&lt;br /&gt;
    def tuned_to(self):&lt;br /&gt;
        print self.name, &amp;quot;tuned to:&amp;quot;, self.__frequency&lt;br /&gt;
&lt;br /&gt;
if __name__ == &amp;quot;__main__&amp;quot;:&lt;br /&gt;
    # declare two radio instances&lt;br /&gt;
    car = Radio(name=&amp;quot;car&amp;quot;)&lt;br /&gt;
    kitchen = Radio(91.5,&amp;quot;kitchen&amp;quot;)&lt;br /&gt;
    # call some methods&lt;br /&gt;
    car.tuned_to()&lt;br /&gt;
    kitchen.tuned_to()&lt;br /&gt;
    car.tune(89.3)&lt;br /&gt;
    car.tuned_to()&lt;br /&gt;
    # Docstrings--double quotes at the top of the class:                        &lt;br /&gt;
    print car.__doc__&lt;br /&gt;
    # NB members not private by default:&lt;br /&gt;
    print car.name&lt;br /&gt;
    # BUT leading double underscores will trigger&lt;br /&gt;
    # name mangling and hence the member will be hidden &lt;br /&gt;
    print car.__frequency&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Running the script gives us:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
car tuned to: 0.0&lt;br /&gt;
kitchen tuned to: 91.5&lt;br /&gt;
car tuned to: 89.3&lt;br /&gt;
A simple radio&lt;br /&gt;
car&lt;br /&gt;
Traceback (most recent call last):&lt;br /&gt;
  File &amp;quot;./foo.py&amp;quot;, line 27, in &amp;lt;module&amp;gt;&lt;br /&gt;
    print car.__frequency&lt;br /&gt;
AttributeError: Radio instance has no attribute '__frequency'&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Python for Shell Scripting=&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
from subprocess import call&lt;br /&gt;
call([&amp;quot;ls&amp;quot;, &amp;quot;-l&amp;quot;])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Python as a Glue Languge=&lt;br /&gt;
&lt;br /&gt;
* Calling R from python is possible using: http://rpy.sourceforge.net/index.html.&lt;br /&gt;
* Calling Matlab from python: http://mlabwrap.sourceforge.net.&lt;br /&gt;
* With SWIG you can make many bindings, including Python to C and C++: http://www.swig.org/.&lt;br /&gt;
* Or if Fortran is more your cup-of-tea, you can use f2py: http://cens.ioc.ee/projects/f2py2e/.&lt;br /&gt;
* There are many more examples.&lt;br /&gt;
&lt;br /&gt;
=Using Packages=&lt;br /&gt;
&lt;br /&gt;
Python packages are great because they provide us with a whole lot of extra functionality--above and beyond the core language--that we didn't have to write and debug ourselves.&lt;br /&gt;
&lt;br /&gt;
Let's walk through a simple example using a package.  At an interactive prompt type:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
from random import randint&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will give us access to the '''randint(x,y)''' function, which returns a randomly chosen integer from the given range [x,y]:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
4&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
1&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
3&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
0&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
OK, so far so good.  One thing to note is that the above '''import''' statement has drawn the name ''randint'' into our current '''namespace'''.  What if we had already defined a function named ''randint''.  That could cause problems.  In order to protect ourselves from this kind of problem, there are several import variants.&lt;br /&gt;
&lt;br /&gt;
By default, functions will be added to a namespace with the same name as the package.  In order to call the functions we will, in this case, have to prefix them with there namespace:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; import random&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; random.randint(0,10)&lt;br /&gt;
6&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Should we desire, we can apply a little more control and specify the namespace for the import ourselves: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; import random as rnd&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; rnd.randint(0,10)&lt;br /&gt;
3&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Another--more 'devil-may-care'--approach is to do away with the separate namespace and pull everything from a given package into the current namespace:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from random import *&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
9&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; random()&lt;br /&gt;
0.3172268098313996&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
(The '''random()''' function returns a randomly selected floating point number in the range [0, 1)--that is, between 0 and 1, including 0.0 but always smaller than 1.0.)&lt;br /&gt;
&lt;br /&gt;
==A Namespace Collision==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; def randint():&lt;br /&gt;
...     print &amp;quot;dummy function&amp;quot;&lt;br /&gt;
... &lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint()&lt;br /&gt;
dummy function&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from random import randint&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint()&lt;br /&gt;
Traceback (most recent call last):&lt;br /&gt;
  File &amp;quot;&amp;lt;stdin&amp;gt;&amp;quot;, line 1, in &amp;lt;module&amp;gt;&lt;br /&gt;
TypeError: randint() takes exactly 3 arguments (1 given)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
0&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Databases=&lt;br /&gt;
&lt;br /&gt;
==Simple Databases==&lt;br /&gt;
&lt;br /&gt;
Python provides access to some database packages through some standard packages.  The '''bsddb''' module allows you to access the highly popular '''Berkeley DB database''' from your python code.&lt;br /&gt;
&lt;br /&gt;
The interface to the database provided by this module is very similar to the way in which we access a dictionary.  First, let's populate a database:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
import bsddb&lt;br /&gt;
d = bsddb.btopen('engines.db')&lt;br /&gt;
d['thomas'] = 'blue'&lt;br /&gt;
d['james'] = 'red'&lt;br /&gt;
d['henry'] = 'green'&lt;br /&gt;
d.close()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now let's open the database again and query it's contents:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; d = bsddb.btopen('engines.db')&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; d.keys()&lt;br /&gt;
['henry', 'james', 'thomas']&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; d.first()&lt;br /&gt;
('henry', 'green')&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; d.last()&lt;br /&gt;
('thomas', 'blue')&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; colour = d['james']&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; colour&lt;br /&gt;
'red'&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; del d['henry']&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; d.keys()&lt;br /&gt;
['james', 'thomas']&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Relational Databases==&lt;br /&gt;
&lt;br /&gt;
Relational databases give us more oomph.  '''SQLite''' is a useful relational database to consider as it is light, in that it requires hardly anything in terms of setup or management, yet still understands queries formulated in SQL.  As such it is useful for creating relatively simple examples of SQL access to a database in python and is a stepping stone toward more powerful database packages.&lt;br /&gt;
&lt;br /&gt;
Here is a script which will create a table called '''planets''' in the file '''pytest.db''' and populate with details of the planets in our solar system:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
#&lt;br /&gt;
# Example python script using sqlite3 package&lt;br /&gt;
# to connect to an SQLite database.&lt;br /&gt;
#&lt;br /&gt;
&lt;br /&gt;
import sqlite3&lt;br /&gt;
 &lt;br /&gt;
conn = sqlite3.connect('pytest.db') # or use :memory: to put it in RAM&lt;br /&gt;
&lt;br /&gt;
cursor = conn.cursor()&lt;br /&gt;
 &lt;br /&gt;
# create a table&lt;br /&gt;
cursor.execute(&amp;quot;&amp;quot;&amp;quot;CREATE TABLE planets&lt;br /&gt;
                  (Id INT, Name TEXT, Diameter REAL, &lt;br /&gt;
                   Mass REAL, Orbital_Period REAL)&amp;quot;&amp;quot;&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
# insert a single record&lt;br /&gt;
cursor.execute(&amp;quot;INSERT INTO planets VALUES(1,'Mercury',0.382,0.06,0.24)&amp;quot;)&lt;br /&gt;
conn.commit() # save data to file&lt;br /&gt;
 &lt;br /&gt;
# insert multiple records&lt;br /&gt;
other_planets = [(2,'Venus',0.949,0.82,0.72),&lt;br /&gt;
                 (3,'Earth',1.0,1.0,1.0),&lt;br /&gt;
                 (4,'Mars',0.532,0.11,1.52),&lt;br /&gt;
                 (5,'Jupiter',11.209,317.8,5.20),&lt;br /&gt;
                 (6,'Saturn',9.449,95.2,9.54),&lt;br /&gt;
                 (7,'Uranus',4.007,14.6,19.22),&lt;br /&gt;
                 (8,'Neptune',3.883,17.2,30.06),&lt;br /&gt;
                 (9,'Pluto',0.18,0.002,248.09)]&lt;br /&gt;
cursor.executemany(&amp;quot;INSERT INTO planets VALUES (?,?,?,?,?)&amp;quot;, other_planets)&lt;br /&gt;
conn.commit() # save data to file&lt;br /&gt;
&lt;br /&gt;
# delete a record&lt;br /&gt;
sql = &amp;quot;&amp;quot;&amp;quot;&lt;br /&gt;
DELETE FROM planets&lt;br /&gt;
WHERE Name = 'Pluto'&lt;br /&gt;
&amp;quot;&amp;quot;&amp;quot;&lt;br /&gt;
cursor.execute(sql)  # poor old pluto! &lt;br /&gt;
conn.commit()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And here is a short example script showing a couple of ways to interrogate the database: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
#&lt;br /&gt;
# Example python script using sqlite3 package&lt;br /&gt;
# to connect to an SQLite database.&lt;br /&gt;
#&lt;br /&gt;
&lt;br /&gt;
import sqlite3&lt;br /&gt;
 &lt;br /&gt;
conn = sqlite3.connect('pytest.db') # or use :memory: to put it in RAM&lt;br /&gt;
&lt;br /&gt;
cursor = conn.cursor()&lt;br /&gt;
&lt;br /&gt;
print &amp;quot;All the records in the table, ordered by Name:\n&amp;quot;&lt;br /&gt;
for row in cursor.execute(&amp;quot;SELECT rowid, * FROM planets ORDER BY Name&amp;quot;):&lt;br /&gt;
    print row&lt;br /&gt;
&lt;br /&gt;
print &amp;quot;\n&amp;quot;&lt;br /&gt;
&lt;br /&gt;
print &amp;quot;All the planets with a mass greater than or equal to that of Earth:\n&amp;quot;&lt;br /&gt;
sql = &amp;quot;SELECT * FROM planets WHERE Mass&amp;gt;=?&amp;quot;&lt;br /&gt;
cursor.execute(sql, [(&amp;quot;1.0&amp;quot;)])&lt;br /&gt;
for row in cursor.fetchall():  # or use fetchone()&lt;br /&gt;
    print row&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the results of running the script are:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
All the records in the table, ordered by Name:&lt;br /&gt;
&lt;br /&gt;
(3, 3, u'Earth', 1.0, 1.0, 1.0)&lt;br /&gt;
(5, 5, u'Jupiter', 11.209, 317.80000000000001, 5.2000000000000002)&lt;br /&gt;
(4, 4, u'Mars', 0.53200000000000003, 0.11, 1.52)&lt;br /&gt;
(1, 1, u'Mercury', 0.38200000000000001, 0.059999999999999998, 0.23999999999999999)&lt;br /&gt;
(8, 8, u'Neptune', 3.883, 17.199999999999999, 30.059999999999999)&lt;br /&gt;
(6, 6, u'Saturn', 9.4489999999999998, 95.200000000000003, 9.5399999999999991)&lt;br /&gt;
(7, 7, u'Uranus', 4.0069999999999997, 14.6, 19.219999999999999)&lt;br /&gt;
(2, 2, u'Venus', 0.94899999999999995, 0.81999999999999995, 0.71999999999999997)&lt;br /&gt;
&lt;br /&gt;
All the planets with a mass greater than or equal to that of Earth:&lt;br /&gt;
&lt;br /&gt;
(3, u'Earth', 1.0, 1.0, 1.0),&lt;br /&gt;
(5, u'Jupiter', 11.209, 317.80000000000001, 5.2000000000000002),&lt;br /&gt;
(6, u'Saturn', 9.4489999999999998, 95.200000000000003, 9.5399999999999991),&lt;br /&gt;
(7, u'Uranus', 4.0069999999999997, 14.6, 19.219999999999999),&lt;br /&gt;
(8, u'Neptune', 3.883, 17.199999999999999, 30.059999999999999)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more information on using SQLite with Python, see, e.g.:&lt;br /&gt;
* http://zetcode.com/db/sqlitepythontutorial/&lt;br /&gt;
* http://www.blog.pythonlibrary.org/2012/07/18/python-a-simple-step-by-step-sqlite-tutorial/&lt;br /&gt;
&lt;br /&gt;
You can also connect to a MySQL database from python using, e.g. the [http://mysql-python.sourceforge.net/ python-mysqldb] package.  A snippet of python code for connecting to a database is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
import MySQLdb&lt;br /&gt;
&lt;br /&gt;
conn = MySQLdb.connect(host=&amp;quot;localhost&amp;quot;,   # your host, usually localhost&lt;br /&gt;
                     user=&amp;quot;gethin&amp;quot;,      # your username&lt;br /&gt;
                      passwd=&amp;quot;changeme&amp;quot;, # your password&lt;br /&gt;
                      db=&amp;quot;menagerie&amp;quot;)    # name of the data base&lt;br /&gt;
&lt;br /&gt;
# Create a cursor object, as before with SQLite&lt;br /&gt;
cur = conn.cursor() &lt;br /&gt;
&lt;br /&gt;
# and then you can submit your SQL command:&lt;br /&gt;
cur.execute(&amp;quot;SELECT * FROM YOUR_TABLE_NAME&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Numpy=&lt;br /&gt;
&lt;br /&gt;
OK, let's move onto looking at python's numerical processing capabilities.  We will start by looking at the '''numpy''' package:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
from numpy import *&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now that we have access to the functions from '''numpy''', let's create an array.  '''Note that a numpy array is an object of a different type to an intrinsic array in Python'''.   A simple approach is to use the '''array''' function.  For example we might enter:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
a = array([[1.0,0.0,0.0],[0.0,1.0,0.0],[0.0,0.0,1.0]])&lt;br /&gt;
b = array([[1,2,3],[4,5,6],[7,8,9]])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a&lt;br /&gt;
array([[ 1.,  0.,  0.],&lt;br /&gt;
       [ 0.,  1.,  0.],&lt;br /&gt;
       [ 0.,  0.,  1.]])&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; b        &lt;br /&gt;
array([[1, 2, 3],&lt;br /&gt;
       [4, 5, 6],&lt;br /&gt;
       [7, 8, 9]])&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; transpose(b)&lt;br /&gt;
array([[1, 4, 7],&lt;br /&gt;
       [2, 5, 8],&lt;br /&gt;
       [3, 6, 9]])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Given an array, we may inquire about it's shape:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
print a.shape&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and we are told that it is a 2-dimensional array (i.e. an array of rank 2) and that the length of both dimensions is 3:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
(3, 3)&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
We can also apply operators to array objects.  For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
a = a * 9&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
array([[ 9.,  0.,  0.],&lt;br /&gt;
       [ 0.,  9.,  0.],&lt;br /&gt;
       [ 0.,  0.,  9.]])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Note, however, that most operations on numpy arrays are done element-wise''', which is '''different to a linear algebra operation that you may have been expecting.'''  We will return to linear algebra operations when we look at the '''scipy''' package.&lt;br /&gt;
&lt;br /&gt;
Should we so desire, we could re-shape the array.  One way to do this is to to set it's shape attribute directly:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a.shape = (1,9)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a&lt;br /&gt;
array([[ 9.,  0.,  0.,  0.,  9.,  0.,  0.,  0.,  9.]])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As with the list example, it can be useful to read or change the value of an element (or sub array) individually.  Let's turn the array back to it's rank-2 form and try it out:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a.shape = (3,3)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a[1,1] = 777.0&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print a&lt;br /&gt;
[[   9.    0.    0.]&lt;br /&gt;
 [   0.  777.    0.]&lt;br /&gt;
 [   0.    0.    9.]]&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a[1:,1:] = [[777.0, 777.0],[777.0, 777.0]]&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print a&lt;br /&gt;
[[   9.    0.    0.]&lt;br /&gt;
 [   0.  777.  777.]&lt;br /&gt;
 [   0.  777.  777.]]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This is all pretty handy so far, but specifying the value of each element explicitly could become a chore.  Happily some helper functions exist to give you a head start with some building blocks.  For example, your can use:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; b = zeros((3,3))&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print b&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; b = ones((3,2))&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print b&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; b = identity(2)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print b&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; big = resize(b, (6,6))&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print big&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The use of '''resize''' in the last example illustrates a useful '''replicating feature'''.&lt;br /&gt;
&lt;br /&gt;
A list of all the functions and operations contained within numpy is: http://scipy.org/Numpy_Example_List.&lt;br /&gt;
&lt;br /&gt;
=Pylab and Matplotlib=&lt;br /&gt;
&lt;br /&gt;
The above examples are quite natty, but we have deliberately kept the array sizes small so that we can print the element values easily.  In practice, you may find that your array sizes are much larger and printing the values to the screen is impractical.  Fear not!  Python has many packages which help you plot your data, so that you can explore it.&lt;br /&gt;
&lt;br /&gt;
Using the pylab plotting interface we can create:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
import pylab&lt;br /&gt;
from numpy import arange, pi, cos, sin, add, sqrt&lt;br /&gt;
t = arange(0.0, 3.0, 0.01)&lt;br /&gt;
c = cos(2 * pi * t)&lt;br /&gt;
s = sin(2 * pi * t)&lt;br /&gt;
pylab.ylabel('some numbers')&lt;br /&gt;
pylab.xlabel('some more numbers')&lt;br /&gt;
pylab.plot(t, c, 'r', lw=2)&lt;br /&gt;
pylab.plot(t, s, 'b', lw=2)&lt;br /&gt;
pylab.plot(t, c-s, 'gs', lw=2)&lt;br /&gt;
pylab.ylim(-1.5, 1.5)&lt;br /&gt;
pylab.title('sin and cos functions')&lt;br /&gt;
pylab.savefig('curves', dpi=300)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where '''curves.png''' looks like:&lt;br /&gt;
&lt;br /&gt;
[[Image:Curves.png|thumb|600px|none|Some nice curves]]&lt;br /&gt;
&lt;br /&gt;
You can open .png images from the linux command line (inc. bluecrystal) using, e.g.: '''display -resize 1000 curves.png''' &lt;br /&gt;
&lt;br /&gt;
We can also use Matplotlib directly for more control:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
import matplotlib.pyplot as plt&lt;br /&gt;
from pylab import meshgrid&lt;br /&gt;
from numpy import arange, add, sin, sqrt&lt;br /&gt;
x = arange(-5,10)&lt;br /&gt;
y = arange(-4,11)&lt;br /&gt;
z1 = sqrt(add.outer(x**2,y**2))&lt;br /&gt;
Z = sin(z1)/z1 &lt;br /&gt;
X, Y = meshgrid(x,y)&lt;br /&gt;
plt.figure()&lt;br /&gt;
plt.contour(X,Y,Z)&lt;br /&gt;
plt.show()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and you should get a window similar to:&lt;br /&gt;
&lt;br /&gt;
[[Image:Sinc-matplotlib-contour.png|thumb|600px|none|A contour map of the sinc function]]&lt;br /&gt;
&lt;br /&gt;
Perhaps the best way next step for matplotlib is to look at the gallery: http://matplotlib.org/gallery.html.&lt;br /&gt;
Just click on a figure and you will get the code used to generate it--a really great resource!&lt;br /&gt;
&lt;br /&gt;
==Input and Output==&lt;br /&gt;
&lt;br /&gt;
The foregoing is all very interesting, but life would be rather dull if you had to re-enter all your data by hand whenever you set to work with Python and numpy.  Therefore we need a means to save data to a file and load it again.  Happily, we can do this rather easily using a couple of routines from the '''pylab''' package:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from numpy import *&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from pylab import load&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from pylab import save&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; data = zeros((3,3))&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; save('myfile.txt', data)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; read_data = load(&amp;quot;myfile.txt&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''warning, the load() function of numpy will be shadowed''' in the above example.  One way to protect yourself against this is to make use of '''namespaces''':  Modify your import command to '''import pylab''' and then use '''pylab.load(..)'''.&lt;br /&gt;
&lt;br /&gt;
=Scipy=&lt;br /&gt;
&lt;br /&gt;
* http://www.scipy.org/&lt;br /&gt;
* ..and good examples on http://scipy-lectures.github.com/intro/scipy.html&lt;br /&gt;
* Many useful features:&lt;br /&gt;
* Integration &amp;amp; Differentiation&lt;br /&gt;
* Optimisation (curve fitting, etc)&lt;br /&gt;
* Fourier transforms&lt;br /&gt;
* Signal processing&lt;br /&gt;
* Statistical algorithms&lt;br /&gt;
* Much, much more...&lt;br /&gt;
* If you know Python you can use SciPy&lt;br /&gt;
&lt;br /&gt;
==An example: Differentiation==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; # derivative of x^2 at x=3&lt;br /&gt;
...&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from scipy import derivative&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; derivative(lambda x: x**2, 3)&lt;br /&gt;
6.0&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; # also works with arrays&lt;br /&gt;
...&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from numpy import array&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; my_array = array([1,2,3])&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; derivative(lambda x: x**2,my_array)&lt;br /&gt;
array([ 2., 4., 6.])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Google for many more examples pertaining to your favourite numerical procedure!&lt;br /&gt;
&lt;br /&gt;
=A Repository of Packages You Could Use=&lt;br /&gt;
&lt;br /&gt;
Now, we've touched on a couple, but there are thousands of python packages available.  Before you start writing your own function for X, check that someone hasn't contributed code for that already at http://pypi.python.org/pypi.&lt;br /&gt;
&lt;br /&gt;
'''pip''', the python package manager will look in pypi by default to install a package.  You can use the '''--user''' option to install python packages in your own user space.  See:&lt;br /&gt;
* https://pip.readthedocs.org/en/latest/&lt;br /&gt;
for more information on pip.&lt;br /&gt;
&lt;br /&gt;
=Writing Faster Python=&lt;br /&gt;
&lt;br /&gt;
As with other scripting languages, such as MATLAB and R, one of the simplest ways in which you can write faster python code is to eliminate loops by vectorising your code.&lt;br /&gt;
&lt;br /&gt;
Consider the following two scripts.  First '''for-loop.py''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
&lt;br /&gt;
import numpy as np&lt;br /&gt;
arr = np.random.rand(1000000)&lt;br /&gt;
&lt;br /&gt;
def filter(arr):&lt;br /&gt;
    for i, val in enumerate(arr):&lt;br /&gt;
        if val &amp;lt; 0.5:&lt;br /&gt;
            arr[i] = 0&lt;br /&gt;
    return arr&lt;br /&gt;
&lt;br /&gt;
if __name__ == &amp;quot;__main__&amp;quot;:&lt;br /&gt;
    filter(arr)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and secondly, '''vectorised.py''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
&lt;br /&gt;
import numpy as np&lt;br /&gt;
arr = np.random.rand(1000000)&lt;br /&gt;
&lt;br /&gt;
def filter(arr):&lt;br /&gt;
    arr[arr &amp;lt; 0.5] = 0&lt;br /&gt;
    return arr&lt;br /&gt;
&lt;br /&gt;
if __name__ == &amp;quot;__main__&amp;quot;:&lt;br /&gt;
    filter(arr)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If we now run these two scripts through the Linux command line '''time''' utility, we see that the vectorised code runs a lot faster than the for loop:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~$ time ./for-loop.py &lt;br /&gt;
&lt;br /&gt;
real	0m0.963s&lt;br /&gt;
user	0m0.952s&lt;br /&gt;
sys	0m0.012s&lt;br /&gt;
gethin@gethin-desktop:~$ time ./vectorised.py &lt;br /&gt;
&lt;br /&gt;
real	0m0.116s&lt;br /&gt;
user	0m0.096s&lt;br /&gt;
sys	0m0.020s&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For some more tips on writing faster python code, and examples of how to use one of the python profiler modules, take a look at:&lt;br /&gt;
* https://wiki.python.org/moin/PythonSpeed/PerformanceTips&lt;br /&gt;
* http://technicaldiscovery.blogspot.co.uk/2011/06/speeding-up-python-numpy-cython-and.html&lt;br /&gt;
* http://www.huyng.com/posts/python-performance-analysis/&lt;br /&gt;
* http://www.appneta.com/2012/05/21/profiling-python-performance-lineprof-statprof-cprofile/&lt;br /&gt;
&lt;br /&gt;
=Further Reading=&lt;br /&gt;
&lt;br /&gt;
* http://docs.python.org/tutorial/&lt;br /&gt;
* http://wiki.python.org/moin/PythonBooks&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=Python1&amp;diff=9418</id>
		<title>Python1</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=Python1&amp;diff=9418"/>
		<updated>2014-03-31T10:04:34Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: /* Python as a Glue Languge */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[category:Pragmatic Programming]]&lt;br /&gt;
'''Python for Scientists'''&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
[[Image:Python.png|thumb|1100px|none|http://xkcd.com/353/]]&lt;br /&gt;
&lt;br /&gt;
With thanks to Simon Metson and Mike Wallace for much of the following material.&lt;br /&gt;
&lt;br /&gt;
=Getting Started on BlueCrystal Phase-2=&lt;br /&gt;
&lt;br /&gt;
After you have logged in, type the following at the command line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module add languages/python-2.7.2.0&lt;br /&gt;
python&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This should start up an interactive python session:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Python 2.7.2 (default, Aug 25 2011, 10:51:03) &lt;br /&gt;
[GCC 4.3.3] on linux2&lt;br /&gt;
Type &amp;quot;help&amp;quot;, &amp;quot;copyright&amp;quot;, &amp;quot;credits&amp;quot; or &amp;quot;license&amp;quot; for more information.&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where we can type commands at the '''&amp;gt;&amp;gt;&amp;gt;''' prompt.&lt;br /&gt;
&lt;br /&gt;
=Python as a Calculator=&lt;br /&gt;
&lt;br /&gt;
To get started, let's just try a few commands out.  If you type:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print &amp;quot;Hello!&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
you'll get:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hello!&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you try:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print 5 + 9&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
you'll get:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
14&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far so simple!  Here is a copy of a session containing a few more commands where we've set the values of some variables and also defined and run our own function: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; five = 5&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; neuf = 9&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print five + neuf&lt;br /&gt;
14&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; def say_hello():&lt;br /&gt;
...     print &amp;quot;Hello, world!&amp;quot;&lt;br /&gt;
... # hit return here &lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; say_hello()&lt;br /&gt;
Hello, world!&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can exit an interactive session at any time by typing '''Ctrl-D'''.&lt;br /&gt;
&lt;br /&gt;
=Getting Help=&lt;br /&gt;
&lt;br /&gt;
One of the good things about Python is that it has lots of useful online documentation.  ([[A_Good_Read|There are good books on the language too]].)  For example, take a look at: http://docs.python.org/.  You can also type '''help()''' and the interpreter prompt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; help()&lt;br /&gt;
&lt;br /&gt;
Welcome to Python 2.7!  This is the online help utility.&lt;br /&gt;
&lt;br /&gt;
If this is your first time using Python, you should definitely check out&lt;br /&gt;
the tutorial on the Internet at http://docs.python.org/tutorial/.&lt;br /&gt;
&lt;br /&gt;
Enter the name of any module, keyword, or topic to get help on writing&lt;br /&gt;
Python programs and using Python modules.  To quit this help utility and&lt;br /&gt;
return to the interpreter, just type &amp;quot;quit&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
...&lt;br /&gt;
&lt;br /&gt;
help&amp;gt; keywords&lt;br /&gt;
&lt;br /&gt;
Here is a list of the Python keywords.  Enter any keyword to get more help.&lt;br /&gt;
&lt;br /&gt;
and                 elif                if                  print&lt;br /&gt;
...&lt;br /&gt;
&lt;br /&gt;
help&amp;gt; if&lt;br /&gt;
The ``if`` statement&lt;br /&gt;
********************&lt;br /&gt;
&lt;br /&gt;
The ``if`` statement is used for conditional execution:&lt;br /&gt;
&lt;br /&gt;
   if_stmt ::= &amp;quot;if&amp;quot; expression &amp;quot;:&amp;quot; suite&lt;br /&gt;
               ( &amp;quot;elif&amp;quot; expression &amp;quot;:&amp;quot; suite )*&lt;br /&gt;
               [&amp;quot;else&amp;quot; &amp;quot;:&amp;quot; suite]&lt;br /&gt;
&lt;br /&gt;
It selects exactly one of the suites by evaluating the expressions one&lt;br /&gt;
by one until one is found to be true...&lt;br /&gt;
...&lt;br /&gt;
&lt;br /&gt;
help&amp;gt; quit&lt;br /&gt;
&lt;br /&gt;
You are now leaving help and returning to the Python interpreter.&lt;br /&gt;
...&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Making a Script=&lt;br /&gt;
&lt;br /&gt;
An interactive session can be fun and useful for trying things out.  However--to save our fingers--we will typically want to execute a series of commands as a script, created using your favourite text editor.  Here are the contents of an example script:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/env python&lt;br /&gt;
&lt;br /&gt;
print &amp;quot;Hello, from a python script!&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
Ensure that your script is executable:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
chmod u+x myscript.py&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and now you can run it:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[ggdagw@bigblue4 ~]$ ./myscript.py &lt;br /&gt;
Hello, from a python script!&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Python and Whitespace=&lt;br /&gt;
&lt;br /&gt;
Love it of hate it, Python incorporates whitespace in it's syntax. (It's either that or demarcate blocks with some other syntax, such as ending a line with a semi-colon as it is in C.  Pick your poison.)  Spacing is therefore key in creating a valid python script.  For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
message = &amp;quot;happy days!&amp;quot;&lt;br /&gt;
if len(message) &amp;gt; 10:&lt;br /&gt;
    print &amp;quot;longer..&amp;quot;&lt;br /&gt;
else:&lt;br /&gt;
    print &amp;quot;shorter..&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
will work, but:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
message = &amp;quot;happy days!&amp;quot;&lt;br /&gt;
if len(message) &amp;gt; 10:&lt;br /&gt;
 print &amp;quot;longer..&amp;quot;&lt;br /&gt;
else:&lt;br /&gt;
print &amp;quot;shorter..&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
will not:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
  File &amp;quot;./myscript.py&amp;quot;, line 7&lt;br /&gt;
    print &amp;quot;shorter..&amp;quot;&lt;br /&gt;
        ^&lt;br /&gt;
IndentationError: expected an indented block&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It is therefore a great advantage, when writing to python script, to use a text editor which has a dedicated python mode--such as '''emacs'''--and will actively help you to keep your spacing correct.  See, http://wiki.python.org/moin/PythonEditors, for an extensive list.&lt;br /&gt;
&lt;br /&gt;
=Some Suggested Exercises=&lt;br /&gt;
&lt;br /&gt;
* Calculate the volume of a sphere (Hint: 4/3*pi*r^3)&lt;br /&gt;
* Concatenate two strings&lt;br /&gt;
* Write a recursive function to compute fibonacci numbers (Hint: F(n) = F(n-1) +F(n-2), F(0)=0 and F(1)=1)&lt;br /&gt;
&lt;br /&gt;
=Nuts and Bolts=&lt;br /&gt;
&lt;br /&gt;
==Types==&lt;br /&gt;
&lt;br /&gt;
Python has intrinsic types including, integers, floats, booleans and complex numbers.  It is dynamically typed (meaning that you don't have to have a block of variable declarations at the top of your script), but it is '''not weakly''' typed, for example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; my_complex = 2 + 0.5j&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; my_complex&lt;br /&gt;
(2+0.5j)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; my_complex.real&lt;br /&gt;
2.0&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; my_complex.imag&lt;br /&gt;
0.5&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; name = 'fred'&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; lucky = 7&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; name + lucky&lt;br /&gt;
Traceback (most recent call last):&lt;br /&gt;
  File &amp;quot;&amp;lt;stdin&amp;gt;&amp;quot;, line 1, in &amp;lt;module&amp;gt;&lt;br /&gt;
TypeError: cannot concatenate 'str' and 'int' objects&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Strings==&lt;br /&gt;
&lt;br /&gt;
The eagle-eyed will have spotted in a previous examples that we could ask the length a character string--straight off the bat.  No need to write a counting routine ourselves:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
message = &amp;quot;happy days!&amp;quot;&lt;br /&gt;
print len(message)&lt;br /&gt;
11&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We also take '''slices''' of our character string.  In my case&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
print message[:5]&lt;br /&gt;
happy&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Since a string is an '''object''' (in the object oriented programming sense of the word, but more of that another time...) we can call a number of methods that operate on a string.  A selected sample include:&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot;&lt;br /&gt;
|| s.find(sub) || Finds the first occurrence of the given substring&lt;br /&gt;
|-&lt;br /&gt;
|| s.islower() || Checks whether all characters are lowercase&lt;br /&gt;
|-&lt;br /&gt;
|| s.upper() || Returns '''s''' converted to uppercase&lt;br /&gt;
|-&lt;br /&gt;
|| s.strip() || Removes leading and trailing whitespace&lt;br /&gt;
|-&lt;br /&gt;
|| s.replace(old,new) || Replaces substring '''old''' with '''new'''&lt;br /&gt;
|-&lt;br /&gt;
|| s.split([sep]) || Splits '''s''' uses (optional) '''sep''' as a delimiter.  Returns a list&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Lists and Tuples==&lt;br /&gt;
&lt;br /&gt;
An example of a list is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
shopping = ['bread', 'marmalade', 'milk', 'tea']&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and we can inquire about the length of that using the same function as before:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
len(shopping)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can also take '''slices''' of a list, as we did with a string:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
shopping[0:2]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and even reset a portion of the list that way:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
shopping[0:2] = ['bagels', 'jam']&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Since a list is also an object, we have more handy methods, including:&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot;&lt;br /&gt;
|| s.append(x) || Appends an new element '''x''' to the end of '''s'''&lt;br /&gt;
|-&lt;br /&gt;
|| s.count(x) || Returns the number of occurences of '''x''' in '''s'''&lt;br /&gt;
|-&lt;br /&gt;
|| s.reverse(x) || Reverses items of '''s''' in place&lt;br /&gt;
|-&lt;br /&gt;
|| s.sort([compfunc]) || Sorts items of '''s''' in place.  '''compfunc''' is an optional comparison function&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Tuples are very similar to lists and support many of the same operations (indexing, slicing, concatenation etc.) but differ in that they are '''not mutable''' after creation:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source land=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; mytuple = ('fred', 'ginger', 7, 2.5)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; mylist = ['fred', 'ginger', 7, 2.5]&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; mylist[2] = 8&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print mylist&lt;br /&gt;
['fred', 'ginger', 8, 2.5]&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print mytuple[2]    &lt;br /&gt;
7&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; mytuple[2] = 8&lt;br /&gt;
Traceback (most recent call last):&lt;br /&gt;
  File &amp;quot;&amp;lt;stdin&amp;gt;&amp;quot;, line 1, in &amp;lt;module&amp;gt;&lt;br /&gt;
TypeError: 'tuple' object does not support item assignment&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
List comprehension:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; numbers = [12, 3, 90, 40, 52, 11, 10]&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; small_numbers_doubled = [number * 2 for number in numbers if number &amp;lt; 20]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; small_numbers_doubled&lt;br /&gt;
[24, 6, 22, 20]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Dictionaries==&lt;br /&gt;
&lt;br /&gt;
A dictionary is an associative array or hash table, containing '''key-value''' pairs:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
mydict = {'thomas':'blue', 'james':'red', 'henry':'green'}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print mydict['james']&lt;br /&gt;
red&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can write much more user-friendly and intuitive code using dictionaries, rather than arbitrary indexes into a list.&lt;br /&gt;
&lt;br /&gt;
Some example dictionary methods are:&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot;&lt;br /&gt;
|| m.keys() || Returns a list of the keys in '''m'''&lt;br /&gt;
|-&lt;br /&gt;
|| m.items() || Returns a list of the (key,value) pairs in '''m'''&lt;br /&gt;
|-&lt;br /&gt;
|| m[k] = x || Sets m[k] to x&lt;br /&gt;
|-&lt;br /&gt;
|| m.update(b) || Adds objects from dictionary '''b''' to '''m'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Control Structures==&lt;br /&gt;
&lt;br /&gt;
Of course, we'll need conditionals and loops etc. to go beyond the simplest of scripts.  Here is an '''if-then-else''', python style:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
if sky == ‘blue’:&lt;br /&gt;
    birds_sing()&lt;br /&gt;
elif sky == ‘black’:&lt;br /&gt;
    birds_sleep()&lt;br /&gt;
else:&lt;br /&gt;
    pass #do nothing&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and a classic '''for loop''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
for ii in range(1,10):&lt;br /&gt;
    print ii&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1&lt;br /&gt;
...&lt;br /&gt;
9&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We'll also see a '''while loop''' shoehorned into the next example.&lt;br /&gt;
&lt;br /&gt;
For our control statements, we can use comparison operators such as, '''==''', '''!=''', '''&amp;gt;''', '''&amp;lt;''', '''&amp;lt;=''', '''&amp;gt;=''', and logical operators, such as, '''and''', '''or''','''not'''&lt;br /&gt;
&lt;br /&gt;
==File Input and Output==&lt;br /&gt;
&lt;br /&gt;
Here's some code for printing the contents of a text file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
fp = open(&amp;quot;foo.txt&amp;quot;,&amp;quot;r&amp;quot;)&lt;br /&gt;
line = fp.readline()&lt;br /&gt;
while line:&lt;br /&gt;
    line = line.strip()&lt;br /&gt;
    print line&lt;br /&gt;
    line = fp.readline()&lt;br /&gt;
fp.close()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We could open a file for writing with:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
fp = open(&amp;quot;foo.txt&amp;quot;,&amp;quot;w&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Command Line Parsing==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
&lt;br /&gt;
import sys&lt;br /&gt;
&lt;br /&gt;
if __name__ == &amp;quot;__main__&amp;quot;:&lt;br /&gt;
    # We can test on the length of argv&lt;br /&gt;
    if len(sys.argv) &amp;lt; 2:&lt;br /&gt;
        print &amp;quot;usage: to use this script...&amp;quot;&lt;br /&gt;
    else:&lt;br /&gt;
        ii = 0&lt;br /&gt;
        for arg in sys.argv:&lt;br /&gt;
            # (typically) argv[0] is bound to the script name&lt;br /&gt;
            print &amp;quot;arg&amp;quot;, ii, &amp;quot;is:&amp;quot;, arg&lt;br /&gt;
            ii = ii+1&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~$ ./cmdline.py&lt;br /&gt;
usage: to use this script...&lt;br /&gt;
gethin@gethin-desktop:~$ ./cmdline.py fred ginger&lt;br /&gt;
arg 0 is: ./cmdline.py&lt;br /&gt;
arg 1 is: fred&lt;br /&gt;
arg 2 is: ginger&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Object Oriented Programming in Python=&lt;br /&gt;
&lt;br /&gt;
Here is an example of using a class in python:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
&lt;br /&gt;
class Radio:&lt;br /&gt;
    &amp;quot;A simple radio&amp;quot;&lt;br /&gt;
    def __init__(self,freq=0.0,name=&amp;quot;&amp;quot;):&lt;br /&gt;
        &amp;quot;Constructor method&amp;quot;&lt;br /&gt;
        self.__frequency=freq&lt;br /&gt;
        self.name=name&lt;br /&gt;
    def tune(self,freq):&lt;br /&gt;
        self.__frequency=freq&lt;br /&gt;
    def tuned_to(self):&lt;br /&gt;
        print self.name, &amp;quot;tuned to:&amp;quot;, self.__frequency&lt;br /&gt;
&lt;br /&gt;
if __name__ == &amp;quot;__main__&amp;quot;:&lt;br /&gt;
    # declare two radio instances&lt;br /&gt;
    car = Radio(name=&amp;quot;car&amp;quot;)&lt;br /&gt;
    kitchen = Radio(91.5,&amp;quot;kitchen&amp;quot;)&lt;br /&gt;
    # call some methods&lt;br /&gt;
    car.tuned_to()&lt;br /&gt;
    kitchen.tuned_to()&lt;br /&gt;
    car.tune(89.3)&lt;br /&gt;
    car.tuned_to()&lt;br /&gt;
    # Docstrings--double quotes at the top of the class:                        &lt;br /&gt;
    print car.__doc__&lt;br /&gt;
    # NB members not private by default:&lt;br /&gt;
    print car.name&lt;br /&gt;
    # BUT leading double underscores will trigger&lt;br /&gt;
    # name mangling and hence the member will be hidden &lt;br /&gt;
    print car.__frequency&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Running the script gives us:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
car tuned to: 0.0&lt;br /&gt;
kitchen tuned to: 91.5&lt;br /&gt;
car tuned to: 89.3&lt;br /&gt;
A simple radio&lt;br /&gt;
car&lt;br /&gt;
Traceback (most recent call last):&lt;br /&gt;
  File &amp;quot;./foo.py&amp;quot;, line 27, in &amp;lt;module&amp;gt;&lt;br /&gt;
    print car.__frequency&lt;br /&gt;
AttributeError: Radio instance has no attribute '__frequency'&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Python for Shell Scripting=&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
from subprocess import call&lt;br /&gt;
call([&amp;quot;ls&amp;quot;, &amp;quot;-l&amp;quot;])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Python as a Glue Languge=&lt;br /&gt;
&lt;br /&gt;
* Calling R from python is possible using: http://rpy.sourceforge.net/index.html.&lt;br /&gt;
* Calling Matlab from python: http://mlabwrap.sourceforge.net.&lt;br /&gt;
* With SWIG you can make many bindings, including Python to C and C++: http://www.swig.org/.&lt;br /&gt;
* Or if Fortran is more your cup-of-tea, you can use f2py: http://cens.ioc.ee/projects/f2py2e/.&lt;br /&gt;
* There are many more examples.&lt;br /&gt;
&lt;br /&gt;
=Using Packages=&lt;br /&gt;
&lt;br /&gt;
Python packages are great because they provide us with a whole lot of extra functionality--above and beyond the core language--that we didn't have to write and debug ourselves.&lt;br /&gt;
&lt;br /&gt;
Let's walk through a simple example using a package.  At an interactive prompt type:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
from random import randint&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will give us access to the '''randint(x,y)''' function, which returns a randomly chosen integer from the given range [x,y]:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
4&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
1&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
3&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
0&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
OK, so far so good.  One thing to note is that the above '''import''' statement has drawn the name ''randint'' into our current '''namespace'''.  What if we had already defined a function named ''randint''.  That could cause problems.  In order to protect ourselves from this kind of problem, there are several import variants.&lt;br /&gt;
&lt;br /&gt;
By default, functions will be added to a namespace with the same name as the package.  In order to call the functions we will, in this case, have to prefix them with there namespace:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; import random&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; random.randint(0,10)&lt;br /&gt;
6&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Should we desire, we can apply a little more control and specify the namespace for the import ourselves: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; import random as rnd&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; rnd.randint(0,10)&lt;br /&gt;
3&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Another--more 'devil-may-care'--approach is to do away with the separate namespace and pull everything from a given package into the current namespace:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from random import *&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
9&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; random()&lt;br /&gt;
0.3172268098313996&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
(The '''random()''' function returns a randomly selected floating point number in the range [0, 1)--that is, between 0 and 1, including 0.0 but always smaller than 1.0.)&lt;br /&gt;
&lt;br /&gt;
==A Namespace Collision==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; def randint():&lt;br /&gt;
...     print &amp;quot;dummy function&amp;quot;&lt;br /&gt;
... &lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint()&lt;br /&gt;
dummy function&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from random import randint&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint()&lt;br /&gt;
Traceback (most recent call last):&lt;br /&gt;
  File &amp;quot;&amp;lt;stdin&amp;gt;&amp;quot;, line 1, in &amp;lt;module&amp;gt;&lt;br /&gt;
TypeError: randint() takes exactly 3 arguments (1 given)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
0&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Databases=&lt;br /&gt;
&lt;br /&gt;
==Simple Databases==&lt;br /&gt;
&lt;br /&gt;
Python provides access to some database packages through some standard packages.  The '''bsddb''' module allows you to access the highly popular '''Berkeley DB database''' from your python code.&lt;br /&gt;
&lt;br /&gt;
The interface to the database provided by this module is very similar to the way in which we access a dictionary.  First, let's populate a database:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
import bsddb&lt;br /&gt;
d = bsddb.btopen('engines.db')&lt;br /&gt;
d['thomas'] = 'blue'&lt;br /&gt;
d['james'] = 'red'&lt;br /&gt;
d['henry'] = 'green'&lt;br /&gt;
d.close()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now let's open the database again and query it's contents:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; d = bsddb.btopen('engines.db')&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; d.keys()&lt;br /&gt;
['henry', 'james', 'thomas']&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; d.first()&lt;br /&gt;
('henry', 'green')&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; d.last()&lt;br /&gt;
('thomas', 'blue')&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; colour = d['james']&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; colour&lt;br /&gt;
'red'&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; del d['henry']&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; d.keys()&lt;br /&gt;
['james', 'thomas']&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Relational Databases==&lt;br /&gt;
&lt;br /&gt;
Relational databases give us more oomph.  '''SQLite''' is a useful relational database to consider as it is light, in that it requires hardly anything in terms of setup or management, yet still understands queries formulated in SQL.  As such it is useful for creating relatively simple examples of SQL access to a database in python and is a stepping stone toward more powerful database packages.&lt;br /&gt;
&lt;br /&gt;
Here is a script which will create a table called '''planets''' in the file '''pytest.db''' and populate with details of the planets in our solar system:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
#&lt;br /&gt;
# Example python script using sqlite3 package&lt;br /&gt;
# to connect to an SQLite database.&lt;br /&gt;
#&lt;br /&gt;
&lt;br /&gt;
import sqlite3&lt;br /&gt;
 &lt;br /&gt;
conn = sqlite3.connect('pytest.db') # or use :memory: to put it in RAM&lt;br /&gt;
&lt;br /&gt;
cursor = conn.cursor()&lt;br /&gt;
 &lt;br /&gt;
# create a table&lt;br /&gt;
cursor.execute(&amp;quot;&amp;quot;&amp;quot;CREATE TABLE planets&lt;br /&gt;
                  (Id INT, Name TEXT, Diameter REAL, &lt;br /&gt;
                   Mass REAL, Orbital_Period REAL)&amp;quot;&amp;quot;&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
# insert a single record&lt;br /&gt;
cursor.execute(&amp;quot;INSERT INTO planets VALUES(1,'Mercury',0.382,0.06,0.24)&amp;quot;)&lt;br /&gt;
conn.commit() # save data to file&lt;br /&gt;
 &lt;br /&gt;
# insert multiple records&lt;br /&gt;
other_planets = [(2,'Venus',0.949,0.82,0.72),&lt;br /&gt;
                 (3,'Earth',1.0,1.0,1.0),&lt;br /&gt;
                 (4,'Mars',0.532,0.11,1.52),&lt;br /&gt;
                 (5,'Jupiter',11.209,317.8,5.20),&lt;br /&gt;
                 (6,'Saturn',9.449,95.2,9.54),&lt;br /&gt;
                 (7,'Uranus',4.007,14.6,19.22),&lt;br /&gt;
                 (8,'Neptune',3.883,17.2,30.06),&lt;br /&gt;
                 (9,'Pluto',0.18,0.002,248.09)]&lt;br /&gt;
cursor.executemany(&amp;quot;INSERT INTO planets VALUES (?,?,?,?,?)&amp;quot;, other_planets)&lt;br /&gt;
conn.commit() # save data to file&lt;br /&gt;
&lt;br /&gt;
# delete a record&lt;br /&gt;
sql = &amp;quot;&amp;quot;&amp;quot;&lt;br /&gt;
DELETE FROM planets&lt;br /&gt;
WHERE Name = 'Pluto'&lt;br /&gt;
&amp;quot;&amp;quot;&amp;quot;&lt;br /&gt;
cursor.execute(sql)  # poor old pluto! &lt;br /&gt;
conn.commit()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And here is a short example script showing a couple of ways to interrogate the database: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
#&lt;br /&gt;
# Example python script using sqlite3 package&lt;br /&gt;
# to connect to an SQLite database.&lt;br /&gt;
#&lt;br /&gt;
&lt;br /&gt;
import sqlite3&lt;br /&gt;
 &lt;br /&gt;
conn = sqlite3.connect('pytest.db') # or use :memory: to put it in RAM&lt;br /&gt;
&lt;br /&gt;
cursor = conn.cursor()&lt;br /&gt;
&lt;br /&gt;
print &amp;quot;All the records in the table, ordered by Name:\n&amp;quot;&lt;br /&gt;
for row in cursor.execute(&amp;quot;SELECT rowid, * FROM planets ORDER BY Name&amp;quot;):&lt;br /&gt;
    print row&lt;br /&gt;
&lt;br /&gt;
print &amp;quot;\n&amp;quot;&lt;br /&gt;
&lt;br /&gt;
print &amp;quot;All the planets with a mass greater than or equal to that of Earth:\n&amp;quot;&lt;br /&gt;
sql = &amp;quot;SELECT * FROM planets WHERE Mass&amp;gt;=?&amp;quot;&lt;br /&gt;
cursor.execute(sql, [(&amp;quot;1.0&amp;quot;)])&lt;br /&gt;
for row in cursor.fetchall():  # or use fetchone()&lt;br /&gt;
    print row&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the results of running the script are:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
All the records in the table, ordered by Name:&lt;br /&gt;
&lt;br /&gt;
(3, 3, u'Earth', 1.0, 1.0, 1.0)&lt;br /&gt;
(5, 5, u'Jupiter', 11.209, 317.80000000000001, 5.2000000000000002)&lt;br /&gt;
(4, 4, u'Mars', 0.53200000000000003, 0.11, 1.52)&lt;br /&gt;
(1, 1, u'Mercury', 0.38200000000000001, 0.059999999999999998, 0.23999999999999999)&lt;br /&gt;
(8, 8, u'Neptune', 3.883, 17.199999999999999, 30.059999999999999)&lt;br /&gt;
(6, 6, u'Saturn', 9.4489999999999998, 95.200000000000003, 9.5399999999999991)&lt;br /&gt;
(7, 7, u'Uranus', 4.0069999999999997, 14.6, 19.219999999999999)&lt;br /&gt;
(2, 2, u'Venus', 0.94899999999999995, 0.81999999999999995, 0.71999999999999997)&lt;br /&gt;
&lt;br /&gt;
All the planets with a mass greater than or equal to that of Earth:&lt;br /&gt;
&lt;br /&gt;
(3, u'Earth', 1.0, 1.0, 1.0),&lt;br /&gt;
(5, u'Jupiter', 11.209, 317.80000000000001, 5.2000000000000002),&lt;br /&gt;
(6, u'Saturn', 9.4489999999999998, 95.200000000000003, 9.5399999999999991),&lt;br /&gt;
(7, u'Uranus', 4.0069999999999997, 14.6, 19.219999999999999),&lt;br /&gt;
(8, u'Neptune', 3.883, 17.199999999999999, 30.059999999999999)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more information on using SQLite with Python, see, e.g.:&lt;br /&gt;
* http://zetcode.com/db/sqlitepythontutorial/&lt;br /&gt;
* http://www.blog.pythonlibrary.org/2012/07/18/python-a-simple-step-by-step-sqlite-tutorial/&lt;br /&gt;
&lt;br /&gt;
You can also connect to a MySQL database from python using, e.g. the [http://mysql-python.sourceforge.net/ python-mysqldb] package.  A snippet of python code for connecting to a database is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
import MySQLdb&lt;br /&gt;
&lt;br /&gt;
conn = MySQLdb.connect(host=&amp;quot;localhost&amp;quot;,   # your host, usually localhost&lt;br /&gt;
                     user=&amp;quot;gethin&amp;quot;,      # your username&lt;br /&gt;
                      passwd=&amp;quot;changeme&amp;quot;, # your password&lt;br /&gt;
                      db=&amp;quot;menagerie&amp;quot;)    # name of the data base&lt;br /&gt;
&lt;br /&gt;
# Create a cursor object, as before with SQLite&lt;br /&gt;
cur = conn.cursor() &lt;br /&gt;
&lt;br /&gt;
# and then you can submit your SQL command:&lt;br /&gt;
cur.execute(&amp;quot;SELECT * FROM YOUR_TABLE_NAME&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Numpy=&lt;br /&gt;
&lt;br /&gt;
OK, let's move onto looking at python's numerical processing capabilities.  We will start by looking at the '''numpy''' package:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
from numpy import *&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now that we have access to the functions from '''numpy''', let's create an array.  '''Note that a numpy array is an object of a different type to an intrinsic array in Python'''.   A simple approach is to use the '''array''' function.  For example we might enter:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
a = array([[1.0,0.0,0.0],[0.0,1.0,0.0],[0.0,0.0,1.0]])&lt;br /&gt;
b = array([[1,2,3],[4,5,6],[7,8,9]])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a&lt;br /&gt;
array([[ 1.,  0.,  0.],&lt;br /&gt;
       [ 0.,  1.,  0.],&lt;br /&gt;
       [ 0.,  0.,  1.]])&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; b        &lt;br /&gt;
array([[1, 2, 3],&lt;br /&gt;
       [4, 5, 6],&lt;br /&gt;
       [7, 8, 9]])&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; transpose(b)&lt;br /&gt;
array([[1, 4, 7],&lt;br /&gt;
       [2, 5, 8],&lt;br /&gt;
       [3, 6, 9]])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Given an array, we may inquire about it's shape:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
print a.shape&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and we are told that it is a 2-dimensional array (i.e. an array of rank 2) and that the length of both dimensions is 3:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
(3, 3)&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
We can also apply operators to array objects.  For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
a = a * 9&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
array([[ 9.,  0.,  0.],&lt;br /&gt;
       [ 0.,  9.,  0.],&lt;br /&gt;
       [ 0.,  0.,  9.]])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Note, however, that most operations on numpy arrays are done element-wise''', which is '''different to a linear algebra operation that you may have been expecting.'''  We will return to linear algebra operations when we look at the '''scipy''' package.&lt;br /&gt;
&lt;br /&gt;
Should we so desire, we could re-shape the array.  One way to do this is to to set it's shape attribute directly:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a.shape = (1,9)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a&lt;br /&gt;
array([[ 9.,  0.,  0.,  0.,  9.,  0.,  0.,  0.,  9.]])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As with the list example, it can be useful to read or change the value of an element (or sub array) individually.  Let's turn the array back to it's rank-2 form and try it out:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a.shape = (3,3)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a[1,1] = 777.0&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print a&lt;br /&gt;
[[   9.    0.    0.]&lt;br /&gt;
 [   0.  777.    0.]&lt;br /&gt;
 [   0.    0.    9.]]&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a[1:,1:] = [[777.0, 777.0],[777.0, 777.0]]&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print a&lt;br /&gt;
[[   9.    0.    0.]&lt;br /&gt;
 [   0.  777.  777.]&lt;br /&gt;
 [   0.  777.  777.]]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This is all pretty handy so far, but specifying the value of each element explicitly could become a chore.  Happily some helper functions exist to give you a head start with some building blocks.  For example, your can use:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; b = zeros((3,3))&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print b&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; b = ones((3,2))&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print b&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; b = identity(2)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print b&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; big = resize(b, (6,6))&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print big&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The use of '''resize''' in the last example illustrates a useful '''replicating feature'''.&lt;br /&gt;
&lt;br /&gt;
A list of all the functions and operations contained within numpy is: http://scipy.org/Numpy_Example_List.&lt;br /&gt;
&lt;br /&gt;
=Pylab and Matplotlib=&lt;br /&gt;
&lt;br /&gt;
The above examples are quite natty, but we have deliberately kept the array sizes small so that we can print the element values easily.  In practice, you may find that your array sizes are much larger and printing the values to the screen is impractical.  Fear not!  Python has many packages which help you plot your data, so that you can explore it.&lt;br /&gt;
&lt;br /&gt;
Using the pylab plotting interface we can create:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
import pylab&lt;br /&gt;
from numpy import arange, pi, cos, sin, add, sqrt&lt;br /&gt;
t = arange(0.0, 3.0, 0.01)&lt;br /&gt;
c = cos(2 * pi * t)&lt;br /&gt;
s = sin(2 * pi * t)&lt;br /&gt;
pylab.ylabel('some numbers')&lt;br /&gt;
pylab.xlabel('some more numbers')&lt;br /&gt;
pylab.plot(t, c, 'r', lw=2)&lt;br /&gt;
pylab.plot(t, s, 'b', lw=2)&lt;br /&gt;
pylab.plot(t, c-s, 'gs', lw=2)&lt;br /&gt;
pylab.ylim(-1.5, 1.5)&lt;br /&gt;
pylab.title('sin and cos functions')&lt;br /&gt;
pylab.savefig('curves', dpi=300)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where '''curves.png''' looks like:&lt;br /&gt;
&lt;br /&gt;
[[Image:Curves.png|thumb|600px|none|Some nice curves]]&lt;br /&gt;
&lt;br /&gt;
You can open .png images from the linux command line (inc. bluecrystal) using, e.g.: '''display -resize 1000 curves.png''' &lt;br /&gt;
&lt;br /&gt;
We can also use Matplotlib directly for more control:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
import matplotlib.pyplot as plt&lt;br /&gt;
from pylab import meshgrid&lt;br /&gt;
from numpy import arange, add, sin, sqrt&lt;br /&gt;
x = arange(-5,10)&lt;br /&gt;
y = arange(-4,11)&lt;br /&gt;
z1 = sqrt(add.outer(x**2,y**2))&lt;br /&gt;
Z = sin(z1)/z1 &lt;br /&gt;
X, Y = meshgrid(x,y)&lt;br /&gt;
plt.figure()&lt;br /&gt;
plt.contour(X,Y,Z)&lt;br /&gt;
plt.show()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and you should get a window similar to:&lt;br /&gt;
&lt;br /&gt;
[[Image:Sinc-matplotlib-contour.png|thumb|600px|none|A contour map of the sinc function]]&lt;br /&gt;
&lt;br /&gt;
Perhaps the best way next step for matplotlib is to look at the gallery: http://matplotlib.org/gallery.html.&lt;br /&gt;
Just click on a figure and you will get the code used to generate it--a really great resource!&lt;br /&gt;
&lt;br /&gt;
==Input and Output==&lt;br /&gt;
&lt;br /&gt;
The foregoing is all very interesting, but life would be rather dull if you had to re-enter all your data by hand whenever you set to work with Python and numpy.  Therefore we need a means to save data to a file and load it again.  Happily, we can do this rather easily using a couple of routines from the '''pylab''' package:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from numpy import *&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from pylab import load&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from pylab import save&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; data = zeros((3,3))&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; save('myfile.txt', data)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; read_data = load(&amp;quot;myfile.txt&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''warning, the load() function of numpy will be shadowed''' in the above example.  One way to protect yourself against this is to make use of '''namespaces''':  Modify your import command to '''import pylab''' and then use '''pylab.load(..)'''.&lt;br /&gt;
&lt;br /&gt;
=Scipy=&lt;br /&gt;
&lt;br /&gt;
* http://www.scipy.org/&lt;br /&gt;
* ..and good examples on http://scipy-lectures.github.com/intro/scipy.html&lt;br /&gt;
* Many useful features:&lt;br /&gt;
* Integration &amp;amp; Differentiation&lt;br /&gt;
* Optimisation (curve fitting, etc)&lt;br /&gt;
* Fourier transforms&lt;br /&gt;
* Signal processing&lt;br /&gt;
* Statistical algorithms&lt;br /&gt;
* Much, much more...&lt;br /&gt;
* If you know Python you can use SciPy&lt;br /&gt;
&lt;br /&gt;
==An example: Differentiation==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; # derivative of x^2 at x=3&lt;br /&gt;
...&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from scipy import derivative&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; derivative(lambda x: x**2, 3)&lt;br /&gt;
6.0&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; # also works with arrays&lt;br /&gt;
...&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from numpy import array&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; my_array = array([1,2,3])&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; derivative(lambda x: x**2,my_array)&lt;br /&gt;
array([ 2., 4., 6.])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Google for many more examples pertaining to your favourite numerical procedure!&lt;br /&gt;
&lt;br /&gt;
=A Repository of Packages You Could Use=&lt;br /&gt;
&lt;br /&gt;
Now, we've touched on a couple, but there are thousands of python packages available.  Before you start writing your own function for X, check that someone hasn't contributed code for that already at http://pypi.python.org/pypi.&lt;br /&gt;
&lt;br /&gt;
=Writing Faster Python=&lt;br /&gt;
&lt;br /&gt;
As with other scripting languages, such as MATLAB and R, one of the simplest ways in which you can write faster python code is to eliminate loops by vectorising your code.&lt;br /&gt;
&lt;br /&gt;
Consider the following two scripts.  First '''for-loop.py''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
&lt;br /&gt;
import numpy as np&lt;br /&gt;
arr = np.random.rand(1000000)&lt;br /&gt;
&lt;br /&gt;
def filter(arr):&lt;br /&gt;
    for i, val in enumerate(arr):&lt;br /&gt;
        if val &amp;lt; 0.5:&lt;br /&gt;
            arr[i] = 0&lt;br /&gt;
    return arr&lt;br /&gt;
&lt;br /&gt;
if __name__ == &amp;quot;__main__&amp;quot;:&lt;br /&gt;
    filter(arr)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and secondly, '''vectorised.py''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
&lt;br /&gt;
import numpy as np&lt;br /&gt;
arr = np.random.rand(1000000)&lt;br /&gt;
&lt;br /&gt;
def filter(arr):&lt;br /&gt;
    arr[arr &amp;lt; 0.5] = 0&lt;br /&gt;
    return arr&lt;br /&gt;
&lt;br /&gt;
if __name__ == &amp;quot;__main__&amp;quot;:&lt;br /&gt;
    filter(arr)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If we now run these two scripts through the Linux command line '''time''' utility, we see that the vectorised code runs a lot faster than the for loop:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~$ time ./for-loop.py &lt;br /&gt;
&lt;br /&gt;
real	0m0.963s&lt;br /&gt;
user	0m0.952s&lt;br /&gt;
sys	0m0.012s&lt;br /&gt;
gethin@gethin-desktop:~$ time ./vectorised.py &lt;br /&gt;
&lt;br /&gt;
real	0m0.116s&lt;br /&gt;
user	0m0.096s&lt;br /&gt;
sys	0m0.020s&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For some more tips on writing faster python code, and examples of how to use one of the python profiler modules, take a look at:&lt;br /&gt;
* https://wiki.python.org/moin/PythonSpeed/PerformanceTips&lt;br /&gt;
* http://technicaldiscovery.blogspot.co.uk/2011/06/speeding-up-python-numpy-cython-and.html&lt;br /&gt;
* http://www.huyng.com/posts/python-performance-analysis/&lt;br /&gt;
* http://www.appneta.com/2012/05/21/profiling-python-performance-lineprof-statprof-cprofile/&lt;br /&gt;
&lt;br /&gt;
=Further Reading=&lt;br /&gt;
&lt;br /&gt;
* http://docs.python.org/tutorial/&lt;br /&gt;
* http://wiki.python.org/moin/PythonBooks&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=R1&amp;diff=9416</id>
		<title>R1</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=R1&amp;diff=9416"/>
		<updated>2014-03-07T12:15:13Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: Protected &amp;quot;R1&amp;quot; ([edit=sysop] (indefinite) [move=sysop] (indefinite))&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[category:Pragmatic Programming]]&lt;br /&gt;
'''Open Source Statistics with R'''&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
R is a mature, open-source (i.e. free!) statistics package, with an intuitive interface, excellent graphics and a vibrant community constantly adding new methods for the statistical investigation of your data to the library of packages available.&lt;br /&gt;
&lt;br /&gt;
The goal of this tutorial is to introduce you to the R package, and not to be an introductory course in statistics.&lt;br /&gt;
&lt;br /&gt;
If you are working on a Linux system, you will typically start R from the command line.  On a Windows machine, or a Mac, you will typically start up R in some form of GUI.  However you get R started, you will have access to an R command prompt.  The good news is that the examples below will all work at the R command prompt, however you gained access to it.&lt;br /&gt;
&lt;br /&gt;
Further resources:&lt;br /&gt;
&lt;br /&gt;
* The R manual is a great resource for learning R: http://cran.r-project.org/doc/manuals/r-release/R-intro.pdf&lt;br /&gt;
* Some excellent examples of using R can also be found at: http://msenux.redwoods.edu/math/R/ and http://www.r-tutor.com/&lt;br /&gt;
&lt;br /&gt;
=Getting Started=&lt;br /&gt;
&lt;br /&gt;
The very simplest thing we can do with R is to perform some arithmetic at the command prompt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; phi &amp;lt;- (1+sqrt(5))/2&lt;br /&gt;
&amp;gt; phi&lt;br /&gt;
[1] 1.618034&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Parentheses are used to modify the usual order of precedence of the operators ('''/''' will typically be evaluated before '''+''').  Note the '''[1]''' accompanying the returned value.  All numbers entered at the console are interpreted as a vector.  The '[1]' indicates that the line in question is displaying the vector of values starting at first index.  We can use the handy sequence function to create a vector containing more than a single element:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; odds &amp;lt;- seq(from=1, to=67, by=2)&lt;br /&gt;
&amp;gt; odds&lt;br /&gt;
 [1]  1  3  5  7  9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49&lt;br /&gt;
[26] 51 53 55 57 59 61 63 65 67&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
From the above example, we can see that both the '''&amp;lt;-''' and '''=''' operators can be used for assignment.&lt;br /&gt;
&lt;br /&gt;
Vectors are commonly used data structures in R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
coords.bris &amp;lt;- c(51.5, 2.6)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As are matrices:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; magic &amp;lt;- matrix(data=c(2,7,6,9,5,1,4,3,8),nrow=3,ncol=3)&lt;br /&gt;
&amp;gt; magic&lt;br /&gt;
     [,1] [,2] [,3]&lt;br /&gt;
[1,]    2    9    4&lt;br /&gt;
[2,]    7    5    3&lt;br /&gt;
[3,]    6    1    8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the '''c''' function combines the arguments given in the parentheses.  We can access portions of the array using the syntax shown in the square brackets.  For example, we can access the first row using the '''[1,]''' notation, and similarly the second column using '''[,2]'''.  Since the square is 3x3 magic, the numbers in both slices should sum to 15:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; sum(magic[1,])&lt;br /&gt;
[1] 15&lt;br /&gt;
&amp;gt; sum(magic[,2])&lt;br /&gt;
[1] 15&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Single elements and ranges can also accessed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; magic[2,2]&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;gt; magic[2:3,2:3]&lt;br /&gt;
     [,1] [,2]&lt;br /&gt;
[1,]    5    3&lt;br /&gt;
[2,]    1    8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
R also provides '''arrays''', which have more than two dimensions, and '''lists''' to hold heterogeneous collections.&lt;br /&gt;
&lt;br /&gt;
An example list:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; list.r4 &amp;lt;- list(name=&amp;quot;Radio4&amp;quot;, frequency=&amp;quot;93.7&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The items of which, we can access in several ways:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; list.r4$frequency&lt;br /&gt;
[1] &amp;quot;93.7&amp;quot;&lt;br /&gt;
&amp;gt; list.r4[1]&lt;br /&gt;
$name&lt;br /&gt;
[1] &amp;quot;Radio4&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;gt; list.r4[[1]]&lt;br /&gt;
[1] &amp;quot;Radio4&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A very commonly used data structure is the '''data frame''', which R uses to store tabular data.  Given several vectors of equal length, we can collate them into a data frame:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; country &amp;lt;- c(&amp;quot;USA&amp;quot;, &amp;quot;China&amp;quot;, &amp;quot;GB&amp;quot;)&lt;br /&gt;
&amp;gt; gold &amp;lt;- c(46, 38, 29)&lt;br /&gt;
&amp;gt; silver &amp;lt;- c(29, 27, 17)&lt;br /&gt;
&amp;gt; bronze &amp;lt;- c(29, 23, 19)&lt;br /&gt;
&amp;gt; medals.2012 &amp;lt;- data.frame(country, gold, silver, bronze)&lt;br /&gt;
&amp;gt; medals.2012&lt;br /&gt;
    country  gold  silver  bronze&lt;br /&gt;
1       USA    46      29      29&lt;br /&gt;
2     China    38      27      23&lt;br /&gt;
3        GB    29      17      19&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can access columns of a data frame using the '''$''' operator:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; medals.2012$country&lt;br /&gt;
[1] USA   China GB   &lt;br /&gt;
Levels: China GB USA&lt;br /&gt;
&amp;gt; medals.2012$gold&lt;br /&gt;
[1] 46 38 29&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Standard Graphics: A taster=&lt;br /&gt;
&lt;br /&gt;
An aspect which makes R popular are it's graphing functions.  R also has some very handy built-in data sets--we'll use this to demonstrate just a small fraction of R's graphing abilities.&lt;br /&gt;
&lt;br /&gt;
First up is the humble '''plot()''' function.  Given a data frame of points, such as one charting the relationship between temperature and the vapour pressure of mercury, it will give us a (handily labelled) scatter plot: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(pressure)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See the gallery below for all the plots created in this section.&lt;br /&gt;
&lt;br /&gt;
The plot function will also accept a time-series (another class of object recognised by R) and will sensibly join the points with a line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(co2)&lt;br /&gt;
&amp;gt; class(co2)&lt;br /&gt;
[1] &amp;quot;ts&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Pie charts are easily constructed.  In this case, to show the relative proportions of electricity generated from different sources in the UK in 2011 (source: https://www.gov.uk/government/.../5942-uk-energy-in-brief-2012.pdf‎):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; uk.electricty.sources.2011 &amp;lt;- c(41,29,18,5,4,2,1)&lt;br /&gt;
&amp;gt; names(uk.electricty.sources.2011) &amp;lt;- (&amp;quot;Gas&amp;quot;, &amp;quot;Coal&amp;quot;, &amp;quot;Nuclear&amp;quot;, &amp;quot;Hydro &amp;amp; other&amp;quot;, &amp;quot;Wind&amp;quot;, &amp;quot;Imports&amp;quot;, &amp;quot;Oil&amp;quot;)&lt;br /&gt;
&amp;gt; pie(uk.electricty.sources.2011, main=&amp;quot;UK Electricty Generating Mix, 2011&amp;quot;, col=rainbow(7))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Next, let's create a bar chart of monthly average precipitation falling here in the fair city of Bristol (source: http://www.worldweatheronline.com):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; bristol.precip &amp;lt;- c(82.9, 56.1, 59.2, 69, 50.8, 50.9, 50.8, 74.8, 74.7, 91.1, 94.5, 93.6)&lt;br /&gt;
&amp;gt; names(bristol.precip) &amp;lt;- c(&amp;quot;Jan&amp;quot;, &amp;quot;Feb&amp;quot;, &amp;quot;Mar&amp;quot;, &amp;quot;Apr&amp;quot;, &amp;quot;May&amp;quot;, &amp;quot;Jun&amp;quot;, &amp;quot;Jul&amp;quot;, &amp;quot;Aug&amp;quot;, &amp;quot;Sep&amp;quot;, &amp;quot;Oct&amp;quot;, &amp;quot;Nov&amp;quot;, &amp;quot;Dec&amp;quot;)&lt;br /&gt;
&amp;gt; barplot(bristol.precip,&lt;br /&gt;
+ main=&amp;quot;Average Monthly Precipitation in Bristol&amp;quot;,&lt;br /&gt;
+ ylab=&amp;quot;Mean precipitation (mm)&amp;quot;,&lt;br /&gt;
+ ylim=c(0,100),&lt;br /&gt;
+ col=c(&amp;quot;darkblue&amp;quot;))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Box_plot 'Box and whisker' plots] are useful ways to graph the quartiles of some data.  In this case, the fuel efficiencies of various US cars, circa 1974:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; boxplot(mpg~cyl,data=mtcars, main=&amp;quot;Car Milage Data&amp;quot;,&lt;br /&gt;
+    xlab=&amp;quot;Number of Cylinders&amp;quot;, ylab=&amp;quot;Miles Per Gallon&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
R includes a very useful help facility.  In the case of the '''filled.contour()''' plotting function, the help page includes an example of it's use to plot the topology of a volcano in Auckland, NZ:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; ?filled.countour&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;gallery widths=300px heights=300px perrow=3&amp;gt;&lt;br /&gt;
File:Vapour-pressure.png|Vapour pressure of mercury against temperature&lt;br /&gt;
File:Mauna-loa.png|CO2 concentrations measured at Mauna-Loa between 1959 and 1997&lt;br /&gt;
File:Pie.png|The UK's electricity generating mix, 2011&lt;br /&gt;
File:Barplot.png|Average monthly precipitation in Bristol&lt;br /&gt;
File:Boxplot.png|Range of fuel efficiencies for different engine sizes &lt;br /&gt;
File:Maunga-Whau.png|Topology of Maunga Whau volcano in Auckland&lt;br /&gt;
&amp;lt;/gallery&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are many more example plots--complete with the R code required to create the plots (at the bottom of the page, after the comments)--on the following web page:&lt;br /&gt;
* http://gallery.r-enthusiasts.com/thumbs.php&lt;br /&gt;
&lt;br /&gt;
=Loops=&lt;br /&gt;
&lt;br /&gt;
A simple '''for''' loop:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; for (ii in seq(1,10)) print(ii)&lt;br /&gt;
[1] 1&lt;br /&gt;
[1] 2&lt;br /&gt;
[1] 3&lt;br /&gt;
[1] 4&lt;br /&gt;
[1] 5&lt;br /&gt;
[1] 6&lt;br /&gt;
[1] 7&lt;br /&gt;
[1] 8&lt;br /&gt;
[1] 9&lt;br /&gt;
[1] 10&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Some more exotic counting:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; for (ii in seq(from=10, to=0, by=-2)) print(ii)&lt;br /&gt;
[1] 10&lt;br /&gt;
[1] 8&lt;br /&gt;
[1] 6&lt;br /&gt;
[1] 4&lt;br /&gt;
[1] 2&lt;br /&gt;
[1] 0&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''while''' loops are for when we don't know the number of iterations in advance:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; ii &amp;lt;- runif(1,0,1)&lt;br /&gt;
&amp;gt; ii&lt;br /&gt;
[1] 0.3998513&lt;br /&gt;
&amp;gt; while (ii &amp;lt; 0.5) {print(ii); ii &amp;lt;- runif(1,0,1)}&lt;br /&gt;
[1] 0.3998513&lt;br /&gt;
[1] 0.05469244&lt;br /&gt;
&amp;gt; ii&lt;br /&gt;
[1] 0.8265036&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Functions=&lt;br /&gt;
&lt;br /&gt;
You can define your own functions in R, using the '''function''' keyword.  For example, Pythagoras' Theorem:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypotenuse &amp;lt;- function(x, y) {sqrt(x^2 + y^2)}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The braces ({}) are optional, but add clarity.&lt;br /&gt;
&lt;br /&gt;
To call the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypotenuse(3,4)&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can provide default values for the arguments, which can be overridden for any given invocation of the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot2 &amp;lt;- function(x=3 ,y=4) {sqrt(x^2 + y^2)}&lt;br /&gt;
&amp;gt; hypot2()&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;gt; hypot2(12,16)&lt;br /&gt;
[1] 20&lt;br /&gt;
&amp;gt; hypot2(y=16, x=12)&lt;br /&gt;
[1] 20&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can see that the order of the arguments is respected, unless the names are given, in which case the order can be changed. &lt;br /&gt;
&lt;br /&gt;
Longer functions can be spread over several lines.  We can also use the '''return''' keyword to control which value is returned by the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot3 &amp;lt;- function(x=3 ,y=4) {&lt;br /&gt;
+ x_sq &amp;lt;- x^2&lt;br /&gt;
+ y_sq &amp;lt;- y^2&lt;br /&gt;
+ return( sqrt(x_sq + y_sq) )}&lt;br /&gt;
&amp;gt; hypot3(6,8)&lt;br /&gt;
[1] 10&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can check on the contents of a function, by just typing it's name (without parentheses):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot3&lt;br /&gt;
function(x=3 ,y=4) {&lt;br /&gt;
x_sq &amp;lt;- x^2&lt;br /&gt;
y_sq &amp;lt;- y^2&lt;br /&gt;
return( sqrt(x_sq + y_sq) )}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or just check the arguments, using the '''args''' function.  (The body of the function in general is reported as NULL):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; args(hypot3)&lt;br /&gt;
function (x = 3, y = 4) &lt;br /&gt;
NULL&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Packages=&lt;br /&gt;
&lt;br /&gt;
Listed at http://cran.r-project.org/&lt;br /&gt;
&lt;br /&gt;
Let's install the '''multicore''' package, that will give us access to functions within R which will run on the multiple processors which we often find in our computers these days:   &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; install.packages(&amp;quot;multicore&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Et voila!  It is done.&lt;br /&gt;
&lt;br /&gt;
We can check which packages are currently loaded into the library available from our workspace:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; library()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If we need to add one, we type e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; library(multicore)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now, an example of using a function from the multicore package.  The '''lapply''' function, which is included in the standard R core, will map a given function over a list inputs, giving a list of the function outputs in return.  For example, we can map a squaring function over the list of integers from 1 to 3:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; lapply(1:3, function(x) {x^2})&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
which gives us the list:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[[1]]&lt;br /&gt;
[1] 1&lt;br /&gt;
&lt;br /&gt;
[[2]]&lt;br /&gt;
[1] 4&lt;br /&gt;
&lt;br /&gt;
[[3]]&lt;br /&gt;
[1] 9&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now, we can do the same work in parallel using:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; mclapply(1:3, function(x) {x^2})&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Reading Data from File=&lt;br /&gt;
&lt;br /&gt;
R provides some very useful functions for reading and writing data from/to file.&lt;br /&gt;
&lt;br /&gt;
==Text Files==&lt;br /&gt;
&lt;br /&gt;
Let's start with text files.  If your data is organised into a file such that it looks like a table with column headings:  &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Perhaps the simplest one is '''read.table()'''.  If I have a text file with the following contents:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
country              gold silver bronze&lt;br /&gt;
&amp;quot;USA&amp;quot;                46   29     29&lt;br /&gt;
&amp;quot;China&amp;quot;              38   27     23&lt;br /&gt;
&amp;quot;Great Britain&amp;quot;      29   17     19&lt;br /&gt;
&amp;quot;Russian Federation&amp;quot; 24   26     32&lt;br /&gt;
&amp;quot;Republic of Korea&amp;quot;  13   8      7&lt;br /&gt;
&amp;quot;Germany&amp;quot;            11   19     14&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It will be a simple matter to use the '''read.table()''' function to load the data into R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; medals.2012 &amp;lt;- read.table(&amp;quot;medals.txt&amp;quot;, header=TRUE)&lt;br /&gt;
&amp;gt; medals.2012&lt;br /&gt;
             country gold silver bronze&lt;br /&gt;
1                USA   46     29     29&lt;br /&gt;
2              China   38     27     23&lt;br /&gt;
3      Great Britain   29     17     19&lt;br /&gt;
4 Russian Federation   24     26     32&lt;br /&gt;
5  Republic of Korea   13      8      7&lt;br /&gt;
6            Germany   11     19     14&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is a corresponding '''write.table()''' function to export the contents of a data frame into a text file.&lt;br /&gt;
&lt;br /&gt;
CSV files can be easily handled by specifying '''sep=&amp;quot;,&amp;quot;''' as an argument to read.table().  However, for convenience, there are also '''read.csv()''' and '''write.csv()''' functions defined.  For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; write.csv(medals.2012,&amp;quot;medals.csv&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Gives us the file, '''medals.csv''', with the contents:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;quot;&amp;quot;,&amp;quot;country&amp;quot;,&amp;quot;gold&amp;quot;,&amp;quot;silver&amp;quot;,&amp;quot;bronze&amp;quot;&lt;br /&gt;
&amp;quot;1&amp;quot;,&amp;quot;USA&amp;quot;,46,29,29&lt;br /&gt;
&amp;quot;2&amp;quot;,&amp;quot;China&amp;quot;,38,27,23&lt;br /&gt;
&amp;quot;3&amp;quot;,&amp;quot;Great Britain&amp;quot;,29,17,19&lt;br /&gt;
&amp;quot;4&amp;quot;,&amp;quot;Russian Federation&amp;quot;,24,26,32&lt;br /&gt;
&amp;quot;5&amp;quot;,&amp;quot;Republic of Korea&amp;quot;,13,8,7&lt;br /&gt;
&amp;quot;6&amp;quot;,&amp;quot;Germany&amp;quot;,11,19,14&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Binary Files==&lt;br /&gt;
&lt;br /&gt;
The '''save()''' function will store an R data structure in binary form:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; save(medals.2012,file=&amp;quot;medals.RData&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~$ file medals.RData &lt;br /&gt;
medals.RData: gzip compressed data, from Unix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is, of course, a corresponding function to load such data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; load(&amp;quot;medals.RData&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Databases==&lt;br /&gt;
&lt;br /&gt;
If you would like to read and write data directly from/to a database, there are several packages to help you.  See http://cran.r-project.org/doc/manuals/r-release/R-data.html#Relational-databases for more information.&lt;br /&gt;
&lt;br /&gt;
==NetCDF==&lt;br /&gt;
&lt;br /&gt;
The [http://cran.r-project.org/web/packages/ncdf/index.html '''ncdf''' package] provides an interface to NetCDF files.  Before installing the package, you will need the Unidata NetCDF libraries installed on your system.  On Linux, the standard package managers conveniently provide this.  Note that you will need the 'development' packages.  Once the prerequisites are satisfied, you can use the standard R command to install the package from CRAN:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; install.packages(&amp;quot;ncdf&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Examples of Common Tasks=&lt;br /&gt;
&lt;br /&gt;
==Preparing Data==&lt;br /&gt;
&lt;br /&gt;
===Sorting===&lt;br /&gt;
&lt;br /&gt;
Using '''sort''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; railway.engines &amp;lt;- c(&amp;quot;thomas&amp;quot;, &amp;quot;henry&amp;quot;, &amp;quot;gordon&amp;quot;, &amp;quot;edward&amp;quot;, &amp;quot;james&amp;quot;)&lt;br /&gt;
&amp;gt; sort(railway.engines)&lt;br /&gt;
[1] &amp;quot;edward&amp;quot; &amp;quot;gordon&amp;quot; &amp;quot;henry&amp;quot;  &amp;quot;james&amp;quot;  &amp;quot;thomas&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/sort.html&lt;br /&gt;
&lt;br /&gt;
===Random Sampling===&lt;br /&gt;
&lt;br /&gt;
Using '''sample''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; railway.engines &amp;lt;- c(&amp;quot;thomas&amp;quot;, &amp;quot;henry&amp;quot;, &amp;quot;gordon&amp;quot;, &amp;quot;edward&amp;quot;, &amp;quot;james&amp;quot;)&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;gordon&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;james&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;edward&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;thomas&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;gordon&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;james&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/sample.html&lt;br /&gt;
&lt;br /&gt;
===Combining===&lt;br /&gt;
&lt;br /&gt;
Using '''rbind''' to add combine the rows to two data frames:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; country &amp;lt;- c(&amp;quot;France&amp;quot;, &amp;quot;Italy&amp;quot;, &amp;quot;Hungary&amp;quot;, &amp;quot;Australia&amp;quot;)&lt;br /&gt;
&amp;gt; gold &amp;lt;- c(11, 8, 8, 7)&lt;br /&gt;
&amp;gt; silver &amp;lt;- c(11, 9, 4, 16)&lt;br /&gt;
&amp;gt; bronze &amp;lt;- c(12, 11, 5, 12)&lt;br /&gt;
&amp;gt; extras.2012 &amp;lt;- data.frame(country, gold, silver, bronze)&lt;br /&gt;
&amp;gt; rbind(medals.2012, extras.2012)&lt;br /&gt;
              country gold silver bronze&lt;br /&gt;
1                 USA   46     29     29&lt;br /&gt;
2               China   38     27     23&lt;br /&gt;
3       Great Britain   29     17     19&lt;br /&gt;
4  Russian Federation   24     26     32&lt;br /&gt;
5   Republic of Korea   13      8      7&lt;br /&gt;
6             Germany   11     19     14&lt;br /&gt;
7              France   11     11     12&lt;br /&gt;
8               Italy    8      9     11&lt;br /&gt;
9             Hungary    8      4      5&lt;br /&gt;
10          Australia    7     16     12&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/cbind.html&lt;br /&gt;
&lt;br /&gt;
===Binning Data===&lt;br /&gt;
&lt;br /&gt;
Using '''cut''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; girls_2=c(83.8, 86.2, 85.1, 88.6, 83, 88.9, 89.7, 81.3, 88.7, 88.4)&lt;br /&gt;
&amp;gt; bins=cut(girls_2, breaks=3)&lt;br /&gt;
&amp;gt; bins&lt;br /&gt;
 [1] (81.3,84.1] (84.1,86.9] (84.1,86.9] (86.9,89.7] (81.3,84.1] (86.9,89.7]&lt;br /&gt;
 [7] (86.9,89.7] (81.3,84.1] (86.9,89.7] (86.9,89.7]&lt;br /&gt;
Levels: (81.3,84.1] (84.1,86.9] (86.9,89.7]&lt;br /&gt;
&amp;gt; plot(bins)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Plotting the data couldn't be simpler with '''plot(bins)'''!&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/cut.html&lt;br /&gt;
&lt;br /&gt;
==Linear Regression==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(cars)&lt;br /&gt;
&amp;gt; res=lm(dist ~ speed, data=cars)&lt;br /&gt;
&amp;gt; abline(res)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Image:R-lm(cars)-abline.png|400px|thumbnail|center|linear regression of stopping distance against speed from the built-in data set, cars]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
'''Exercises'''&lt;br /&gt;
* You may wish to compare different methods of estimation.  From the MASS package, you can fit a line with the '''rlm''' and '''lqs'' funtions.  You can plot all the lines against the data using:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; abline(res.lm, lty=1)&lt;br /&gt;
&amp;gt; abline(res.rlm, lty=2)&lt;br /&gt;
&amp;gt; abline(res.lqs, lty=3)&lt;br /&gt;
&amp;gt; legend(x=5, y=100, legend=c(&amp;quot;lm&amp;quot;,&amp;quot;rlm&amp;quot;,&amp;quot;lqs&amp;quot;), lty=c(1,2,3))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-patched/library/MASS/html/rlm.html and http://stat.ethz.ch/R-manual/R-devel/RHOME/library/MASS/html/lqs.html.&lt;br /&gt;
&lt;br /&gt;
* Weighted least squares.  The '''lm''' function will accept a vector of weights, '''lm(... weights=...)'''.  If given, the function will optimise the line of best fit according a the equation of weighted least squares.  Experiment with different linear model fits, given different weighting vectors.  Some handy hints for creating a vector of weights:&lt;br /&gt;
** '''w1&amp;lt;-rep(0.1,50)''' will give you a vector, length 50, where each element has a value of 0.1.  W1[1]&amp;lt;-10 will give the first element of the vector a value of 10.&lt;br /&gt;
** '''w2&amp;lt;-seq(from=0.02, to=1.0, by=0.02)''' provides a vector containing a sequence of values from 0.02 to 1.0 in steps of 0.02 (handily, again 50 in total).&lt;br /&gt;
&lt;br /&gt;
==Significance Testing==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; boys_2=c(90.2, 91.4, 86.4, 87.6, 86.7, 88.1, 82.2, 83.8, 91, 87.4)&lt;br /&gt;
&amp;gt; girls_2=c(83.8, 86.2, 85.1, 88.6, 83, 88.9, 89.7, 81.3, 88.7, 88.4)&lt;br /&gt;
&amp;gt; res=var.test(boys_2,girls_2)&lt;br /&gt;
&amp;gt; res&lt;br /&gt;
&lt;br /&gt;
	F test to compare two variances&lt;br /&gt;
&lt;br /&gt;
data:  boys_2 and girls_2 &lt;br /&gt;
F = 1.0186, num df = 9, denom df = 9, p-value = 0.9786&lt;br /&gt;
alternative hypothesis: true ratio of variances is not equal to 1 &lt;br /&gt;
95 percent confidence interval:&lt;br /&gt;
 0.2529956 4.1007126 &lt;br /&gt;
sample estimates:&lt;br /&gt;
ratio of variances &lt;br /&gt;
          1.018559 &lt;br /&gt;
&amp;gt; res=t.test(boys_2, girls_2, var.equal=TRUE, paired=FALSE)&lt;br /&gt;
&amp;gt; res&lt;br /&gt;
&lt;br /&gt;
	Two Sample t-test&lt;br /&gt;
&lt;br /&gt;
data:  boys_2 and girls_2 &lt;br /&gt;
t = 0.8429, df = 18, p-value = 0.4103&lt;br /&gt;
alternative hypothesis: true difference in means is not equal to 0 &lt;br /&gt;
95 percent confidence interval:&lt;br /&gt;
 -1.656675  3.876675 &lt;br /&gt;
sample estimates:&lt;br /&gt;
mean of x mean of y &lt;br /&gt;
    87.48     86.3&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Classification==&lt;br /&gt;
&lt;br /&gt;
===k Nearest Neighbours===&lt;br /&gt;
&lt;br /&gt;
This famous (Fisher's or Anderson's) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa (s), versicolor (c), and virginica (v).&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-patched/library/datasets/html/iris.html&lt;br /&gt;
&lt;br /&gt;
k-nearest neighbour classification for test set from training set: For each row of the test set, the k nearest (in Euclidean distance) training set vectors are found, and the classification is decided by majority vote, with ties broken at random. If there are ties for the kth nearest vector, all candidates are included in the vote.&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/class/html/knn.html&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(class)&lt;br /&gt;
train &amp;lt;- rbind(iris3[1:25,,1], iris3[1:25,,2], iris3[1:25,,3])&lt;br /&gt;
test &amp;lt;- rbind(iris3[26:50,,1], iris3[26:50,,2], iris3[26:50,,3])&lt;br /&gt;
cl &amp;lt;- factor(c(rep(&amp;quot;s&amp;quot;,25), rep(&amp;quot;c&amp;quot;,25), rep(&amp;quot;v&amp;quot;,25)))&lt;br /&gt;
iris3.knn &amp;lt;- knn(train, test, cl, k = 3, prob=TRUE)&lt;br /&gt;
table(predicted=iris3.knn, actual=cl)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
How did we do?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
         actual&lt;br /&gt;
predicted  c  s  v&lt;br /&gt;
        c 23  0  3&lt;br /&gt;
        s  0 25  0&lt;br /&gt;
        v  2  0 22&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Classification Trees===&lt;br /&gt;
&lt;br /&gt;
The kyphosis data frame has 81 rows and 4 columns. representing data on children who have had corrective spinal surgery.&lt;br /&gt;
&lt;br /&gt;
This data frame contains the following columns:&lt;br /&gt;
* Kyphosis: a factor with levels absent present indicating if a kyphosis (a type of deformation) was present after the operation.&lt;br /&gt;
* Age: in months&lt;br /&gt;
* Number: the number of vertebrae involved&lt;br /&gt;
* Start: the number of the first (topmost) vertebra operated on.&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/rpart/html/kyphosis.html&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
fit &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)&lt;br /&gt;
fit2 &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis,&lt;br /&gt;
              parms = list(prior = c(.65,.35), split = &amp;quot;information&amp;quot;))&lt;br /&gt;
fit3 &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis,&lt;br /&gt;
              control = rpart.control(cp = 0.05))&lt;br /&gt;
par(mfrow = c(1,2), xpd = NA) # otherwise on some devices the text is clipped&lt;br /&gt;
plot(fit)&lt;br /&gt;
text(fit, use.n = TRUE)&lt;br /&gt;
plot(fit2)&lt;br /&gt;
text(fit2, use.n = TRUE)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Image:R-classification-tree.png|500px|thumbnail|center|Classification tree for the kyphosis data frame.]]&lt;br /&gt;
&lt;br /&gt;
==Solving Systems of Linear Equations==&lt;br /&gt;
&lt;br /&gt;
See, e.g.: https://source.ggy.bris.ac.uk/wiki/NumMethodsPDEs&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; A &amp;lt;- array(c(1,3,2,3,5,4,-2,6,3), dim=c(3,3))&lt;br /&gt;
&amp;gt; b &amp;lt;- c(5,7,8)&lt;br /&gt;
&amp;gt; solve(A,b)&lt;br /&gt;
[1] -15   8   2&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Suggested Exercises=&lt;br /&gt;
&lt;br /&gt;
If you would like to work through some exercises, with model answers included, you could take a look at:&lt;br /&gt;
* http://www2.warwick.ac.uk/fac/sci/statistics/staff/academic-research/reed/rexercises.pdf&lt;br /&gt;
&lt;br /&gt;
If you would prefer to noodle about with some real-world data, you could take a look at:&lt;br /&gt;
* http://www.theguardian.com/news/datablog/2010/oct/18/historic-government-spending-area#data&lt;br /&gt;
&lt;br /&gt;
=Writing Faster R Code=&lt;br /&gt;
&lt;br /&gt;
In the above sections we've introduced a number of features of R and have begun the journey to becoming a proficient and productive user of the language.  In the remaining sections, we'll switch tack and focus on a question commonly asked by those beginning to use R in anger--'''&amp;quot;My R code is slow.  How can I speed it up?&amp;quot;'''.  In this section we'll consider the related tasks of finding which bits of your R code is responsible for the majority of the run-time and what you can do about it.&lt;br /&gt;
&lt;br /&gt;
==Profiling &amp;amp; Timing==&lt;br /&gt;
&lt;br /&gt;
In order to remain productive (and sane, and have a social life...), it is essential that we first identify which portions of your R code are responsible for the majority of the run-time.  We could spend ages optimising a portion that we ''think'' may be running slowly, but computers have the gift(!) to constantly surprise us, and if that portion of your program accounted for, say, 10% of the run-time, then you will have sweated for absolutely no useful gain.&lt;br /&gt;
&lt;br /&gt;
The simplest method of investigation is to simply time the application of a function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
system.time(some.function())&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can get a more detailed analysis of a block of code using the built-in R profiler.  The general pattern of invocation is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
Rprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
# Do some work&lt;br /&gt;
Rprof()&lt;br /&gt;
summaryRprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For example, here's an R script, '''profile.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
Rprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
# Create a 10 x 100,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(100000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)})&lt;br /&gt;
Rprof()&lt;br /&gt;
summaryRprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Which I ran by typing:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
R CMD BATCH profile.r&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the output file, '''profile.r.Rout''', I found the following break down:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
               self.time self.pct total.time total.pct&lt;br /&gt;
&amp;quot;simpleLoess&amp;quot;       4.84    88.00       5.10     92.73&lt;br /&gt;
&amp;quot;rnorm&amp;quot;             0.22     4.00       0.22      4.00&lt;br /&gt;
&amp;quot;loess.smooth&amp;quot;      0.18     3.27       5.28     96.00&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The profile tells us that the function '''simpleLoess''' take 88% of the runtime, whereas '''rnorm''' takes only 4%.&lt;br /&gt;
&lt;br /&gt;
==Preallocation of Memory==&lt;br /&gt;
&lt;br /&gt;
As with other scripting languages, such as MATLAB, the simplest method that you can use to speed up your R code is to pre-allocate the storage for variables whenever possible.  To see the benefits of this, consider the following two functions:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; f1 &amp;lt;- function() {&lt;br /&gt;
+ v &amp;lt;- c()&lt;br /&gt;
+ for (i in 1:30000)&lt;br /&gt;
+   v[i] &amp;lt;- i^2&lt;br /&gt;
+ }&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; f2 &amp;lt;- function() {&lt;br /&gt;
+ v &amp;lt;- c(NA)&lt;br /&gt;
+ length(v) &amp;lt;- 30000&lt;br /&gt;
+ for (i in 1:30000)&lt;br /&gt;
+   v[i] &amp;lt;- i^2&lt;br /&gt;
+ }&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Timing calls to each of them shows that the pre-allocation of memory gives a whopping ~'''x30 speed-up'''.  Your mileage will vary depending upon the details of your code.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; system.time(f1())&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  1.720   0.040   1.762&lt;br /&gt;
&amp;gt; system.time(f2())&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.052   0.000   0.05&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Vectorised Operations==&lt;br /&gt;
&lt;br /&gt;
The other principle method for speeding up your R code is to eliminate loops whenever you can.  Many functions and operators in R will accept arrays as input, rather than just single values and this may allow you to not use a loop.  The examples in the previous section used for loops to step through an array, squaring each element.  However, you can achieve the same result far more quickly by passing the array ''en masse'' to exponentiation operator:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; system.time(v &amp;lt;- (1:1000000)^2)&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.024   0.004   0.026&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Here we've been able to square 1,000,000 items in half the time it took to process 30,000!&lt;br /&gt;
&lt;br /&gt;
==Calling Functions Written in a Compiled Language (e.g. C or Fortran)==&lt;br /&gt;
&lt;br /&gt;
Another way to get more speed is to outsource portions of R code that are found to be slow to a compiled language, such as C or Fortran.  A good starting point on this topic is:&lt;br /&gt;
&lt;br /&gt;
* http://mazamascience.com/WorkingWithData/?p=1067&lt;br /&gt;
&lt;br /&gt;
=R and HPC=&lt;br /&gt;
&lt;br /&gt;
If you've profiled your code and tried all that you can to speed it up, as described in the previous section, you might be interested in the various initiatives that exist to run R on high performance computers, such as bluecrsytal:  &lt;br /&gt;
&lt;br /&gt;
* http://cran.r-project.org/web/views/HighPerformanceComputing.html&lt;br /&gt;
&lt;br /&gt;
We will see in the following examples, the general approach to running R in parallel is to arrange your task so that a function is applied to a list of inputs, and then to split the list over several CPU cores or cluster worker nodes.&lt;br /&gt;
&lt;br /&gt;
==Multicore==&lt;br /&gt;
&lt;br /&gt;
The '''multicore''' package allows us to make use of several CPU cores within a single machine.  Note, however, that the package  does not work on a MS Windows computers.&lt;br /&gt;
&lt;br /&gt;
As an example, let's look at the use of the package's '''mclapply''' function, a multicore equivalent of R's built-in list apply mapper, '''lapply'''.   I saved the following commands into an R script called '''mutlicore.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(multicore)&lt;br /&gt;
# how many cores are present?&lt;br /&gt;
multicore:::detectCores()&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And used the following submission script to run it on bluecrystal phase2:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=00:00:05&lt;br /&gt;
&lt;br /&gt;
#! Ensure that we have the correct version of R loaded&lt;br /&gt;
module add languages/R-2.15.1&lt;br /&gt;
&lt;br /&gt;
#! change the working directory (default is home directory)&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
#! Run the R script&lt;br /&gt;
R CMD BATCH multicore.r&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After the job had run, I got the following output in the file '''multicore.r.Rout''':&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(multicore)&lt;br /&gt;
&amp;gt; # how many cores are present?&lt;br /&gt;
&amp;gt; multicore:::detectCores()&lt;br /&gt;
[1] 8&lt;br /&gt;
&amp;gt; # Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
&amp;gt; data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
&amp;gt; # Map a function over the matrix.  First in serial..&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.674   0.007   0.749 &lt;br /&gt;
&amp;gt; # .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.301   0.074   0.113 &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Rmpi==&lt;br /&gt;
&lt;br /&gt;
The '''Rmpi''' package allows us to create and use cohorts of message passing processes from within R.  It does so by providing an interface to the MPI (Message Passing Interface) library.&lt;br /&gt;
&lt;br /&gt;
In order to use the Rmpi package on BCp2, you will need the '''ofed/openmpi/gcc/64/1.4.2-qlc''' module loaded.&lt;br /&gt;
&lt;br /&gt;
Here's a short example that I saved as '''Rmpi.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(Rmpi)&lt;br /&gt;
# spawn as many slaves as possible&lt;br /&gt;
mpi.spawn.Rslaves()&lt;br /&gt;
mpi.remote.exec(mpi.get.processor.name())&lt;br /&gt;
mpi.remote.exec(runif(1))&lt;br /&gt;
mpi.close.Rslaves()&lt;br /&gt;
mpi.quit()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I submitted the job to BCp2 using the following submission script:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#PBS -l nodes=4:ppn=1,walltime=00:00:05&lt;br /&gt;
&lt;br /&gt;
#! Ensure that we have the correct version of R loaded&lt;br /&gt;
module add languages/R-2.15.1&lt;br /&gt;
&lt;br /&gt;
#! change the working directory (default is home directory)&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
#! Create a machine file (used for multi-node jobs)&lt;br /&gt;
cat $PBS_NODEFILE &amp;gt; machine.file.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
#! Disable PSM on the QLogic HCAs&lt;br /&gt;
export OMPI_MCA_mtl=^psm&lt;br /&gt;
&lt;br /&gt;
#! Run the R script&lt;br /&gt;
mpirun -np 1 -machinefile machine.file.$PBS_JOBID R CMD BATCH Rmpi.r&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and got the following output:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(Rmpi)&lt;br /&gt;
&amp;gt; # spawn as many slaves as possible&lt;br /&gt;
&amp;gt; mpi.spawn.Rslaves()&lt;br /&gt;
        4 slaves are spawned successfully. 0 failed.&lt;br /&gt;
master (rank 0, comm 1) of size 5 is running on: u03n074 &lt;br /&gt;
slave1 (rank 1, comm 1) of size 5 is running on: u03n098 &lt;br /&gt;
slave2 (rank 2, comm 1) of size 5 is running on: u04n029 &lt;br /&gt;
slave3 (rank 3, comm 1) of size 5 is running on: u04n030 &lt;br /&gt;
slave4 (rank 4, comm 1) of size 5 is running on: u03n074 &lt;br /&gt;
&amp;gt; mpi.remote.exec(mpi.get.processor.name())&lt;br /&gt;
$slave1&lt;br /&gt;
[1] &amp;quot;u03n098&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave2&lt;br /&gt;
[1] &amp;quot;u04n029&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave3&lt;br /&gt;
[1] &amp;quot;u04n030&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave4&lt;br /&gt;
[1] &amp;quot;u03n074&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;gt; mpi.remote.exec(runif(1))&lt;br /&gt;
         X1        X2        X3        X4&lt;br /&gt;
1 0.5154871 0.5154871 0.5154871 0.5154871&lt;br /&gt;
&amp;gt; mpi.close.Rslaves()&lt;br /&gt;
[1] 1&lt;br /&gt;
&amp;gt; mpi.quit()&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Snow==&lt;br /&gt;
&lt;br /&gt;
Calling MPI routines from within R may be too low level for many people to use comfortably.  Happily, the '''snow''' package provides a higher level abstraction for distributed memory programming from within R.&lt;br /&gt;
&lt;br /&gt;
Here's my example program that a saved as '''snow.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(snow)&lt;br /&gt;
# request a cluster of 3 worker nodes&lt;br /&gt;
cl &amp;lt;- makeCluster(3)&lt;br /&gt;
clusterCall(cl, function() Sys.info()[c(&amp;quot;nodename&amp;quot;,&amp;quot;machine&amp;quot;)])&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using snow, across a cluster of workers)&lt;br /&gt;
system.time(x &amp;lt;- clusterApply(cl, data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
stopCluster(cl)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I ran it on BCp2 using the same submission script given for Rmpi, save for changing Rmpi.r to snow.r.  The output was:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(snow)&lt;br /&gt;
&amp;gt; # request a cluster of 3 worker nodes&lt;br /&gt;
&amp;gt; cl &amp;lt;- makeCluster(3)&lt;br /&gt;
Loading required package: Rmpi&lt;br /&gt;
        3 slaves are spawned successfully. 0 failed.&lt;br /&gt;
&amp;gt; clusterCall(cl, function() Sys.info()[c(&amp;quot;nodename&amp;quot;,&amp;quot;machine&amp;quot;)])&lt;br /&gt;
[[1]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u01n105&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
[[2]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u02n014&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
[[3]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u03n098&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&amp;gt; # Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
&amp;gt; data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
&amp;gt; # Map a function over the matrix.  First in serial..&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.711   0.001   0.715 &lt;br /&gt;
&amp;gt; # .. and secondly in parallel (using snow, across a cluster of workers)&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- clusterApply(cl, data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.259   0.001   0.260 &lt;br /&gt;
&amp;gt; stopCluster(cl)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Parallel==&lt;br /&gt;
&lt;br /&gt;
The '''parallel''' package is an amalgamation of functionality from the multicore and snow packages.  The shared memory parallelism in this package runs on an MS Windows machine (unlike the multicore package). &lt;br /&gt;
&lt;br /&gt;
I trivial translation of our previous multicore example is:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(parallel)&lt;br /&gt;
# how many cores are present?&lt;br /&gt;
parallel:::detectCores()&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I have not been able to get a distributed memory cluster working on BCp2 using the parallel package.&lt;br /&gt;
&lt;br /&gt;
=Further Reading=&lt;br /&gt;
&lt;br /&gt;
* [http://shop.oreilly.com/product/9780596801717.do R in a Nutshell]&lt;br /&gt;
* [http://shop.oreilly.com/product/0636920021421.do Parallel R]&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=R1&amp;diff=9415</id>
		<title>R1</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=R1&amp;diff=9415"/>
		<updated>2014-03-07T12:14:29Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: Unprotected &amp;quot;R1&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[category:Pragmatic Programming]]&lt;br /&gt;
'''Open Source Statistics with R'''&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
R is a mature, open-source (i.e. free!) statistics package, with an intuitive interface, excellent graphics and a vibrant community constantly adding new methods for the statistical investigation of your data to the library of packages available.&lt;br /&gt;
&lt;br /&gt;
The goal of this tutorial is to introduce you to the R package, and not to be an introductory course in statistics.&lt;br /&gt;
&lt;br /&gt;
If you are working on a Linux system, you will typically start R from the command line.  On a Windows machine, or a Mac, you will typically start up R in some form of GUI.  However you get R started, you will have access to an R command prompt.  The good news is that the examples below will all work at the R command prompt, however you gained access to it.&lt;br /&gt;
&lt;br /&gt;
Further resources:&lt;br /&gt;
&lt;br /&gt;
* The R manual is a great resource for learning R: http://cran.r-project.org/doc/manuals/r-release/R-intro.pdf&lt;br /&gt;
* Some excellent examples of using R can also be found at: http://msenux.redwoods.edu/math/R/ and http://www.r-tutor.com/&lt;br /&gt;
&lt;br /&gt;
=Getting Started=&lt;br /&gt;
&lt;br /&gt;
The very simplest thing we can do with R is to perform some arithmetic at the command prompt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; phi &amp;lt;- (1+sqrt(5))/2&lt;br /&gt;
&amp;gt; phi&lt;br /&gt;
[1] 1.618034&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Parentheses are used to modify the usual order of precedence of the operators ('''/''' will typically be evaluated before '''+''').  Note the '''[1]''' accompanying the returned value.  All numbers entered at the console are interpreted as a vector.  The '[1]' indicates that the line in question is displaying the vector of values starting at first index.  We can use the handy sequence function to create a vector containing more than a single element:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; odds &amp;lt;- seq(from=1, to=67, by=2)&lt;br /&gt;
&amp;gt; odds&lt;br /&gt;
 [1]  1  3  5  7  9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49&lt;br /&gt;
[26] 51 53 55 57 59 61 63 65 67&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
From the above example, we can see that both the '''&amp;lt;-''' and '''=''' operators can be used for assignment.&lt;br /&gt;
&lt;br /&gt;
Vectors are commonly used data structures in R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
coords.bris &amp;lt;- c(51.5, 2.6)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As are matrices:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; magic &amp;lt;- matrix(data=c(2,7,6,9,5,1,4,3,8),nrow=3,ncol=3)&lt;br /&gt;
&amp;gt; magic&lt;br /&gt;
     [,1] [,2] [,3]&lt;br /&gt;
[1,]    2    9    4&lt;br /&gt;
[2,]    7    5    3&lt;br /&gt;
[3,]    6    1    8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the '''c''' function combines the arguments given in the parentheses.  We can access portions of the array using the syntax shown in the square brackets.  For example, we can access the first row using the '''[1,]''' notation, and similarly the second column using '''[,2]'''.  Since the square is 3x3 magic, the numbers in both slices should sum to 15:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; sum(magic[1,])&lt;br /&gt;
[1] 15&lt;br /&gt;
&amp;gt; sum(magic[,2])&lt;br /&gt;
[1] 15&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Single elements and ranges can also accessed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; magic[2,2]&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;gt; magic[2:3,2:3]&lt;br /&gt;
     [,1] [,2]&lt;br /&gt;
[1,]    5    3&lt;br /&gt;
[2,]    1    8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
R also provides '''arrays''', which have more than two dimensions, and '''lists''' to hold heterogeneous collections.&lt;br /&gt;
&lt;br /&gt;
An example list:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; list.r4 &amp;lt;- list(name=&amp;quot;Radio4&amp;quot;, frequency=&amp;quot;93.7&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The items of which, we can access in several ways:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; list.r4$frequency&lt;br /&gt;
[1] &amp;quot;93.7&amp;quot;&lt;br /&gt;
&amp;gt; list.r4[1]&lt;br /&gt;
$name&lt;br /&gt;
[1] &amp;quot;Radio4&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;gt; list.r4[[1]]&lt;br /&gt;
[1] &amp;quot;Radio4&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A very commonly used data structure is the '''data frame''', which R uses to store tabular data.  Given several vectors of equal length, we can collate them into a data frame:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; country &amp;lt;- c(&amp;quot;USA&amp;quot;, &amp;quot;China&amp;quot;, &amp;quot;GB&amp;quot;)&lt;br /&gt;
&amp;gt; gold &amp;lt;- c(46, 38, 29)&lt;br /&gt;
&amp;gt; silver &amp;lt;- c(29, 27, 17)&lt;br /&gt;
&amp;gt; bronze &amp;lt;- c(29, 23, 19)&lt;br /&gt;
&amp;gt; medals.2012 &amp;lt;- data.frame(country, gold, silver, bronze)&lt;br /&gt;
&amp;gt; medals.2012&lt;br /&gt;
    country  gold  silver  bronze&lt;br /&gt;
1       USA    46      29      29&lt;br /&gt;
2     China    38      27      23&lt;br /&gt;
3        GB    29      17      19&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can access columns of a data frame using the '''$''' operator:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; medals.2012$country&lt;br /&gt;
[1] USA   China GB   &lt;br /&gt;
Levels: China GB USA&lt;br /&gt;
&amp;gt; medals.2012$gold&lt;br /&gt;
[1] 46 38 29&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Standard Graphics: A taster=&lt;br /&gt;
&lt;br /&gt;
An aspect which makes R popular are it's graphing functions.  R also has some very handy built-in data sets--we'll use this to demonstrate just a small fraction of R's graphing abilities.&lt;br /&gt;
&lt;br /&gt;
First up is the humble '''plot()''' function.  Given a data frame of points, such as one charting the relationship between temperature and the vapour pressure of mercury, it will give us a (handily labelled) scatter plot: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(pressure)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See the gallery below for all the plots created in this section.&lt;br /&gt;
&lt;br /&gt;
The plot function will also accept a time-series (another class of object recognised by R) and will sensibly join the points with a line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(co2)&lt;br /&gt;
&amp;gt; class(co2)&lt;br /&gt;
[1] &amp;quot;ts&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Pie charts are easily constructed.  In this case, to show the relative proportions of electricity generated from different sources in the UK in 2011 (source: https://www.gov.uk/government/.../5942-uk-energy-in-brief-2012.pdf‎):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; uk.electricty.sources.2011 &amp;lt;- c(41,29,18,5,4,2,1)&lt;br /&gt;
&amp;gt; names(uk.electricty.sources.2011) &amp;lt;- (&amp;quot;Gas&amp;quot;, &amp;quot;Coal&amp;quot;, &amp;quot;Nuclear&amp;quot;, &amp;quot;Hydro &amp;amp; other&amp;quot;, &amp;quot;Wind&amp;quot;, &amp;quot;Imports&amp;quot;, &amp;quot;Oil&amp;quot;)&lt;br /&gt;
&amp;gt; pie(uk.electricty.sources.2011, main=&amp;quot;UK Electricty Generating Mix, 2011&amp;quot;, col=rainbow(7))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Next, let's create a bar chart of monthly average precipitation falling here in the fair city of Bristol (source: http://www.worldweatheronline.com):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; bristol.precip &amp;lt;- c(82.9, 56.1, 59.2, 69, 50.8, 50.9, 50.8, 74.8, 74.7, 91.1, 94.5, 93.6)&lt;br /&gt;
&amp;gt; names(bristol.precip) &amp;lt;- c(&amp;quot;Jan&amp;quot;, &amp;quot;Feb&amp;quot;, &amp;quot;Mar&amp;quot;, &amp;quot;Apr&amp;quot;, &amp;quot;May&amp;quot;, &amp;quot;Jun&amp;quot;, &amp;quot;Jul&amp;quot;, &amp;quot;Aug&amp;quot;, &amp;quot;Sep&amp;quot;, &amp;quot;Oct&amp;quot;, &amp;quot;Nov&amp;quot;, &amp;quot;Dec&amp;quot;)&lt;br /&gt;
&amp;gt; barplot(bristol.precip,&lt;br /&gt;
+ main=&amp;quot;Average Monthly Precipitation in Bristol&amp;quot;,&lt;br /&gt;
+ ylab=&amp;quot;Mean precipitation (mm)&amp;quot;,&lt;br /&gt;
+ ylim=c(0,100),&lt;br /&gt;
+ col=c(&amp;quot;darkblue&amp;quot;))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Box_plot 'Box and whisker' plots] are useful ways to graph the quartiles of some data.  In this case, the fuel efficiencies of various US cars, circa 1974:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; boxplot(mpg~cyl,data=mtcars, main=&amp;quot;Car Milage Data&amp;quot;,&lt;br /&gt;
+    xlab=&amp;quot;Number of Cylinders&amp;quot;, ylab=&amp;quot;Miles Per Gallon&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
R includes a very useful help facility.  In the case of the '''filled.contour()''' plotting function, the help page includes an example of it's use to plot the topology of a volcano in Auckland, NZ:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; ?filled.countour&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;gallery widths=300px heights=300px perrow=3&amp;gt;&lt;br /&gt;
File:Vapour-pressure.png|Vapour pressure of mercury against temperature&lt;br /&gt;
File:Mauna-loa.png|CO2 concentrations measured at Mauna-Loa between 1959 and 1997&lt;br /&gt;
File:Pie.png|The UK's electricity generating mix, 2011&lt;br /&gt;
File:Barplot.png|Average monthly precipitation in Bristol&lt;br /&gt;
File:Boxplot.png|Range of fuel efficiencies for different engine sizes &lt;br /&gt;
File:Maunga-Whau.png|Topology of Maunga Whau volcano in Auckland&lt;br /&gt;
&amp;lt;/gallery&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are many more example plots--complete with the R code required to create the plots (at the bottom of the page, after the comments)--on the following web page:&lt;br /&gt;
* http://gallery.r-enthusiasts.com/thumbs.php&lt;br /&gt;
&lt;br /&gt;
=Loops=&lt;br /&gt;
&lt;br /&gt;
A simple '''for''' loop:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; for (ii in seq(1,10)) print(ii)&lt;br /&gt;
[1] 1&lt;br /&gt;
[1] 2&lt;br /&gt;
[1] 3&lt;br /&gt;
[1] 4&lt;br /&gt;
[1] 5&lt;br /&gt;
[1] 6&lt;br /&gt;
[1] 7&lt;br /&gt;
[1] 8&lt;br /&gt;
[1] 9&lt;br /&gt;
[1] 10&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Some more exotic counting:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; for (ii in seq(from=10, to=0, by=-2)) print(ii)&lt;br /&gt;
[1] 10&lt;br /&gt;
[1] 8&lt;br /&gt;
[1] 6&lt;br /&gt;
[1] 4&lt;br /&gt;
[1] 2&lt;br /&gt;
[1] 0&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''while''' loops are for when we don't know the number of iterations in advance:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; ii &amp;lt;- runif(1,0,1)&lt;br /&gt;
&amp;gt; ii&lt;br /&gt;
[1] 0.3998513&lt;br /&gt;
&amp;gt; while (ii &amp;lt; 0.5) {print(ii); ii &amp;lt;- runif(1,0,1)}&lt;br /&gt;
[1] 0.3998513&lt;br /&gt;
[1] 0.05469244&lt;br /&gt;
&amp;gt; ii&lt;br /&gt;
[1] 0.8265036&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Functions=&lt;br /&gt;
&lt;br /&gt;
You can define your own functions in R, using the '''function''' keyword.  For example, Pythagoras' Theorem:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypotenuse &amp;lt;- function(x, y) {sqrt(x^2 + y^2)}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The braces ({}) are optional, but add clarity.&lt;br /&gt;
&lt;br /&gt;
To call the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypotenuse(3,4)&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can provide default values for the arguments, which can be overridden for any given invocation of the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot2 &amp;lt;- function(x=3 ,y=4) {sqrt(x^2 + y^2)}&lt;br /&gt;
&amp;gt; hypot2()&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;gt; hypot2(12,16)&lt;br /&gt;
[1] 20&lt;br /&gt;
&amp;gt; hypot2(y=16, x=12)&lt;br /&gt;
[1] 20&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can see that the order of the arguments is respected, unless the names are given, in which case the order can be changed. &lt;br /&gt;
&lt;br /&gt;
Longer functions can be spread over several lines.  We can also use the '''return''' keyword to control which value is returned by the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot3 &amp;lt;- function(x=3 ,y=4) {&lt;br /&gt;
+ x_sq &amp;lt;- x^2&lt;br /&gt;
+ y_sq &amp;lt;- y^2&lt;br /&gt;
+ return( sqrt(x_sq + y_sq) )}&lt;br /&gt;
&amp;gt; hypot3(6,8)&lt;br /&gt;
[1] 10&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can check on the contents of a function, by just typing it's name (without parentheses):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot3&lt;br /&gt;
function(x=3 ,y=4) {&lt;br /&gt;
x_sq &amp;lt;- x^2&lt;br /&gt;
y_sq &amp;lt;- y^2&lt;br /&gt;
return( sqrt(x_sq + y_sq) )}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or just check the arguments, using the '''args''' function.  (The body of the function in general is reported as NULL):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; args(hypot3)&lt;br /&gt;
function (x = 3, y = 4) &lt;br /&gt;
NULL&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Packages=&lt;br /&gt;
&lt;br /&gt;
Listed at http://cran.r-project.org/&lt;br /&gt;
&lt;br /&gt;
Let's install the '''multicore''' package, that will give us access to functions within R which will run on the multiple processors which we often find in our computers these days:   &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; install.packages(&amp;quot;multicore&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Et voila!  It is done.&lt;br /&gt;
&lt;br /&gt;
We can check which packages are currently loaded into the library available from our workspace:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; library()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If we need to add one, we type e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; library(multicore)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now, an example of using a function from the multicore package.  The '''lapply''' function, which is included in the standard R core, will map a given function over a list inputs, giving a list of the function outputs in return.  For example, we can map a squaring function over the list of integers from 1 to 3:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; lapply(1:3, function(x) {x^2})&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
which gives us the list:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[[1]]&lt;br /&gt;
[1] 1&lt;br /&gt;
&lt;br /&gt;
[[2]]&lt;br /&gt;
[1] 4&lt;br /&gt;
&lt;br /&gt;
[[3]]&lt;br /&gt;
[1] 9&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now, we can do the same work in parallel using:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; mclapply(1:3, function(x) {x^2})&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Reading Data from File=&lt;br /&gt;
&lt;br /&gt;
R provides some very useful functions for reading and writing data from/to file.&lt;br /&gt;
&lt;br /&gt;
==Text Files==&lt;br /&gt;
&lt;br /&gt;
Let's start with text files.  If your data is organised into a file such that it looks like a table with column headings:  &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Perhaps the simplest one is '''read.table()'''.  If I have a text file with the following contents:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
country              gold silver bronze&lt;br /&gt;
&amp;quot;USA&amp;quot;                46   29     29&lt;br /&gt;
&amp;quot;China&amp;quot;              38   27     23&lt;br /&gt;
&amp;quot;Great Britain&amp;quot;      29   17     19&lt;br /&gt;
&amp;quot;Russian Federation&amp;quot; 24   26     32&lt;br /&gt;
&amp;quot;Republic of Korea&amp;quot;  13   8      7&lt;br /&gt;
&amp;quot;Germany&amp;quot;            11   19     14&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It will be a simple matter to use the '''read.table()''' function to load the data into R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; medals.2012 &amp;lt;- read.table(&amp;quot;medals.txt&amp;quot;, header=TRUE)&lt;br /&gt;
&amp;gt; medals.2012&lt;br /&gt;
             country gold silver bronze&lt;br /&gt;
1                USA   46     29     29&lt;br /&gt;
2              China   38     27     23&lt;br /&gt;
3      Great Britain   29     17     19&lt;br /&gt;
4 Russian Federation   24     26     32&lt;br /&gt;
5  Republic of Korea   13      8      7&lt;br /&gt;
6            Germany   11     19     14&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is a corresponding '''write.table()''' function to export the contents of a data frame into a text file.&lt;br /&gt;
&lt;br /&gt;
CSV files can be easily handled by specifying '''sep=&amp;quot;,&amp;quot;''' as an argument to read.table().  However, for convenience, there are also '''read.csv()''' and '''write.csv()''' functions defined.  For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; write.csv(medals.2012,&amp;quot;medals.csv&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Gives us the file, '''medals.csv''', with the contents:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;quot;&amp;quot;,&amp;quot;country&amp;quot;,&amp;quot;gold&amp;quot;,&amp;quot;silver&amp;quot;,&amp;quot;bronze&amp;quot;&lt;br /&gt;
&amp;quot;1&amp;quot;,&amp;quot;USA&amp;quot;,46,29,29&lt;br /&gt;
&amp;quot;2&amp;quot;,&amp;quot;China&amp;quot;,38,27,23&lt;br /&gt;
&amp;quot;3&amp;quot;,&amp;quot;Great Britain&amp;quot;,29,17,19&lt;br /&gt;
&amp;quot;4&amp;quot;,&amp;quot;Russian Federation&amp;quot;,24,26,32&lt;br /&gt;
&amp;quot;5&amp;quot;,&amp;quot;Republic of Korea&amp;quot;,13,8,7&lt;br /&gt;
&amp;quot;6&amp;quot;,&amp;quot;Germany&amp;quot;,11,19,14&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Binary Files==&lt;br /&gt;
&lt;br /&gt;
The '''save()''' function will store an R data structure in binary form:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; save(medals.2012,file=&amp;quot;medals.RData&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~$ file medals.RData &lt;br /&gt;
medals.RData: gzip compressed data, from Unix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is, of course, a corresponding function to load such data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; load(&amp;quot;medals.RData&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Databases==&lt;br /&gt;
&lt;br /&gt;
If you would like to read and write data directly from/to a database, there are several packages to help you.  See http://cran.r-project.org/doc/manuals/r-release/R-data.html#Relational-databases for more information.&lt;br /&gt;
&lt;br /&gt;
==NetCDF==&lt;br /&gt;
&lt;br /&gt;
The [http://cran.r-project.org/web/packages/ncdf/index.html '''ncdf''' package] provides an interface to NetCDF files.  Before installing the package, you will need the Unidata NetCDF libraries installed on your system.  On Linux, the standard package managers conveniently provide this.  Note that you will need the 'development' packages.  Once the prerequisites are satisfied, you can use the standard R command to install the package from CRAN:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; install.packages(&amp;quot;ncdf&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Examples of Common Tasks=&lt;br /&gt;
&lt;br /&gt;
==Preparing Data==&lt;br /&gt;
&lt;br /&gt;
===Sorting===&lt;br /&gt;
&lt;br /&gt;
Using '''sort''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; railway.engines &amp;lt;- c(&amp;quot;thomas&amp;quot;, &amp;quot;henry&amp;quot;, &amp;quot;gordon&amp;quot;, &amp;quot;edward&amp;quot;, &amp;quot;james&amp;quot;)&lt;br /&gt;
&amp;gt; sort(railway.engines)&lt;br /&gt;
[1] &amp;quot;edward&amp;quot; &amp;quot;gordon&amp;quot; &amp;quot;henry&amp;quot;  &amp;quot;james&amp;quot;  &amp;quot;thomas&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/sort.html&lt;br /&gt;
&lt;br /&gt;
===Random Sampling===&lt;br /&gt;
&lt;br /&gt;
Using '''sample''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; railway.engines &amp;lt;- c(&amp;quot;thomas&amp;quot;, &amp;quot;henry&amp;quot;, &amp;quot;gordon&amp;quot;, &amp;quot;edward&amp;quot;, &amp;quot;james&amp;quot;)&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;gordon&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;james&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;edward&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;thomas&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;gordon&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;james&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/sample.html&lt;br /&gt;
&lt;br /&gt;
===Combining===&lt;br /&gt;
&lt;br /&gt;
Using '''rbind''' to add combine the rows to two data frames:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; country &amp;lt;- c(&amp;quot;France&amp;quot;, &amp;quot;Italy&amp;quot;, &amp;quot;Hungary&amp;quot;, &amp;quot;Australia&amp;quot;)&lt;br /&gt;
&amp;gt; gold &amp;lt;- c(11, 8, 8, 7)&lt;br /&gt;
&amp;gt; silver &amp;lt;- c(11, 9, 4, 16)&lt;br /&gt;
&amp;gt; bronze &amp;lt;- c(12, 11, 5, 12)&lt;br /&gt;
&amp;gt; extras.2012 &amp;lt;- data.frame(country, gold, silver, bronze)&lt;br /&gt;
&amp;gt; rbind(medals.2012, extras.2012)&lt;br /&gt;
              country gold silver bronze&lt;br /&gt;
1                 USA   46     29     29&lt;br /&gt;
2               China   38     27     23&lt;br /&gt;
3       Great Britain   29     17     19&lt;br /&gt;
4  Russian Federation   24     26     32&lt;br /&gt;
5   Republic of Korea   13      8      7&lt;br /&gt;
6             Germany   11     19     14&lt;br /&gt;
7              France   11     11     12&lt;br /&gt;
8               Italy    8      9     11&lt;br /&gt;
9             Hungary    8      4      5&lt;br /&gt;
10          Australia    7     16     12&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/cbind.html&lt;br /&gt;
&lt;br /&gt;
===Binning Data===&lt;br /&gt;
&lt;br /&gt;
Using '''cut''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; girls_2=c(83.8, 86.2, 85.1, 88.6, 83, 88.9, 89.7, 81.3, 88.7, 88.4)&lt;br /&gt;
&amp;gt; bins=cut(girls_2, breaks=3)&lt;br /&gt;
&amp;gt; bins&lt;br /&gt;
 [1] (81.3,84.1] (84.1,86.9] (84.1,86.9] (86.9,89.7] (81.3,84.1] (86.9,89.7]&lt;br /&gt;
 [7] (86.9,89.7] (81.3,84.1] (86.9,89.7] (86.9,89.7]&lt;br /&gt;
Levels: (81.3,84.1] (84.1,86.9] (86.9,89.7]&lt;br /&gt;
&amp;gt; plot(bins)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Plotting the data couldn't be simpler with '''plot(bins)'''!&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/cut.html&lt;br /&gt;
&lt;br /&gt;
==Linear Regression==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(cars)&lt;br /&gt;
&amp;gt; res=lm(dist ~ speed, data=cars)&lt;br /&gt;
&amp;gt; abline(res)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Image:R-lm(cars)-abline.png|400px|thumbnail|center|linear regression of stopping distance against speed from the built-in data set, cars]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
'''Exercises'''&lt;br /&gt;
* You may wish to compare different methods of estimation.  From the MASS package, you can fit a line with the '''rlm''' and '''lqs'' funtions.  You can plot all the lines against the data using:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; abline(res.lm, lty=1)&lt;br /&gt;
&amp;gt; abline(res.rlm, lty=2)&lt;br /&gt;
&amp;gt; abline(res.lqs, lty=3)&lt;br /&gt;
&amp;gt; legend(x=5, y=100, legend=c(&amp;quot;lm&amp;quot;,&amp;quot;rlm&amp;quot;,&amp;quot;lqs&amp;quot;), lty=c(1,2,3))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-patched/library/MASS/html/rlm.html and http://stat.ethz.ch/R-manual/R-devel/RHOME/library/MASS/html/lqs.html.&lt;br /&gt;
&lt;br /&gt;
* Weighted least squares.  The '''lm''' function will accept a vector of weights, '''lm(... weights=...)'''.  If given, the function will optimise the line of best fit according a the equation of weighted least squares.  Experiment with different linear model fits, given different weighting vectors.  Some handy hints for creating a vector of weights:&lt;br /&gt;
** '''w1&amp;lt;-rep(0.1,50)''' will give you a vector, length 50, where each element has a value of 0.1.  W1[1]&amp;lt;-10 will give the first element of the vector a value of 10.&lt;br /&gt;
** '''w2&amp;lt;-seq(from=0.02, to=1.0, by=0.02)''' provides a vector containing a sequence of values from 0.02 to 1.0 in steps of 0.02 (handily, again 50 in total).&lt;br /&gt;
&lt;br /&gt;
==Significance Testing==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; boys_2=c(90.2, 91.4, 86.4, 87.6, 86.7, 88.1, 82.2, 83.8, 91, 87.4)&lt;br /&gt;
&amp;gt; girls_2=c(83.8, 86.2, 85.1, 88.6, 83, 88.9, 89.7, 81.3, 88.7, 88.4)&lt;br /&gt;
&amp;gt; res=var.test(boys_2,girls_2)&lt;br /&gt;
&amp;gt; res&lt;br /&gt;
&lt;br /&gt;
	F test to compare two variances&lt;br /&gt;
&lt;br /&gt;
data:  boys_2 and girls_2 &lt;br /&gt;
F = 1.0186, num df = 9, denom df = 9, p-value = 0.9786&lt;br /&gt;
alternative hypothesis: true ratio of variances is not equal to 1 &lt;br /&gt;
95 percent confidence interval:&lt;br /&gt;
 0.2529956 4.1007126 &lt;br /&gt;
sample estimates:&lt;br /&gt;
ratio of variances &lt;br /&gt;
          1.018559 &lt;br /&gt;
&amp;gt; res=t.test(boys_2, girls_2, var.equal=TRUE, paired=FALSE)&lt;br /&gt;
&amp;gt; res&lt;br /&gt;
&lt;br /&gt;
	Two Sample t-test&lt;br /&gt;
&lt;br /&gt;
data:  boys_2 and girls_2 &lt;br /&gt;
t = 0.8429, df = 18, p-value = 0.4103&lt;br /&gt;
alternative hypothesis: true difference in means is not equal to 0 &lt;br /&gt;
95 percent confidence interval:&lt;br /&gt;
 -1.656675  3.876675 &lt;br /&gt;
sample estimates:&lt;br /&gt;
mean of x mean of y &lt;br /&gt;
    87.48     86.3&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Classification==&lt;br /&gt;
&lt;br /&gt;
===k Nearest Neighbours===&lt;br /&gt;
&lt;br /&gt;
This famous (Fisher's or Anderson's) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa (s), versicolor (c), and virginica (v).&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-patched/library/datasets/html/iris.html&lt;br /&gt;
&lt;br /&gt;
k-nearest neighbour classification for test set from training set: For each row of the test set, the k nearest (in Euclidean distance) training set vectors are found, and the classification is decided by majority vote, with ties broken at random. If there are ties for the kth nearest vector, all candidates are included in the vote.&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/class/html/knn.html&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(class)&lt;br /&gt;
train &amp;lt;- rbind(iris3[1:25,,1], iris3[1:25,,2], iris3[1:25,,3])&lt;br /&gt;
test &amp;lt;- rbind(iris3[26:50,,1], iris3[26:50,,2], iris3[26:50,,3])&lt;br /&gt;
cl &amp;lt;- factor(c(rep(&amp;quot;s&amp;quot;,25), rep(&amp;quot;c&amp;quot;,25), rep(&amp;quot;v&amp;quot;,25)))&lt;br /&gt;
iris3.knn &amp;lt;- knn(train, test, cl, k = 3, prob=TRUE)&lt;br /&gt;
table(predicted=iris3.knn, actual=cl)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
How did we do?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
         actual&lt;br /&gt;
predicted  c  s  v&lt;br /&gt;
        c 23  0  3&lt;br /&gt;
        s  0 25  0&lt;br /&gt;
        v  2  0 22&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Classification Trees===&lt;br /&gt;
&lt;br /&gt;
The kyphosis data frame has 81 rows and 4 columns. representing data on children who have had corrective spinal surgery.&lt;br /&gt;
&lt;br /&gt;
This data frame contains the following columns:&lt;br /&gt;
* Kyphosis: a factor with levels absent present indicating if a kyphosis (a type of deformation) was present after the operation.&lt;br /&gt;
* Age: in months&lt;br /&gt;
* Number: the number of vertebrae involved&lt;br /&gt;
* Start: the number of the first (topmost) vertebra operated on.&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/rpart/html/kyphosis.html&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
fit &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)&lt;br /&gt;
fit2 &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis,&lt;br /&gt;
              parms = list(prior = c(.65,.35), split = &amp;quot;information&amp;quot;))&lt;br /&gt;
fit3 &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis,&lt;br /&gt;
              control = rpart.control(cp = 0.05))&lt;br /&gt;
par(mfrow = c(1,2), xpd = NA) # otherwise on some devices the text is clipped&lt;br /&gt;
plot(fit)&lt;br /&gt;
text(fit, use.n = TRUE)&lt;br /&gt;
plot(fit2)&lt;br /&gt;
text(fit2, use.n = TRUE)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Image:R-classification-tree.png|500px|thumbnail|center|Classification tree for the kyphosis data frame.]]&lt;br /&gt;
&lt;br /&gt;
==Solving Systems of Linear Equations==&lt;br /&gt;
&lt;br /&gt;
See, e.g.: https://source.ggy.bris.ac.uk/wiki/NumMethodsPDEs&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; A &amp;lt;- array(c(1,3,2,3,5,4,-2,6,3), dim=c(3,3))&lt;br /&gt;
&amp;gt; b &amp;lt;- c(5,7,8)&lt;br /&gt;
&amp;gt; solve(A,b)&lt;br /&gt;
[1] -15   8   2&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Suggested Exercises=&lt;br /&gt;
&lt;br /&gt;
If you would like to work through some exercises, with model answers included, you could take a look at:&lt;br /&gt;
* http://www2.warwick.ac.uk/fac/sci/statistics/staff/academic-research/reed/rexercises.pdf&lt;br /&gt;
&lt;br /&gt;
If you would prefer to noodle about with some real-world data, you could take a look at:&lt;br /&gt;
* http://www.theguardian.com/news/datablog/2010/oct/18/historic-government-spending-area#data&lt;br /&gt;
&lt;br /&gt;
=Writing Faster R Code=&lt;br /&gt;
&lt;br /&gt;
In the above sections we've introduced a number of features of R and have begun the journey to becoming a proficient and productive user of the language.  In the remaining sections, we'll switch tack and focus on a question commonly asked by those beginning to use R in anger--'''&amp;quot;My R code is slow.  How can I speed it up?&amp;quot;'''.  In this section we'll consider the related tasks of finding which bits of your R code is responsible for the majority of the run-time and what you can do about it.&lt;br /&gt;
&lt;br /&gt;
==Profiling &amp;amp; Timing==&lt;br /&gt;
&lt;br /&gt;
In order to remain productive (and sane, and have a social life...), it is essential that we first identify which portions of your R code are responsible for the majority of the run-time.  We could spend ages optimising a portion that we ''think'' may be running slowly, but computers have the gift(!) to constantly surprise us, and if that portion of your program accounted for, say, 10% of the run-time, then you will have sweated for absolutely no useful gain.&lt;br /&gt;
&lt;br /&gt;
The simplest method of investigation is to simply time the application of a function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
system.time(some.function())&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can get a more detailed analysis of a block of code using the built-in R profiler.  The general pattern of invocation is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
Rprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
# Do some work&lt;br /&gt;
Rprof()&lt;br /&gt;
summaryRprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For example, here's an R script, '''profile.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
Rprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
# Create a 10 x 100,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(100000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)})&lt;br /&gt;
Rprof()&lt;br /&gt;
summaryRprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Which I ran by typing:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
R CMD BATCH profile.r&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the output file, '''profile.r.Rout''', I found the following break down:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
               self.time self.pct total.time total.pct&lt;br /&gt;
&amp;quot;simpleLoess&amp;quot;       4.84    88.00       5.10     92.73&lt;br /&gt;
&amp;quot;rnorm&amp;quot;             0.22     4.00       0.22      4.00&lt;br /&gt;
&amp;quot;loess.smooth&amp;quot;      0.18     3.27       5.28     96.00&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The profile tells us that the function '''simpleLoess''' take 88% of the runtime, whereas '''rnorm''' takes only 4%.&lt;br /&gt;
&lt;br /&gt;
==Preallocation of Memory==&lt;br /&gt;
&lt;br /&gt;
As with other scripting languages, such as MATLAB, the simplest method that you can use to speed up your R code is to pre-allocate the storage for variables whenever possible.  To see the benefits of this, consider the following two functions:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; f1 &amp;lt;- function() {&lt;br /&gt;
+ v &amp;lt;- c()&lt;br /&gt;
+ for (i in 1:30000)&lt;br /&gt;
+   v[i] &amp;lt;- i^2&lt;br /&gt;
+ }&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; f2 &amp;lt;- function() {&lt;br /&gt;
+ v &amp;lt;- c(NA)&lt;br /&gt;
+ length(v) &amp;lt;- 30000&lt;br /&gt;
+ for (i in 1:30000)&lt;br /&gt;
+   v[i] &amp;lt;- i^2&lt;br /&gt;
+ }&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Timing calls to each of them shows that the pre-allocation of memory gives a whopping ~'''x30 speed-up'''.  Your mileage will vary depending upon the details of your code.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; system.time(f1())&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  1.720   0.040   1.762&lt;br /&gt;
&amp;gt; system.time(f2())&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.052   0.000   0.05&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Vectorised Operations==&lt;br /&gt;
&lt;br /&gt;
The other principle method for speeding up your R code is to eliminate loops whenever you can.  Many functions and operators in R will accept arrays as input, rather than just single values and this may allow you to not use a loop.  The examples in the previous section used for loops to step through an array, squaring each element.  However, you can achieve the same result far more quickly by passing the array ''en masse'' to exponentiation operator:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; system.time(v &amp;lt;- (1:1000000)^2)&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.024   0.004   0.026&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Here we've been able to square 1,000,000 items in half the time it took to process 30,000!&lt;br /&gt;
&lt;br /&gt;
==Calling Functions Written in a Compiled Language (e.g. C or Fortran)==&lt;br /&gt;
&lt;br /&gt;
Another way to get more speed is to outsource portions of R code that are found to be slow to a compiled language, such as C or Fortran.  A good starting point on this topic is:&lt;br /&gt;
&lt;br /&gt;
* http://mazamascience.com/WorkingWithData/?p=1067&lt;br /&gt;
&lt;br /&gt;
=R and HPC=&lt;br /&gt;
&lt;br /&gt;
If you've profiled your code and tried all that you can to speed it up, as described in the previous section, you might be interested in the various initiatives that exist to run R on high performance computers, such as bluecrsytal:  &lt;br /&gt;
&lt;br /&gt;
* http://cran.r-project.org/web/views/HighPerformanceComputing.html&lt;br /&gt;
&lt;br /&gt;
We will see in the following examples, the general approach to running R in parallel is to arrange your task so that a function is applied to a list of inputs, and then to split the list over several CPU cores or cluster worker nodes.&lt;br /&gt;
&lt;br /&gt;
==Multicore==&lt;br /&gt;
&lt;br /&gt;
The '''multicore''' package allows us to make use of several CPU cores within a single machine.  Note, however, that the package  does not work on a MS Windows computers.&lt;br /&gt;
&lt;br /&gt;
As an example, let's look at the use of the package's '''mclapply''' function, a multicore equivalent of R's built-in list apply mapper, '''lapply'''.   I saved the following commands into an R script called '''mutlicore.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(multicore)&lt;br /&gt;
# how many cores are present?&lt;br /&gt;
multicore:::detectCores()&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And used the following submission script to run it on bluecrystal phase2:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=00:00:05&lt;br /&gt;
&lt;br /&gt;
#! Ensure that we have the correct version of R loaded&lt;br /&gt;
module add languages/R-2.15.1&lt;br /&gt;
&lt;br /&gt;
#! change the working directory (default is home directory)&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
#! Run the R script&lt;br /&gt;
R CMD BATCH multicore.r&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After the job had run, I got the following output in the file '''multicore.r.Rout''':&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(multicore)&lt;br /&gt;
&amp;gt; # how many cores are present?&lt;br /&gt;
&amp;gt; multicore:::detectCores()&lt;br /&gt;
[1] 8&lt;br /&gt;
&amp;gt; # Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
&amp;gt; data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
&amp;gt; # Map a function over the matrix.  First in serial..&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.674   0.007   0.749 &lt;br /&gt;
&amp;gt; # .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.301   0.074   0.113 &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Rmpi==&lt;br /&gt;
&lt;br /&gt;
The '''Rmpi''' package allows us to create and use cohorts of message passing processes from within R.  It does so by providing an interface to the MPI (Message Passing Interface) library.&lt;br /&gt;
&lt;br /&gt;
In order to use the Rmpi package on BCp2, you will need the '''ofed/openmpi/gcc/64/1.4.2-qlc''' module loaded.&lt;br /&gt;
&lt;br /&gt;
Here's a short example that I saved as '''Rmpi.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(Rmpi)&lt;br /&gt;
# spawn as many slaves as possible&lt;br /&gt;
mpi.spawn.Rslaves()&lt;br /&gt;
mpi.remote.exec(mpi.get.processor.name())&lt;br /&gt;
mpi.remote.exec(runif(1))&lt;br /&gt;
mpi.close.Rslaves()&lt;br /&gt;
mpi.quit()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I submitted the job to BCp2 using the following submission script:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#PBS -l nodes=4:ppn=1,walltime=00:00:05&lt;br /&gt;
&lt;br /&gt;
#! Ensure that we have the correct version of R loaded&lt;br /&gt;
module add languages/R-2.15.1&lt;br /&gt;
&lt;br /&gt;
#! change the working directory (default is home directory)&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
#! Create a machine file (used for multi-node jobs)&lt;br /&gt;
cat $PBS_NODEFILE &amp;gt; machine.file.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
#! Disable PSM on the QLogic HCAs&lt;br /&gt;
export OMPI_MCA_mtl=^psm&lt;br /&gt;
&lt;br /&gt;
#! Run the R script&lt;br /&gt;
mpirun -np 1 -machinefile machine.file.$PBS_JOBID R CMD BATCH Rmpi.r&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and got the following output:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(Rmpi)&lt;br /&gt;
&amp;gt; # spawn as many slaves as possible&lt;br /&gt;
&amp;gt; mpi.spawn.Rslaves()&lt;br /&gt;
        4 slaves are spawned successfully. 0 failed.&lt;br /&gt;
master (rank 0, comm 1) of size 5 is running on: u03n074 &lt;br /&gt;
slave1 (rank 1, comm 1) of size 5 is running on: u03n098 &lt;br /&gt;
slave2 (rank 2, comm 1) of size 5 is running on: u04n029 &lt;br /&gt;
slave3 (rank 3, comm 1) of size 5 is running on: u04n030 &lt;br /&gt;
slave4 (rank 4, comm 1) of size 5 is running on: u03n074 &lt;br /&gt;
&amp;gt; mpi.remote.exec(mpi.get.processor.name())&lt;br /&gt;
$slave1&lt;br /&gt;
[1] &amp;quot;u03n098&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave2&lt;br /&gt;
[1] &amp;quot;u04n029&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave3&lt;br /&gt;
[1] &amp;quot;u04n030&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave4&lt;br /&gt;
[1] &amp;quot;u03n074&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;gt; mpi.remote.exec(runif(1))&lt;br /&gt;
         X1        X2        X3        X4&lt;br /&gt;
1 0.5154871 0.5154871 0.5154871 0.5154871&lt;br /&gt;
&amp;gt; mpi.close.Rslaves()&lt;br /&gt;
[1] 1&lt;br /&gt;
&amp;gt; mpi.quit()&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Snow==&lt;br /&gt;
&lt;br /&gt;
Calling MPI routines from within R may be too low level for many people to use comfortably.  Happily, the '''snow''' package provides a higher level abstraction for distributed memory programming from within R.&lt;br /&gt;
&lt;br /&gt;
Here's my example program that a saved as '''snow.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(snow)&lt;br /&gt;
# request a cluster of 3 worker nodes&lt;br /&gt;
cl &amp;lt;- makeCluster(3)&lt;br /&gt;
clusterCall(cl, function() Sys.info()[c(&amp;quot;nodename&amp;quot;,&amp;quot;machine&amp;quot;)])&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using snow, across a cluster of workers)&lt;br /&gt;
system.time(x &amp;lt;- clusterApply(cl, data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
stopCluster(cl)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I ran it on BCp2 using the same submission script given for Rmpi, save for changing Rmpi.r to snow.r.  The output was:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(snow)&lt;br /&gt;
&amp;gt; # request a cluster of 3 worker nodes&lt;br /&gt;
&amp;gt; cl &amp;lt;- makeCluster(3)&lt;br /&gt;
Loading required package: Rmpi&lt;br /&gt;
        3 slaves are spawned successfully. 0 failed.&lt;br /&gt;
&amp;gt; clusterCall(cl, function() Sys.info()[c(&amp;quot;nodename&amp;quot;,&amp;quot;machine&amp;quot;)])&lt;br /&gt;
[[1]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u01n105&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
[[2]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u02n014&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
[[3]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u03n098&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&amp;gt; # Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
&amp;gt; data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
&amp;gt; # Map a function over the matrix.  First in serial..&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.711   0.001   0.715 &lt;br /&gt;
&amp;gt; # .. and secondly in parallel (using snow, across a cluster of workers)&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- clusterApply(cl, data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.259   0.001   0.260 &lt;br /&gt;
&amp;gt; stopCluster(cl)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Parallel==&lt;br /&gt;
&lt;br /&gt;
The '''parallel''' package is an amalgamation of functionality from the multicore and snow packages.  The shared memory parallelism in this package runs on an MS Windows machine (unlike the multicore package). &lt;br /&gt;
&lt;br /&gt;
I trivial translation of our previous multicore example is:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(parallel)&lt;br /&gt;
# how many cores are present?&lt;br /&gt;
parallel:::detectCores()&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I have not been able to get a distributed memory cluster working on BCp2 using the parallel package.&lt;br /&gt;
&lt;br /&gt;
=Further Reading=&lt;br /&gt;
&lt;br /&gt;
* [http://shop.oreilly.com/product/9780596801717.do R in a Nutshell]&lt;br /&gt;
* [http://shop.oreilly.com/product/0636920021421.do Parallel R]&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=R1&amp;diff=9414</id>
		<title>R1</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=R1&amp;diff=9414"/>
		<updated>2014-03-07T12:11:02Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: Changed protection level for &amp;quot;R1&amp;quot; ([edit=sysop] (indefinite) [move=sysop] (indefinite))&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[category:Pragmatic Programming]]&lt;br /&gt;
'''Open Source Statistics with R'''&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
R is a mature, open-source (i.e. free!) statistics package, with an intuitive interface, excellent graphics and a vibrant community constantly adding new methods for the statistical investigation of your data to the library of packages available.&lt;br /&gt;
&lt;br /&gt;
The goal of this tutorial is to introduce you to the R package, and not to be an introductory course in statistics.&lt;br /&gt;
&lt;br /&gt;
If you are working on a Linux system, you will typically start R from the command line.  On a Windows machine, or a Mac, you will typically start up R in some form of GUI.  However you get R started, you will have access to an R command prompt.  The good news is that the examples below will all work at the R command prompt, however you gained access to it.&lt;br /&gt;
&lt;br /&gt;
Further resources:&lt;br /&gt;
&lt;br /&gt;
* The R manual is a great resource for learning R: http://cran.r-project.org/doc/manuals/r-release/R-intro.pdf&lt;br /&gt;
* Some excellent examples of using R can also be found at: http://msenux.redwoods.edu/math/R/ and http://www.r-tutor.com/&lt;br /&gt;
&lt;br /&gt;
=Getting Started=&lt;br /&gt;
&lt;br /&gt;
The very simplest thing we can do with R is to perform some arithmetic at the command prompt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; phi &amp;lt;- (1+sqrt(5))/2&lt;br /&gt;
&amp;gt; phi&lt;br /&gt;
[1] 1.618034&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Parentheses are used to modify the usual order of precedence of the operators ('''/''' will typically be evaluated before '''+''').  Note the '''[1]''' accompanying the returned value.  All numbers entered at the console are interpreted as a vector.  The '[1]' indicates that the line in question is displaying the vector of values starting at first index.  We can use the handy sequence function to create a vector containing more than a single element:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; odds &amp;lt;- seq(from=1, to=67, by=2)&lt;br /&gt;
&amp;gt; odds&lt;br /&gt;
 [1]  1  3  5  7  9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49&lt;br /&gt;
[26] 51 53 55 57 59 61 63 65 67&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
From the above example, we can see that both the '''&amp;lt;-''' and '''=''' operators can be used for assignment.&lt;br /&gt;
&lt;br /&gt;
Vectors are commonly used data structures in R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
coords.bris &amp;lt;- c(51.5, 2.6)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As are matrices:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; magic &amp;lt;- matrix(data=c(2,7,6,9,5,1,4,3,8),nrow=3,ncol=3)&lt;br /&gt;
&amp;gt; magic&lt;br /&gt;
     [,1] [,2] [,3]&lt;br /&gt;
[1,]    2    9    4&lt;br /&gt;
[2,]    7    5    3&lt;br /&gt;
[3,]    6    1    8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the '''c''' function combines the arguments given in the parentheses.  We can access portions of the array using the syntax shown in the square brackets.  For example, we can access the first row using the '''[1,]''' notation, and similarly the second column using '''[,2]'''.  Since the square is 3x3 magic, the numbers in both slices should sum to 15:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; sum(magic[1,])&lt;br /&gt;
[1] 15&lt;br /&gt;
&amp;gt; sum(magic[,2])&lt;br /&gt;
[1] 15&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Single elements and ranges can also accessed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; magic[2,2]&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;gt; magic[2:3,2:3]&lt;br /&gt;
     [,1] [,2]&lt;br /&gt;
[1,]    5    3&lt;br /&gt;
[2,]    1    8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
R also provides '''arrays''', which have more than two dimensions, and '''lists''' to hold heterogeneous collections.&lt;br /&gt;
&lt;br /&gt;
An example list:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; list.r4 &amp;lt;- list(name=&amp;quot;Radio4&amp;quot;, frequency=&amp;quot;93.7&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The items of which, we can access in several ways:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; list.r4$frequency&lt;br /&gt;
[1] &amp;quot;93.7&amp;quot;&lt;br /&gt;
&amp;gt; list.r4[1]&lt;br /&gt;
$name&lt;br /&gt;
[1] &amp;quot;Radio4&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;gt; list.r4[[1]]&lt;br /&gt;
[1] &amp;quot;Radio4&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A very commonly used data structure is the '''data frame''', which R uses to store tabular data.  Given several vectors of equal length, we can collate them into a data frame:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; country &amp;lt;- c(&amp;quot;USA&amp;quot;, &amp;quot;China&amp;quot;, &amp;quot;GB&amp;quot;)&lt;br /&gt;
&amp;gt; gold &amp;lt;- c(46, 38, 29)&lt;br /&gt;
&amp;gt; silver &amp;lt;- c(29, 27, 17)&lt;br /&gt;
&amp;gt; bronze &amp;lt;- c(29, 23, 19)&lt;br /&gt;
&amp;gt; medals.2012 &amp;lt;- data.frame(country, gold, silver, bronze)&lt;br /&gt;
&amp;gt; medals.2012&lt;br /&gt;
    country  gold  silver  bronze&lt;br /&gt;
1       USA    46      29      29&lt;br /&gt;
2     China    38      27      23&lt;br /&gt;
3        GB    29      17      19&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can access columns of a data frame using the '''$''' operator:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; medals.2012$country&lt;br /&gt;
[1] USA   China GB   &lt;br /&gt;
Levels: China GB USA&lt;br /&gt;
&amp;gt; medals.2012$gold&lt;br /&gt;
[1] 46 38 29&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Standard Graphics: A taster=&lt;br /&gt;
&lt;br /&gt;
An aspect which makes R popular are it's graphing functions.  R also has some very handy built-in data sets--we'll use this to demonstrate just a small fraction of R's graphing abilities.&lt;br /&gt;
&lt;br /&gt;
First up is the humble '''plot()''' function.  Given a data frame of points, such as one charting the relationship between temperature and the vapour pressure of mercury, it will give us a (handily labelled) scatter plot: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(pressure)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See the gallery below for all the plots created in this section.&lt;br /&gt;
&lt;br /&gt;
The plot function will also accept a time-series (another class of object recognised by R) and will sensibly join the points with a line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(co2)&lt;br /&gt;
&amp;gt; class(co2)&lt;br /&gt;
[1] &amp;quot;ts&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Pie charts are easily constructed.  In this case, to show the relative proportions of electricity generated from different sources in the UK in 2011 (source: https://www.gov.uk/government/.../5942-uk-energy-in-brief-2012.pdf‎):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; uk.electricty.sources.2011 &amp;lt;- c(41,29,18,5,4,2,1)&lt;br /&gt;
&amp;gt; names(uk.electricty.sources.2011) &amp;lt;- (&amp;quot;Gas&amp;quot;, &amp;quot;Coal&amp;quot;, &amp;quot;Nuclear&amp;quot;, &amp;quot;Hydro &amp;amp; other&amp;quot;, &amp;quot;Wind&amp;quot;, &amp;quot;Imports&amp;quot;, &amp;quot;Oil&amp;quot;)&lt;br /&gt;
&amp;gt; pie(uk.electricty.sources.2011, main=&amp;quot;UK Electricty Generating Mix, 2011&amp;quot;, col=rainbow(7))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Next, let's create a bar chart of monthly average precipitation falling here in the fair city of Bristol (source: http://www.worldweatheronline.com):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; bristol.precip &amp;lt;- c(82.9, 56.1, 59.2, 69, 50.8, 50.9, 50.8, 74.8, 74.7, 91.1, 94.5, 93.6)&lt;br /&gt;
&amp;gt; names(bristol.precip) &amp;lt;- c(&amp;quot;Jan&amp;quot;, &amp;quot;Feb&amp;quot;, &amp;quot;Mar&amp;quot;, &amp;quot;Apr&amp;quot;, &amp;quot;May&amp;quot;, &amp;quot;Jun&amp;quot;, &amp;quot;Jul&amp;quot;, &amp;quot;Aug&amp;quot;, &amp;quot;Sep&amp;quot;, &amp;quot;Oct&amp;quot;, &amp;quot;Nov&amp;quot;, &amp;quot;Dec&amp;quot;)&lt;br /&gt;
&amp;gt; barplot(bristol.precip,&lt;br /&gt;
+ main=&amp;quot;Average Monthly Precipitation in Bristol&amp;quot;,&lt;br /&gt;
+ ylab=&amp;quot;Mean precipitation (mm)&amp;quot;,&lt;br /&gt;
+ ylim=c(0,100),&lt;br /&gt;
+ col=c(&amp;quot;darkblue&amp;quot;))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Box_plot 'Box and whisker' plots] are useful ways to graph the quartiles of some data.  In this case, the fuel efficiencies of various US cars, circa 1974:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; boxplot(mpg~cyl,data=mtcars, main=&amp;quot;Car Milage Data&amp;quot;,&lt;br /&gt;
+    xlab=&amp;quot;Number of Cylinders&amp;quot;, ylab=&amp;quot;Miles Per Gallon&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
R includes a very useful help facility.  In the case of the '''filled.contour()''' plotting function, the help page includes an example of it's use to plot the topology of a volcano in Auckland, NZ:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; ?filled.countour&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;gallery widths=300px heights=300px perrow=3&amp;gt;&lt;br /&gt;
File:Vapour-pressure.png|Vapour pressure of mercury against temperature&lt;br /&gt;
File:Mauna-loa.png|CO2 concentrations measured at Mauna-Loa between 1959 and 1997&lt;br /&gt;
File:Pie.png|The UK's electricity generating mix, 2011&lt;br /&gt;
File:Barplot.png|Average monthly precipitation in Bristol&lt;br /&gt;
File:Boxplot.png|Range of fuel efficiencies for different engine sizes &lt;br /&gt;
File:Maunga-Whau.png|Topology of Maunga Whau volcano in Auckland&lt;br /&gt;
&amp;lt;/gallery&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are many more example plots--complete with the R code required to create the plots (at the bottom of the page, after the comments)--on the following web page:&lt;br /&gt;
* http://gallery.r-enthusiasts.com/thumbs.php&lt;br /&gt;
&lt;br /&gt;
=Loops=&lt;br /&gt;
&lt;br /&gt;
A simple '''for''' loop:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; for (ii in seq(1,10)) print(ii)&lt;br /&gt;
[1] 1&lt;br /&gt;
[1] 2&lt;br /&gt;
[1] 3&lt;br /&gt;
[1] 4&lt;br /&gt;
[1] 5&lt;br /&gt;
[1] 6&lt;br /&gt;
[1] 7&lt;br /&gt;
[1] 8&lt;br /&gt;
[1] 9&lt;br /&gt;
[1] 10&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Some more exotic counting:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; for (ii in seq(from=10, to=0, by=-2)) print(ii)&lt;br /&gt;
[1] 10&lt;br /&gt;
[1] 8&lt;br /&gt;
[1] 6&lt;br /&gt;
[1] 4&lt;br /&gt;
[1] 2&lt;br /&gt;
[1] 0&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''while''' loops are for when we don't know the number of iterations in advance:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; ii &amp;lt;- runif(1,0,1)&lt;br /&gt;
&amp;gt; ii&lt;br /&gt;
[1] 0.3998513&lt;br /&gt;
&amp;gt; while (ii &amp;lt; 0.5) {print(ii); ii &amp;lt;- runif(1,0,1)}&lt;br /&gt;
[1] 0.3998513&lt;br /&gt;
[1] 0.05469244&lt;br /&gt;
&amp;gt; ii&lt;br /&gt;
[1] 0.8265036&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Functions=&lt;br /&gt;
&lt;br /&gt;
You can define your own functions in R, using the '''function''' keyword.  For example, Pythagoras' Theorem:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypotenuse &amp;lt;- function(x, y) {sqrt(x^2 + y^2)}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The braces ({}) are optional, but add clarity.&lt;br /&gt;
&lt;br /&gt;
To call the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypotenuse(3,4)&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can provide default values for the arguments, which can be overridden for any given invocation of the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot2 &amp;lt;- function(x=3 ,y=4) {sqrt(x^2 + y^2)}&lt;br /&gt;
&amp;gt; hypot2()&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;gt; hypot2(12,16)&lt;br /&gt;
[1] 20&lt;br /&gt;
&amp;gt; hypot2(y=16, x=12)&lt;br /&gt;
[1] 20&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can see that the order of the arguments is respected, unless the names are given, in which case the order can be changed. &lt;br /&gt;
&lt;br /&gt;
Longer functions can be spread over several lines.  We can also use the '''return''' keyword to control which value is returned by the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot3 &amp;lt;- function(x=3 ,y=4) {&lt;br /&gt;
+ x_sq &amp;lt;- x^2&lt;br /&gt;
+ y_sq &amp;lt;- y^2&lt;br /&gt;
+ return( sqrt(x_sq + y_sq) )}&lt;br /&gt;
&amp;gt; hypot3(6,8)&lt;br /&gt;
[1] 10&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can check on the contents of a function, by just typing it's name (without parentheses):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot3&lt;br /&gt;
function(x=3 ,y=4) {&lt;br /&gt;
x_sq &amp;lt;- x^2&lt;br /&gt;
y_sq &amp;lt;- y^2&lt;br /&gt;
return( sqrt(x_sq + y_sq) )}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or just check the arguments, using the '''args''' function.  (The body of the function in general is reported as NULL):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; args(hypot3)&lt;br /&gt;
function (x = 3, y = 4) &lt;br /&gt;
NULL&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Packages=&lt;br /&gt;
&lt;br /&gt;
Listed at http://cran.r-project.org/&lt;br /&gt;
&lt;br /&gt;
Let's install the '''multicore''' package, that will give us access to functions within R which will run on the multiple processors which we often find in our computers these days:   &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; install.packages(&amp;quot;multicore&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Et voila!  It is done.&lt;br /&gt;
&lt;br /&gt;
We can check which packages are currently loaded into the library available from our workspace:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; library()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If we need to add one, we type e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; library(multicore)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now, an example of using a function from the multicore package.  The '''lapply''' function, which is included in the standard R core, will map a given function over a list inputs, giving a list of the function outputs in return.  For example, we can map a squaring function over the list of integers from 1 to 3:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; lapply(1:3, function(x) {x^2})&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
which gives us the list:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[[1]]&lt;br /&gt;
[1] 1&lt;br /&gt;
&lt;br /&gt;
[[2]]&lt;br /&gt;
[1] 4&lt;br /&gt;
&lt;br /&gt;
[[3]]&lt;br /&gt;
[1] 9&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now, we can do the same work in parallel using:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; mclapply(1:3, function(x) {x^2})&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Reading Data from File=&lt;br /&gt;
&lt;br /&gt;
R provides some very useful functions for reading and writing data from/to file.&lt;br /&gt;
&lt;br /&gt;
==Text Files==&lt;br /&gt;
&lt;br /&gt;
Let's start with text files.  If your data is organised into a file such that it looks like a table with column headings:  &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Perhaps the simplest one is '''read.table()'''.  If I have a text file with the following contents:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
country              gold silver bronze&lt;br /&gt;
&amp;quot;USA&amp;quot;                46   29     29&lt;br /&gt;
&amp;quot;China&amp;quot;              38   27     23&lt;br /&gt;
&amp;quot;Great Britain&amp;quot;      29   17     19&lt;br /&gt;
&amp;quot;Russian Federation&amp;quot; 24   26     32&lt;br /&gt;
&amp;quot;Republic of Korea&amp;quot;  13   8      7&lt;br /&gt;
&amp;quot;Germany&amp;quot;            11   19     14&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It will be a simple matter to use the '''read.table()''' function to load the data into R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; medals.2012 &amp;lt;- read.table(&amp;quot;medals.txt&amp;quot;, header=TRUE)&lt;br /&gt;
&amp;gt; medals.2012&lt;br /&gt;
             country gold silver bronze&lt;br /&gt;
1                USA   46     29     29&lt;br /&gt;
2              China   38     27     23&lt;br /&gt;
3      Great Britain   29     17     19&lt;br /&gt;
4 Russian Federation   24     26     32&lt;br /&gt;
5  Republic of Korea   13      8      7&lt;br /&gt;
6            Germany   11     19     14&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is a corresponding '''write.table()''' function to export the contents of a data frame into a text file.&lt;br /&gt;
&lt;br /&gt;
CSV files can be easily handled by specifying '''sep=&amp;quot;,&amp;quot;''' as an argument to read.table().  However, for convenience, there are also '''read.csv()''' and '''write.csv()''' functions defined.  For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; write.csv(medals.2012,&amp;quot;medals.csv&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Gives us the file, '''medals.csv''', with the contents:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;quot;&amp;quot;,&amp;quot;country&amp;quot;,&amp;quot;gold&amp;quot;,&amp;quot;silver&amp;quot;,&amp;quot;bronze&amp;quot;&lt;br /&gt;
&amp;quot;1&amp;quot;,&amp;quot;USA&amp;quot;,46,29,29&lt;br /&gt;
&amp;quot;2&amp;quot;,&amp;quot;China&amp;quot;,38,27,23&lt;br /&gt;
&amp;quot;3&amp;quot;,&amp;quot;Great Britain&amp;quot;,29,17,19&lt;br /&gt;
&amp;quot;4&amp;quot;,&amp;quot;Russian Federation&amp;quot;,24,26,32&lt;br /&gt;
&amp;quot;5&amp;quot;,&amp;quot;Republic of Korea&amp;quot;,13,8,7&lt;br /&gt;
&amp;quot;6&amp;quot;,&amp;quot;Germany&amp;quot;,11,19,14&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Binary Files==&lt;br /&gt;
&lt;br /&gt;
The '''save()''' function will store an R data structure in binary form:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; save(medals.2012,file=&amp;quot;medals.RData&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~$ file medals.RData &lt;br /&gt;
medals.RData: gzip compressed data, from Unix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is, of course, a corresponding function to load such data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; load(&amp;quot;medals.RData&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Databases==&lt;br /&gt;
&lt;br /&gt;
If you would like to read and write data directly from/to a database, there are several packages to help you.  See http://cran.r-project.org/doc/manuals/r-release/R-data.html#Relational-databases for more information.&lt;br /&gt;
&lt;br /&gt;
==NetCDF==&lt;br /&gt;
&lt;br /&gt;
The [http://cran.r-project.org/web/packages/ncdf/index.html '''ncdf''' package] provides an interface to NetCDF files.  Before installing the package, you will need the Unidata NetCDF libraries installed on your system.  On Linux, the standard package managers conveniently provide this.  Note that you will need the 'development' packages.  Once the prerequisites are satisfied, you can use the standard R command to install the package from CRAN:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; install.packages(&amp;quot;ncdf&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Examples of Common Tasks=&lt;br /&gt;
&lt;br /&gt;
==Preparing Data==&lt;br /&gt;
&lt;br /&gt;
===Sorting===&lt;br /&gt;
&lt;br /&gt;
Using '''sort''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; railway.engines &amp;lt;- c(&amp;quot;thomas&amp;quot;, &amp;quot;henry&amp;quot;, &amp;quot;gordon&amp;quot;, &amp;quot;edward&amp;quot;, &amp;quot;james&amp;quot;)&lt;br /&gt;
&amp;gt; sort(railway.engines)&lt;br /&gt;
[1] &amp;quot;edward&amp;quot; &amp;quot;gordon&amp;quot; &amp;quot;henry&amp;quot;  &amp;quot;james&amp;quot;  &amp;quot;thomas&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/sort.html&lt;br /&gt;
&lt;br /&gt;
===Random Sampling===&lt;br /&gt;
&lt;br /&gt;
Using '''sample''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; railway.engines &amp;lt;- c(&amp;quot;thomas&amp;quot;, &amp;quot;henry&amp;quot;, &amp;quot;gordon&amp;quot;, &amp;quot;edward&amp;quot;, &amp;quot;james&amp;quot;)&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;gordon&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;james&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;edward&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;thomas&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;gordon&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;james&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/sample.html&lt;br /&gt;
&lt;br /&gt;
===Combining===&lt;br /&gt;
&lt;br /&gt;
Using '''rbind''' to add combine the rows to two data frames:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; country &amp;lt;- c(&amp;quot;France&amp;quot;, &amp;quot;Italy&amp;quot;, &amp;quot;Hungary&amp;quot;, &amp;quot;Australia&amp;quot;)&lt;br /&gt;
&amp;gt; gold &amp;lt;- c(11, 8, 8, 7)&lt;br /&gt;
&amp;gt; silver &amp;lt;- c(11, 9, 4, 16)&lt;br /&gt;
&amp;gt; bronze &amp;lt;- c(12, 11, 5, 12)&lt;br /&gt;
&amp;gt; extras.2012 &amp;lt;- data.frame(country, gold, silver, bronze)&lt;br /&gt;
&amp;gt; rbind(medals.2012, extras.2012)&lt;br /&gt;
              country gold silver bronze&lt;br /&gt;
1                 USA   46     29     29&lt;br /&gt;
2               China   38     27     23&lt;br /&gt;
3       Great Britain   29     17     19&lt;br /&gt;
4  Russian Federation   24     26     32&lt;br /&gt;
5   Republic of Korea   13      8      7&lt;br /&gt;
6             Germany   11     19     14&lt;br /&gt;
7              France   11     11     12&lt;br /&gt;
8               Italy    8      9     11&lt;br /&gt;
9             Hungary    8      4      5&lt;br /&gt;
10          Australia    7     16     12&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/cbind.html&lt;br /&gt;
&lt;br /&gt;
===Binning Data===&lt;br /&gt;
&lt;br /&gt;
Using '''cut''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; girls_2=c(83.8, 86.2, 85.1, 88.6, 83, 88.9, 89.7, 81.3, 88.7, 88.4)&lt;br /&gt;
&amp;gt; bins=cut(girls_2, breaks=3)&lt;br /&gt;
&amp;gt; bins&lt;br /&gt;
 [1] (81.3,84.1] (84.1,86.9] (84.1,86.9] (86.9,89.7] (81.3,84.1] (86.9,89.7]&lt;br /&gt;
 [7] (86.9,89.7] (81.3,84.1] (86.9,89.7] (86.9,89.7]&lt;br /&gt;
Levels: (81.3,84.1] (84.1,86.9] (86.9,89.7]&lt;br /&gt;
&amp;gt; plot(bins)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Plotting the data couldn't be simpler with '''plot(bins)'''!&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/cut.html&lt;br /&gt;
&lt;br /&gt;
==Linear Regression==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(cars)&lt;br /&gt;
&amp;gt; res=lm(dist ~ speed, data=cars)&lt;br /&gt;
&amp;gt; abline(res)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Image:R-lm(cars)-abline.png|400px|thumbnail|center|linear regression of stopping distance against speed from the built-in data set, cars]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
'''Exercises'''&lt;br /&gt;
* You may wish to compare different methods of estimation.  From the MASS package, you can fit a line with the '''rlm''' and '''lqs'' funtions.  You can plot all the lines against the data using:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; abline(res.lm, lty=1)&lt;br /&gt;
&amp;gt; abline(res.rlm, lty=2)&lt;br /&gt;
&amp;gt; abline(res.lqs, lty=3)&lt;br /&gt;
&amp;gt; legend(x=5, y=100, legend=c(&amp;quot;lm&amp;quot;,&amp;quot;rlm&amp;quot;,&amp;quot;lqs&amp;quot;), lty=c(1,2,3))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-patched/library/MASS/html/rlm.html and http://stat.ethz.ch/R-manual/R-devel/RHOME/library/MASS/html/lqs.html.&lt;br /&gt;
&lt;br /&gt;
* Weighted least squares.  The '''lm''' function will accept a vector of weights, '''lm(... weights=...)'''.  If given, the function will optimise the line of best fit according a the equation of weighted least squares.  Experiment with different linear model fits, given different weighting vectors.  Some handy hints for creating a vector of weights:&lt;br /&gt;
** '''w1&amp;lt;-rep(0.1,50)''' will give you a vector, length 50, where each element has a value of 0.1.  W1[1]&amp;lt;-10 will give the first element of the vector a value of 10.&lt;br /&gt;
** '''w2&amp;lt;-seq(from=0.02, to=1.0, by=0.02)''' provides a vector containing a sequence of values from 0.02 to 1.0 in steps of 0.02 (handily, again 50 in total).&lt;br /&gt;
&lt;br /&gt;
==Significance Testing==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; boys_2=c(90.2, 91.4, 86.4, 87.6, 86.7, 88.1, 82.2, 83.8, 91, 87.4)&lt;br /&gt;
&amp;gt; girls_2=c(83.8, 86.2, 85.1, 88.6, 83, 88.9, 89.7, 81.3, 88.7, 88.4)&lt;br /&gt;
&amp;gt; res=var.test(boys_2,girls_2)&lt;br /&gt;
&amp;gt; res&lt;br /&gt;
&lt;br /&gt;
	F test to compare two variances&lt;br /&gt;
&lt;br /&gt;
data:  boys_2 and girls_2 &lt;br /&gt;
F = 1.0186, num df = 9, denom df = 9, p-value = 0.9786&lt;br /&gt;
alternative hypothesis: true ratio of variances is not equal to 1 &lt;br /&gt;
95 percent confidence interval:&lt;br /&gt;
 0.2529956 4.1007126 &lt;br /&gt;
sample estimates:&lt;br /&gt;
ratio of variances &lt;br /&gt;
          1.018559 &lt;br /&gt;
&amp;gt; res=t.test(boys_2, girls_2, var.equal=TRUE, paired=FALSE)&lt;br /&gt;
&amp;gt; res&lt;br /&gt;
&lt;br /&gt;
	Two Sample t-test&lt;br /&gt;
&lt;br /&gt;
data:  boys_2 and girls_2 &lt;br /&gt;
t = 0.8429, df = 18, p-value = 0.4103&lt;br /&gt;
alternative hypothesis: true difference in means is not equal to 0 &lt;br /&gt;
95 percent confidence interval:&lt;br /&gt;
 -1.656675  3.876675 &lt;br /&gt;
sample estimates:&lt;br /&gt;
mean of x mean of y &lt;br /&gt;
    87.48     86.3&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Classification==&lt;br /&gt;
&lt;br /&gt;
===k Nearest Neighbours===&lt;br /&gt;
&lt;br /&gt;
This famous (Fisher's or Anderson's) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa (s), versicolor (c), and virginica (v).&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-patched/library/datasets/html/iris.html&lt;br /&gt;
&lt;br /&gt;
k-nearest neighbour classification for test set from training set: For each row of the test set, the k nearest (in Euclidean distance) training set vectors are found, and the classification is decided by majority vote, with ties broken at random. If there are ties for the kth nearest vector, all candidates are included in the vote.&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/class/html/knn.html&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(class)&lt;br /&gt;
train &amp;lt;- rbind(iris3[1:25,,1], iris3[1:25,,2], iris3[1:25,,3])&lt;br /&gt;
test &amp;lt;- rbind(iris3[26:50,,1], iris3[26:50,,2], iris3[26:50,,3])&lt;br /&gt;
cl &amp;lt;- factor(c(rep(&amp;quot;s&amp;quot;,25), rep(&amp;quot;c&amp;quot;,25), rep(&amp;quot;v&amp;quot;,25)))&lt;br /&gt;
iris3.knn &amp;lt;- knn(train, test, cl, k = 3, prob=TRUE)&lt;br /&gt;
table(predicted=iris3.knn, actual=cl)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
How did we do?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
         actual&lt;br /&gt;
predicted  c  s  v&lt;br /&gt;
        c 23  0  3&lt;br /&gt;
        s  0 25  0&lt;br /&gt;
        v  2  0 22&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Classification Trees===&lt;br /&gt;
&lt;br /&gt;
The kyphosis data frame has 81 rows and 4 columns. representing data on children who have had corrective spinal surgery.&lt;br /&gt;
&lt;br /&gt;
This data frame contains the following columns:&lt;br /&gt;
* Kyphosis: a factor with levels absent present indicating if a kyphosis (a type of deformation) was present after the operation.&lt;br /&gt;
* Age: in months&lt;br /&gt;
* Number: the number of vertebrae involved&lt;br /&gt;
* Start: the number of the first (topmost) vertebra operated on.&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/rpart/html/kyphosis.html&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
fit &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)&lt;br /&gt;
fit2 &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis,&lt;br /&gt;
              parms = list(prior = c(.65,.35), split = &amp;quot;information&amp;quot;))&lt;br /&gt;
fit3 &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis,&lt;br /&gt;
              control = rpart.control(cp = 0.05))&lt;br /&gt;
par(mfrow = c(1,2), xpd = NA) # otherwise on some devices the text is clipped&lt;br /&gt;
plot(fit)&lt;br /&gt;
text(fit, use.n = TRUE)&lt;br /&gt;
plot(fit2)&lt;br /&gt;
text(fit2, use.n = TRUE)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Image:R-classification-tree.png|500px|thumbnail|center|Classification tree for the kyphosis data frame.]]&lt;br /&gt;
&lt;br /&gt;
==Solving Systems of Linear Equations==&lt;br /&gt;
&lt;br /&gt;
See, e.g.: https://source.ggy.bris.ac.uk/wiki/NumMethodsPDEs&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; A &amp;lt;- array(c(1,3,2,3,5,4,-2,6,3), dim=c(3,3))&lt;br /&gt;
&amp;gt; b &amp;lt;- c(5,7,8)&lt;br /&gt;
&amp;gt; solve(A,b)&lt;br /&gt;
[1] -15   8   2&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Suggested Exercises=&lt;br /&gt;
&lt;br /&gt;
If you would like to work through some exercises, with model answers included, you could take a look at:&lt;br /&gt;
* http://www2.warwick.ac.uk/fac/sci/statistics/staff/academic-research/reed/rexercises.pdf&lt;br /&gt;
&lt;br /&gt;
If you would prefer to noodle about with some real-world data, you could take a look at:&lt;br /&gt;
* http://www.theguardian.com/news/datablog/2010/oct/18/historic-government-spending-area#data&lt;br /&gt;
&lt;br /&gt;
=Writing Faster R Code=&lt;br /&gt;
&lt;br /&gt;
In the above sections we've introduced a number of features of R and have begun the journey to becoming a proficient and productive user of the language.  In the remaining sections, we'll switch tack and focus on a question commonly asked by those beginning to use R in anger--'''&amp;quot;My R code is slow.  How can I speed it up?&amp;quot;'''.  In this section we'll consider the related tasks of finding which bits of your R code is responsible for the majority of the run-time and what you can do about it.&lt;br /&gt;
&lt;br /&gt;
==Profiling &amp;amp; Timing==&lt;br /&gt;
&lt;br /&gt;
In order to remain productive (and sane, and have a social life...), it is essential that we first identify which portions of your R code are responsible for the majority of the run-time.  We could spend ages optimising a portion that we ''think'' may be running slowly, but computers have the gift(!) to constantly surprise us, and if that portion of your program accounted for, say, 10% of the run-time, then you will have sweated for absolutely no useful gain.&lt;br /&gt;
&lt;br /&gt;
The simplest method of investigation is to simply time the application of a function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
system.time(some.function())&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can get a more detailed analysis of a block of code using the built-in R profiler.  The general pattern of invocation is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
Rprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
# Do some work&lt;br /&gt;
Rprof()&lt;br /&gt;
summaryRprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For example, here's an R script, '''profile.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
Rprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
# Create a 10 x 100,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(100000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)})&lt;br /&gt;
Rprof()&lt;br /&gt;
summaryRprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Which I ran by typing:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
R CMD BATCH profile.r&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the output file, '''profile.r.Rout''', I found the following break down:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
               self.time self.pct total.time total.pct&lt;br /&gt;
&amp;quot;simpleLoess&amp;quot;       4.84    88.00       5.10     92.73&lt;br /&gt;
&amp;quot;rnorm&amp;quot;             0.22     4.00       0.22      4.00&lt;br /&gt;
&amp;quot;loess.smooth&amp;quot;      0.18     3.27       5.28     96.00&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The profile tells us that the function '''simpleLoess''' take 88% of the runtime, whereas '''rnorm''' takes only 4%.&lt;br /&gt;
&lt;br /&gt;
==Preallocation of Memory==&lt;br /&gt;
&lt;br /&gt;
As with other scripting languages, such as MATLAB, the simplest method that you can use to speed up your R code is to pre-allocate the storage for variables whenever possible.  To see the benefits of this, consider the following two functions:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; f1 &amp;lt;- function() {&lt;br /&gt;
+ v &amp;lt;- c()&lt;br /&gt;
+ for (i in 1:30000)&lt;br /&gt;
+   v[i] &amp;lt;- i^2&lt;br /&gt;
+ }&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; f2 &amp;lt;- function() {&lt;br /&gt;
+ v &amp;lt;- c(NA)&lt;br /&gt;
+ length(v) &amp;lt;- 30000&lt;br /&gt;
+ for (i in 1:30000)&lt;br /&gt;
+   v[i] &amp;lt;- i^2&lt;br /&gt;
+ }&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Timing calls to each of them shows that the pre-allocation of memory gives a whopping ~'''x30 speed-up'''.  Your mileage will vary depending upon the details of your code.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; system.time(f1())&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  1.720   0.040   1.762&lt;br /&gt;
&amp;gt; system.time(f2())&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.052   0.000   0.05&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Vectorised Operations==&lt;br /&gt;
&lt;br /&gt;
The other principle method for speeding up your R code is to eliminate loops whenever you can.  Many functions and operators in R will accept arrays as input, rather than just single values and this may allow you to not use a loop.  The examples in the previous section used for loops to step through an array, squaring each element.  However, you can achieve the same result far more quickly by passing the array ''en masse'' to exponentiation operator:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; system.time(v &amp;lt;- (1:1000000)^2)&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.024   0.004   0.026&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Here we've been able to square 1,000,000 items in half the time it took to process 30,000!&lt;br /&gt;
&lt;br /&gt;
==Calling Functions Written in a Compiled Language (e.g. C or Fortran)==&lt;br /&gt;
&lt;br /&gt;
Another way to get more speed is to outsource portions of R code that are found to be slow to a compiled language, such as C or Fortran.  A good starting point on this topic is:&lt;br /&gt;
&lt;br /&gt;
* http://mazamascience.com/WorkingWithData/?p=1067&lt;br /&gt;
&lt;br /&gt;
=R and HPC=&lt;br /&gt;
&lt;br /&gt;
If you've profiled your code and tried all that you can to speed it up, as described in the previous section, you might be interested in the various initiatives that exist to run R on high performance computers, such as bluecrsytal:  &lt;br /&gt;
&lt;br /&gt;
* http://cran.r-project.org/web/views/HighPerformanceComputing.html&lt;br /&gt;
&lt;br /&gt;
We will see in the following examples, the general approach to running R in parallel is to arrange your task so that a function is applied to a list of inputs, and then to split the list over several CPU cores or cluster worker nodes.&lt;br /&gt;
&lt;br /&gt;
==Multicore==&lt;br /&gt;
&lt;br /&gt;
The '''multicore''' package allows us to make use of several CPU cores within a single machine.  Note, however, that the package  does not work on a MS Windows computers.&lt;br /&gt;
&lt;br /&gt;
As an example, let's look at the use of the package's '''mclapply''' function, a multicore equivalent of R's built-in list apply mapper, '''lapply'''.   I saved the following commands into an R script called '''mutlicore.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(multicore)&lt;br /&gt;
# how many cores are present?&lt;br /&gt;
multicore:::detectCores()&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And used the following submission script to run it on bluecrystal phase2:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=00:00:05&lt;br /&gt;
&lt;br /&gt;
#! Ensure that we have the correct version of R loaded&lt;br /&gt;
module add languages/R-2.15.1&lt;br /&gt;
&lt;br /&gt;
#! change the working directory (default is home directory)&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
#! Run the R script&lt;br /&gt;
R CMD BATCH multicore.r&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After the job had run, I got the following output in the file '''multicore.r.Rout''':&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(multicore)&lt;br /&gt;
&amp;gt; # how many cores are present?&lt;br /&gt;
&amp;gt; multicore:::detectCores()&lt;br /&gt;
[1] 8&lt;br /&gt;
&amp;gt; # Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
&amp;gt; data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
&amp;gt; # Map a function over the matrix.  First in serial..&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.674   0.007   0.749 &lt;br /&gt;
&amp;gt; # .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.301   0.074   0.113 &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Rmpi==&lt;br /&gt;
&lt;br /&gt;
The '''Rmpi''' package allows us to create and use cohorts of message passing processes from within R.  It does so by providing an interface to the MPI (Message Passing Interface) library.&lt;br /&gt;
&lt;br /&gt;
In order to use the Rmpi package on BCp2, you will need the '''ofed/openmpi/gcc/64/1.4.2-qlc''' module loaded.&lt;br /&gt;
&lt;br /&gt;
Here's a short example that I saved as '''Rmpi.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(Rmpi)&lt;br /&gt;
# spawn as many slaves as possible&lt;br /&gt;
mpi.spawn.Rslaves()&lt;br /&gt;
mpi.remote.exec(mpi.get.processor.name())&lt;br /&gt;
mpi.remote.exec(runif(1))&lt;br /&gt;
mpi.close.Rslaves()&lt;br /&gt;
mpi.quit()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I submitted the job to BCp2 using the following submission script:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#PBS -l nodes=4:ppn=1,walltime=00:00:05&lt;br /&gt;
&lt;br /&gt;
#! Ensure that we have the correct version of R loaded&lt;br /&gt;
module add languages/R-2.15.1&lt;br /&gt;
&lt;br /&gt;
#! change the working directory (default is home directory)&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
#! Create a machine file (used for multi-node jobs)&lt;br /&gt;
cat $PBS_NODEFILE &amp;gt; machine.file.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
#! Disable PSM on the QLogic HCAs&lt;br /&gt;
export OMPI_MCA_mtl=^psm&lt;br /&gt;
&lt;br /&gt;
#! Run the R script&lt;br /&gt;
mpirun -np 1 -machinefile machine.file.$PBS_JOBID R CMD BATCH Rmpi.r&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and got the following output:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(Rmpi)&lt;br /&gt;
&amp;gt; # spawn as many slaves as possible&lt;br /&gt;
&amp;gt; mpi.spawn.Rslaves()&lt;br /&gt;
        4 slaves are spawned successfully. 0 failed.&lt;br /&gt;
master (rank 0, comm 1) of size 5 is running on: u03n074 &lt;br /&gt;
slave1 (rank 1, comm 1) of size 5 is running on: u03n098 &lt;br /&gt;
slave2 (rank 2, comm 1) of size 5 is running on: u04n029 &lt;br /&gt;
slave3 (rank 3, comm 1) of size 5 is running on: u04n030 &lt;br /&gt;
slave4 (rank 4, comm 1) of size 5 is running on: u03n074 &lt;br /&gt;
&amp;gt; mpi.remote.exec(mpi.get.processor.name())&lt;br /&gt;
$slave1&lt;br /&gt;
[1] &amp;quot;u03n098&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave2&lt;br /&gt;
[1] &amp;quot;u04n029&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave3&lt;br /&gt;
[1] &amp;quot;u04n030&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave4&lt;br /&gt;
[1] &amp;quot;u03n074&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;gt; mpi.remote.exec(runif(1))&lt;br /&gt;
         X1        X2        X3        X4&lt;br /&gt;
1 0.5154871 0.5154871 0.5154871 0.5154871&lt;br /&gt;
&amp;gt; mpi.close.Rslaves()&lt;br /&gt;
[1] 1&lt;br /&gt;
&amp;gt; mpi.quit()&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Snow==&lt;br /&gt;
&lt;br /&gt;
Calling MPI routines from within R may be too low level for many people to use comfortably.  Happily, the '''snow''' package provides a higher level abstraction for distributed memory programming from within R.&lt;br /&gt;
&lt;br /&gt;
Here's my example program that a saved as '''snow.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(snow)&lt;br /&gt;
# request a cluster of 3 worker nodes&lt;br /&gt;
cl &amp;lt;- makeCluster(3)&lt;br /&gt;
clusterCall(cl, function() Sys.info()[c(&amp;quot;nodename&amp;quot;,&amp;quot;machine&amp;quot;)])&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using snow, across a cluster of workers)&lt;br /&gt;
system.time(x &amp;lt;- clusterApply(cl, data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
stopCluster(cl)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I ran it on BCp2 using the same submission script given for Rmpi, save for changing Rmpi.r to snow.r.  The output was:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(snow)&lt;br /&gt;
&amp;gt; # request a cluster of 3 worker nodes&lt;br /&gt;
&amp;gt; cl &amp;lt;- makeCluster(3)&lt;br /&gt;
Loading required package: Rmpi&lt;br /&gt;
        3 slaves are spawned successfully. 0 failed.&lt;br /&gt;
&amp;gt; clusterCall(cl, function() Sys.info()[c(&amp;quot;nodename&amp;quot;,&amp;quot;machine&amp;quot;)])&lt;br /&gt;
[[1]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u01n105&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
[[2]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u02n014&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
[[3]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u03n098&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&amp;gt; # Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
&amp;gt; data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
&amp;gt; # Map a function over the matrix.  First in serial..&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.711   0.001   0.715 &lt;br /&gt;
&amp;gt; # .. and secondly in parallel (using snow, across a cluster of workers)&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- clusterApply(cl, data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.259   0.001   0.260 &lt;br /&gt;
&amp;gt; stopCluster(cl)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Parallel==&lt;br /&gt;
&lt;br /&gt;
The '''parallel''' package is an amalgamation of functionality from the multicore and snow packages.  The shared memory parallelism in this package runs on an MS Windows machine (unlike the multicore package). &lt;br /&gt;
&lt;br /&gt;
I trivial translation of our previous multicore example is:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(parallel)&lt;br /&gt;
# how many cores are present?&lt;br /&gt;
parallel:::detectCores()&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I have not been able to get a distributed memory cluster working on BCp2 using the parallel package.&lt;br /&gt;
&lt;br /&gt;
=Further Reading=&lt;br /&gt;
&lt;br /&gt;
* [http://shop.oreilly.com/product/9780596801717.do R in a Nutshell]&lt;br /&gt;
* [http://shop.oreilly.com/product/0636920021421.do Parallel R]&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=R1&amp;diff=9413</id>
		<title>R1</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=R1&amp;diff=9413"/>
		<updated>2014-03-07T12:10:06Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: Changed protection level for &amp;quot;R1&amp;quot; ([edit=autoconfirmed] (indefinite) [move=autoconfirmed] (indefinite))&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[category:Pragmatic Programming]]&lt;br /&gt;
'''Open Source Statistics with R'''&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
R is a mature, open-source (i.e. free!) statistics package, with an intuitive interface, excellent graphics and a vibrant community constantly adding new methods for the statistical investigation of your data to the library of packages available.&lt;br /&gt;
&lt;br /&gt;
The goal of this tutorial is to introduce you to the R package, and not to be an introductory course in statistics.&lt;br /&gt;
&lt;br /&gt;
If you are working on a Linux system, you will typically start R from the command line.  On a Windows machine, or a Mac, you will typically start up R in some form of GUI.  However you get R started, you will have access to an R command prompt.  The good news is that the examples below will all work at the R command prompt, however you gained access to it.&lt;br /&gt;
&lt;br /&gt;
Further resources:&lt;br /&gt;
&lt;br /&gt;
* The R manual is a great resource for learning R: http://cran.r-project.org/doc/manuals/r-release/R-intro.pdf&lt;br /&gt;
* Some excellent examples of using R can also be found at: http://msenux.redwoods.edu/math/R/ and http://www.r-tutor.com/&lt;br /&gt;
&lt;br /&gt;
=Getting Started=&lt;br /&gt;
&lt;br /&gt;
The very simplest thing we can do with R is to perform some arithmetic at the command prompt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; phi &amp;lt;- (1+sqrt(5))/2&lt;br /&gt;
&amp;gt; phi&lt;br /&gt;
[1] 1.618034&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Parentheses are used to modify the usual order of precedence of the operators ('''/''' will typically be evaluated before '''+''').  Note the '''[1]''' accompanying the returned value.  All numbers entered at the console are interpreted as a vector.  The '[1]' indicates that the line in question is displaying the vector of values starting at first index.  We can use the handy sequence function to create a vector containing more than a single element:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; odds &amp;lt;- seq(from=1, to=67, by=2)&lt;br /&gt;
&amp;gt; odds&lt;br /&gt;
 [1]  1  3  5  7  9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49&lt;br /&gt;
[26] 51 53 55 57 59 61 63 65 67&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
From the above example, we can see that both the '''&amp;lt;-''' and '''=''' operators can be used for assignment.&lt;br /&gt;
&lt;br /&gt;
Vectors are commonly used data structures in R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
coords.bris &amp;lt;- c(51.5, 2.6)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As are matrices:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; magic &amp;lt;- matrix(data=c(2,7,6,9,5,1,4,3,8),nrow=3,ncol=3)&lt;br /&gt;
&amp;gt; magic&lt;br /&gt;
     [,1] [,2] [,3]&lt;br /&gt;
[1,]    2    9    4&lt;br /&gt;
[2,]    7    5    3&lt;br /&gt;
[3,]    6    1    8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the '''c''' function combines the arguments given in the parentheses.  We can access portions of the array using the syntax shown in the square brackets.  For example, we can access the first row using the '''[1,]''' notation, and similarly the second column using '''[,2]'''.  Since the square is 3x3 magic, the numbers in both slices should sum to 15:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; sum(magic[1,])&lt;br /&gt;
[1] 15&lt;br /&gt;
&amp;gt; sum(magic[,2])&lt;br /&gt;
[1] 15&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Single elements and ranges can also accessed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; magic[2,2]&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;gt; magic[2:3,2:3]&lt;br /&gt;
     [,1] [,2]&lt;br /&gt;
[1,]    5    3&lt;br /&gt;
[2,]    1    8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
R also provides '''arrays''', which have more than two dimensions, and '''lists''' to hold heterogeneous collections.&lt;br /&gt;
&lt;br /&gt;
An example list:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; list.r4 &amp;lt;- list(name=&amp;quot;Radio4&amp;quot;, frequency=&amp;quot;93.7&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The items of which, we can access in several ways:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; list.r4$frequency&lt;br /&gt;
[1] &amp;quot;93.7&amp;quot;&lt;br /&gt;
&amp;gt; list.r4[1]&lt;br /&gt;
$name&lt;br /&gt;
[1] &amp;quot;Radio4&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;gt; list.r4[[1]]&lt;br /&gt;
[1] &amp;quot;Radio4&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A very commonly used data structure is the '''data frame''', which R uses to store tabular data.  Given several vectors of equal length, we can collate them into a data frame:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; country &amp;lt;- c(&amp;quot;USA&amp;quot;, &amp;quot;China&amp;quot;, &amp;quot;GB&amp;quot;)&lt;br /&gt;
&amp;gt; gold &amp;lt;- c(46, 38, 29)&lt;br /&gt;
&amp;gt; silver &amp;lt;- c(29, 27, 17)&lt;br /&gt;
&amp;gt; bronze &amp;lt;- c(29, 23, 19)&lt;br /&gt;
&amp;gt; medals.2012 &amp;lt;- data.frame(country, gold, silver, bronze)&lt;br /&gt;
&amp;gt; medals.2012&lt;br /&gt;
    country  gold  silver  bronze&lt;br /&gt;
1       USA    46      29      29&lt;br /&gt;
2     China    38      27      23&lt;br /&gt;
3        GB    29      17      19&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can access columns of a data frame using the '''$''' operator:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; medals.2012$country&lt;br /&gt;
[1] USA   China GB   &lt;br /&gt;
Levels: China GB USA&lt;br /&gt;
&amp;gt; medals.2012$gold&lt;br /&gt;
[1] 46 38 29&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Standard Graphics: A taster=&lt;br /&gt;
&lt;br /&gt;
An aspect which makes R popular are it's graphing functions.  R also has some very handy built-in data sets--we'll use this to demonstrate just a small fraction of R's graphing abilities.&lt;br /&gt;
&lt;br /&gt;
First up is the humble '''plot()''' function.  Given a data frame of points, such as one charting the relationship between temperature and the vapour pressure of mercury, it will give us a (handily labelled) scatter plot: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(pressure)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See the gallery below for all the plots created in this section.&lt;br /&gt;
&lt;br /&gt;
The plot function will also accept a time-series (another class of object recognised by R) and will sensibly join the points with a line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(co2)&lt;br /&gt;
&amp;gt; class(co2)&lt;br /&gt;
[1] &amp;quot;ts&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Pie charts are easily constructed.  In this case, to show the relative proportions of electricity generated from different sources in the UK in 2011 (source: https://www.gov.uk/government/.../5942-uk-energy-in-brief-2012.pdf‎):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; uk.electricty.sources.2011 &amp;lt;- c(41,29,18,5,4,2,1)&lt;br /&gt;
&amp;gt; names(uk.electricty.sources.2011) &amp;lt;- (&amp;quot;Gas&amp;quot;, &amp;quot;Coal&amp;quot;, &amp;quot;Nuclear&amp;quot;, &amp;quot;Hydro &amp;amp; other&amp;quot;, &amp;quot;Wind&amp;quot;, &amp;quot;Imports&amp;quot;, &amp;quot;Oil&amp;quot;)&lt;br /&gt;
&amp;gt; pie(uk.electricty.sources.2011, main=&amp;quot;UK Electricty Generating Mix, 2011&amp;quot;, col=rainbow(7))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Next, let's create a bar chart of monthly average precipitation falling here in the fair city of Bristol (source: http://www.worldweatheronline.com):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; bristol.precip &amp;lt;- c(82.9, 56.1, 59.2, 69, 50.8, 50.9, 50.8, 74.8, 74.7, 91.1, 94.5, 93.6)&lt;br /&gt;
&amp;gt; names(bristol.precip) &amp;lt;- c(&amp;quot;Jan&amp;quot;, &amp;quot;Feb&amp;quot;, &amp;quot;Mar&amp;quot;, &amp;quot;Apr&amp;quot;, &amp;quot;May&amp;quot;, &amp;quot;Jun&amp;quot;, &amp;quot;Jul&amp;quot;, &amp;quot;Aug&amp;quot;, &amp;quot;Sep&amp;quot;, &amp;quot;Oct&amp;quot;, &amp;quot;Nov&amp;quot;, &amp;quot;Dec&amp;quot;)&lt;br /&gt;
&amp;gt; barplot(bristol.precip,&lt;br /&gt;
+ main=&amp;quot;Average Monthly Precipitation in Bristol&amp;quot;,&lt;br /&gt;
+ ylab=&amp;quot;Mean precipitation (mm)&amp;quot;,&lt;br /&gt;
+ ylim=c(0,100),&lt;br /&gt;
+ col=c(&amp;quot;darkblue&amp;quot;))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Box_plot 'Box and whisker' plots] are useful ways to graph the quartiles of some data.  In this case, the fuel efficiencies of various US cars, circa 1974:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; boxplot(mpg~cyl,data=mtcars, main=&amp;quot;Car Milage Data&amp;quot;,&lt;br /&gt;
+    xlab=&amp;quot;Number of Cylinders&amp;quot;, ylab=&amp;quot;Miles Per Gallon&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
R includes a very useful help facility.  In the case of the '''filled.contour()''' plotting function, the help page includes an example of it's use to plot the topology of a volcano in Auckland, NZ:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; ?filled.countour&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;gallery widths=300px heights=300px perrow=3&amp;gt;&lt;br /&gt;
File:Vapour-pressure.png|Vapour pressure of mercury against temperature&lt;br /&gt;
File:Mauna-loa.png|CO2 concentrations measured at Mauna-Loa between 1959 and 1997&lt;br /&gt;
File:Pie.png|The UK's electricity generating mix, 2011&lt;br /&gt;
File:Barplot.png|Average monthly precipitation in Bristol&lt;br /&gt;
File:Boxplot.png|Range of fuel efficiencies for different engine sizes &lt;br /&gt;
File:Maunga-Whau.png|Topology of Maunga Whau volcano in Auckland&lt;br /&gt;
&amp;lt;/gallery&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are many more example plots--complete with the R code required to create the plots (at the bottom of the page, after the comments)--on the following web page:&lt;br /&gt;
* http://gallery.r-enthusiasts.com/thumbs.php&lt;br /&gt;
&lt;br /&gt;
=Loops=&lt;br /&gt;
&lt;br /&gt;
A simple '''for''' loop:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; for (ii in seq(1,10)) print(ii)&lt;br /&gt;
[1] 1&lt;br /&gt;
[1] 2&lt;br /&gt;
[1] 3&lt;br /&gt;
[1] 4&lt;br /&gt;
[1] 5&lt;br /&gt;
[1] 6&lt;br /&gt;
[1] 7&lt;br /&gt;
[1] 8&lt;br /&gt;
[1] 9&lt;br /&gt;
[1] 10&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Some more exotic counting:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; for (ii in seq(from=10, to=0, by=-2)) print(ii)&lt;br /&gt;
[1] 10&lt;br /&gt;
[1] 8&lt;br /&gt;
[1] 6&lt;br /&gt;
[1] 4&lt;br /&gt;
[1] 2&lt;br /&gt;
[1] 0&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''while''' loops are for when we don't know the number of iterations in advance:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; ii &amp;lt;- runif(1,0,1)&lt;br /&gt;
&amp;gt; ii&lt;br /&gt;
[1] 0.3998513&lt;br /&gt;
&amp;gt; while (ii &amp;lt; 0.5) {print(ii); ii &amp;lt;- runif(1,0,1)}&lt;br /&gt;
[1] 0.3998513&lt;br /&gt;
[1] 0.05469244&lt;br /&gt;
&amp;gt; ii&lt;br /&gt;
[1] 0.8265036&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Functions=&lt;br /&gt;
&lt;br /&gt;
You can define your own functions in R, using the '''function''' keyword.  For example, Pythagoras' Theorem:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypotenuse &amp;lt;- function(x, y) {sqrt(x^2 + y^2)}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The braces ({}) are optional, but add clarity.&lt;br /&gt;
&lt;br /&gt;
To call the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypotenuse(3,4)&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can provide default values for the arguments, which can be overridden for any given invocation of the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot2 &amp;lt;- function(x=3 ,y=4) {sqrt(x^2 + y^2)}&lt;br /&gt;
&amp;gt; hypot2()&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;gt; hypot2(12,16)&lt;br /&gt;
[1] 20&lt;br /&gt;
&amp;gt; hypot2(y=16, x=12)&lt;br /&gt;
[1] 20&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can see that the order of the arguments is respected, unless the names are given, in which case the order can be changed. &lt;br /&gt;
&lt;br /&gt;
Longer functions can be spread over several lines.  We can also use the '''return''' keyword to control which value is returned by the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot3 &amp;lt;- function(x=3 ,y=4) {&lt;br /&gt;
+ x_sq &amp;lt;- x^2&lt;br /&gt;
+ y_sq &amp;lt;- y^2&lt;br /&gt;
+ return( sqrt(x_sq + y_sq) )}&lt;br /&gt;
&amp;gt; hypot3(6,8)&lt;br /&gt;
[1] 10&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can check on the contents of a function, by just typing it's name (without parentheses):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot3&lt;br /&gt;
function(x=3 ,y=4) {&lt;br /&gt;
x_sq &amp;lt;- x^2&lt;br /&gt;
y_sq &amp;lt;- y^2&lt;br /&gt;
return( sqrt(x_sq + y_sq) )}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or just check the arguments, using the '''args''' function.  (The body of the function in general is reported as NULL):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; args(hypot3)&lt;br /&gt;
function (x = 3, y = 4) &lt;br /&gt;
NULL&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Packages=&lt;br /&gt;
&lt;br /&gt;
Listed at http://cran.r-project.org/&lt;br /&gt;
&lt;br /&gt;
Let's install the '''multicore''' package, that will give us access to functions within R which will run on the multiple processors which we often find in our computers these days:   &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; install.packages(&amp;quot;multicore&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Et voila!  It is done.&lt;br /&gt;
&lt;br /&gt;
We can check which packages are currently loaded into the library available from our workspace:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; library()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If we need to add one, we type e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; library(multicore)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now, an example of using a function from the multicore package.  The '''lapply''' function, which is included in the standard R core, will map a given function over a list inputs, giving a list of the function outputs in return.  For example, we can map a squaring function over the list of integers from 1 to 3:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; lapply(1:3, function(x) {x^2})&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
which gives us the list:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[[1]]&lt;br /&gt;
[1] 1&lt;br /&gt;
&lt;br /&gt;
[[2]]&lt;br /&gt;
[1] 4&lt;br /&gt;
&lt;br /&gt;
[[3]]&lt;br /&gt;
[1] 9&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now, we can do the same work in parallel using:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; mclapply(1:3, function(x) {x^2})&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Reading Data from File=&lt;br /&gt;
&lt;br /&gt;
R provides some very useful functions for reading and writing data from/to file.&lt;br /&gt;
&lt;br /&gt;
==Text Files==&lt;br /&gt;
&lt;br /&gt;
Let's start with text files.  If your data is organised into a file such that it looks like a table with column headings:  &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Perhaps the simplest one is '''read.table()'''.  If I have a text file with the following contents:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
country              gold silver bronze&lt;br /&gt;
&amp;quot;USA&amp;quot;                46   29     29&lt;br /&gt;
&amp;quot;China&amp;quot;              38   27     23&lt;br /&gt;
&amp;quot;Great Britain&amp;quot;      29   17     19&lt;br /&gt;
&amp;quot;Russian Federation&amp;quot; 24   26     32&lt;br /&gt;
&amp;quot;Republic of Korea&amp;quot;  13   8      7&lt;br /&gt;
&amp;quot;Germany&amp;quot;            11   19     14&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It will be a simple matter to use the '''read.table()''' function to load the data into R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; medals.2012 &amp;lt;- read.table(&amp;quot;medals.txt&amp;quot;, header=TRUE)&lt;br /&gt;
&amp;gt; medals.2012&lt;br /&gt;
             country gold silver bronze&lt;br /&gt;
1                USA   46     29     29&lt;br /&gt;
2              China   38     27     23&lt;br /&gt;
3      Great Britain   29     17     19&lt;br /&gt;
4 Russian Federation   24     26     32&lt;br /&gt;
5  Republic of Korea   13      8      7&lt;br /&gt;
6            Germany   11     19     14&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is a corresponding '''write.table()''' function to export the contents of a data frame into a text file.&lt;br /&gt;
&lt;br /&gt;
CSV files can be easily handled by specifying '''sep=&amp;quot;,&amp;quot;''' as an argument to read.table().  However, for convenience, there are also '''read.csv()''' and '''write.csv()''' functions defined.  For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; write.csv(medals.2012,&amp;quot;medals.csv&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Gives us the file, '''medals.csv''', with the contents:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;quot;&amp;quot;,&amp;quot;country&amp;quot;,&amp;quot;gold&amp;quot;,&amp;quot;silver&amp;quot;,&amp;quot;bronze&amp;quot;&lt;br /&gt;
&amp;quot;1&amp;quot;,&amp;quot;USA&amp;quot;,46,29,29&lt;br /&gt;
&amp;quot;2&amp;quot;,&amp;quot;China&amp;quot;,38,27,23&lt;br /&gt;
&amp;quot;3&amp;quot;,&amp;quot;Great Britain&amp;quot;,29,17,19&lt;br /&gt;
&amp;quot;4&amp;quot;,&amp;quot;Russian Federation&amp;quot;,24,26,32&lt;br /&gt;
&amp;quot;5&amp;quot;,&amp;quot;Republic of Korea&amp;quot;,13,8,7&lt;br /&gt;
&amp;quot;6&amp;quot;,&amp;quot;Germany&amp;quot;,11,19,14&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Binary Files==&lt;br /&gt;
&lt;br /&gt;
The '''save()''' function will store an R data structure in binary form:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; save(medals.2012,file=&amp;quot;medals.RData&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~$ file medals.RData &lt;br /&gt;
medals.RData: gzip compressed data, from Unix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is, of course, a corresponding function to load such data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; load(&amp;quot;medals.RData&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Databases==&lt;br /&gt;
&lt;br /&gt;
If you would like to read and write data directly from/to a database, there are several packages to help you.  See http://cran.r-project.org/doc/manuals/r-release/R-data.html#Relational-databases for more information.&lt;br /&gt;
&lt;br /&gt;
==NetCDF==&lt;br /&gt;
&lt;br /&gt;
The [http://cran.r-project.org/web/packages/ncdf/index.html '''ncdf''' package] provides an interface to NetCDF files.  Before installing the package, you will need the Unidata NetCDF libraries installed on your system.  On Linux, the standard package managers conveniently provide this.  Note that you will need the 'development' packages.  Once the prerequisites are satisfied, you can use the standard R command to install the package from CRAN:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; install.packages(&amp;quot;ncdf&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Examples of Common Tasks=&lt;br /&gt;
&lt;br /&gt;
==Preparing Data==&lt;br /&gt;
&lt;br /&gt;
===Sorting===&lt;br /&gt;
&lt;br /&gt;
Using '''sort''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; railway.engines &amp;lt;- c(&amp;quot;thomas&amp;quot;, &amp;quot;henry&amp;quot;, &amp;quot;gordon&amp;quot;, &amp;quot;edward&amp;quot;, &amp;quot;james&amp;quot;)&lt;br /&gt;
&amp;gt; sort(railway.engines)&lt;br /&gt;
[1] &amp;quot;edward&amp;quot; &amp;quot;gordon&amp;quot; &amp;quot;henry&amp;quot;  &amp;quot;james&amp;quot;  &amp;quot;thomas&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/sort.html&lt;br /&gt;
&lt;br /&gt;
===Random Sampling===&lt;br /&gt;
&lt;br /&gt;
Using '''sample''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; railway.engines &amp;lt;- c(&amp;quot;thomas&amp;quot;, &amp;quot;henry&amp;quot;, &amp;quot;gordon&amp;quot;, &amp;quot;edward&amp;quot;, &amp;quot;james&amp;quot;)&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;gordon&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;james&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;edward&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;thomas&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;gordon&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;james&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/sample.html&lt;br /&gt;
&lt;br /&gt;
===Combining===&lt;br /&gt;
&lt;br /&gt;
Using '''rbind''' to add combine the rows to two data frames:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; country &amp;lt;- c(&amp;quot;France&amp;quot;, &amp;quot;Italy&amp;quot;, &amp;quot;Hungary&amp;quot;, &amp;quot;Australia&amp;quot;)&lt;br /&gt;
&amp;gt; gold &amp;lt;- c(11, 8, 8, 7)&lt;br /&gt;
&amp;gt; silver &amp;lt;- c(11, 9, 4, 16)&lt;br /&gt;
&amp;gt; bronze &amp;lt;- c(12, 11, 5, 12)&lt;br /&gt;
&amp;gt; extras.2012 &amp;lt;- data.frame(country, gold, silver, bronze)&lt;br /&gt;
&amp;gt; rbind(medals.2012, extras.2012)&lt;br /&gt;
              country gold silver bronze&lt;br /&gt;
1                 USA   46     29     29&lt;br /&gt;
2               China   38     27     23&lt;br /&gt;
3       Great Britain   29     17     19&lt;br /&gt;
4  Russian Federation   24     26     32&lt;br /&gt;
5   Republic of Korea   13      8      7&lt;br /&gt;
6             Germany   11     19     14&lt;br /&gt;
7              France   11     11     12&lt;br /&gt;
8               Italy    8      9     11&lt;br /&gt;
9             Hungary    8      4      5&lt;br /&gt;
10          Australia    7     16     12&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/cbind.html&lt;br /&gt;
&lt;br /&gt;
===Binning Data===&lt;br /&gt;
&lt;br /&gt;
Using '''cut''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; girls_2=c(83.8, 86.2, 85.1, 88.6, 83, 88.9, 89.7, 81.3, 88.7, 88.4)&lt;br /&gt;
&amp;gt; bins=cut(girls_2, breaks=3)&lt;br /&gt;
&amp;gt; bins&lt;br /&gt;
 [1] (81.3,84.1] (84.1,86.9] (84.1,86.9] (86.9,89.7] (81.3,84.1] (86.9,89.7]&lt;br /&gt;
 [7] (86.9,89.7] (81.3,84.1] (86.9,89.7] (86.9,89.7]&lt;br /&gt;
Levels: (81.3,84.1] (84.1,86.9] (86.9,89.7]&lt;br /&gt;
&amp;gt; plot(bins)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Plotting the data couldn't be simpler with '''plot(bins)'''!&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/cut.html&lt;br /&gt;
&lt;br /&gt;
==Linear Regression==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(cars)&lt;br /&gt;
&amp;gt; res=lm(dist ~ speed, data=cars)&lt;br /&gt;
&amp;gt; abline(res)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Image:R-lm(cars)-abline.png|400px|thumbnail|center|linear regression of stopping distance against speed from the built-in data set, cars]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
'''Exercises'''&lt;br /&gt;
* You may wish to compare different methods of estimation.  From the MASS package, you can fit a line with the '''rlm''' and '''lqs'' funtions.  You can plot all the lines against the data using:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; abline(res.lm, lty=1)&lt;br /&gt;
&amp;gt; abline(res.rlm, lty=2)&lt;br /&gt;
&amp;gt; abline(res.lqs, lty=3)&lt;br /&gt;
&amp;gt; legend(x=5, y=100, legend=c(&amp;quot;lm&amp;quot;,&amp;quot;rlm&amp;quot;,&amp;quot;lqs&amp;quot;), lty=c(1,2,3))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-patched/library/MASS/html/rlm.html and http://stat.ethz.ch/R-manual/R-devel/RHOME/library/MASS/html/lqs.html.&lt;br /&gt;
&lt;br /&gt;
* Weighted least squares.  The '''lm''' function will accept a vector of weights, '''lm(... weights=...)'''.  If given, the function will optimise the line of best fit according a the equation of weighted least squares.  Experiment with different linear model fits, given different weighting vectors.  Some handy hints for creating a vector of weights:&lt;br /&gt;
** '''w1&amp;lt;-rep(0.1,50)''' will give you a vector, length 50, where each element has a value of 0.1.  W1[1]&amp;lt;-10 will give the first element of the vector a value of 10.&lt;br /&gt;
** '''w2&amp;lt;-seq(from=0.02, to=1.0, by=0.02)''' provides a vector containing a sequence of values from 0.02 to 1.0 in steps of 0.02 (handily, again 50 in total).&lt;br /&gt;
&lt;br /&gt;
==Significance Testing==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; boys_2=c(90.2, 91.4, 86.4, 87.6, 86.7, 88.1, 82.2, 83.8, 91, 87.4)&lt;br /&gt;
&amp;gt; girls_2=c(83.8, 86.2, 85.1, 88.6, 83, 88.9, 89.7, 81.3, 88.7, 88.4)&lt;br /&gt;
&amp;gt; res=var.test(boys_2,girls_2)&lt;br /&gt;
&amp;gt; res&lt;br /&gt;
&lt;br /&gt;
	F test to compare two variances&lt;br /&gt;
&lt;br /&gt;
data:  boys_2 and girls_2 &lt;br /&gt;
F = 1.0186, num df = 9, denom df = 9, p-value = 0.9786&lt;br /&gt;
alternative hypothesis: true ratio of variances is not equal to 1 &lt;br /&gt;
95 percent confidence interval:&lt;br /&gt;
 0.2529956 4.1007126 &lt;br /&gt;
sample estimates:&lt;br /&gt;
ratio of variances &lt;br /&gt;
          1.018559 &lt;br /&gt;
&amp;gt; res=t.test(boys_2, girls_2, var.equal=TRUE, paired=FALSE)&lt;br /&gt;
&amp;gt; res&lt;br /&gt;
&lt;br /&gt;
	Two Sample t-test&lt;br /&gt;
&lt;br /&gt;
data:  boys_2 and girls_2 &lt;br /&gt;
t = 0.8429, df = 18, p-value = 0.4103&lt;br /&gt;
alternative hypothesis: true difference in means is not equal to 0 &lt;br /&gt;
95 percent confidence interval:&lt;br /&gt;
 -1.656675  3.876675 &lt;br /&gt;
sample estimates:&lt;br /&gt;
mean of x mean of y &lt;br /&gt;
    87.48     86.3&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Classification==&lt;br /&gt;
&lt;br /&gt;
===k Nearest Neighbours===&lt;br /&gt;
&lt;br /&gt;
This famous (Fisher's or Anderson's) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa (s), versicolor (c), and virginica (v).&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-patched/library/datasets/html/iris.html&lt;br /&gt;
&lt;br /&gt;
k-nearest neighbour classification for test set from training set: For each row of the test set, the k nearest (in Euclidean distance) training set vectors are found, and the classification is decided by majority vote, with ties broken at random. If there are ties for the kth nearest vector, all candidates are included in the vote.&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/class/html/knn.html&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(class)&lt;br /&gt;
train &amp;lt;- rbind(iris3[1:25,,1], iris3[1:25,,2], iris3[1:25,,3])&lt;br /&gt;
test &amp;lt;- rbind(iris3[26:50,,1], iris3[26:50,,2], iris3[26:50,,3])&lt;br /&gt;
cl &amp;lt;- factor(c(rep(&amp;quot;s&amp;quot;,25), rep(&amp;quot;c&amp;quot;,25), rep(&amp;quot;v&amp;quot;,25)))&lt;br /&gt;
iris3.knn &amp;lt;- knn(train, test, cl, k = 3, prob=TRUE)&lt;br /&gt;
table(predicted=iris3.knn, actual=cl)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
How did we do?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
         actual&lt;br /&gt;
predicted  c  s  v&lt;br /&gt;
        c 23  0  3&lt;br /&gt;
        s  0 25  0&lt;br /&gt;
        v  2  0 22&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Classification Trees===&lt;br /&gt;
&lt;br /&gt;
The kyphosis data frame has 81 rows and 4 columns. representing data on children who have had corrective spinal surgery.&lt;br /&gt;
&lt;br /&gt;
This data frame contains the following columns:&lt;br /&gt;
* Kyphosis: a factor with levels absent present indicating if a kyphosis (a type of deformation) was present after the operation.&lt;br /&gt;
* Age: in months&lt;br /&gt;
* Number: the number of vertebrae involved&lt;br /&gt;
* Start: the number of the first (topmost) vertebra operated on.&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/rpart/html/kyphosis.html&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
fit &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)&lt;br /&gt;
fit2 &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis,&lt;br /&gt;
              parms = list(prior = c(.65,.35), split = &amp;quot;information&amp;quot;))&lt;br /&gt;
fit3 &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis,&lt;br /&gt;
              control = rpart.control(cp = 0.05))&lt;br /&gt;
par(mfrow = c(1,2), xpd = NA) # otherwise on some devices the text is clipped&lt;br /&gt;
plot(fit)&lt;br /&gt;
text(fit, use.n = TRUE)&lt;br /&gt;
plot(fit2)&lt;br /&gt;
text(fit2, use.n = TRUE)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Image:R-classification-tree.png|500px|thumbnail|center|Classification tree for the kyphosis data frame.]]&lt;br /&gt;
&lt;br /&gt;
==Solving Systems of Linear Equations==&lt;br /&gt;
&lt;br /&gt;
See, e.g.: https://source.ggy.bris.ac.uk/wiki/NumMethodsPDEs&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; A &amp;lt;- array(c(1,3,2,3,5,4,-2,6,3), dim=c(3,3))&lt;br /&gt;
&amp;gt; b &amp;lt;- c(5,7,8)&lt;br /&gt;
&amp;gt; solve(A,b)&lt;br /&gt;
[1] -15   8   2&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Suggested Exercises=&lt;br /&gt;
&lt;br /&gt;
If you would like to work through some exercises, with model answers included, you could take a look at:&lt;br /&gt;
* http://www2.warwick.ac.uk/fac/sci/statistics/staff/academic-research/reed/rexercises.pdf&lt;br /&gt;
&lt;br /&gt;
If you would prefer to noodle about with some real-world data, you could take a look at:&lt;br /&gt;
* http://www.theguardian.com/news/datablog/2010/oct/18/historic-government-spending-area#data&lt;br /&gt;
&lt;br /&gt;
=Writing Faster R Code=&lt;br /&gt;
&lt;br /&gt;
In the above sections we've introduced a number of features of R and have begun the journey to becoming a proficient and productive user of the language.  In the remaining sections, we'll switch tack and focus on a question commonly asked by those beginning to use R in anger--'''&amp;quot;My R code is slow.  How can I speed it up?&amp;quot;'''.  In this section we'll consider the related tasks of finding which bits of your R code is responsible for the majority of the run-time and what you can do about it.&lt;br /&gt;
&lt;br /&gt;
==Profiling &amp;amp; Timing==&lt;br /&gt;
&lt;br /&gt;
In order to remain productive (and sane, and have a social life...), it is essential that we first identify which portions of your R code are responsible for the majority of the run-time.  We could spend ages optimising a portion that we ''think'' may be running slowly, but computers have the gift(!) to constantly surprise us, and if that portion of your program accounted for, say, 10% of the run-time, then you will have sweated for absolutely no useful gain.&lt;br /&gt;
&lt;br /&gt;
The simplest method of investigation is to simply time the application of a function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
system.time(some.function())&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can get a more detailed analysis of a block of code using the built-in R profiler.  The general pattern of invocation is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
Rprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
# Do some work&lt;br /&gt;
Rprof()&lt;br /&gt;
summaryRprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For example, here's an R script, '''profile.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
Rprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
# Create a 10 x 100,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(100000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)})&lt;br /&gt;
Rprof()&lt;br /&gt;
summaryRprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Which I ran by typing:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
R CMD BATCH profile.r&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the output file, '''profile.r.Rout''', I found the following break down:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
               self.time self.pct total.time total.pct&lt;br /&gt;
&amp;quot;simpleLoess&amp;quot;       4.84    88.00       5.10     92.73&lt;br /&gt;
&amp;quot;rnorm&amp;quot;             0.22     4.00       0.22      4.00&lt;br /&gt;
&amp;quot;loess.smooth&amp;quot;      0.18     3.27       5.28     96.00&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The profile tells us that the function '''simpleLoess''' take 88% of the runtime, whereas '''rnorm''' takes only 4%.&lt;br /&gt;
&lt;br /&gt;
==Preallocation of Memory==&lt;br /&gt;
&lt;br /&gt;
As with other scripting languages, such as MATLAB, the simplest method that you can use to speed up your R code is to pre-allocate the storage for variables whenever possible.  To see the benefits of this, consider the following two functions:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; f1 &amp;lt;- function() {&lt;br /&gt;
+ v &amp;lt;- c()&lt;br /&gt;
+ for (i in 1:30000)&lt;br /&gt;
+   v[i] &amp;lt;- i^2&lt;br /&gt;
+ }&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; f2 &amp;lt;- function() {&lt;br /&gt;
+ v &amp;lt;- c(NA)&lt;br /&gt;
+ length(v) &amp;lt;- 30000&lt;br /&gt;
+ for (i in 1:30000)&lt;br /&gt;
+   v[i] &amp;lt;- i^2&lt;br /&gt;
+ }&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Timing calls to each of them shows that the pre-allocation of memory gives a whopping ~'''x30 speed-up'''.  Your mileage will vary depending upon the details of your code.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; system.time(f1())&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  1.720   0.040   1.762&lt;br /&gt;
&amp;gt; system.time(f2())&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.052   0.000   0.05&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Vectorised Operations==&lt;br /&gt;
&lt;br /&gt;
The other principle method for speeding up your R code is to eliminate loops whenever you can.  Many functions and operators in R will accept arrays as input, rather than just single values and this may allow you to not use a loop.  The examples in the previous section used for loops to step through an array, squaring each element.  However, you can achieve the same result far more quickly by passing the array ''en masse'' to exponentiation operator:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; system.time(v &amp;lt;- (1:1000000)^2)&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.024   0.004   0.026&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Here we've been able to square 1,000,000 items in half the time it took to process 30,000!&lt;br /&gt;
&lt;br /&gt;
==Calling Functions Written in a Compiled Language (e.g. C or Fortran)==&lt;br /&gt;
&lt;br /&gt;
Another way to get more speed is to outsource portions of R code that are found to be slow to a compiled language, such as C or Fortran.  A good starting point on this topic is:&lt;br /&gt;
&lt;br /&gt;
* http://mazamascience.com/WorkingWithData/?p=1067&lt;br /&gt;
&lt;br /&gt;
=R and HPC=&lt;br /&gt;
&lt;br /&gt;
If you've profiled your code and tried all that you can to speed it up, as described in the previous section, you might be interested in the various initiatives that exist to run R on high performance computers, such as bluecrsytal:  &lt;br /&gt;
&lt;br /&gt;
* http://cran.r-project.org/web/views/HighPerformanceComputing.html&lt;br /&gt;
&lt;br /&gt;
We will see in the following examples, the general approach to running R in parallel is to arrange your task so that a function is applied to a list of inputs, and then to split the list over several CPU cores or cluster worker nodes.&lt;br /&gt;
&lt;br /&gt;
==Multicore==&lt;br /&gt;
&lt;br /&gt;
The '''multicore''' package allows us to make use of several CPU cores within a single machine.  Note, however, that the package  does not work on a MS Windows computers.&lt;br /&gt;
&lt;br /&gt;
As an example, let's look at the use of the package's '''mclapply''' function, a multicore equivalent of R's built-in list apply mapper, '''lapply'''.   I saved the following commands into an R script called '''mutlicore.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(multicore)&lt;br /&gt;
# how many cores are present?&lt;br /&gt;
multicore:::detectCores()&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And used the following submission script to run it on bluecrystal phase2:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=00:00:05&lt;br /&gt;
&lt;br /&gt;
#! Ensure that we have the correct version of R loaded&lt;br /&gt;
module add languages/R-2.15.1&lt;br /&gt;
&lt;br /&gt;
#! change the working directory (default is home directory)&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
#! Run the R script&lt;br /&gt;
R CMD BATCH multicore.r&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After the job had run, I got the following output in the file '''multicore.r.Rout''':&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(multicore)&lt;br /&gt;
&amp;gt; # how many cores are present?&lt;br /&gt;
&amp;gt; multicore:::detectCores()&lt;br /&gt;
[1] 8&lt;br /&gt;
&amp;gt; # Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
&amp;gt; data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
&amp;gt; # Map a function over the matrix.  First in serial..&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.674   0.007   0.749 &lt;br /&gt;
&amp;gt; # .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.301   0.074   0.113 &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Rmpi==&lt;br /&gt;
&lt;br /&gt;
The '''Rmpi''' package allows us to create and use cohorts of message passing processes from within R.  It does so by providing an interface to the MPI (Message Passing Interface) library.&lt;br /&gt;
&lt;br /&gt;
In order to use the Rmpi package on BCp2, you will need the '''ofed/openmpi/gcc/64/1.4.2-qlc''' module loaded.&lt;br /&gt;
&lt;br /&gt;
Here's a short example that I saved as '''Rmpi.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(Rmpi)&lt;br /&gt;
# spawn as many slaves as possible&lt;br /&gt;
mpi.spawn.Rslaves()&lt;br /&gt;
mpi.remote.exec(mpi.get.processor.name())&lt;br /&gt;
mpi.remote.exec(runif(1))&lt;br /&gt;
mpi.close.Rslaves()&lt;br /&gt;
mpi.quit()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I submitted the job to BCp2 using the following submission script:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#PBS -l nodes=4:ppn=1,walltime=00:00:05&lt;br /&gt;
&lt;br /&gt;
#! Ensure that we have the correct version of R loaded&lt;br /&gt;
module add languages/R-2.15.1&lt;br /&gt;
&lt;br /&gt;
#! change the working directory (default is home directory)&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
#! Create a machine file (used for multi-node jobs)&lt;br /&gt;
cat $PBS_NODEFILE &amp;gt; machine.file.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
#! Disable PSM on the QLogic HCAs&lt;br /&gt;
export OMPI_MCA_mtl=^psm&lt;br /&gt;
&lt;br /&gt;
#! Run the R script&lt;br /&gt;
mpirun -np 1 -machinefile machine.file.$PBS_JOBID R CMD BATCH Rmpi.r&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and got the following output:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(Rmpi)&lt;br /&gt;
&amp;gt; # spawn as many slaves as possible&lt;br /&gt;
&amp;gt; mpi.spawn.Rslaves()&lt;br /&gt;
        4 slaves are spawned successfully. 0 failed.&lt;br /&gt;
master (rank 0, comm 1) of size 5 is running on: u03n074 &lt;br /&gt;
slave1 (rank 1, comm 1) of size 5 is running on: u03n098 &lt;br /&gt;
slave2 (rank 2, comm 1) of size 5 is running on: u04n029 &lt;br /&gt;
slave3 (rank 3, comm 1) of size 5 is running on: u04n030 &lt;br /&gt;
slave4 (rank 4, comm 1) of size 5 is running on: u03n074 &lt;br /&gt;
&amp;gt; mpi.remote.exec(mpi.get.processor.name())&lt;br /&gt;
$slave1&lt;br /&gt;
[1] &amp;quot;u03n098&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave2&lt;br /&gt;
[1] &amp;quot;u04n029&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave3&lt;br /&gt;
[1] &amp;quot;u04n030&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave4&lt;br /&gt;
[1] &amp;quot;u03n074&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;gt; mpi.remote.exec(runif(1))&lt;br /&gt;
         X1        X2        X3        X4&lt;br /&gt;
1 0.5154871 0.5154871 0.5154871 0.5154871&lt;br /&gt;
&amp;gt; mpi.close.Rslaves()&lt;br /&gt;
[1] 1&lt;br /&gt;
&amp;gt; mpi.quit()&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Snow==&lt;br /&gt;
&lt;br /&gt;
Calling MPI routines from within R may be too low level for many people to use comfortably.  Happily, the '''snow''' package provides a higher level abstraction for distributed memory programming from within R.&lt;br /&gt;
&lt;br /&gt;
Here's my example program that a saved as '''snow.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(snow)&lt;br /&gt;
# request a cluster of 3 worker nodes&lt;br /&gt;
cl &amp;lt;- makeCluster(3)&lt;br /&gt;
clusterCall(cl, function() Sys.info()[c(&amp;quot;nodename&amp;quot;,&amp;quot;machine&amp;quot;)])&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using snow, across a cluster of workers)&lt;br /&gt;
system.time(x &amp;lt;- clusterApply(cl, data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
stopCluster(cl)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I ran it on BCp2 using the same submission script given for Rmpi, save for changing Rmpi.r to snow.r.  The output was:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(snow)&lt;br /&gt;
&amp;gt; # request a cluster of 3 worker nodes&lt;br /&gt;
&amp;gt; cl &amp;lt;- makeCluster(3)&lt;br /&gt;
Loading required package: Rmpi&lt;br /&gt;
        3 slaves are spawned successfully. 0 failed.&lt;br /&gt;
&amp;gt; clusterCall(cl, function() Sys.info()[c(&amp;quot;nodename&amp;quot;,&amp;quot;machine&amp;quot;)])&lt;br /&gt;
[[1]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u01n105&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
[[2]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u02n014&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
[[3]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u03n098&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&amp;gt; # Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
&amp;gt; data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
&amp;gt; # Map a function over the matrix.  First in serial..&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.711   0.001   0.715 &lt;br /&gt;
&amp;gt; # .. and secondly in parallel (using snow, across a cluster of workers)&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- clusterApply(cl, data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.259   0.001   0.260 &lt;br /&gt;
&amp;gt; stopCluster(cl)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Parallel==&lt;br /&gt;
&lt;br /&gt;
The '''parallel''' package is an amalgamation of functionality from the multicore and snow packages.  The shared memory parallelism in this package runs on an MS Windows machine (unlike the multicore package). &lt;br /&gt;
&lt;br /&gt;
I trivial translation of our previous multicore example is:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(parallel)&lt;br /&gt;
# how many cores are present?&lt;br /&gt;
parallel:::detectCores()&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I have not been able to get a distributed memory cluster working on BCp2 using the parallel package.&lt;br /&gt;
&lt;br /&gt;
=Further Reading=&lt;br /&gt;
&lt;br /&gt;
* [http://shop.oreilly.com/product/9780596801717.do R in a Nutshell]&lt;br /&gt;
* [http://shop.oreilly.com/product/0636920021421.do Parallel R]&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=MATLAB1&amp;diff=9412</id>
		<title>MATLAB1</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=MATLAB1&amp;diff=9412"/>
		<updated>2014-03-07T12:08:30Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: /* Finding where your code is slow */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[category:Pragmatic Programming]]&lt;br /&gt;
'''An Introduction MATLAB'''&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
Rather than re-invent the wheel, we'll use some tried and tested tutorial material.  The following notes from the Maths department at the University of Dundee are concise, comprehensive, but also easy to read:&lt;br /&gt;
http://www.maths.dundee.ac.uk/ftp/na-reports/MatlabNotes.pdf&lt;br /&gt;
&lt;br /&gt;
Once you have read through and understood the above notes, you might like to try your hand at some example exercises:&lt;br /&gt;
* easier ones: http://www.facstaff.bucknell.edu/maneval/help211/basicexercises.html&lt;br /&gt;
* harder ones: http://www.cl.cam.ac.uk/teaching/2006/UnixTools/matlab-answers.pdf&lt;br /&gt;
&lt;br /&gt;
=Hints and Tips on Performance=&lt;br /&gt;
&lt;br /&gt;
A common query is, '''&amp;quot;How can I speed up my MATLAB code?&amp;quot;'''.  People often go on to say that it ran fine when they were developing their code, but now that their ambition has grown and they are working on larger problems, they end up waiting for days to get a result.  This is sometimes followed up by, &amp;quot;it'll run faster on the HPC system, right?&amp;quot;  Well, not necessarily.&lt;br /&gt;
&lt;br /&gt;
Let's try to pick some of this apart.&lt;br /&gt;
&lt;br /&gt;
There are several aspects of some MATLAB code that can really limit it's performance.  For loops are a common limiting factor, as is allocation of memory on-the-fly.  These limitations can often be addressed by:&lt;br /&gt;
&lt;br /&gt;
* Pre-allocation memory, where appropriate.&lt;br /&gt;
* Replacing loops over the elements of a vector or matrix with:&lt;br /&gt;
** Scalar and array operations.&lt;br /&gt;
** Built-in functions which take vectors or matrices as arguments.&lt;br /&gt;
&lt;br /&gt;
However, before we get into examples of improved code, we need to determine '''where''' your code is spending the majority of it's time.  It would not be sensible to invest lots of effort in re-writing a section of your program which took only 1% of the overall runtime.  Accordingly, the next section focusses on methods for finding ''hot spots'' in your code: &lt;br /&gt;
&lt;br /&gt;
==Finding where your code is slow==&lt;br /&gt;
&lt;br /&gt;
Possibly the simplest way to assess the performance of a sequence of MATLAB operations is to employ the timing functions '''tic''' and '''toc'''.  For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;matlab&amp;quot;&amp;gt;&lt;br /&gt;
tic;&lt;br /&gt;
n=1500;&lt;br /&gt;
A=rand(n);&lt;br /&gt;
B=pinv(A);&lt;br /&gt;
toc&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
gives the result:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Elapsed time is 2.163306 seconds.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A more detailed analysis can be elicited from the MATLAB profiler.  Let's suppose we have a function which converts cartesian to polar coordinates:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;matlab&amp;quot;&amp;gt;&lt;br /&gt;
function [r,theta] = cart2plr(x,y)&lt;br /&gt;
%   cart2plr  Convert Cartesian coordinates to polar coordinates&lt;br /&gt;
%&lt;br /&gt;
%   [r,theta] = cart2plr(x,y) computes r and theta with&lt;br /&gt;
%&lt;br /&gt;
%       r = sqrt(x^2 + y^2);&lt;br /&gt;
%       theta = atan2(y,x);&lt;br /&gt;
&lt;br /&gt;
r = sqrt(x^2 + y^2);&lt;br /&gt;
theta = atan2(y,x);&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and we call that function a number of times in the following script:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;matlab&amp;quot;&amp;gt;&lt;br /&gt;
profile on&lt;br /&gt;
for i=1:3000&lt;br /&gt;
  cart2plr(rand(),rand());&lt;br /&gt;
end&lt;br /&gt;
profile off&lt;br /&gt;
profile viewer&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We will be able to see the following analysis in the profile viewer window:&lt;br /&gt;
&lt;br /&gt;
[[Image:MATLAB-Profiler.png|thumb|800px|none|The MATLAB profiler]]&lt;br /&gt;
&lt;br /&gt;
==Preallocation of Vectors==&lt;br /&gt;
&lt;br /&gt;
Memory allocation is an expensive operation.  MATLAB will allow us to assign values to an array inside a loop, where the array keeps growing to accommodate all the iterations of the loop.  For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;matlab&amp;quot;&amp;gt;&lt;br /&gt;
for i=1:1000&lt;br /&gt;
  vec(i) = i^2;&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
However, this flexibility will come at the cost of performance, as the frequent resizing of the container ''vec'' will incur many requests for additional memory for storage.  Therefore, it is wise to pre-allocate storage, if you can predict ahead of time how large the container needs to be.  This is probably the simplest way in which you can speed up your MATLAB code.&lt;br /&gt;
&lt;br /&gt;
To demonstrate the benefit of pre-allocation, consider the following two MATLAB scripts.&lt;br /&gt;
&lt;br /&gt;
'''noprealloc.m''':&lt;br /&gt;
&amp;lt;source lang=&amp;quot;matlab&amp;quot;&amp;gt;&lt;br /&gt;
tic;&lt;br /&gt;
for i=1:3000,&lt;br /&gt;
  for j=1:3000,&lt;br /&gt;
    x(i,j)=i+j;&lt;br /&gt;
  end&lt;br /&gt;
end&lt;br /&gt;
toc&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''prealloc.m''':&lt;br /&gt;
&amp;lt;source lang=&amp;quot;matlab&amp;quot;&amp;gt;&lt;br /&gt;
tic;&lt;br /&gt;
x=zeros(3000);&lt;br /&gt;
for i=1:3000,&lt;br /&gt;
  for j=1:3000,&lt;br /&gt;
    x(i,j)=i+j;&lt;br /&gt;
  end&lt;br /&gt;
end&lt;br /&gt;
toc&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
When we run these two scripts (on BCp2), we see a ''significant'' difference in the runtime:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; noprealloc&lt;br /&gt;
Elapsed time is 14.317089 seconds.&lt;br /&gt;
&amp;gt;&amp;gt; prealloc  &lt;br /&gt;
Elapsed time is 0.279115 seconds.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Scalar and Array Operators==&lt;br /&gt;
&lt;br /&gt;
For example, if you would like to perform a scalar operation to a vector, '''vec''', (say, multiply each element by 3) then you do not need to write a loop.&lt;br /&gt;
&lt;br /&gt;
Replace:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;matlab&amp;quot;&amp;gt;&lt;br /&gt;
for i = 1:length(vec)&lt;br /&gt;
  vec(i) = vec(i) * 3;&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
with:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;matlab&amp;quot;&amp;gt;&lt;br /&gt;
vec = vec*3&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Similarly, if you have two vectors or matrices '''of the same size''', you can perform element-by-element operations using, e.g.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;matlab&amp;quot;&amp;gt;&lt;br /&gt;
m3 = m1 - m2&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that array versions of the multiplication, division and exponentiation operators are '''.*''', '''./''' and '''.^''', respectively.&lt;br /&gt;
&lt;br /&gt;
If you wish to apply the same function to all the elements of an array or vector, then you can pass it as an argument to the function.  If you write your own functions, ensure that the operators that you use inside the function can handle vectors or matrices.&lt;br /&gt;
&lt;br /&gt;
==Built-in Functions==&lt;br /&gt;
&lt;br /&gt;
MATLAB contains a number of built-in functions which can save you from writing a loop.  Examples include:&lt;br /&gt;
&lt;br /&gt;
* '''sum''' and '''prod''':  which compute the sum or product, respectively, of all the elements of vector.&lt;br /&gt;
* '''cumsum''' and '''cumprod''': both return a vector and are the cumulative counterparts of ''''sum''' and '''prod'''.&lt;br /&gt;
* '''min''' and '''max'''.&lt;br /&gt;
* '''any''' and '''all''': will return true if any or all of the elements of a vector or matrix are true (&amp;gt;0), respectively.&lt;br /&gt;
* '''find''':  returns the indices of a vector that satisfy the given expression.  For example, '''find(vec &amp;gt; 7)''' returns the indices of all elements of vec that are greater than 7.&lt;br /&gt;
&lt;br /&gt;
==MEX Files==&lt;br /&gt;
&lt;br /&gt;
Another route to higher performance is to outsource an identified bottleneck in you MATLAB code to a piece of compiled code written in C/C++ or Fortran.  This is the MEX file approach.  A good introduction to creating MEX files is:&lt;br /&gt;
&lt;br /&gt;
* http://classes.soe.ucsc.edu/ee264/Fall11/cmex.pdf&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
=Matlab as a Calculator=&lt;br /&gt;
&lt;br /&gt;
==The Golden Ratio==&lt;br /&gt;
&lt;br /&gt;
http://en.wikipedia.org/wiki/Golden_ratio&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;matlab&amp;quot;&amp;gt;&lt;br /&gt;
phi = (1 + sqrt(5))/2&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
phi =&lt;br /&gt;
    1.6180&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;matlab&amp;quot;&amp;gt;&lt;br /&gt;
format long&lt;br /&gt;
phi&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
phi =&lt;br /&gt;
    1.618033988749895&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
φ2 − φ − 1 = 0&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;matlab&amp;quot;&amp;gt;&lt;br /&gt;
p = [1 -1 -1]&lt;br /&gt;
r = roots(p)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
r =&lt;br /&gt;
  -0.618033988749895&lt;br /&gt;
   1.618033988749895&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= =&lt;br /&gt;
= =&lt;br /&gt;
= =&lt;br /&gt;
= =&lt;br /&gt;
=Vectors=&lt;br /&gt;
&lt;br /&gt;
separated by either commas or spaces:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;matlab&amp;quot;&amp;gt;&lt;br /&gt;
v = [ 1 3, sqrt(5)]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
v =&lt;br /&gt;
1.0000   3.0000   2.2361&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;matlab&amp;quot;&amp;gt;&lt;br /&gt;
length(v)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ans =&lt;br /&gt;
3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;matlab&amp;quot;&amp;gt;&lt;br /&gt;
v = [ 1; 3; sqrt(5)]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
c =&lt;br /&gt;
    1.0000&lt;br /&gt;
    3.0000&lt;br /&gt;
    2.2361&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
--&amp;gt;&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=Fortran1&amp;diff=9409</id>
		<title>Fortran1</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=Fortran1&amp;diff=9409"/>
		<updated>2014-03-03T15:14:49Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: /* Containers and the Types of Things */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Pragmatic Programming]]&lt;br /&gt;
'''Fortran1: The Basics'''&lt;br /&gt;
=Getting the content for the practical=&lt;br /&gt;
We'll forge our path through the verdant garden of '''Fortran90''' using a number of examples.  To get your copy of these examples, from the version control repository, login to your favourite linux machine (perhaps dylan), and type:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
svn co https://svn.ggy.bris.ac.uk/subversion-open/fortran1/trunk fortran1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=hello, world=&lt;br /&gt;
&lt;br /&gt;
Without further ado, and in-keeping with the most venerable of traditions, let's meet our first example--&amp;quot;hello, world&amp;quot;:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
cd fortran1/examples/example1&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can compile the program by typing:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
gfortran hello_world.f90 -o hello_world.exe&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and run it by typing:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
./hello_world.exe&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Bingo!'''  You've just compiled and run, perhaps your first, Fortran90 program.  Hurrah! we're on our way:)  Everybody whoop!  Yeehah!&lt;br /&gt;
&lt;br /&gt;
OK, OK...you'd better reign in your excitement.  This is serious you know!:)&lt;br /&gt;
&lt;br /&gt;
Enough of the magic, let's take a look inside the source code file.  Take a look at the contents of hello_world.f90, using '''cat''', '''less''', '''more''' or your favourite text editor, and you'll see:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;fortran&amp;quot;&amp;gt;&lt;br /&gt;
!&lt;br /&gt;
! This is a comment line.&lt;br /&gt;
! Below is a simple 'hello, world' program written in Fortran90.&lt;br /&gt;
! It illustrates creating a main 'program' unit together&lt;br /&gt;
! with good habits, such as using 'implicit none' and comments.&lt;br /&gt;
!&lt;br /&gt;
program hello_world&lt;br /&gt;
  implicit none&lt;br /&gt;
  write(*,*)  &amp;quot;hello, world&amp;quot;&lt;br /&gt;
end program hello_world&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We have:&lt;br /&gt;
#some comment lines, giving us a helpful narrative&lt;br /&gt;
#the start of the '''main program unit'''&lt;br /&gt;
#the '''implicit none''' statement (more of that in the next section, but suffice to say, every well dressed Fortran program should have one)&lt;br /&gt;
#a '''write''' statement, printing our greeting to the screen&lt;br /&gt;
#and last, but not least, the end of the main program.&lt;br /&gt;
&lt;br /&gt;
This is all pretty straight forward, right?  Open-up your text editor and try changing the greeting, just for the heck of it.  Retype '''make''' and re-run it.  We'll adopt a similar strategy for all the other examples we'll meet.  If you ever want to get back to the original version of a program, just type:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
svn revert hello_world.f90&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Although this has all been fairly painless, we have made a very significant step--we are now editing, compiling and running Fortran programs.  All the rest is basically just details!:)&lt;br /&gt;
&lt;br /&gt;
=Containers and the Types of Things=&lt;br /&gt;
&lt;br /&gt;
As fun as &amp;quot;hello, world&amp;quot; was, let's spice things up a little.  For instance, let's introduce some variables.  We'll need to move to the next example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
cd ../example2&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Fortran90 has several types of built-in, or '''intrinsic''', variables.  Take a look in '''basic_types.f90''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;fortran&amp;quot;&amp;gt;&lt;br /&gt;
  character         :: sex       ! a letter e.g. 'm' or 'f'&lt;br /&gt;
  character(len=12) :: name      ! a string&lt;br /&gt;
  logical           :: wed       ! married?&lt;br /&gt;
  integer           :: numBooks  ! must be a whole number&lt;br /&gt;
  real              :: height    ! e.g. 1.83 m (good include units in comment)&lt;br /&gt;
  complex           :: z         ! real and imaginary parts&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This set of types suffice for a great many programs.  The above are all single entities.  We'll meet arrays of things in a couple of examples time.  In ''Fortran2'', we'll also meet user-defined types.  These allow us to group instances of intrinsic types together forming new kinds of thing--new types.  User-defined types are the ''bees knees'' and can make programs much easier to work with.  We'll leave the details to that later course, however.&lt;br /&gt;
&lt;br /&gt;
The above snippet shows some variable '''declarations''', along with a helpful comments.  It's good practice to comment your  declarations, as a programmer new to your code (or even yourself in a couple of months time) can have a hard time figuring out what is supposed to be stored in such-and-such a variable.  While we're on the topic, it's also good practice to give your variables meaningful names, even if they are long.  Trust me, a bit more typing now, perhaps, but a lot less head-scratching later on over what is stored in the inspiringly named '''xbNew''', or '''z2'''!&lt;br /&gt;
&lt;br /&gt;
It's often a good idea to give variables an initial value when we declare them (working with uninitialised variables in another common source of bugs).  This time we're looking at '''more_types.f90''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;fortran&amp;quot;&amp;gt;&lt;br /&gt;
  character         :: nucleotide = 'A'            ! DNA has A,C,G &amp;amp; T&lt;br /&gt;
  character(len=50) :: infile = 'yourData.nc', outfile = &amp;quot;myData.nc&amp;quot;&lt;br /&gt;
  logical           :: initialised = .true.        ! or .false.&lt;br /&gt;
  real              :: solConst       = 1.37       ! Solar 'constant' in kW/m^2&lt;br /&gt;
  complex           :: sqrtMinusOne = (0.0,1.0)    ! (real,imag), sqrd gives (-1.0,0.0)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Fortran90 also allows us to gives variables certain attributes.  For example, from '''pitfalls.f90''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;fortran&amp;quot;&amp;gt;&lt;br /&gt;
  real,parameter    :: pi = 3.14159  ! a fixed constant&lt;br /&gt;
  real(kind=8)      :: totPrecip     ! this is preferred to 'double precision'&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''parameter''' attribute tells you, me and Fortran that '''pi''' is a '''constant'''.  It's fixed and it's a compile-time error if we try to change it.  This is a good thing, since we can catch nasty bugs that can creep in that way.  We never want pi to be anything other than pi, right?!  Assigning '''parameter''' attributes to quantities we know are constant is an example of '''defensive programming''', or ''bug avoidance''!&lt;br /&gt;
&lt;br /&gt;
By default reals in Fortran are represented using 4 bytes of memory.  The addition of '''(kind=8)''' gives us an 8-byte real, often referred to a '''double precision''' real.  Fortran does have a ''double precision'' type, but the '''kind''' attribute is preferred.  (Many compilers also support the promotion of all default, 4-byte reals and integers in your program through flags, typically named ''-r8'' and ''-i8'', respectively.)  8-byte reals can be useful as accumulators, since they can help to avoid rounding errors.&lt;br /&gt;
&lt;br /&gt;
The third program illustrates some arithmetic pitfalls--'''beware!''':&lt;br /&gt;
* '''integer division''' and it's truncation&lt;br /&gt;
* '''casting''' as a solution to mismatched types&lt;br /&gt;
* (integer) '''overflow'''&lt;br /&gt;
* (real number) '''underflow''' &lt;br /&gt;
&lt;br /&gt;
Let's have a play with the programs.  You an compile them by typing:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and run them by typing, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
./basic_types.exe&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
or,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
./more_types.exe&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
or,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
./pitfalls.exe&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Although we compiled our first example ''by-hand'',  we'll be using '''make''' to compile the rest of our example programs, so you won't have to worry about that side of things.  (If you'd like to know more about make, you can take a look at [[Make|our course on make]], presented in a very similar style to this here excursion into Fortran90.)&lt;br /&gt;
&lt;br /&gt;
Now modify the program (remembering '''svn revert intrinsic_types.f90''' if you make a mess).  Try giving values to various types and also using operators such as:&lt;br /&gt;
&lt;br /&gt;
* '''arithmetic''': +, -, /, ** (exponentiation)&lt;br /&gt;
* '''functions''': sin, cos, floor (rounding down)&lt;br /&gt;
* '''logic''': .and., .or., .not., .eqv., .neqv.&lt;br /&gt;
* and you'll meet many more in the future..&lt;br /&gt;
&lt;br /&gt;
For example:&lt;br /&gt;
&lt;br /&gt;
* Calculate the [http://en.wikipedia.org/wiki/Sphere#Surface_area_of_a_sphere surface area of a sphere].&lt;br /&gt;
* Is sine of pi divided by four really the same as one over root two?&lt;br /&gt;
* What is the truth table of the NAND operator?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
About that mysterious '''implicit none''' which we keep seeing at the start of our programs.  Let me tell you a story:  Once upon a time, the kings and queens of the garden of Fortran, being a generous and well meaning bunch, decided to save the programmers the bother of specifying the type of their variables.  &amp;quot;Don't bother!&amp;quot;, they said, &amp;quot;just be sure to give them appropriate names, and well sort out the rest.&amp;quot;  &amp;quot;Thank you.  Thank you very much&amp;quot;, said the programmers, and it was decreed that the names of integers should start with the letters i, j, k, l, m, or n, and the names of reals would start with the other letters.  Anyhow, this all seemed like a great wheeze and everybody was very happy.  This lasted for a while, but after time, the programmers got complacent and forgot how to name things and it all got rather messy.  Integers became reals, reals became integers and before they knew it, the programmers had '''bugs all over the place!'''  Boo.  The kings and queens conferred on the matter and they realised that they had made a grave error in their gift of implicit typing.  However, they couldn't undo what they had done.  Instead, they had to persuade the programmers to give it up voluntarily.  &amp;quot;Anything, anything!&amp;quot;, they pleaded &amp;quot;to get rid of '''all these bugs!'''&amp;quot;, and so it passed that every good programmer agreed to put '''implicit none''' at the top of every program they wrote, and they all lived happily ever after.&lt;br /&gt;
&lt;br /&gt;
=If, Do, Select and Other Ways to Control the Flow=&lt;br /&gt;
&lt;br /&gt;
Programs are like cooking recipes.  We've covered the how much of this and how much of that part.  However, we also need to cover the doing bit--do this and then do that, and for how long etc.  This is generically termed '''control flow'''.  Fortran gives us a fairly rich language with which to describe how we would like things done.  Next example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cd ../example3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Take a look inside '''control.f90'''.  We have some variable declarations and then we encounter our first '''conditional''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;fortran&amp;quot;&amp;gt;&lt;br /&gt;
  if (initialised .eqv. .true.) then&lt;br /&gt;
     write (*,*) &amp;quot;The variable 'area' is initialised and has the value:&amp;quot;, area&lt;br /&gt;
  else&lt;br /&gt;
     write (*,*) &amp;quot;The variable 'area' is NOT initialised and has the value:&amp;quot;, area&lt;br /&gt;
  end if&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This is fairly self eplanatory--'''if'''..something is the case..'''then'''..'''else'''..  You can also have an '''elseif'''.  In fact you can have as many of those as you like.  You can also have as many statements inside each clause as you like.  Talk about spoiled!&lt;br /&gt;
&lt;br /&gt;
'''Select''' is another control structure:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;fortran&amp;quot;&amp;gt;&lt;br /&gt;
  select case (nucleotide)&lt;br /&gt;
     case ('A')&lt;br /&gt;
        write (*,*) &amp;quot;nucleotide is Adenine&amp;quot;&lt;br /&gt;
     case ('G')&lt;br /&gt;
        write (*,*) &amp;quot;nucleotide is Guanine&amp;quot;&lt;br /&gt;
     case ('T')&lt;br /&gt;
        write (*,*) &amp;quot;nucleotide is Thymine&amp;quot;&lt;br /&gt;
     case ('C')&lt;br /&gt;
        write (*,*) &amp;quot;nucleotide is Cytosine&amp;quot;&lt;br /&gt;
     case default&lt;br /&gt;
        write (*,*) &amp;quot;default is the catch-all.  'Fall-through' can be a nasty bug.&amp;quot;&lt;br /&gt;
        stop&lt;br /&gt;
     end select&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This is a neat way of saying, &amp;quot;if..then..elsif..else..&amp;quot;  The '''default''' clause at the bottom is important.  Dropping this off can lead to '''fall-through''', where none of the cases triggered.  This is rarely what you want and can lead to nasty bugs.&lt;br /&gt;
&lt;br /&gt;
Our first '''do loop''' is of the form:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;fortran&amp;quot;&amp;gt;&lt;br /&gt;
  do ii=1,5&lt;br /&gt;
     write (*,*) &amp;quot;Do loop counter ii is:&amp;quot;, ii&lt;br /&gt;
  end do&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Again, this is fairly readable.  '''ii''' is first given the value of 1, the body of the loop is evaluated and then we go back to the top again.  Except this time we '''increment''' the counter (ii) by the '''default amount''', which is 1.  When we're at the top and we take '''ii''' past 5, we stop the loop and move on to the next statement passed the '''end do'''.  You're allowed as many statements inside the loop as you like.  Indeed, you're allowed more loops, conditions, loops in loops, just about anything you can think of!  '''Beware''', however, debugging a huge construct of nested this that and the other can be beyond the limits of human patience.  Keep our programs simple and you will be happier for it.&lt;br /&gt;
&lt;br /&gt;
The other loop examples show variations in the stopping condition and '''stride''' (i.e. how much we increment by), including counting backwards, and stopping before we've even started!&lt;br /&gt;
&lt;br /&gt;
You'll notice that all the loops we've seen thus far will run for a pre-determined number of iterations.  What if we don't know how many iterations we want ahead of time.  Some languages, such as C for example, include a '''while loop''' for this purpose.  In Fortran we still use '''do''', but omit any start and end conditions.  Note that we must include an '''exit condition''', if we want such a loop to terminate:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;fortran&amp;quot;&amp;gt;&lt;br /&gt;
  threshold = 0.5&lt;br /&gt;
  ii = 0&lt;br /&gt;
  do  &lt;br /&gt;
     random_value = rand()&lt;br /&gt;
     if (random_value .gt. threshold) then&lt;br /&gt;
        print*, 'counter is:', ii, random_value, '&amp;gt;', threshold, 'stopping.'&lt;br /&gt;
        exit&lt;br /&gt;
     end if&lt;br /&gt;
     print*, 'counter is:', ii, random_value, '&amp;lt;', threshold&lt;br /&gt;
     ii = ii+1&lt;br /&gt;
  end do&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As before, compile it, run it and generally muck about.  These are only a few of the control structures provided to us by Fortran.  You'll find that you can do most things with these three, however.&lt;br /&gt;
&lt;br /&gt;
Before leaving this example, let's consider if tests containing an equals and floating point numbers.  Remember that there are an infinite set of real numbers and so a computer can only approximate them.  For example, '''how would a computer represent 10/3'''?  It has limited precision.  It follows therefore that we should be careful when we need to test whether a real number is equal some value, such as '''3.3''' (see the last section of the program).  A common way around this problem is to subtract the first real from&lt;br /&gt;
second and to compare the '''absolute''' value of the result to some small threshold (to account for rounding errors).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;fortran&amp;quot;&amp;gt;&lt;br /&gt;
  if ((abs(val-ref)) .lt. 0.0001) then&lt;br /&gt;
     ...&lt;br /&gt;
  end if&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Exercises'''&lt;br /&gt;
&lt;br /&gt;
* Write a select statement which prints out the names of the digits in the set [1,10] in different languages.  Write a loop to trigger each of these print statements.&lt;br /&gt;
* Write a nested loop: Write a pair of nested do loops which count between 1 and 3.  Print the values of the two counters in the inner-most loop.  Think about where your 'end do's should go.&lt;br /&gt;
* Write a nested if statement: For example, create character variables called 'vehicle', 'colour' and 'size'.  We could represent a small red car as; vehicle = 'c', colour = 'r' and size = 's'.  Arrange for your nested if to print 'eureka' given a big green train.  Think about where your 'end if' statements should go.&lt;br /&gt;
&lt;br /&gt;
=Not one, Many!=&lt;br /&gt;
&lt;br /&gt;
That was fun.  Back to thinking about variables for a moment:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cd ../example4&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Last time we declared just one thing of a given type.  Sometimes we're greedy!  Sometimes we want more!  To be fair, some things are naturally represented by a vector or a matrix.  Think of values on a grid, solutions to linear systems, points in space, transformations such as scaling or rotation of vectors.  For these sorts of things the kings and queens of Fortran gave us programmers '''arrays'''.  Take a look inside '''static_array.f90''': &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;fortran&amp;quot;&amp;gt;&lt;br /&gt;
  real, dimension(4)   :: tinyGrid = (/1.0, 2.0, 3.0, 4.0/)&lt;br /&gt;
  real, dimension(2,2) :: square = 0.0  ! 2 rows, 2 columns, init to all zeros&lt;br /&gt;
  real, dimension(3,2) :: rectangle     ! 3 rows, 2 columns, uninit&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The syntax here reads, &amp;quot;we'll have an one-dimensional array (i.e. vector) of 4 reals called tinyGrid, please, and we'll set the initial values of the cells in that array to be 1.0, 2.0, 3.0 and 4.0, respectively.&lt;br /&gt;
&lt;br /&gt;
For the second and third declarations, we're asking for two-dimensional arrays.  One with two rows and two colums, called ''square'', and one with three rows and two colums.  We're calling that ''rectangle''. &lt;br /&gt;
&lt;br /&gt;
The program then goes on to print out the contents of ''tinyGrid''.&lt;br /&gt;
&lt;br /&gt;
If we want to access a single element of an array, we can do so by specifying it's indices:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;fortran&amp;quot;&amp;gt;&lt;br /&gt;
  write (*,*) &amp;quot;square(1,2) is the top right corner:&amp;quot;, square(1,2)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Fortran90 provides a couple of handy '''intrinsic routines''' for determining the '''size''' (how many cells in total) and the '''shape''' (number of dimensions and the ''extent'' of each dimension) of an array.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;fortran&amp;quot;&amp;gt;&lt;br /&gt;
  write (*,*) &amp;quot;size(rectangle) gives total number of elements:&amp;quot;, size(rectangle)&lt;br /&gt;
  write (*,*) &amp;quot;shape(rectangle) gives rank and extent:&amp;quot;, shape(rectangle)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Fortran90 also allows us to '''reshape''' an array on-the-fly.  Using this intrinsic, we can copy the values from ''tinyGrid'' into ''square''.  Neat.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;fortran&amp;quot;&amp;gt;&lt;br /&gt;
  square = reshape(tinyGrid,(/2,2/))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Fortran also provides us with a rather rich set of operators (+, -, *, / etc.) for array-valued variables.  Have a go at playing with these.  If you know some linear algebra, you're going to have a great time with this example!&lt;br /&gt;
&lt;br /&gt;
One thing to bear in mind when we consider 2D arrays is that Fortran stores them in memory as a 1D array and 'unwraps' them according to '''column-major order''':&lt;br /&gt;
&lt;br /&gt;
[[Image:columnMajor.jpg|300px|thumbnail|centre|a 2D array 'unwrapped' into a 1D array using column-major order.]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Exercises'''&lt;br /&gt;
&lt;br /&gt;
* Create 3x3 grid to store the outcome of a game of noughts-and-crosses (tic-tac-toe), populate it and print the grid to screen.&lt;br /&gt;
* Fortran allows you to '''slice''' arrays.  For example the second column of the 2d-array 'a' is a(:,2).  Print the third row from the grid above.&lt;br /&gt;
* Add two vectors together.  Is this algebraically correct?  What about adding two matrices?&lt;br /&gt;
* Create two 2-d arrays.  Populate one randomly and create another as a mask.  Combine the two matrices and print the result. &lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
The static malarky is because Fortran90 also allows us to say we want an array, but we don't know how big we want it to be yet. &amp;quot;We'll decide that at run-time&amp;quot;, we programmers say.  This can be handy if you're reading in some data, say a pay-roll, and you don't know how many employees you'll have from one year to the next.  Fortran90 calls these '''allocatable arrays''' and we'll meet them in ''Fortran2''.&lt;br /&gt;
&lt;br /&gt;
=If Things get Hectic, Outsource!=&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
cd ../example5&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now, as we get more ambitious, the size of our program grows.  Before long, it can get unwieldy.  Also we may find that we repeat ourselves.  We do the same thing twice, three times.  Heck, many times!  Now is the time to start breaking your program into chunks, re-using some from time-to-time, making it more manageable.  Fortran gives us two routes to chunk-ification, '''functions''' and '''subroutines'''.  &lt;br /&gt;
&lt;br /&gt;
Let's deal with subroutines first.  In '''procedures.f90''', scroll down a bit and you can see:   &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;fortran&amp;quot;&amp;gt;&lt;br /&gt;
subroutine mirror(len,inArray,outArray)&lt;br /&gt;
&lt;br /&gt;
  implicit none&lt;br /&gt;
&lt;br /&gt;
  ! dummy variables&lt;br /&gt;
  integer,                 intent(in)  :: len&lt;br /&gt;
  character,dimension(len),intent(in)  :: inArray&lt;br /&gt;
  character,dimension(len),intent(out) :: outArray&lt;br /&gt;
&lt;br /&gt;
  ! local variables&lt;br /&gt;
  integer :: ii&lt;br /&gt;
  integer :: lucky = 3  ! notice scope of this identically named variable&lt;br /&gt;
&lt;br /&gt;
  do ii=1,len&lt;br /&gt;
     outArray(len+1-ii) = inArray(ii)&lt;br /&gt;
  end do&lt;br /&gt;
&lt;br /&gt;
  write (*,*) &amp;quot;'lucky' _inside_ subroutine is:&amp;quot;, lucky&lt;br /&gt;
&lt;br /&gt;
end subroutine mirror&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now, note that this is '''outside''' of the main program unit.  (In principle, we could hive this off into another source code file, but we'll leave that discussion until ''Fortran2''.)  Notice also that we have a shiney '''implicit none''' resplendent at the top of the subroutine.  Overall, it looks pretty similar to how a main program unit might look, but with the addition of '''arguments'''.  Those fellows in the parentheses after the subroutine name.  The declaration part also lists those arguments and we've commented that this are so-called '''dummy variables'''.  We also see an attribute that we've not seen before, called '''intent'''.  This is a very handy tool for defensive programming (remember aka ''bug avoidance'').  Using ''intent'' we can say that the integer ''len'' is an input and as such we're not going to try to change it.  Likewise for the charcter array ''inArray''.  It would be a compile-error if we did.  We also state that the character array ''outArray'' is an output and we're going to give it a new value ''come what may''!  We also have some variables that are '''local'''.  Interestingly enough, one of our local variables, the integer called ''lucky'', has exactly the same name as a variable in the main program unit.  When we run the program, however, we will see that the two do not interfere with each other.  This is down to their '''scope'''.  The scope of lucky in the main program is all and only the main program unit and the scope of lucky in the subroutine is all and only the subroutine.  We say that the main program unit and the subroutine units have different '''name spaces'''.&lt;br /&gt;
&lt;br /&gt;
Well, we've seen a lot of new syntax and concepts in all that.  Useful ones though.  This program is small and artificial, so it's hard to see the benefits just yet.  You will, however, as your programs grow.  The subroutine is '''called''' from the main program , funnily enough by a '''call''' statement.  Notice how the arguments passed to the subroutine in the call statement also have different names in the main program and the subroutine.  That's scope again.&lt;br /&gt;
&lt;br /&gt;
Functions similar yet different to subroutines.  Notice that we don't call them in the same way.  We still pass arguments, but the functions '''returns''' and value, and so we could place a function on the right-hand side (RHS) of an assignment.  Typically we would write a function for a smaller body of work.  We can potentially invoke it in a neater way from our main program, however. &lt;br /&gt;
&lt;br /&gt;
Looking at the body of the function:   &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;fortran&amp;quot;&amp;gt;&lt;br /&gt;
function isPrime(num)&lt;br /&gt;
  &lt;br /&gt;
  implicit none&lt;br /&gt;
  &lt;br /&gt;
  ! return and dummy variables&lt;br /&gt;
  logical             :: isPrime&lt;br /&gt;
  integer, intent(in) :: num&lt;br /&gt;
&lt;br /&gt;
  select case (num)&lt;br /&gt;
     case (2,3,5,7,11,13,17,19,23,29)&lt;br /&gt;
        isPrime = .true.&lt;br /&gt;
     case default&lt;br /&gt;
        isPrime = .false.&lt;br /&gt;
     end select&lt;br /&gt;
&lt;br /&gt;
end function isPrime&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
we can see that we have a declaration for a variable with the same name as the function.  The type of this variable is the '''return type''' of the function.  Indeed the value of this variable when we reach the bottom of the function is the value passed by to the calling routine.  '''Note that we can call functions and subroutines from other functions and subroutines etc'''. in a nested fashion.&lt;br /&gt;
&lt;br /&gt;
The last thing of note is the funky '''interface''' structure at the top of the main program:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;fortran&amp;quot;&amp;gt;&lt;br /&gt;
  interface&lt;br /&gt;
     function isPrime(num)&lt;br /&gt;
       logical :: isPrime&lt;br /&gt;
       integer, intent(in) :: num&lt;br /&gt;
     end function isPrime&lt;br /&gt;
  end interface&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Our function '''definition''' is outside of the main program and so is said to be '''external'''.  The main program unit needs to know about it, however, and an interface structure is a good way to do this as it prompts Fortran to check that all the arguments match up between the call and the definition.  It's the way to do it..for now.  (We'll meet Fortran90 modules in ''Fortran2'', which will give us a neater way to perform the same checks.)&lt;br /&gt;
&lt;br /&gt;
Try running the program and writing some functions and subroutines of your own.&lt;br /&gt;
&lt;br /&gt;
'''Exercises'''&lt;br /&gt;
&lt;br /&gt;
* Write a simple error handling subroutine that '''stop'''s the program after printing a message, which is passed in as an argument.&lt;br /&gt;
* Does the surface area of a circle increase linearly with an increase in radius?  How about circumference?  Write some functions and a loop to investigate.&lt;br /&gt;
* What happens if you write a function which calls itself?&lt;br /&gt;
&lt;br /&gt;
=Input and output=&lt;br /&gt;
&lt;br /&gt;
OK, say we want some '''permenance'''?  Perhaps we want to record the outputs of our program for some time, or we want to read the same values into a program each time that we run it.  Storing data in files is the way to go.&lt;br /&gt;
&lt;br /&gt;
==File i/o ==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cd ../example6&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In this example we'll see how to read and write data to and from files.  The code for this example in contained in '''file_access.f90'''.&lt;br /&gt;
&lt;br /&gt;
Before we can either read from- or write to- a file, we must first open it.  Funnily enough, we can do this using the '''open''' statement, e.g.: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;fortran&amp;quot;&amp;gt;&lt;br /&gt;
open(unit=19,file=&amp;quot;output.txt&amp;quot;,form='formatted',status='old',iostat=ios)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* '''unit''' is a positive integer given to the file so that we can refer to it later.  Several numbers are reserved, however, so we must avoid 5 (keyboard), 6 (screen), 101 (stdout) &amp;amp; 102 (punch card!).&lt;br /&gt;
* '''file''' is a character string (can be 'literal') containing the name of the file to open.&lt;br /&gt;
* '''form''' is the file format--'formatted' (text file) or 'unformatted' (binary file).&lt;br /&gt;
* '''status''' specifies the behaviour if the file exists:&lt;br /&gt;
*# '''old''' the file must exists&lt;br /&gt;
*# '''new''' the file cannot exists prior to being opened&lt;br /&gt;
*# '''replace''' the old file will be overwritten&lt;br /&gt;
* '''iostat''' is a non-zero integer in case of an error, e.g. the file cannot be opened for instance.&lt;br /&gt;
&lt;br /&gt;
When you have finished with a file, you must '''close''' it:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;fortran&amp;quot;&amp;gt;&lt;br /&gt;
close(19)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that if you need to go back to the beginning of a file, you don't have to close it and open it again, you can '''rewind''' it:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;fortran&amp;quot;&amp;gt;&lt;br /&gt;
rewind(19)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To '''write''' to a file, the syntax is, e.g.:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;fortran&amp;quot;&amp;gt;&lt;br /&gt;
write(unit=19,fmt=*) array1, array2&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where,&lt;br /&gt;
* '''array1''' &amp;amp; '''array2''' are variables we wish to write to the file.&lt;br /&gt;
* '''unit''' is the unit number of the file we want write to. &lt;br /&gt;
* '''fmt''' is a '''format string'''.  We have chosen '*', which gives us default settings.  However, we can gain greater control by specifying in detail what we will be writing and how we would like it formatted.  Take a look in the example '''file_access.f90''' for examples. &lt;br /&gt;
&lt;br /&gt;
Similarly, we use a '''read''' statement to extract data from a file, e.g.: &lt;br /&gt;
&amp;lt;source lang=&amp;quot;fortran&amp;quot;&amp;gt;&lt;br /&gt;
read(unit=19,fmt=*,iostat=ios) var1, var2&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
Note is that it is very important to use the '''iostat''' attribute to make sure that we handle any errors, and don't press on regardless into oblivion! e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;fortran&amp;quot;&amp;gt;&lt;br /&gt;
  if (ios /= 0) then&lt;br /&gt;
     print*,'ERROR: could not open file'&lt;br /&gt;
     stop&lt;br /&gt;
  end if&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Among other things, the program '''file_access.exe''' writes the same data to; (i) a text file; and (ii) a binary file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;fortran&amp;quot;&amp;gt;&lt;br /&gt;
  open(unit=56,file='output.txt',status='replace',iostat=ios,form='formatted')&lt;br /&gt;
  if (ios /= 0) then&lt;br /&gt;
     print*,'ERROR: could not open output.txt'&lt;br /&gt;
     stop&lt;br /&gt;
  end if&lt;br /&gt;
&lt;br /&gt;
  open(unit=57,file='output.bin',status='replace',iostat=ios,form='unformatted')&lt;br /&gt;
  if (ios /= 0) then&lt;br /&gt;
     print*,'ERROR: could not open output.bin'&lt;br /&gt;
     stop&lt;br /&gt;
  end if&lt;br /&gt;
&lt;br /&gt;
  ! write size1 and size2 to output files (and then compare sizes)&lt;br /&gt;
  write(57) array1&lt;br /&gt;
  write(57) array2&lt;br /&gt;
  close(57)&lt;br /&gt;
&lt;br /&gt;
  write(56,*) array1&lt;br /&gt;
  write(56,*) array2&lt;br /&gt;
  close(56)  &lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you do a long listing('ls -l'), you'll see the size difference:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-rw-r--r-- 1 fred users 220032 Feb  8 11:01 output.bin&lt;br /&gt;
-rw-r--r-- 1 fred users 825002 Feb  8 11:01 output.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Using character strings as file names==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cd ../example7&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''filename.f90''' in &amp;lt;tt&amp;gt;example7&amp;lt;/tt&amp;gt; shows a little trick for people who output a lot of data and need to manipulate a lot of files. It is possible to use a write statement to output a number (or anything else) to a character string and to subsequently use that string as a filename to open, close and output to many files on the fly:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;fortran&amp;quot;&amp;gt;&lt;br /&gt;
  character(len=8) :: filename&lt;br /&gt;
  integer          :: ii, ios&lt;br /&gt;
&lt;br /&gt;
  ! define a format to write a filename&lt;br /&gt;
  10 format('output',i2.2)&lt;br /&gt;
&lt;br /&gt;
  do ii=1,20&lt;br /&gt;
    write (unit=filename,fmt=10) ii&lt;br /&gt;
    open(20,file=filename,status='replace',form='formatted',iostat=ios)&lt;br /&gt;
    ...&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Namelists==&lt;br /&gt;
And all of a sudden, we're at our last example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cd ../example8&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In '''namelists.f90''', we'll look at another approach to file input and output.  Fortran provides a way of grouping variables together into a set, called a '''namelist''', that are input or output from our program ''en masse''.  This is a common situation.  For example, reading in the parameters into a model.  The statement:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;fortran&amp;quot;&amp;gt;&lt;br /&gt;
namelist /my_bundle/ numBooks,initialised,name,vec&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
sets it up.&lt;br /&gt;
&lt;br /&gt;
Fortran further provides us with built-in mechanisms for reading or writing a namelist to- or from a file.&lt;br /&gt;
&lt;br /&gt;
First, we must open the file of course:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;fortran&amp;quot;&amp;gt;&lt;br /&gt;
  open(unit=56,file='input.nml',status='old',iostat=ios)&lt;br /&gt;
  if (ios /= 0) then&lt;br /&gt;
     print*,'ERROR: could not open namelist file'&lt;br /&gt;
     stop&lt;br /&gt;
  end if&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Note that we made sure to tell Fortran that this is an '''old''' file, i.e. that it already exists, and to check the error code.  I the case that the '''open''' operation failed, we've asked the program to halt with an error.&lt;br /&gt;
&lt;br /&gt;
Now, assuming that we've opened the file OK, we proceed to read its contents.  Fortran makes this rather easy for us, given that the information is contained in a namelist:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;fortran&amp;quot;&amp;gt;&lt;br /&gt;
  read(UNIT=56,NML=my_bundle,IOSTAT=ios)&lt;br /&gt;
  if (ios /= 0) then&lt;br /&gt;
     print*,'ERROR: could not read example namelist'&lt;br /&gt;
     stop&lt;br /&gt;
  else&lt;br /&gt;
     close(56)&lt;br /&gt;
  end if&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the '''read''' statement, we told fortran that we wanted to read a namelist and that the group of variables should be flagged in the file as '''my_bundle'''.  Again we've checked the error status and decided to halt with an error should the read statement fail for any reason.  Take a look at the contents of '''input.nml'''.  See that it is ascii text and that the variables (with their values) do not need to be listed in the same order in the file as they are in the program:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;amp;my_bundle&lt;br /&gt;
numBooks=6&lt;br /&gt;
vec=1.0 2.0 3.0 4.0&lt;br /&gt;
initialised=.false.&lt;br /&gt;
name='romeo'&lt;br /&gt;
&amp;amp;end&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The program procedes to print the values to screen, demonstrating that they have indeed come from the file, to assign new values to the variables and then to write the modified values to a new file, called '''modified.nml'''.  Compare the two ascii text files.  Try running the program a second time and you will receive an error, telling you that the output could not be written since a file called modified.nml exists and that we expressly stated that is was to be a '''new''' file.  Delete modified.nml, try again and it will succeed.  Try changing the namelist, values, filenames etc.  Go for it, make a mess!  You'll learn a lot from it:)&lt;br /&gt;
&lt;br /&gt;
'''Exercises'''&lt;br /&gt;
&lt;br /&gt;
* Write some code to read in a colour (red, blue, green etc.) from a namelist and then print out the names of all the [http://en.wikipedia.org/wiki/Railway_engines_(Thomas_and_Friends) railway engines] (thomas, percy, gordon etc.) that match.&lt;br /&gt;
* Read in the dimensions of an ellipse from file and print out it's area.&lt;br /&gt;
&lt;br /&gt;
= To go further = &lt;br /&gt;
The  [[:category:Pragmatic Programming | Pragmatic Programming]] course continues with [[Linux2]], a look at some of the more advanced but very useful Linux concepts.&lt;br /&gt;
&lt;br /&gt;
Now that you are getting familiar with Fortran, go a bit further by reading [[Fortran2]].&lt;br /&gt;
&lt;br /&gt;
A useful Fortran textbook is called '''Fortran 90 Programming''' by Ellis, Philips &amp;amp; Lahey.  Take a look at the [[A_Good_Read|'A Good Read?']] page for more details.&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=R1&amp;diff=9408</id>
		<title>R1</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=R1&amp;diff=9408"/>
		<updated>2014-03-03T12:08:54Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: /* Suggested Exercises */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[category:Pragmatic Programming]]&lt;br /&gt;
'''Open Source Statistics with R'''&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
R is a mature, open-source (i.e. free!) statistics package, with an intuitive interface, excellent graphics and a vibrant community constantly adding new methods for the statistical investigation of your data to the library of packages available.&lt;br /&gt;
&lt;br /&gt;
The goal of this tutorial is to introduce you to the R package, and not to be an introductory course in statistics.&lt;br /&gt;
&lt;br /&gt;
If you are working on a Linux system, you will typically start R from the command line.  On a Windows machine, or a Mac, you will typically start up R in some form of GUI.  However you get R started, you will have access to an R command prompt.  The good news is that the examples below will all work at the R command prompt, however you gained access to it.&lt;br /&gt;
&lt;br /&gt;
Further resources:&lt;br /&gt;
&lt;br /&gt;
* The R manual is a great resource for learning R: http://cran.r-project.org/doc/manuals/r-release/R-intro.pdf&lt;br /&gt;
* Some excellent examples of using R can also be found at: http://msenux.redwoods.edu/math/R/ and http://www.r-tutor.com/&lt;br /&gt;
&lt;br /&gt;
=Getting Started=&lt;br /&gt;
&lt;br /&gt;
The very simplest thing we can do with R is to perform some arithmetic at the command prompt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; phi &amp;lt;- (1+sqrt(5))/2&lt;br /&gt;
&amp;gt; phi&lt;br /&gt;
[1] 1.618034&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Parentheses are used to modify the usual order of precedence of the operators ('''/''' will typically be evaluated before '''+''').  Note the '''[1]''' accompanying the returned value.  All numbers entered at the console are interpreted as a vector.  The '[1]' indicates that the line in question is displaying the vector of values starting at first index.  We can use the handy sequence function to create a vector containing more than a single element:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; odds &amp;lt;- seq(from=1, to=67, by=2)&lt;br /&gt;
&amp;gt; odds&lt;br /&gt;
 [1]  1  3  5  7  9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49&lt;br /&gt;
[26] 51 53 55 57 59 61 63 65 67&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
From the above example, we can see that both the '''&amp;lt;-''' and '''=''' operators can be used for assignment.&lt;br /&gt;
&lt;br /&gt;
Vectors are commonly used data structures in R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
coords.bris &amp;lt;- c(51.5, 2.6)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As are matrices:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; magic &amp;lt;- matrix(data=c(2,7,6,9,5,1,4,3,8),nrow=3,ncol=3)&lt;br /&gt;
&amp;gt; magic&lt;br /&gt;
     [,1] [,2] [,3]&lt;br /&gt;
[1,]    2    9    4&lt;br /&gt;
[2,]    7    5    3&lt;br /&gt;
[3,]    6    1    8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the '''c''' function combines the arguments given in the parentheses.  We can access portions of the array using the syntax shown in the square brackets.  For example, we can access the first row using the '''[1,]''' notation, and similarly the second column using '''[,2]'''.  Since the square is 3x3 magic, the numbers in both slices should sum to 15:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; sum(magic[1,])&lt;br /&gt;
[1] 15&lt;br /&gt;
&amp;gt; sum(magic[,2])&lt;br /&gt;
[1] 15&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Single elements and ranges can also accessed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; magic[2,2]&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;gt; magic[2:3,2:3]&lt;br /&gt;
     [,1] [,2]&lt;br /&gt;
[1,]    5    3&lt;br /&gt;
[2,]    1    8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
R also provides '''arrays''', which have more than two dimensions, and '''lists''' to hold heterogeneous collections.&lt;br /&gt;
&lt;br /&gt;
An example list:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; list.r4 &amp;lt;- list(name=&amp;quot;Radio4&amp;quot;, frequency=&amp;quot;93.7&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The items of which, we can access in several ways:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; list.r4$frequency&lt;br /&gt;
[1] &amp;quot;93.7&amp;quot;&lt;br /&gt;
&amp;gt; list.r4[1]&lt;br /&gt;
$name&lt;br /&gt;
[1] &amp;quot;Radio4&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;gt; list.r4[[1]]&lt;br /&gt;
[1] &amp;quot;Radio4&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A very commonly used data structure is the '''data frame''', which R uses to store tabular data.  Given several vectors of equal length, we can collate them into a data frame:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; country &amp;lt;- c(&amp;quot;USA&amp;quot;, &amp;quot;China&amp;quot;, &amp;quot;GB&amp;quot;)&lt;br /&gt;
&amp;gt; gold &amp;lt;- c(46, 38, 29)&lt;br /&gt;
&amp;gt; silver &amp;lt;- c(29, 27, 17)&lt;br /&gt;
&amp;gt; bronze &amp;lt;- c(29, 23, 19)&lt;br /&gt;
&amp;gt; medals.2012 &amp;lt;- data.frame(country, gold, silver, bronze)&lt;br /&gt;
&amp;gt; medals.2012&lt;br /&gt;
    country  gold  silver  bronze&lt;br /&gt;
1       USA    46      29      29&lt;br /&gt;
2     China    38      27      23&lt;br /&gt;
3        GB    29      17      19&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can access columns of a data frame using the '''$''' operator:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; medals.2012$country&lt;br /&gt;
[1] USA   China GB   &lt;br /&gt;
Levels: China GB USA&lt;br /&gt;
&amp;gt; medals.2012$gold&lt;br /&gt;
[1] 46 38 29&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Standard Graphics: A taster=&lt;br /&gt;
&lt;br /&gt;
An aspect which makes R popular are it's graphing functions.  R also has some very handy built-in data sets--we'll use this to demonstrate just a small fraction of R's graphing abilities.&lt;br /&gt;
&lt;br /&gt;
First up is the humble '''plot()''' function.  Given a data frame of points, such as one charting the relationship between temperature and the vapour pressure of mercury, it will give us a (handily labelled) scatter plot: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(pressure)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See the gallery below for all the plots created in this section.&lt;br /&gt;
&lt;br /&gt;
The plot function will also accept a time-series (another class of object recognised by R) and will sensibly join the points with a line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(co2)&lt;br /&gt;
&amp;gt; class(co2)&lt;br /&gt;
[1] &amp;quot;ts&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Pie charts are easily constructed.  In this case, to show the relative proportions of electricity generated from different sources in the UK in 2011 (source: https://www.gov.uk/government/.../5942-uk-energy-in-brief-2012.pdf‎):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; uk.electricty.sources.2011 &amp;lt;- c(41,29,18,5,4,2,1)&lt;br /&gt;
&amp;gt; names(uk.electricty.sources.2011) &amp;lt;- (&amp;quot;Gas&amp;quot;, &amp;quot;Coal&amp;quot;, &amp;quot;Nuclear&amp;quot;, &amp;quot;Hydro &amp;amp; other&amp;quot;, &amp;quot;Wind&amp;quot;, &amp;quot;Imports&amp;quot;, &amp;quot;Oil&amp;quot;)&lt;br /&gt;
&amp;gt; pie(uk.electricty.sources.2011, main=&amp;quot;UK Electricty Generating Mix, 2011&amp;quot;, col=rainbow(7))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Next, let's create a bar chart of monthly average precipitation falling here in the fair city of Bristol (source: http://www.worldweatheronline.com):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; bristol.precip &amp;lt;- c(82.9, 56.1, 59.2, 69, 50.8, 50.9, 50.8, 74.8, 74.7, 91.1, 94.5, 93.6)&lt;br /&gt;
&amp;gt; names(bristol.precip) &amp;lt;- c(&amp;quot;Jan&amp;quot;, &amp;quot;Feb&amp;quot;, &amp;quot;Mar&amp;quot;, &amp;quot;Apr&amp;quot;, &amp;quot;May&amp;quot;, &amp;quot;Jun&amp;quot;, &amp;quot;Jul&amp;quot;, &amp;quot;Aug&amp;quot;, &amp;quot;Sep&amp;quot;, &amp;quot;Oct&amp;quot;, &amp;quot;Nov&amp;quot;, &amp;quot;Dec&amp;quot;)&lt;br /&gt;
&amp;gt; barplot(bristol.precip,&lt;br /&gt;
+ main=&amp;quot;Average Monthly Precipitation in Bristol&amp;quot;,&lt;br /&gt;
+ ylab=&amp;quot;Mean precipitation (mm)&amp;quot;,&lt;br /&gt;
+ ylim=c(0,100),&lt;br /&gt;
+ col=c(&amp;quot;darkblue&amp;quot;))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Box_plot 'Box and whisker' plots] are useful ways to graph the quartiles of some data.  In this case, the fuel efficiencies of various US cars, circa 1974:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; boxplot(mpg~cyl,data=mtcars, main=&amp;quot;Car Milage Data&amp;quot;,&lt;br /&gt;
+    xlab=&amp;quot;Number of Cylinders&amp;quot;, ylab=&amp;quot;Miles Per Gallon&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
R includes a very useful help facility.  In the case of the '''filled.contour()''' plotting function, the help page includes an example of it's use to plot the topology of a volcano in Auckland, NZ:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; ?filled.countour&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;gallery widths=300px heights=300px perrow=3&amp;gt;&lt;br /&gt;
File:Vapour-pressure.png|Vapour pressure of mercury against temperature&lt;br /&gt;
File:Mauna-loa.png|CO2 concentrations measured at Mauna-Loa between 1959 and 1997&lt;br /&gt;
File:Pie.png|The UK's electricity generating mix, 2011&lt;br /&gt;
File:Barplot.png|Average monthly precipitation in Bristol&lt;br /&gt;
File:Boxplot.png|Range of fuel efficiencies for different engine sizes &lt;br /&gt;
File:Maunga-Whau.png|Topology of Maunga Whau volcano in Auckland&lt;br /&gt;
&amp;lt;/gallery&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are many more example plots--complete with the R code required to create the plots (at the bottom of the page, after the comments)--on the following web page:&lt;br /&gt;
* http://gallery.r-enthusiasts.com/thumbs.php&lt;br /&gt;
&lt;br /&gt;
=Loops=&lt;br /&gt;
&lt;br /&gt;
A simple '''for''' loop:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; for (ii in seq(1,10)) print(ii)&lt;br /&gt;
[1] 1&lt;br /&gt;
[1] 2&lt;br /&gt;
[1] 3&lt;br /&gt;
[1] 4&lt;br /&gt;
[1] 5&lt;br /&gt;
[1] 6&lt;br /&gt;
[1] 7&lt;br /&gt;
[1] 8&lt;br /&gt;
[1] 9&lt;br /&gt;
[1] 10&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Some more exotic counting:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; for (ii in seq(from=10, to=0, by=-2)) print(ii)&lt;br /&gt;
[1] 10&lt;br /&gt;
[1] 8&lt;br /&gt;
[1] 6&lt;br /&gt;
[1] 4&lt;br /&gt;
[1] 2&lt;br /&gt;
[1] 0&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''while''' loops are for when we don't know the number of iterations in advance:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; ii &amp;lt;- runif(1,0,1)&lt;br /&gt;
&amp;gt; ii&lt;br /&gt;
[1] 0.3998513&lt;br /&gt;
&amp;gt; while (ii &amp;lt; 0.5) {print(ii); ii &amp;lt;- runif(1,0,1)}&lt;br /&gt;
[1] 0.3998513&lt;br /&gt;
[1] 0.05469244&lt;br /&gt;
&amp;gt; ii&lt;br /&gt;
[1] 0.8265036&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Functions=&lt;br /&gt;
&lt;br /&gt;
You can define your own functions in R, using the '''function''' keyword.  For example, Pythagoras' Theorem:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypotenuse &amp;lt;- function(x, y) {sqrt(x^2 + y^2)}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The braces ({}) are optional, but add clarity.&lt;br /&gt;
&lt;br /&gt;
To call the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypotenuse(3,4)&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can provide default values for the arguments, which can be overridden for any given invocation of the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot2 &amp;lt;- function(x=3 ,y=4) {sqrt(x^2 + y^2)}&lt;br /&gt;
&amp;gt; hypot2()&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;gt; hypot2(12,16)&lt;br /&gt;
[1] 20&lt;br /&gt;
&amp;gt; hypot2(y=16, x=12)&lt;br /&gt;
[1] 20&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can see that the order of the arguments is respected, unless the names are given, in which case the order can be changed. &lt;br /&gt;
&lt;br /&gt;
Longer functions can be spread over several lines.  We can also use the '''return''' keyword to control which value is returned by the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot3 &amp;lt;- function(x=3 ,y=4) {&lt;br /&gt;
+ x_sq &amp;lt;- x^2&lt;br /&gt;
+ y_sq &amp;lt;- y^2&lt;br /&gt;
+ return( sqrt(x_sq + y_sq) )}&lt;br /&gt;
&amp;gt; hypot3(6,8)&lt;br /&gt;
[1] 10&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can check on the contents of a function, by just typing it's name (without parentheses):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot3&lt;br /&gt;
function(x=3 ,y=4) {&lt;br /&gt;
x_sq &amp;lt;- x^2&lt;br /&gt;
y_sq &amp;lt;- y^2&lt;br /&gt;
return( sqrt(x_sq + y_sq) )}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or just check the arguments, using the '''args''' function.  (The body of the function in general is reported as NULL):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; args(hypot3)&lt;br /&gt;
function (x = 3, y = 4) &lt;br /&gt;
NULL&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Packages=&lt;br /&gt;
&lt;br /&gt;
Listed at http://cran.r-project.org/&lt;br /&gt;
&lt;br /&gt;
Let's install the '''multicore''' package, that will give us access to functions within R which will run on the multiple processors which we often find in our computers these days:   &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; install.packages(&amp;quot;multicore&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Et voila!  It is done.&lt;br /&gt;
&lt;br /&gt;
We can check which packages are currently loaded into the library available from our workspace:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; library()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If we need to add one, we type e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; library(multicore)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now, an example of using a function from the multicore package.  The '''lapply''' function, which is included in the standard R core, will map a given function over a list inputs, giving a list of the function outputs in return.  For example, we can map a squaring function over the list of integers from 1 to 3:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; lapply(1:3, function(x) {x^2})&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
which gives us the list:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[[1]]&lt;br /&gt;
[1] 1&lt;br /&gt;
&lt;br /&gt;
[[2]]&lt;br /&gt;
[1] 4&lt;br /&gt;
&lt;br /&gt;
[[3]]&lt;br /&gt;
[1] 9&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now, we can do the same work in parallel using:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; mclapply(1:3, function(x) {x^2})&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Reading Data from File=&lt;br /&gt;
&lt;br /&gt;
R provides some very useful functions for reading and writing data from/to file.&lt;br /&gt;
&lt;br /&gt;
==Text Files==&lt;br /&gt;
&lt;br /&gt;
Let's start with text files.  If your data is organised into a file such that it looks like a table with column headings:  &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Perhaps the simplest one is '''read.table()'''.  If I have a text file with the following contents:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
country              gold silver bronze&lt;br /&gt;
&amp;quot;USA&amp;quot;                46   29     29&lt;br /&gt;
&amp;quot;China&amp;quot;              38   27     23&lt;br /&gt;
&amp;quot;Great Britain&amp;quot;      29   17     19&lt;br /&gt;
&amp;quot;Russian Federation&amp;quot; 24   26     32&lt;br /&gt;
&amp;quot;Republic of Korea&amp;quot;  13   8      7&lt;br /&gt;
&amp;quot;Germany&amp;quot;            11   19     14&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It will be a simple matter to use the '''read.table()''' function to load the data into R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; medals.2012 &amp;lt;- read.table(&amp;quot;medals.txt&amp;quot;, header=TRUE)&lt;br /&gt;
&amp;gt; medals.2012&lt;br /&gt;
             country gold silver bronze&lt;br /&gt;
1                USA   46     29     29&lt;br /&gt;
2              China   38     27     23&lt;br /&gt;
3      Great Britain   29     17     19&lt;br /&gt;
4 Russian Federation   24     26     32&lt;br /&gt;
5  Republic of Korea   13      8      7&lt;br /&gt;
6            Germany   11     19     14&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is a corresponding '''write.table()''' function to export the contents of a data frame into a text file.&lt;br /&gt;
&lt;br /&gt;
CSV files can be easily handled by specifying '''sep=&amp;quot;,&amp;quot;''' as an argument to read.table().  However, for convenience, there are also '''read.csv()''' and '''write.csv()''' functions defined.  For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; write.csv(medals.2012,&amp;quot;medals.csv&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Gives us the file, '''medals.csv''', with the contents:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;quot;&amp;quot;,&amp;quot;country&amp;quot;,&amp;quot;gold&amp;quot;,&amp;quot;silver&amp;quot;,&amp;quot;bronze&amp;quot;&lt;br /&gt;
&amp;quot;1&amp;quot;,&amp;quot;USA&amp;quot;,46,29,29&lt;br /&gt;
&amp;quot;2&amp;quot;,&amp;quot;China&amp;quot;,38,27,23&lt;br /&gt;
&amp;quot;3&amp;quot;,&amp;quot;Great Britain&amp;quot;,29,17,19&lt;br /&gt;
&amp;quot;4&amp;quot;,&amp;quot;Russian Federation&amp;quot;,24,26,32&lt;br /&gt;
&amp;quot;5&amp;quot;,&amp;quot;Republic of Korea&amp;quot;,13,8,7&lt;br /&gt;
&amp;quot;6&amp;quot;,&amp;quot;Germany&amp;quot;,11,19,14&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Binary Files==&lt;br /&gt;
&lt;br /&gt;
The '''save()''' function will store an R data structure in binary form:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; save(medals.2012,file=&amp;quot;medals.RData&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~$ file medals.RData &lt;br /&gt;
medals.RData: gzip compressed data, from Unix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is, of course, a corresponding function to load such data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; load(&amp;quot;medals.RData&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Databases==&lt;br /&gt;
&lt;br /&gt;
If you would like to read and write data directly from/to a database, there are several packages to help you.  See http://cran.r-project.org/doc/manuals/r-release/R-data.html#Relational-databases for more information.&lt;br /&gt;
&lt;br /&gt;
==NetCDF==&lt;br /&gt;
&lt;br /&gt;
The [http://cran.r-project.org/web/packages/ncdf/index.html '''ncdf''' package] provides an interface to NetCDF files.  Before installing the package, you will need the Unidata NetCDF libraries installed on your system.  On Linux, the standard package managers conveniently provide this.  Note that you will need the 'development' packages.  Once the prerequisites are satisfied, you can use the standard R command to install the package from CRAN:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; install.packages(&amp;quot;ncdf&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Examples of Common Tasks=&lt;br /&gt;
&lt;br /&gt;
==Preparing Data==&lt;br /&gt;
&lt;br /&gt;
===Sorting===&lt;br /&gt;
&lt;br /&gt;
Using '''sort''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; railway.engines &amp;lt;- c(&amp;quot;thomas&amp;quot;, &amp;quot;henry&amp;quot;, &amp;quot;gordon&amp;quot;, &amp;quot;edward&amp;quot;, &amp;quot;james&amp;quot;)&lt;br /&gt;
&amp;gt; sort(railway.engines)&lt;br /&gt;
[1] &amp;quot;edward&amp;quot; &amp;quot;gordon&amp;quot; &amp;quot;henry&amp;quot;  &amp;quot;james&amp;quot;  &amp;quot;thomas&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/sort.html&lt;br /&gt;
&lt;br /&gt;
===Random Sampling===&lt;br /&gt;
&lt;br /&gt;
Using '''sample''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; railway.engines &amp;lt;- c(&amp;quot;thomas&amp;quot;, &amp;quot;henry&amp;quot;, &amp;quot;gordon&amp;quot;, &amp;quot;edward&amp;quot;, &amp;quot;james&amp;quot;)&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;gordon&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;james&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;edward&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;thomas&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;gordon&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;james&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/sample.html&lt;br /&gt;
&lt;br /&gt;
===Combining===&lt;br /&gt;
&lt;br /&gt;
Using '''rbind''' to add combine the rows to two data frames:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; country &amp;lt;- c(&amp;quot;France&amp;quot;, &amp;quot;Italy&amp;quot;, &amp;quot;Hungary&amp;quot;, &amp;quot;Australia&amp;quot;)&lt;br /&gt;
&amp;gt; gold &amp;lt;- c(11, 8, 8, 7)&lt;br /&gt;
&amp;gt; silver &amp;lt;- c(11, 9, 4, 16)&lt;br /&gt;
&amp;gt; bronze &amp;lt;- c(12, 11, 5, 12)&lt;br /&gt;
&amp;gt; extras.2012 &amp;lt;- data.frame(country, gold, silver, bronze)&lt;br /&gt;
&amp;gt; rbind(medals.2012, extras.2012)&lt;br /&gt;
              country gold silver bronze&lt;br /&gt;
1                 USA   46     29     29&lt;br /&gt;
2               China   38     27     23&lt;br /&gt;
3       Great Britain   29     17     19&lt;br /&gt;
4  Russian Federation   24     26     32&lt;br /&gt;
5   Republic of Korea   13      8      7&lt;br /&gt;
6             Germany   11     19     14&lt;br /&gt;
7              France   11     11     12&lt;br /&gt;
8               Italy    8      9     11&lt;br /&gt;
9             Hungary    8      4      5&lt;br /&gt;
10          Australia    7     16     12&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/cbind.html&lt;br /&gt;
&lt;br /&gt;
===Binning Data===&lt;br /&gt;
&lt;br /&gt;
Using '''cut''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; girls_2=c(83.8, 86.2, 85.1, 88.6, 83, 88.9, 89.7, 81.3, 88.7, 88.4)&lt;br /&gt;
&amp;gt; bins=cut(girls_2, breaks=3)&lt;br /&gt;
&amp;gt; bins&lt;br /&gt;
 [1] (81.3,84.1] (84.1,86.9] (84.1,86.9] (86.9,89.7] (81.3,84.1] (86.9,89.7]&lt;br /&gt;
 [7] (86.9,89.7] (81.3,84.1] (86.9,89.7] (86.9,89.7]&lt;br /&gt;
Levels: (81.3,84.1] (84.1,86.9] (86.9,89.7]&lt;br /&gt;
&amp;gt; plot(bins)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Plotting the data couldn't be simpler with '''plot(bins)'''!&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/cut.html&lt;br /&gt;
&lt;br /&gt;
==Linear Regression==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(cars)&lt;br /&gt;
&amp;gt; res=lm(dist ~ speed, data=cars)&lt;br /&gt;
&amp;gt; abline(res)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Image:R-lm(cars)-abline.png|400px|thumbnail|center|linear regression of stopping distance against speed from the built-in data set, cars]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
'''Exercises'''&lt;br /&gt;
* You may wish to compare different methods of estimation.  From the MASS package, you can fit a line with the '''rlm''' and '''lqs'' funtions.  You can plot all the lines against the data using:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; abline(res.lm, lty=1)&lt;br /&gt;
&amp;gt; abline(res.rlm, lty=2)&lt;br /&gt;
&amp;gt; abline(res.lqs, lty=3)&lt;br /&gt;
&amp;gt; legend(x=5, y=100, legend=c(&amp;quot;lm&amp;quot;,&amp;quot;rlm&amp;quot;,&amp;quot;lqs&amp;quot;), lty=c(1,2,3))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-patched/library/MASS/html/rlm.html and http://stat.ethz.ch/R-manual/R-devel/RHOME/library/MASS/html/lqs.html.&lt;br /&gt;
&lt;br /&gt;
* Weighted least squares.  The '''lm''' function will accept a vector of weights, '''lm(... weights=...)'''.  If given, the function will optimise the line of best fit according a the equation of weighted least squares.  Experiment with different linear model fits, given different weighting vectors.  Some handy hints for creating a vector of weights:&lt;br /&gt;
** '''w1&amp;lt;-rep(0.1,50)''' will give you a vector, length 50, where each element has a value of 0.1.  W1[1]&amp;lt;-10 will give the first element of the vector a value of 10.&lt;br /&gt;
** '''w2&amp;lt;-seq(from=0.02, to=1.0, by=0.02)''' provides a vector containing a sequence of values from 0.02 to 1.0 in steps of 0.02 (handily, again 50 in total).&lt;br /&gt;
&lt;br /&gt;
==Significance Testing==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; boys_2=c(90.2, 91.4, 86.4, 87.6, 86.7, 88.1, 82.2, 83.8, 91, 87.4)&lt;br /&gt;
&amp;gt; girls_2=c(83.8, 86.2, 85.1, 88.6, 83, 88.9, 89.7, 81.3, 88.7, 88.4)&lt;br /&gt;
&amp;gt; res=var.test(boys_2,girls_2)&lt;br /&gt;
&amp;gt; res&lt;br /&gt;
&lt;br /&gt;
	F test to compare two variances&lt;br /&gt;
&lt;br /&gt;
data:  boys_2 and girls_2 &lt;br /&gt;
F = 1.0186, num df = 9, denom df = 9, p-value = 0.9786&lt;br /&gt;
alternative hypothesis: true ratio of variances is not equal to 1 &lt;br /&gt;
95 percent confidence interval:&lt;br /&gt;
 0.2529956 4.1007126 &lt;br /&gt;
sample estimates:&lt;br /&gt;
ratio of variances &lt;br /&gt;
          1.018559 &lt;br /&gt;
&amp;gt; res=t.test(boys_2, girls_2, var.equal=TRUE, paired=FALSE)&lt;br /&gt;
&amp;gt; res&lt;br /&gt;
&lt;br /&gt;
	Two Sample t-test&lt;br /&gt;
&lt;br /&gt;
data:  boys_2 and girls_2 &lt;br /&gt;
t = 0.8429, df = 18, p-value = 0.4103&lt;br /&gt;
alternative hypothesis: true difference in means is not equal to 0 &lt;br /&gt;
95 percent confidence interval:&lt;br /&gt;
 -1.656675  3.876675 &lt;br /&gt;
sample estimates:&lt;br /&gt;
mean of x mean of y &lt;br /&gt;
    87.48     86.3&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Classification==&lt;br /&gt;
&lt;br /&gt;
===k Nearest Neighbours===&lt;br /&gt;
&lt;br /&gt;
This famous (Fisher's or Anderson's) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa (s), versicolor (c), and virginica (v).&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-patched/library/datasets/html/iris.html&lt;br /&gt;
&lt;br /&gt;
k-nearest neighbour classification for test set from training set: For each row of the test set, the k nearest (in Euclidean distance) training set vectors are found, and the classification is decided by majority vote, with ties broken at random. If there are ties for the kth nearest vector, all candidates are included in the vote.&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/class/html/knn.html&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(class)&lt;br /&gt;
train &amp;lt;- rbind(iris3[1:25,,1], iris3[1:25,,2], iris3[1:25,,3])&lt;br /&gt;
test &amp;lt;- rbind(iris3[26:50,,1], iris3[26:50,,2], iris3[26:50,,3])&lt;br /&gt;
cl &amp;lt;- factor(c(rep(&amp;quot;s&amp;quot;,25), rep(&amp;quot;c&amp;quot;,25), rep(&amp;quot;v&amp;quot;,25)))&lt;br /&gt;
iris3.knn &amp;lt;- knn(train, test, cl, k = 3, prob=TRUE)&lt;br /&gt;
table(predicted=iris3.knn, actual=cl)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
How did we do?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
         actual&lt;br /&gt;
predicted  c  s  v&lt;br /&gt;
        c 23  0  3&lt;br /&gt;
        s  0 25  0&lt;br /&gt;
        v  2  0 22&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Classification Trees===&lt;br /&gt;
&lt;br /&gt;
The kyphosis data frame has 81 rows and 4 columns. representing data on children who have had corrective spinal surgery.&lt;br /&gt;
&lt;br /&gt;
This data frame contains the following columns:&lt;br /&gt;
* Kyphosis: a factor with levels absent present indicating if a kyphosis (a type of deformation) was present after the operation.&lt;br /&gt;
* Age: in months&lt;br /&gt;
* Number: the number of vertebrae involved&lt;br /&gt;
* Start: the number of the first (topmost) vertebra operated on.&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/rpart/html/kyphosis.html&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
fit &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)&lt;br /&gt;
fit2 &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis,&lt;br /&gt;
              parms = list(prior = c(.65,.35), split = &amp;quot;information&amp;quot;))&lt;br /&gt;
fit3 &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis,&lt;br /&gt;
              control = rpart.control(cp = 0.05))&lt;br /&gt;
par(mfrow = c(1,2), xpd = NA) # otherwise on some devices the text is clipped&lt;br /&gt;
plot(fit)&lt;br /&gt;
text(fit, use.n = TRUE)&lt;br /&gt;
plot(fit2)&lt;br /&gt;
text(fit2, use.n = TRUE)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Image:R-classification-tree.png|500px|thumbnail|center|Classification tree for the kyphosis data frame.]]&lt;br /&gt;
&lt;br /&gt;
==Solving Systems of Linear Equations==&lt;br /&gt;
&lt;br /&gt;
See, e.g.: https://source.ggy.bris.ac.uk/wiki/NumMethodsPDEs&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; A &amp;lt;- array(c(1,3,2,3,5,4,-2,6,3), dim=c(3,3))&lt;br /&gt;
&amp;gt; b &amp;lt;- c(5,7,8)&lt;br /&gt;
&amp;gt; solve(A,b)&lt;br /&gt;
[1] -15   8   2&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Suggested Exercises=&lt;br /&gt;
&lt;br /&gt;
If you would like to work through some exercises, with model answers included, you could take a look at:&lt;br /&gt;
* http://www2.warwick.ac.uk/fac/sci/statistics/staff/academic-research/reed/rexercises.pdf&lt;br /&gt;
&lt;br /&gt;
If you would prefer to noodle about with some real-world data, you could take a look at:&lt;br /&gt;
* http://www.theguardian.com/news/datablog/2010/oct/18/historic-government-spending-area#data&lt;br /&gt;
&lt;br /&gt;
=Writing Faster R Code=&lt;br /&gt;
&lt;br /&gt;
In the above sections we've introduced a number of features of R and have begun the journey to becoming a proficient and productive user of the language.  In the remaining sections, we'll switch tack and focus on a question commonly asked by those beginning to use R in anger--'''&amp;quot;My R code is slow.  How can I speed it up?&amp;quot;'''.  In this section we'll consider the related tasks of finding which bits of your R code is responsible for the majority of the run-time and what you can do about it.&lt;br /&gt;
&lt;br /&gt;
==Profiling &amp;amp; Timing==&lt;br /&gt;
&lt;br /&gt;
In order to remain productive (and sane, and have a social life...), it is essential that we first identify which portions of your R code are responsible for the majority of the run-time.  We could spend ages optimising a portion that we ''think'' may be running slowly, but computers have the gift(!) to constantly surprise us, and if that portion of your program accounted for, say, 10% of the run-time, then you will have sweated for absolutely no useful gain.&lt;br /&gt;
&lt;br /&gt;
The simplest method of investigation is to simply time the application of a function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
system.time(some.function())&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can get a more detailed analysis of a block of code using the built-in R profiler.  The general pattern of invocation is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
Rprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
# Do some work&lt;br /&gt;
Rprof()&lt;br /&gt;
summaryRprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For example, here's an R script, '''profile.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
Rprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
# Create a 10 x 100,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(100000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)})&lt;br /&gt;
Rprof()&lt;br /&gt;
summaryRprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Which I ran by typing:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
R CMD BATCH profile.r&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the output file, '''profile.r.Rout''', I found the following break down:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
               self.time self.pct total.time total.pct&lt;br /&gt;
&amp;quot;simpleLoess&amp;quot;       4.84    88.00       5.10     92.73&lt;br /&gt;
&amp;quot;rnorm&amp;quot;             0.22     4.00       0.22      4.00&lt;br /&gt;
&amp;quot;loess.smooth&amp;quot;      0.18     3.27       5.28     96.00&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The profile tells us that the function '''simpleLoess''' take 88% of the runtime, whereas '''rnorm''' takes only 4%.&lt;br /&gt;
&lt;br /&gt;
==Preallocation of Memory==&lt;br /&gt;
&lt;br /&gt;
As with other scripting languages, such as MATLAB, the simplest method that you can use to speed up your R code is to pre-allocate the storage for variables whenever possible.  To see the benefits of this, consider the following two functions:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; f1 &amp;lt;- function() {&lt;br /&gt;
+ v &amp;lt;- c()&lt;br /&gt;
+ for (i in 1:30000)&lt;br /&gt;
+   v[i] &amp;lt;- i^2&lt;br /&gt;
+ }&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; f2 &amp;lt;- function() {&lt;br /&gt;
+ v &amp;lt;- c(NA)&lt;br /&gt;
+ length(v) &amp;lt;- 30000&lt;br /&gt;
+ for (i in 1:30000)&lt;br /&gt;
+   v[i] &amp;lt;- i^2&lt;br /&gt;
+ }&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Timing calls to each of them shows that the pre-allocation of memory gives a whopping ~'''x30 speed-up'''.  Your mileage will vary depending upon the details of your code.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; system.time(f1())&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  1.720   0.040   1.762&lt;br /&gt;
&amp;gt; system.time(f2())&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.052   0.000   0.05&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Vectorised Operations==&lt;br /&gt;
&lt;br /&gt;
The other principle method for speeding up your R code is to eliminate loops whenever you can.  Many functions and operators in R will accept arrays as input, rather than just single values and this may allow you to not use a loop.  The examples in the previous section used for loops to step through an array, squaring each element.  However, you can achieve the same result far more quickly by passing the array ''en masse'' to exponentiation operator:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; system.time(v &amp;lt;- (1:1000000)^2)&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.024   0.004   0.026&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Here we've been able to square 1,000,000 items in half the time it took to process 30,000!&lt;br /&gt;
&lt;br /&gt;
==Calling Functions Written in a Compiled Language (e.g. C or Fortran)==&lt;br /&gt;
&lt;br /&gt;
Another way to get more speed is to outsource portions of R code that are found to be slow to a compiled language, such as C or Fortran.  A good starting point on this topic is:&lt;br /&gt;
&lt;br /&gt;
* http://mazamascience.com/WorkingWithData/?p=1067&lt;br /&gt;
&lt;br /&gt;
=R and HPC=&lt;br /&gt;
&lt;br /&gt;
If you've profiled your code and tried all that you can to speed it up, as described in the previous section, you might be interested in the various initiatives that exist to run R on high performance computers, such as bluecrsytal:  &lt;br /&gt;
&lt;br /&gt;
* http://cran.r-project.org/web/views/HighPerformanceComputing.html&lt;br /&gt;
&lt;br /&gt;
We will see in the following examples, the general approach to running R in parallel is to arrange your task so that a function is applied to a list of inputs, and then to split the list over several CPU cores or cluster worker nodes.&lt;br /&gt;
&lt;br /&gt;
==Multicore==&lt;br /&gt;
&lt;br /&gt;
The '''multicore''' package allows us to make use of several CPU cores within a single machine.  Note, however, that the package  does not work on a MS Windows computers.&lt;br /&gt;
&lt;br /&gt;
As an example, let's look at the use of the package's '''mclapply''' function, a multicore equivalent of R's built-in list apply mapper, '''lapply'''.   I saved the following commands into an R script called '''mutlicore.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(multicore)&lt;br /&gt;
# how many cores are present?&lt;br /&gt;
multicore:::detectCores()&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And used the following submission script to run it on bluecrystal phase2:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=00:00:05&lt;br /&gt;
&lt;br /&gt;
#! Ensure that we have the correct version of R loaded&lt;br /&gt;
module add languages/R-2.15.1&lt;br /&gt;
&lt;br /&gt;
#! change the working directory (default is home directory)&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
#! Run the R script&lt;br /&gt;
R CMD BATCH multicore.r&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After the job had run, I got the following output in the file '''multicore.r.Rout''':&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(multicore)&lt;br /&gt;
&amp;gt; # how many cores are present?&lt;br /&gt;
&amp;gt; multicore:::detectCores()&lt;br /&gt;
[1] 8&lt;br /&gt;
&amp;gt; # Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
&amp;gt; data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
&amp;gt; # Map a function over the matrix.  First in serial..&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.674   0.007   0.749 &lt;br /&gt;
&amp;gt; # .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.301   0.074   0.113 &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Rmpi==&lt;br /&gt;
&lt;br /&gt;
The '''Rmpi''' package allows us to create and use cohorts of message passing processes from within R.  It does so by providing an interface to the MPI (Message Passing Interface) library.&lt;br /&gt;
&lt;br /&gt;
In order to use the Rmpi package on BCp2, you will need the '''ofed/openmpi/gcc/64/1.4.2-qlc''' module loaded.&lt;br /&gt;
&lt;br /&gt;
Here's a short example that I saved as '''Rmpi.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(Rmpi)&lt;br /&gt;
# spawn as many slaves as possible&lt;br /&gt;
mpi.spawn.Rslaves()&lt;br /&gt;
mpi.remote.exec(mpi.get.processor.name())&lt;br /&gt;
mpi.remote.exec(runif(1))&lt;br /&gt;
mpi.close.Rslaves()&lt;br /&gt;
mpi.quit()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I submitted the job to BCp2 using the following submission script:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#PBS -l nodes=4:ppn=1,walltime=00:00:05&lt;br /&gt;
&lt;br /&gt;
#! Ensure that we have the correct version of R loaded&lt;br /&gt;
module add languages/R-2.15.1&lt;br /&gt;
&lt;br /&gt;
#! change the working directory (default is home directory)&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
#! Create a machine file (used for multi-node jobs)&lt;br /&gt;
cat $PBS_NODEFILE &amp;gt; machine.file.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
#! Disable PSM on the QLogic HCAs&lt;br /&gt;
export OMPI_MCA_mtl=^psm&lt;br /&gt;
&lt;br /&gt;
#! Run the R script&lt;br /&gt;
mpirun -np 1 -machinefile machine.file.$PBS_JOBID R CMD BATCH Rmpi.r&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and got the following output:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(Rmpi)&lt;br /&gt;
&amp;gt; # spawn as many slaves as possible&lt;br /&gt;
&amp;gt; mpi.spawn.Rslaves()&lt;br /&gt;
        4 slaves are spawned successfully. 0 failed.&lt;br /&gt;
master (rank 0, comm 1) of size 5 is running on: u03n074 &lt;br /&gt;
slave1 (rank 1, comm 1) of size 5 is running on: u03n098 &lt;br /&gt;
slave2 (rank 2, comm 1) of size 5 is running on: u04n029 &lt;br /&gt;
slave3 (rank 3, comm 1) of size 5 is running on: u04n030 &lt;br /&gt;
slave4 (rank 4, comm 1) of size 5 is running on: u03n074 &lt;br /&gt;
&amp;gt; mpi.remote.exec(mpi.get.processor.name())&lt;br /&gt;
$slave1&lt;br /&gt;
[1] &amp;quot;u03n098&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave2&lt;br /&gt;
[1] &amp;quot;u04n029&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave3&lt;br /&gt;
[1] &amp;quot;u04n030&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave4&lt;br /&gt;
[1] &amp;quot;u03n074&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;gt; mpi.remote.exec(runif(1))&lt;br /&gt;
         X1        X2        X3        X4&lt;br /&gt;
1 0.5154871 0.5154871 0.5154871 0.5154871&lt;br /&gt;
&amp;gt; mpi.close.Rslaves()&lt;br /&gt;
[1] 1&lt;br /&gt;
&amp;gt; mpi.quit()&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Snow==&lt;br /&gt;
&lt;br /&gt;
Calling MPI routines from within R may be too low level for many people to use comfortably.  Happily, the '''snow''' package provides a higher level abstraction for distributed memory programming from within R.&lt;br /&gt;
&lt;br /&gt;
Here's my example program that a saved as '''snow.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(snow)&lt;br /&gt;
# request a cluster of 3 worker nodes&lt;br /&gt;
cl &amp;lt;- makeCluster(3)&lt;br /&gt;
clusterCall(cl, function() Sys.info()[c(&amp;quot;nodename&amp;quot;,&amp;quot;machine&amp;quot;)])&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using snow, across a cluster of workers)&lt;br /&gt;
system.time(x &amp;lt;- clusterApply(cl, data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
stopCluster(cl)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I ran it on BCp2 using the same submission script given for Rmpi, save for changing Rmpi.r to snow.r.  The output was:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(snow)&lt;br /&gt;
&amp;gt; # request a cluster of 3 worker nodes&lt;br /&gt;
&amp;gt; cl &amp;lt;- makeCluster(3)&lt;br /&gt;
Loading required package: Rmpi&lt;br /&gt;
        3 slaves are spawned successfully. 0 failed.&lt;br /&gt;
&amp;gt; clusterCall(cl, function() Sys.info()[c(&amp;quot;nodename&amp;quot;,&amp;quot;machine&amp;quot;)])&lt;br /&gt;
[[1]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u01n105&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
[[2]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u02n014&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
[[3]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u03n098&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&amp;gt; # Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
&amp;gt; data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
&amp;gt; # Map a function over the matrix.  First in serial..&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.711   0.001   0.715 &lt;br /&gt;
&amp;gt; # .. and secondly in parallel (using snow, across a cluster of workers)&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- clusterApply(cl, data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.259   0.001   0.260 &lt;br /&gt;
&amp;gt; stopCluster(cl)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Parallel==&lt;br /&gt;
&lt;br /&gt;
The '''parallel''' package is an amalgamation of functionality from the multicore and snow packages.  The shared memory parallelism in this package runs on an MS Windows machine (unlike the multicore package). &lt;br /&gt;
&lt;br /&gt;
I trivial translation of our previous multicore example is:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(parallel)&lt;br /&gt;
# how many cores are present?&lt;br /&gt;
parallel:::detectCores()&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I have not been able to get a distributed memory cluster working on BCp2 using the parallel package.&lt;br /&gt;
&lt;br /&gt;
=Further Reading=&lt;br /&gt;
&lt;br /&gt;
* [http://shop.oreilly.com/product/9780596801717.do R in a Nutshell]&lt;br /&gt;
* [http://shop.oreilly.com/product/0636920021421.do Parallel R]&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=R1&amp;diff=9407</id>
		<title>R1</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=R1&amp;diff=9407"/>
		<updated>2014-03-03T12:00:59Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: /* Suggested Exercises */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[category:Pragmatic Programming]]&lt;br /&gt;
'''Open Source Statistics with R'''&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
R is a mature, open-source (i.e. free!) statistics package, with an intuitive interface, excellent graphics and a vibrant community constantly adding new methods for the statistical investigation of your data to the library of packages available.&lt;br /&gt;
&lt;br /&gt;
The goal of this tutorial is to introduce you to the R package, and not to be an introductory course in statistics.&lt;br /&gt;
&lt;br /&gt;
If you are working on a Linux system, you will typically start R from the command line.  On a Windows machine, or a Mac, you will typically start up R in some form of GUI.  However you get R started, you will have access to an R command prompt.  The good news is that the examples below will all work at the R command prompt, however you gained access to it.&lt;br /&gt;
&lt;br /&gt;
Further resources:&lt;br /&gt;
&lt;br /&gt;
* The R manual is a great resource for learning R: http://cran.r-project.org/doc/manuals/r-release/R-intro.pdf&lt;br /&gt;
* Some excellent examples of using R can also be found at: http://msenux.redwoods.edu/math/R/ and http://www.r-tutor.com/&lt;br /&gt;
&lt;br /&gt;
=Getting Started=&lt;br /&gt;
&lt;br /&gt;
The very simplest thing we can do with R is to perform some arithmetic at the command prompt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; phi &amp;lt;- (1+sqrt(5))/2&lt;br /&gt;
&amp;gt; phi&lt;br /&gt;
[1] 1.618034&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Parentheses are used to modify the usual order of precedence of the operators ('''/''' will typically be evaluated before '''+''').  Note the '''[1]''' accompanying the returned value.  All numbers entered at the console are interpreted as a vector.  The '[1]' indicates that the line in question is displaying the vector of values starting at first index.  We can use the handy sequence function to create a vector containing more than a single element:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; odds &amp;lt;- seq(from=1, to=67, by=2)&lt;br /&gt;
&amp;gt; odds&lt;br /&gt;
 [1]  1  3  5  7  9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49&lt;br /&gt;
[26] 51 53 55 57 59 61 63 65 67&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
From the above example, we can see that both the '''&amp;lt;-''' and '''=''' operators can be used for assignment.&lt;br /&gt;
&lt;br /&gt;
Vectors are commonly used data structures in R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
coords.bris &amp;lt;- c(51.5, 2.6)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As are matrices:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; magic &amp;lt;- matrix(data=c(2,7,6,9,5,1,4,3,8),nrow=3,ncol=3)&lt;br /&gt;
&amp;gt; magic&lt;br /&gt;
     [,1] [,2] [,3]&lt;br /&gt;
[1,]    2    9    4&lt;br /&gt;
[2,]    7    5    3&lt;br /&gt;
[3,]    6    1    8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the '''c''' function combines the arguments given in the parentheses.  We can access portions of the array using the syntax shown in the square brackets.  For example, we can access the first row using the '''[1,]''' notation, and similarly the second column using '''[,2]'''.  Since the square is 3x3 magic, the numbers in both slices should sum to 15:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; sum(magic[1,])&lt;br /&gt;
[1] 15&lt;br /&gt;
&amp;gt; sum(magic[,2])&lt;br /&gt;
[1] 15&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Single elements and ranges can also accessed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; magic[2,2]&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;gt; magic[2:3,2:3]&lt;br /&gt;
     [,1] [,2]&lt;br /&gt;
[1,]    5    3&lt;br /&gt;
[2,]    1    8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
R also provides '''arrays''', which have more than two dimensions, and '''lists''' to hold heterogeneous collections.&lt;br /&gt;
&lt;br /&gt;
An example list:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; list.r4 &amp;lt;- list(name=&amp;quot;Radio4&amp;quot;, frequency=&amp;quot;93.7&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The items of which, we can access in several ways:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; list.r4$frequency&lt;br /&gt;
[1] &amp;quot;93.7&amp;quot;&lt;br /&gt;
&amp;gt; list.r4[1]&lt;br /&gt;
$name&lt;br /&gt;
[1] &amp;quot;Radio4&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;gt; list.r4[[1]]&lt;br /&gt;
[1] &amp;quot;Radio4&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A very commonly used data structure is the '''data frame''', which R uses to store tabular data.  Given several vectors of equal length, we can collate them into a data frame:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; country &amp;lt;- c(&amp;quot;USA&amp;quot;, &amp;quot;China&amp;quot;, &amp;quot;GB&amp;quot;)&lt;br /&gt;
&amp;gt; gold &amp;lt;- c(46, 38, 29)&lt;br /&gt;
&amp;gt; silver &amp;lt;- c(29, 27, 17)&lt;br /&gt;
&amp;gt; bronze &amp;lt;- c(29, 23, 19)&lt;br /&gt;
&amp;gt; medals.2012 &amp;lt;- data.frame(country, gold, silver, bronze)&lt;br /&gt;
&amp;gt; medals.2012&lt;br /&gt;
    country  gold  silver  bronze&lt;br /&gt;
1       USA    46      29      29&lt;br /&gt;
2     China    38      27      23&lt;br /&gt;
3        GB    29      17      19&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can access columns of a data frame using the '''$''' operator:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; medals.2012$country&lt;br /&gt;
[1] USA   China GB   &lt;br /&gt;
Levels: China GB USA&lt;br /&gt;
&amp;gt; medals.2012$gold&lt;br /&gt;
[1] 46 38 29&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Standard Graphics: A taster=&lt;br /&gt;
&lt;br /&gt;
An aspect which makes R popular are it's graphing functions.  R also has some very handy built-in data sets--we'll use this to demonstrate just a small fraction of R's graphing abilities.&lt;br /&gt;
&lt;br /&gt;
First up is the humble '''plot()''' function.  Given a data frame of points, such as one charting the relationship between temperature and the vapour pressure of mercury, it will give us a (handily labelled) scatter plot: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(pressure)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See the gallery below for all the plots created in this section.&lt;br /&gt;
&lt;br /&gt;
The plot function will also accept a time-series (another class of object recognised by R) and will sensibly join the points with a line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(co2)&lt;br /&gt;
&amp;gt; class(co2)&lt;br /&gt;
[1] &amp;quot;ts&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Pie charts are easily constructed.  In this case, to show the relative proportions of electricity generated from different sources in the UK in 2011 (source: https://www.gov.uk/government/.../5942-uk-energy-in-brief-2012.pdf‎):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; uk.electricty.sources.2011 &amp;lt;- c(41,29,18,5,4,2,1)&lt;br /&gt;
&amp;gt; names(uk.electricty.sources.2011) &amp;lt;- (&amp;quot;Gas&amp;quot;, &amp;quot;Coal&amp;quot;, &amp;quot;Nuclear&amp;quot;, &amp;quot;Hydro &amp;amp; other&amp;quot;, &amp;quot;Wind&amp;quot;, &amp;quot;Imports&amp;quot;, &amp;quot;Oil&amp;quot;)&lt;br /&gt;
&amp;gt; pie(uk.electricty.sources.2011, main=&amp;quot;UK Electricty Generating Mix, 2011&amp;quot;, col=rainbow(7))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Next, let's create a bar chart of monthly average precipitation falling here in the fair city of Bristol (source: http://www.worldweatheronline.com):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; bristol.precip &amp;lt;- c(82.9, 56.1, 59.2, 69, 50.8, 50.9, 50.8, 74.8, 74.7, 91.1, 94.5, 93.6)&lt;br /&gt;
&amp;gt; names(bristol.precip) &amp;lt;- c(&amp;quot;Jan&amp;quot;, &amp;quot;Feb&amp;quot;, &amp;quot;Mar&amp;quot;, &amp;quot;Apr&amp;quot;, &amp;quot;May&amp;quot;, &amp;quot;Jun&amp;quot;, &amp;quot;Jul&amp;quot;, &amp;quot;Aug&amp;quot;, &amp;quot;Sep&amp;quot;, &amp;quot;Oct&amp;quot;, &amp;quot;Nov&amp;quot;, &amp;quot;Dec&amp;quot;)&lt;br /&gt;
&amp;gt; barplot(bristol.precip,&lt;br /&gt;
+ main=&amp;quot;Average Monthly Precipitation in Bristol&amp;quot;,&lt;br /&gt;
+ ylab=&amp;quot;Mean precipitation (mm)&amp;quot;,&lt;br /&gt;
+ ylim=c(0,100),&lt;br /&gt;
+ col=c(&amp;quot;darkblue&amp;quot;))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Box_plot 'Box and whisker' plots] are useful ways to graph the quartiles of some data.  In this case, the fuel efficiencies of various US cars, circa 1974:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; boxplot(mpg~cyl,data=mtcars, main=&amp;quot;Car Milage Data&amp;quot;,&lt;br /&gt;
+    xlab=&amp;quot;Number of Cylinders&amp;quot;, ylab=&amp;quot;Miles Per Gallon&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
R includes a very useful help facility.  In the case of the '''filled.contour()''' plotting function, the help page includes an example of it's use to plot the topology of a volcano in Auckland, NZ:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; ?filled.countour&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;gallery widths=300px heights=300px perrow=3&amp;gt;&lt;br /&gt;
File:Vapour-pressure.png|Vapour pressure of mercury against temperature&lt;br /&gt;
File:Mauna-loa.png|CO2 concentrations measured at Mauna-Loa between 1959 and 1997&lt;br /&gt;
File:Pie.png|The UK's electricity generating mix, 2011&lt;br /&gt;
File:Barplot.png|Average monthly precipitation in Bristol&lt;br /&gt;
File:Boxplot.png|Range of fuel efficiencies for different engine sizes &lt;br /&gt;
File:Maunga-Whau.png|Topology of Maunga Whau volcano in Auckland&lt;br /&gt;
&amp;lt;/gallery&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are many more example plots--complete with the R code required to create the plots (at the bottom of the page, after the comments)--on the following web page:&lt;br /&gt;
* http://gallery.r-enthusiasts.com/thumbs.php&lt;br /&gt;
&lt;br /&gt;
=Loops=&lt;br /&gt;
&lt;br /&gt;
A simple '''for''' loop:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; for (ii in seq(1,10)) print(ii)&lt;br /&gt;
[1] 1&lt;br /&gt;
[1] 2&lt;br /&gt;
[1] 3&lt;br /&gt;
[1] 4&lt;br /&gt;
[1] 5&lt;br /&gt;
[1] 6&lt;br /&gt;
[1] 7&lt;br /&gt;
[1] 8&lt;br /&gt;
[1] 9&lt;br /&gt;
[1] 10&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Some more exotic counting:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; for (ii in seq(from=10, to=0, by=-2)) print(ii)&lt;br /&gt;
[1] 10&lt;br /&gt;
[1] 8&lt;br /&gt;
[1] 6&lt;br /&gt;
[1] 4&lt;br /&gt;
[1] 2&lt;br /&gt;
[1] 0&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''while''' loops are for when we don't know the number of iterations in advance:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; ii &amp;lt;- runif(1,0,1)&lt;br /&gt;
&amp;gt; ii&lt;br /&gt;
[1] 0.3998513&lt;br /&gt;
&amp;gt; while (ii &amp;lt; 0.5) {print(ii); ii &amp;lt;- runif(1,0,1)}&lt;br /&gt;
[1] 0.3998513&lt;br /&gt;
[1] 0.05469244&lt;br /&gt;
&amp;gt; ii&lt;br /&gt;
[1] 0.8265036&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Functions=&lt;br /&gt;
&lt;br /&gt;
You can define your own functions in R, using the '''function''' keyword.  For example, Pythagoras' Theorem:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypotenuse &amp;lt;- function(x, y) {sqrt(x^2 + y^2)}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The braces ({}) are optional, but add clarity.&lt;br /&gt;
&lt;br /&gt;
To call the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypotenuse(3,4)&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can provide default values for the arguments, which can be overridden for any given invocation of the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot2 &amp;lt;- function(x=3 ,y=4) {sqrt(x^2 + y^2)}&lt;br /&gt;
&amp;gt; hypot2()&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;gt; hypot2(12,16)&lt;br /&gt;
[1] 20&lt;br /&gt;
&amp;gt; hypot2(y=16, x=12)&lt;br /&gt;
[1] 20&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can see that the order of the arguments is respected, unless the names are given, in which case the order can be changed. &lt;br /&gt;
&lt;br /&gt;
Longer functions can be spread over several lines.  We can also use the '''return''' keyword to control which value is returned by the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot3 &amp;lt;- function(x=3 ,y=4) {&lt;br /&gt;
+ x_sq &amp;lt;- x^2&lt;br /&gt;
+ y_sq &amp;lt;- y^2&lt;br /&gt;
+ return( sqrt(x_sq + y_sq) )}&lt;br /&gt;
&amp;gt; hypot3(6,8)&lt;br /&gt;
[1] 10&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can check on the contents of a function, by just typing it's name (without parentheses):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot3&lt;br /&gt;
function(x=3 ,y=4) {&lt;br /&gt;
x_sq &amp;lt;- x^2&lt;br /&gt;
y_sq &amp;lt;- y^2&lt;br /&gt;
return( sqrt(x_sq + y_sq) )}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or just check the arguments, using the '''args''' function.  (The body of the function in general is reported as NULL):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; args(hypot3)&lt;br /&gt;
function (x = 3, y = 4) &lt;br /&gt;
NULL&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Packages=&lt;br /&gt;
&lt;br /&gt;
Listed at http://cran.r-project.org/&lt;br /&gt;
&lt;br /&gt;
Let's install the '''multicore''' package, that will give us access to functions within R which will run on the multiple processors which we often find in our computers these days:   &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; install.packages(&amp;quot;multicore&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Et voila!  It is done.&lt;br /&gt;
&lt;br /&gt;
We can check which packages are currently loaded into the library available from our workspace:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; library()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If we need to add one, we type e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; library(multicore)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now, an example of using a function from the multicore package.  The '''lapply''' function, which is included in the standard R core, will map a given function over a list inputs, giving a list of the function outputs in return.  For example, we can map a squaring function over the list of integers from 1 to 3:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; lapply(1:3, function(x) {x^2})&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
which gives us the list:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[[1]]&lt;br /&gt;
[1] 1&lt;br /&gt;
&lt;br /&gt;
[[2]]&lt;br /&gt;
[1] 4&lt;br /&gt;
&lt;br /&gt;
[[3]]&lt;br /&gt;
[1] 9&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now, we can do the same work in parallel using:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; mclapply(1:3, function(x) {x^2})&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Reading Data from File=&lt;br /&gt;
&lt;br /&gt;
R provides some very useful functions for reading and writing data from/to file.&lt;br /&gt;
&lt;br /&gt;
==Text Files==&lt;br /&gt;
&lt;br /&gt;
Let's start with text files.  If your data is organised into a file such that it looks like a table with column headings:  &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Perhaps the simplest one is '''read.table()'''.  If I have a text file with the following contents:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
country              gold silver bronze&lt;br /&gt;
&amp;quot;USA&amp;quot;                46   29     29&lt;br /&gt;
&amp;quot;China&amp;quot;              38   27     23&lt;br /&gt;
&amp;quot;Great Britain&amp;quot;      29   17     19&lt;br /&gt;
&amp;quot;Russian Federation&amp;quot; 24   26     32&lt;br /&gt;
&amp;quot;Republic of Korea&amp;quot;  13   8      7&lt;br /&gt;
&amp;quot;Germany&amp;quot;            11   19     14&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It will be a simple matter to use the '''read.table()''' function to load the data into R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; medals.2012 &amp;lt;- read.table(&amp;quot;medals.txt&amp;quot;, header=TRUE)&lt;br /&gt;
&amp;gt; medals.2012&lt;br /&gt;
             country gold silver bronze&lt;br /&gt;
1                USA   46     29     29&lt;br /&gt;
2              China   38     27     23&lt;br /&gt;
3      Great Britain   29     17     19&lt;br /&gt;
4 Russian Federation   24     26     32&lt;br /&gt;
5  Republic of Korea   13      8      7&lt;br /&gt;
6            Germany   11     19     14&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is a corresponding '''write.table()''' function to export the contents of a data frame into a text file.&lt;br /&gt;
&lt;br /&gt;
CSV files can be easily handled by specifying '''sep=&amp;quot;,&amp;quot;''' as an argument to read.table().  However, for convenience, there are also '''read.csv()''' and '''write.csv()''' functions defined.  For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; write.csv(medals.2012,&amp;quot;medals.csv&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Gives us the file, '''medals.csv''', with the contents:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;quot;&amp;quot;,&amp;quot;country&amp;quot;,&amp;quot;gold&amp;quot;,&amp;quot;silver&amp;quot;,&amp;quot;bronze&amp;quot;&lt;br /&gt;
&amp;quot;1&amp;quot;,&amp;quot;USA&amp;quot;,46,29,29&lt;br /&gt;
&amp;quot;2&amp;quot;,&amp;quot;China&amp;quot;,38,27,23&lt;br /&gt;
&amp;quot;3&amp;quot;,&amp;quot;Great Britain&amp;quot;,29,17,19&lt;br /&gt;
&amp;quot;4&amp;quot;,&amp;quot;Russian Federation&amp;quot;,24,26,32&lt;br /&gt;
&amp;quot;5&amp;quot;,&amp;quot;Republic of Korea&amp;quot;,13,8,7&lt;br /&gt;
&amp;quot;6&amp;quot;,&amp;quot;Germany&amp;quot;,11,19,14&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Binary Files==&lt;br /&gt;
&lt;br /&gt;
The '''save()''' function will store an R data structure in binary form:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; save(medals.2012,file=&amp;quot;medals.RData&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~$ file medals.RData &lt;br /&gt;
medals.RData: gzip compressed data, from Unix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is, of course, a corresponding function to load such data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; load(&amp;quot;medals.RData&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Databases==&lt;br /&gt;
&lt;br /&gt;
If you would like to read and write data directly from/to a database, there are several packages to help you.  See http://cran.r-project.org/doc/manuals/r-release/R-data.html#Relational-databases for more information.&lt;br /&gt;
&lt;br /&gt;
==NetCDF==&lt;br /&gt;
&lt;br /&gt;
The [http://cran.r-project.org/web/packages/ncdf/index.html '''ncdf''' package] provides an interface to NetCDF files.  Before installing the package, you will need the Unidata NetCDF libraries installed on your system.  On Linux, the standard package managers conveniently provide this.  Note that you will need the 'development' packages.  Once the prerequisites are satisfied, you can use the standard R command to install the package from CRAN:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; install.packages(&amp;quot;ncdf&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Examples of Common Tasks=&lt;br /&gt;
&lt;br /&gt;
==Preparing Data==&lt;br /&gt;
&lt;br /&gt;
===Sorting===&lt;br /&gt;
&lt;br /&gt;
Using '''sort''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; railway.engines &amp;lt;- c(&amp;quot;thomas&amp;quot;, &amp;quot;henry&amp;quot;, &amp;quot;gordon&amp;quot;, &amp;quot;edward&amp;quot;, &amp;quot;james&amp;quot;)&lt;br /&gt;
&amp;gt; sort(railway.engines)&lt;br /&gt;
[1] &amp;quot;edward&amp;quot; &amp;quot;gordon&amp;quot; &amp;quot;henry&amp;quot;  &amp;quot;james&amp;quot;  &amp;quot;thomas&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/sort.html&lt;br /&gt;
&lt;br /&gt;
===Random Sampling===&lt;br /&gt;
&lt;br /&gt;
Using '''sample''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; railway.engines &amp;lt;- c(&amp;quot;thomas&amp;quot;, &amp;quot;henry&amp;quot;, &amp;quot;gordon&amp;quot;, &amp;quot;edward&amp;quot;, &amp;quot;james&amp;quot;)&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;gordon&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;james&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;edward&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;thomas&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;gordon&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;james&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/sample.html&lt;br /&gt;
&lt;br /&gt;
===Combining===&lt;br /&gt;
&lt;br /&gt;
Using '''rbind''' to add combine the rows to two data frames:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; country &amp;lt;- c(&amp;quot;France&amp;quot;, &amp;quot;Italy&amp;quot;, &amp;quot;Hungary&amp;quot;, &amp;quot;Australia&amp;quot;)&lt;br /&gt;
&amp;gt; gold &amp;lt;- c(11, 8, 8, 7)&lt;br /&gt;
&amp;gt; silver &amp;lt;- c(11, 9, 4, 16)&lt;br /&gt;
&amp;gt; bronze &amp;lt;- c(12, 11, 5, 12)&lt;br /&gt;
&amp;gt; extras.2012 &amp;lt;- data.frame(country, gold, silver, bronze)&lt;br /&gt;
&amp;gt; rbind(medals.2012, extras.2012)&lt;br /&gt;
              country gold silver bronze&lt;br /&gt;
1                 USA   46     29     29&lt;br /&gt;
2               China   38     27     23&lt;br /&gt;
3       Great Britain   29     17     19&lt;br /&gt;
4  Russian Federation   24     26     32&lt;br /&gt;
5   Republic of Korea   13      8      7&lt;br /&gt;
6             Germany   11     19     14&lt;br /&gt;
7              France   11     11     12&lt;br /&gt;
8               Italy    8      9     11&lt;br /&gt;
9             Hungary    8      4      5&lt;br /&gt;
10          Australia    7     16     12&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/cbind.html&lt;br /&gt;
&lt;br /&gt;
===Binning Data===&lt;br /&gt;
&lt;br /&gt;
Using '''cut''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; girls_2=c(83.8, 86.2, 85.1, 88.6, 83, 88.9, 89.7, 81.3, 88.7, 88.4)&lt;br /&gt;
&amp;gt; bins=cut(girls_2, breaks=3)&lt;br /&gt;
&amp;gt; bins&lt;br /&gt;
 [1] (81.3,84.1] (84.1,86.9] (84.1,86.9] (86.9,89.7] (81.3,84.1] (86.9,89.7]&lt;br /&gt;
 [7] (86.9,89.7] (81.3,84.1] (86.9,89.7] (86.9,89.7]&lt;br /&gt;
Levels: (81.3,84.1] (84.1,86.9] (86.9,89.7]&lt;br /&gt;
&amp;gt; plot(bins)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Plotting the data couldn't be simpler with '''plot(bins)'''!&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/cut.html&lt;br /&gt;
&lt;br /&gt;
==Linear Regression==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(cars)&lt;br /&gt;
&amp;gt; res=lm(dist ~ speed, data=cars)&lt;br /&gt;
&amp;gt; abline(res)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Image:R-lm(cars)-abline.png|400px|thumbnail|center|linear regression of stopping distance against speed from the built-in data set, cars]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
'''Exercises'''&lt;br /&gt;
* You may wish to compare different methods of estimation.  From the MASS package, you can fit a line with the '''rlm''' and '''lqs'' funtions.  You can plot all the lines against the data using:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; abline(res.lm, lty=1)&lt;br /&gt;
&amp;gt; abline(res.rlm, lty=2)&lt;br /&gt;
&amp;gt; abline(res.lqs, lty=3)&lt;br /&gt;
&amp;gt; legend(x=5, y=100, legend=c(&amp;quot;lm&amp;quot;,&amp;quot;rlm&amp;quot;,&amp;quot;lqs&amp;quot;), lty=c(1,2,3))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-patched/library/MASS/html/rlm.html and http://stat.ethz.ch/R-manual/R-devel/RHOME/library/MASS/html/lqs.html.&lt;br /&gt;
&lt;br /&gt;
* Weighted least squares.  The '''lm''' function will accept a vector of weights, '''lm(... weights=...)'''.  If given, the function will optimise the line of best fit according a the equation of weighted least squares.  Experiment with different linear model fits, given different weighting vectors.  Some handy hints for creating a vector of weights:&lt;br /&gt;
** '''w1&amp;lt;-rep(0.1,50)''' will give you a vector, length 50, where each element has a value of 0.1.  W1[1]&amp;lt;-10 will give the first element of the vector a value of 10.&lt;br /&gt;
** '''w2&amp;lt;-seq(from=0.02, to=1.0, by=0.02)''' provides a vector containing a sequence of values from 0.02 to 1.0 in steps of 0.02 (handily, again 50 in total).&lt;br /&gt;
&lt;br /&gt;
==Significance Testing==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; boys_2=c(90.2, 91.4, 86.4, 87.6, 86.7, 88.1, 82.2, 83.8, 91, 87.4)&lt;br /&gt;
&amp;gt; girls_2=c(83.8, 86.2, 85.1, 88.6, 83, 88.9, 89.7, 81.3, 88.7, 88.4)&lt;br /&gt;
&amp;gt; res=var.test(boys_2,girls_2)&lt;br /&gt;
&amp;gt; res&lt;br /&gt;
&lt;br /&gt;
	F test to compare two variances&lt;br /&gt;
&lt;br /&gt;
data:  boys_2 and girls_2 &lt;br /&gt;
F = 1.0186, num df = 9, denom df = 9, p-value = 0.9786&lt;br /&gt;
alternative hypothesis: true ratio of variances is not equal to 1 &lt;br /&gt;
95 percent confidence interval:&lt;br /&gt;
 0.2529956 4.1007126 &lt;br /&gt;
sample estimates:&lt;br /&gt;
ratio of variances &lt;br /&gt;
          1.018559 &lt;br /&gt;
&amp;gt; res=t.test(boys_2, girls_2, var.equal=TRUE, paired=FALSE)&lt;br /&gt;
&amp;gt; res&lt;br /&gt;
&lt;br /&gt;
	Two Sample t-test&lt;br /&gt;
&lt;br /&gt;
data:  boys_2 and girls_2 &lt;br /&gt;
t = 0.8429, df = 18, p-value = 0.4103&lt;br /&gt;
alternative hypothesis: true difference in means is not equal to 0 &lt;br /&gt;
95 percent confidence interval:&lt;br /&gt;
 -1.656675  3.876675 &lt;br /&gt;
sample estimates:&lt;br /&gt;
mean of x mean of y &lt;br /&gt;
    87.48     86.3&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Classification==&lt;br /&gt;
&lt;br /&gt;
===k Nearest Neighbours===&lt;br /&gt;
&lt;br /&gt;
This famous (Fisher's or Anderson's) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa (s), versicolor (c), and virginica (v).&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-patched/library/datasets/html/iris.html&lt;br /&gt;
&lt;br /&gt;
k-nearest neighbour classification for test set from training set: For each row of the test set, the k nearest (in Euclidean distance) training set vectors are found, and the classification is decided by majority vote, with ties broken at random. If there are ties for the kth nearest vector, all candidates are included in the vote.&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/class/html/knn.html&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(class)&lt;br /&gt;
train &amp;lt;- rbind(iris3[1:25,,1], iris3[1:25,,2], iris3[1:25,,3])&lt;br /&gt;
test &amp;lt;- rbind(iris3[26:50,,1], iris3[26:50,,2], iris3[26:50,,3])&lt;br /&gt;
cl &amp;lt;- factor(c(rep(&amp;quot;s&amp;quot;,25), rep(&amp;quot;c&amp;quot;,25), rep(&amp;quot;v&amp;quot;,25)))&lt;br /&gt;
iris3.knn &amp;lt;- knn(train, test, cl, k = 3, prob=TRUE)&lt;br /&gt;
table(predicted=iris3.knn, actual=cl)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
How did we do?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
         actual&lt;br /&gt;
predicted  c  s  v&lt;br /&gt;
        c 23  0  3&lt;br /&gt;
        s  0 25  0&lt;br /&gt;
        v  2  0 22&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Classification Trees===&lt;br /&gt;
&lt;br /&gt;
The kyphosis data frame has 81 rows and 4 columns. representing data on children who have had corrective spinal surgery.&lt;br /&gt;
&lt;br /&gt;
This data frame contains the following columns:&lt;br /&gt;
* Kyphosis: a factor with levels absent present indicating if a kyphosis (a type of deformation) was present after the operation.&lt;br /&gt;
* Age: in months&lt;br /&gt;
* Number: the number of vertebrae involved&lt;br /&gt;
* Start: the number of the first (topmost) vertebra operated on.&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/rpart/html/kyphosis.html&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
fit &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)&lt;br /&gt;
fit2 &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis,&lt;br /&gt;
              parms = list(prior = c(.65,.35), split = &amp;quot;information&amp;quot;))&lt;br /&gt;
fit3 &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis,&lt;br /&gt;
              control = rpart.control(cp = 0.05))&lt;br /&gt;
par(mfrow = c(1,2), xpd = NA) # otherwise on some devices the text is clipped&lt;br /&gt;
plot(fit)&lt;br /&gt;
text(fit, use.n = TRUE)&lt;br /&gt;
plot(fit2)&lt;br /&gt;
text(fit2, use.n = TRUE)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Image:R-classification-tree.png|500px|thumbnail|center|Classification tree for the kyphosis data frame.]]&lt;br /&gt;
&lt;br /&gt;
==Solving Systems of Linear Equations==&lt;br /&gt;
&lt;br /&gt;
See, e.g.: https://source.ggy.bris.ac.uk/wiki/NumMethodsPDEs&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; A &amp;lt;- array(c(1,3,2,3,5,4,-2,6,3), dim=c(3,3))&lt;br /&gt;
&amp;gt; b &amp;lt;- c(5,7,8)&lt;br /&gt;
&amp;gt; solve(A,b)&lt;br /&gt;
[1] -15   8   2&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Suggested Exercises=&lt;br /&gt;
&lt;br /&gt;
If you would like to work through some exercises, with model answers included, you could take a look at:&lt;br /&gt;
* http://www2.warwick.ac.uk/fac/sci/statistics/staff/academic-research/reed/rexercises.pdf&lt;br /&gt;
&lt;br /&gt;
If you would prefer to noodle about with some real-world data, you could take a look at:&lt;br /&gt;
* http://www.ukpublicspending.co.uk&lt;br /&gt;
where you can download CSV files, once you have selected the data series you are interested in.&lt;br /&gt;
&lt;br /&gt;
=Writing Faster R Code=&lt;br /&gt;
&lt;br /&gt;
In the above sections we've introduced a number of features of R and have begun the journey to becoming a proficient and productive user of the language.  In the remaining sections, we'll switch tack and focus on a question commonly asked by those beginning to use R in anger--'''&amp;quot;My R code is slow.  How can I speed it up?&amp;quot;'''.  In this section we'll consider the related tasks of finding which bits of your R code is responsible for the majority of the run-time and what you can do about it.&lt;br /&gt;
&lt;br /&gt;
==Profiling &amp;amp; Timing==&lt;br /&gt;
&lt;br /&gt;
In order to remain productive (and sane, and have a social life...), it is essential that we first identify which portions of your R code are responsible for the majority of the run-time.  We could spend ages optimising a portion that we ''think'' may be running slowly, but computers have the gift(!) to constantly surprise us, and if that portion of your program accounted for, say, 10% of the run-time, then you will have sweated for absolutely no useful gain.&lt;br /&gt;
&lt;br /&gt;
The simplest method of investigation is to simply time the application of a function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
system.time(some.function())&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can get a more detailed analysis of a block of code using the built-in R profiler.  The general pattern of invocation is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
Rprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
# Do some work&lt;br /&gt;
Rprof()&lt;br /&gt;
summaryRprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For example, here's an R script, '''profile.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
Rprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
# Create a 10 x 100,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(100000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)})&lt;br /&gt;
Rprof()&lt;br /&gt;
summaryRprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Which I ran by typing:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
R CMD BATCH profile.r&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the output file, '''profile.r.Rout''', I found the following break down:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
               self.time self.pct total.time total.pct&lt;br /&gt;
&amp;quot;simpleLoess&amp;quot;       4.84    88.00       5.10     92.73&lt;br /&gt;
&amp;quot;rnorm&amp;quot;             0.22     4.00       0.22      4.00&lt;br /&gt;
&amp;quot;loess.smooth&amp;quot;      0.18     3.27       5.28     96.00&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The profile tells us that the function '''simpleLoess''' take 88% of the runtime, whereas '''rnorm''' takes only 4%.&lt;br /&gt;
&lt;br /&gt;
==Preallocation of Memory==&lt;br /&gt;
&lt;br /&gt;
As with other scripting languages, such as MATLAB, the simplest method that you can use to speed up your R code is to pre-allocate the storage for variables whenever possible.  To see the benefits of this, consider the following two functions:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; f1 &amp;lt;- function() {&lt;br /&gt;
+ v &amp;lt;- c()&lt;br /&gt;
+ for (i in 1:30000)&lt;br /&gt;
+   v[i] &amp;lt;- i^2&lt;br /&gt;
+ }&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; f2 &amp;lt;- function() {&lt;br /&gt;
+ v &amp;lt;- c(NA)&lt;br /&gt;
+ length(v) &amp;lt;- 30000&lt;br /&gt;
+ for (i in 1:30000)&lt;br /&gt;
+   v[i] &amp;lt;- i^2&lt;br /&gt;
+ }&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Timing calls to each of them shows that the pre-allocation of memory gives a whopping ~'''x30 speed-up'''.  Your mileage will vary depending upon the details of your code.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; system.time(f1())&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  1.720   0.040   1.762&lt;br /&gt;
&amp;gt; system.time(f2())&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.052   0.000   0.05&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Vectorised Operations==&lt;br /&gt;
&lt;br /&gt;
The other principle method for speeding up your R code is to eliminate loops whenever you can.  Many functions and operators in R will accept arrays as input, rather than just single values and this may allow you to not use a loop.  The examples in the previous section used for loops to step through an array, squaring each element.  However, you can achieve the same result far more quickly by passing the array ''en masse'' to exponentiation operator:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; system.time(v &amp;lt;- (1:1000000)^2)&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.024   0.004   0.026&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Here we've been able to square 1,000,000 items in half the time it took to process 30,000!&lt;br /&gt;
&lt;br /&gt;
==Calling Functions Written in a Compiled Language (e.g. C or Fortran)==&lt;br /&gt;
&lt;br /&gt;
Another way to get more speed is to outsource portions of R code that are found to be slow to a compiled language, such as C or Fortran.  A good starting point on this topic is:&lt;br /&gt;
&lt;br /&gt;
* http://mazamascience.com/WorkingWithData/?p=1067&lt;br /&gt;
&lt;br /&gt;
=R and HPC=&lt;br /&gt;
&lt;br /&gt;
If you've profiled your code and tried all that you can to speed it up, as described in the previous section, you might be interested in the various initiatives that exist to run R on high performance computers, such as bluecrsytal:  &lt;br /&gt;
&lt;br /&gt;
* http://cran.r-project.org/web/views/HighPerformanceComputing.html&lt;br /&gt;
&lt;br /&gt;
We will see in the following examples, the general approach to running R in parallel is to arrange your task so that a function is applied to a list of inputs, and then to split the list over several CPU cores or cluster worker nodes.&lt;br /&gt;
&lt;br /&gt;
==Multicore==&lt;br /&gt;
&lt;br /&gt;
The '''multicore''' package allows us to make use of several CPU cores within a single machine.  Note, however, that the package  does not work on a MS Windows computers.&lt;br /&gt;
&lt;br /&gt;
As an example, let's look at the use of the package's '''mclapply''' function, a multicore equivalent of R's built-in list apply mapper, '''lapply'''.   I saved the following commands into an R script called '''mutlicore.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(multicore)&lt;br /&gt;
# how many cores are present?&lt;br /&gt;
multicore:::detectCores()&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And used the following submission script to run it on bluecrystal phase2:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=00:00:05&lt;br /&gt;
&lt;br /&gt;
#! Ensure that we have the correct version of R loaded&lt;br /&gt;
module add languages/R-2.15.1&lt;br /&gt;
&lt;br /&gt;
#! change the working directory (default is home directory)&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
#! Run the R script&lt;br /&gt;
R CMD BATCH multicore.r&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After the job had run, I got the following output in the file '''multicore.r.Rout''':&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(multicore)&lt;br /&gt;
&amp;gt; # how many cores are present?&lt;br /&gt;
&amp;gt; multicore:::detectCores()&lt;br /&gt;
[1] 8&lt;br /&gt;
&amp;gt; # Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
&amp;gt; data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
&amp;gt; # Map a function over the matrix.  First in serial..&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.674   0.007   0.749 &lt;br /&gt;
&amp;gt; # .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.301   0.074   0.113 &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Rmpi==&lt;br /&gt;
&lt;br /&gt;
The '''Rmpi''' package allows us to create and use cohorts of message passing processes from within R.  It does so by providing an interface to the MPI (Message Passing Interface) library.&lt;br /&gt;
&lt;br /&gt;
In order to use the Rmpi package on BCp2, you will need the '''ofed/openmpi/gcc/64/1.4.2-qlc''' module loaded.&lt;br /&gt;
&lt;br /&gt;
Here's a short example that I saved as '''Rmpi.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(Rmpi)&lt;br /&gt;
# spawn as many slaves as possible&lt;br /&gt;
mpi.spawn.Rslaves()&lt;br /&gt;
mpi.remote.exec(mpi.get.processor.name())&lt;br /&gt;
mpi.remote.exec(runif(1))&lt;br /&gt;
mpi.close.Rslaves()&lt;br /&gt;
mpi.quit()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I submitted the job to BCp2 using the following submission script:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#PBS -l nodes=4:ppn=1,walltime=00:00:05&lt;br /&gt;
&lt;br /&gt;
#! Ensure that we have the correct version of R loaded&lt;br /&gt;
module add languages/R-2.15.1&lt;br /&gt;
&lt;br /&gt;
#! change the working directory (default is home directory)&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
#! Create a machine file (used for multi-node jobs)&lt;br /&gt;
cat $PBS_NODEFILE &amp;gt; machine.file.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
#! Disable PSM on the QLogic HCAs&lt;br /&gt;
export OMPI_MCA_mtl=^psm&lt;br /&gt;
&lt;br /&gt;
#! Run the R script&lt;br /&gt;
mpirun -np 1 -machinefile machine.file.$PBS_JOBID R CMD BATCH Rmpi.r&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and got the following output:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(Rmpi)&lt;br /&gt;
&amp;gt; # spawn as many slaves as possible&lt;br /&gt;
&amp;gt; mpi.spawn.Rslaves()&lt;br /&gt;
        4 slaves are spawned successfully. 0 failed.&lt;br /&gt;
master (rank 0, comm 1) of size 5 is running on: u03n074 &lt;br /&gt;
slave1 (rank 1, comm 1) of size 5 is running on: u03n098 &lt;br /&gt;
slave2 (rank 2, comm 1) of size 5 is running on: u04n029 &lt;br /&gt;
slave3 (rank 3, comm 1) of size 5 is running on: u04n030 &lt;br /&gt;
slave4 (rank 4, comm 1) of size 5 is running on: u03n074 &lt;br /&gt;
&amp;gt; mpi.remote.exec(mpi.get.processor.name())&lt;br /&gt;
$slave1&lt;br /&gt;
[1] &amp;quot;u03n098&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave2&lt;br /&gt;
[1] &amp;quot;u04n029&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave3&lt;br /&gt;
[1] &amp;quot;u04n030&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave4&lt;br /&gt;
[1] &amp;quot;u03n074&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;gt; mpi.remote.exec(runif(1))&lt;br /&gt;
         X1        X2        X3        X4&lt;br /&gt;
1 0.5154871 0.5154871 0.5154871 0.5154871&lt;br /&gt;
&amp;gt; mpi.close.Rslaves()&lt;br /&gt;
[1] 1&lt;br /&gt;
&amp;gt; mpi.quit()&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Snow==&lt;br /&gt;
&lt;br /&gt;
Calling MPI routines from within R may be too low level for many people to use comfortably.  Happily, the '''snow''' package provides a higher level abstraction for distributed memory programming from within R.&lt;br /&gt;
&lt;br /&gt;
Here's my example program that a saved as '''snow.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(snow)&lt;br /&gt;
# request a cluster of 3 worker nodes&lt;br /&gt;
cl &amp;lt;- makeCluster(3)&lt;br /&gt;
clusterCall(cl, function() Sys.info()[c(&amp;quot;nodename&amp;quot;,&amp;quot;machine&amp;quot;)])&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using snow, across a cluster of workers)&lt;br /&gt;
system.time(x &amp;lt;- clusterApply(cl, data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
stopCluster(cl)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I ran it on BCp2 using the same submission script given for Rmpi, save for changing Rmpi.r to snow.r.  The output was:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(snow)&lt;br /&gt;
&amp;gt; # request a cluster of 3 worker nodes&lt;br /&gt;
&amp;gt; cl &amp;lt;- makeCluster(3)&lt;br /&gt;
Loading required package: Rmpi&lt;br /&gt;
        3 slaves are spawned successfully. 0 failed.&lt;br /&gt;
&amp;gt; clusterCall(cl, function() Sys.info()[c(&amp;quot;nodename&amp;quot;,&amp;quot;machine&amp;quot;)])&lt;br /&gt;
[[1]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u01n105&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
[[2]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u02n014&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
[[3]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u03n098&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&amp;gt; # Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
&amp;gt; data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
&amp;gt; # Map a function over the matrix.  First in serial..&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.711   0.001   0.715 &lt;br /&gt;
&amp;gt; # .. and secondly in parallel (using snow, across a cluster of workers)&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- clusterApply(cl, data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.259   0.001   0.260 &lt;br /&gt;
&amp;gt; stopCluster(cl)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Parallel==&lt;br /&gt;
&lt;br /&gt;
The '''parallel''' package is an amalgamation of functionality from the multicore and snow packages.  The shared memory parallelism in this package runs on an MS Windows machine (unlike the multicore package). &lt;br /&gt;
&lt;br /&gt;
I trivial translation of our previous multicore example is:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(parallel)&lt;br /&gt;
# how many cores are present?&lt;br /&gt;
parallel:::detectCores()&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I have not been able to get a distributed memory cluster working on BCp2 using the parallel package.&lt;br /&gt;
&lt;br /&gt;
=Further Reading=&lt;br /&gt;
&lt;br /&gt;
* [http://shop.oreilly.com/product/9780596801717.do R in a Nutshell]&lt;br /&gt;
* [http://shop.oreilly.com/product/0636920021421.do Parallel R]&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=R1&amp;diff=9406</id>
		<title>R1</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=R1&amp;diff=9406"/>
		<updated>2014-03-03T11:31:08Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: /* Rmpi */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[category:Pragmatic Programming]]&lt;br /&gt;
'''Open Source Statistics with R'''&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
R is a mature, open-source (i.e. free!) statistics package, with an intuitive interface, excellent graphics and a vibrant community constantly adding new methods for the statistical investigation of your data to the library of packages available.&lt;br /&gt;
&lt;br /&gt;
The goal of this tutorial is to introduce you to the R package, and not to be an introductory course in statistics.&lt;br /&gt;
&lt;br /&gt;
If you are working on a Linux system, you will typically start R from the command line.  On a Windows machine, or a Mac, you will typically start up R in some form of GUI.  However you get R started, you will have access to an R command prompt.  The good news is that the examples below will all work at the R command prompt, however you gained access to it.&lt;br /&gt;
&lt;br /&gt;
Further resources:&lt;br /&gt;
&lt;br /&gt;
* The R manual is a great resource for learning R: http://cran.r-project.org/doc/manuals/r-release/R-intro.pdf&lt;br /&gt;
* Some excellent examples of using R can also be found at: http://msenux.redwoods.edu/math/R/ and http://www.r-tutor.com/&lt;br /&gt;
&lt;br /&gt;
=Getting Started=&lt;br /&gt;
&lt;br /&gt;
The very simplest thing we can do with R is to perform some arithmetic at the command prompt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; phi &amp;lt;- (1+sqrt(5))/2&lt;br /&gt;
&amp;gt; phi&lt;br /&gt;
[1] 1.618034&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Parentheses are used to modify the usual order of precedence of the operators ('''/''' will typically be evaluated before '''+''').  Note the '''[1]''' accompanying the returned value.  All numbers entered at the console are interpreted as a vector.  The '[1]' indicates that the line in question is displaying the vector of values starting at first index.  We can use the handy sequence function to create a vector containing more than a single element:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; odds &amp;lt;- seq(from=1, to=67, by=2)&lt;br /&gt;
&amp;gt; odds&lt;br /&gt;
 [1]  1  3  5  7  9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49&lt;br /&gt;
[26] 51 53 55 57 59 61 63 65 67&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
From the above example, we can see that both the '''&amp;lt;-''' and '''=''' operators can be used for assignment.&lt;br /&gt;
&lt;br /&gt;
Vectors are commonly used data structures in R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
coords.bris &amp;lt;- c(51.5, 2.6)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As are matrices:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; magic &amp;lt;- matrix(data=c(2,7,6,9,5,1,4,3,8),nrow=3,ncol=3)&lt;br /&gt;
&amp;gt; magic&lt;br /&gt;
     [,1] [,2] [,3]&lt;br /&gt;
[1,]    2    9    4&lt;br /&gt;
[2,]    7    5    3&lt;br /&gt;
[3,]    6    1    8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the '''c''' function combines the arguments given in the parentheses.  We can access portions of the array using the syntax shown in the square brackets.  For example, we can access the first row using the '''[1,]''' notation, and similarly the second column using '''[,2]'''.  Since the square is 3x3 magic, the numbers in both slices should sum to 15:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; sum(magic[1,])&lt;br /&gt;
[1] 15&lt;br /&gt;
&amp;gt; sum(magic[,2])&lt;br /&gt;
[1] 15&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Single elements and ranges can also accessed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; magic[2,2]&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;gt; magic[2:3,2:3]&lt;br /&gt;
     [,1] [,2]&lt;br /&gt;
[1,]    5    3&lt;br /&gt;
[2,]    1    8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
R also provides '''arrays''', which have more than two dimensions, and '''lists''' to hold heterogeneous collections.&lt;br /&gt;
&lt;br /&gt;
An example list:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; list.r4 &amp;lt;- list(name=&amp;quot;Radio4&amp;quot;, frequency=&amp;quot;93.7&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The items of which, we can access in several ways:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; list.r4$frequency&lt;br /&gt;
[1] &amp;quot;93.7&amp;quot;&lt;br /&gt;
&amp;gt; list.r4[1]&lt;br /&gt;
$name&lt;br /&gt;
[1] &amp;quot;Radio4&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;gt; list.r4[[1]]&lt;br /&gt;
[1] &amp;quot;Radio4&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A very commonly used data structure is the '''data frame''', which R uses to store tabular data.  Given several vectors of equal length, we can collate them into a data frame:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; country &amp;lt;- c(&amp;quot;USA&amp;quot;, &amp;quot;China&amp;quot;, &amp;quot;GB&amp;quot;)&lt;br /&gt;
&amp;gt; gold &amp;lt;- c(46, 38, 29)&lt;br /&gt;
&amp;gt; silver &amp;lt;- c(29, 27, 17)&lt;br /&gt;
&amp;gt; bronze &amp;lt;- c(29, 23, 19)&lt;br /&gt;
&amp;gt; medals.2012 &amp;lt;- data.frame(country, gold, silver, bronze)&lt;br /&gt;
&amp;gt; medals.2012&lt;br /&gt;
    country  gold  silver  bronze&lt;br /&gt;
1       USA    46      29      29&lt;br /&gt;
2     China    38      27      23&lt;br /&gt;
3        GB    29      17      19&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can access columns of a data frame using the '''$''' operator:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; medals.2012$country&lt;br /&gt;
[1] USA   China GB   &lt;br /&gt;
Levels: China GB USA&lt;br /&gt;
&amp;gt; medals.2012$gold&lt;br /&gt;
[1] 46 38 29&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Standard Graphics: A taster=&lt;br /&gt;
&lt;br /&gt;
An aspect which makes R popular are it's graphing functions.  R also has some very handy built-in data sets--we'll use this to demonstrate just a small fraction of R's graphing abilities.&lt;br /&gt;
&lt;br /&gt;
First up is the humble '''plot()''' function.  Given a data frame of points, such as one charting the relationship between temperature and the vapour pressure of mercury, it will give us a (handily labelled) scatter plot: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(pressure)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See the gallery below for all the plots created in this section.&lt;br /&gt;
&lt;br /&gt;
The plot function will also accept a time-series (another class of object recognised by R) and will sensibly join the points with a line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(co2)&lt;br /&gt;
&amp;gt; class(co2)&lt;br /&gt;
[1] &amp;quot;ts&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Pie charts are easily constructed.  In this case, to show the relative proportions of electricity generated from different sources in the UK in 2011 (source: https://www.gov.uk/government/.../5942-uk-energy-in-brief-2012.pdf‎):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; uk.electricty.sources.2011 &amp;lt;- c(41,29,18,5,4,2,1)&lt;br /&gt;
&amp;gt; names(uk.electricty.sources.2011) &amp;lt;- (&amp;quot;Gas&amp;quot;, &amp;quot;Coal&amp;quot;, &amp;quot;Nuclear&amp;quot;, &amp;quot;Hydro &amp;amp; other&amp;quot;, &amp;quot;Wind&amp;quot;, &amp;quot;Imports&amp;quot;, &amp;quot;Oil&amp;quot;)&lt;br /&gt;
&amp;gt; pie(uk.electricty.sources.2011, main=&amp;quot;UK Electricty Generating Mix, 2011&amp;quot;, col=rainbow(7))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Next, let's create a bar chart of monthly average precipitation falling here in the fair city of Bristol (source: http://www.worldweatheronline.com):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; bristol.precip &amp;lt;- c(82.9, 56.1, 59.2, 69, 50.8, 50.9, 50.8, 74.8, 74.7, 91.1, 94.5, 93.6)&lt;br /&gt;
&amp;gt; names(bristol.precip) &amp;lt;- c(&amp;quot;Jan&amp;quot;, &amp;quot;Feb&amp;quot;, &amp;quot;Mar&amp;quot;, &amp;quot;Apr&amp;quot;, &amp;quot;May&amp;quot;, &amp;quot;Jun&amp;quot;, &amp;quot;Jul&amp;quot;, &amp;quot;Aug&amp;quot;, &amp;quot;Sep&amp;quot;, &amp;quot;Oct&amp;quot;, &amp;quot;Nov&amp;quot;, &amp;quot;Dec&amp;quot;)&lt;br /&gt;
&amp;gt; barplot(bristol.precip,&lt;br /&gt;
+ main=&amp;quot;Average Monthly Precipitation in Bristol&amp;quot;,&lt;br /&gt;
+ ylab=&amp;quot;Mean precipitation (mm)&amp;quot;,&lt;br /&gt;
+ ylim=c(0,100),&lt;br /&gt;
+ col=c(&amp;quot;darkblue&amp;quot;))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Box_plot 'Box and whisker' plots] are useful ways to graph the quartiles of some data.  In this case, the fuel efficiencies of various US cars, circa 1974:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; boxplot(mpg~cyl,data=mtcars, main=&amp;quot;Car Milage Data&amp;quot;,&lt;br /&gt;
+    xlab=&amp;quot;Number of Cylinders&amp;quot;, ylab=&amp;quot;Miles Per Gallon&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
R includes a very useful help facility.  In the case of the '''filled.contour()''' plotting function, the help page includes an example of it's use to plot the topology of a volcano in Auckland, NZ:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; ?filled.countour&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;gallery widths=300px heights=300px perrow=3&amp;gt;&lt;br /&gt;
File:Vapour-pressure.png|Vapour pressure of mercury against temperature&lt;br /&gt;
File:Mauna-loa.png|CO2 concentrations measured at Mauna-Loa between 1959 and 1997&lt;br /&gt;
File:Pie.png|The UK's electricity generating mix, 2011&lt;br /&gt;
File:Barplot.png|Average monthly precipitation in Bristol&lt;br /&gt;
File:Boxplot.png|Range of fuel efficiencies for different engine sizes &lt;br /&gt;
File:Maunga-Whau.png|Topology of Maunga Whau volcano in Auckland&lt;br /&gt;
&amp;lt;/gallery&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are many more example plots--complete with the R code required to create the plots (at the bottom of the page, after the comments)--on the following web page:&lt;br /&gt;
* http://gallery.r-enthusiasts.com/thumbs.php&lt;br /&gt;
&lt;br /&gt;
=Loops=&lt;br /&gt;
&lt;br /&gt;
A simple '''for''' loop:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; for (ii in seq(1,10)) print(ii)&lt;br /&gt;
[1] 1&lt;br /&gt;
[1] 2&lt;br /&gt;
[1] 3&lt;br /&gt;
[1] 4&lt;br /&gt;
[1] 5&lt;br /&gt;
[1] 6&lt;br /&gt;
[1] 7&lt;br /&gt;
[1] 8&lt;br /&gt;
[1] 9&lt;br /&gt;
[1] 10&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Some more exotic counting:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; for (ii in seq(from=10, to=0, by=-2)) print(ii)&lt;br /&gt;
[1] 10&lt;br /&gt;
[1] 8&lt;br /&gt;
[1] 6&lt;br /&gt;
[1] 4&lt;br /&gt;
[1] 2&lt;br /&gt;
[1] 0&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''while''' loops are for when we don't know the number of iterations in advance:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; ii &amp;lt;- runif(1,0,1)&lt;br /&gt;
&amp;gt; ii&lt;br /&gt;
[1] 0.3998513&lt;br /&gt;
&amp;gt; while (ii &amp;lt; 0.5) {print(ii); ii &amp;lt;- runif(1,0,1)}&lt;br /&gt;
[1] 0.3998513&lt;br /&gt;
[1] 0.05469244&lt;br /&gt;
&amp;gt; ii&lt;br /&gt;
[1] 0.8265036&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Functions=&lt;br /&gt;
&lt;br /&gt;
You can define your own functions in R, using the '''function''' keyword.  For example, Pythagoras' Theorem:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypotenuse &amp;lt;- function(x, y) {sqrt(x^2 + y^2)}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The braces ({}) are optional, but add clarity.&lt;br /&gt;
&lt;br /&gt;
To call the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypotenuse(3,4)&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can provide default values for the arguments, which can be overridden for any given invocation of the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot2 &amp;lt;- function(x=3 ,y=4) {sqrt(x^2 + y^2)}&lt;br /&gt;
&amp;gt; hypot2()&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;gt; hypot2(12,16)&lt;br /&gt;
[1] 20&lt;br /&gt;
&amp;gt; hypot2(y=16, x=12)&lt;br /&gt;
[1] 20&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can see that the order of the arguments is respected, unless the names are given, in which case the order can be changed. &lt;br /&gt;
&lt;br /&gt;
Longer functions can be spread over several lines.  We can also use the '''return''' keyword to control which value is returned by the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot3 &amp;lt;- function(x=3 ,y=4) {&lt;br /&gt;
+ x_sq &amp;lt;- x^2&lt;br /&gt;
+ y_sq &amp;lt;- y^2&lt;br /&gt;
+ return( sqrt(x_sq + y_sq) )}&lt;br /&gt;
&amp;gt; hypot3(6,8)&lt;br /&gt;
[1] 10&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can check on the contents of a function, by just typing it's name (without parentheses):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot3&lt;br /&gt;
function(x=3 ,y=4) {&lt;br /&gt;
x_sq &amp;lt;- x^2&lt;br /&gt;
y_sq &amp;lt;- y^2&lt;br /&gt;
return( sqrt(x_sq + y_sq) )}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or just check the arguments, using the '''args''' function.  (The body of the function in general is reported as NULL):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; args(hypot3)&lt;br /&gt;
function (x = 3, y = 4) &lt;br /&gt;
NULL&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Packages=&lt;br /&gt;
&lt;br /&gt;
Listed at http://cran.r-project.org/&lt;br /&gt;
&lt;br /&gt;
Let's install the '''multicore''' package, that will give us access to functions within R which will run on the multiple processors which we often find in our computers these days:   &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; install.packages(&amp;quot;multicore&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Et voila!  It is done.&lt;br /&gt;
&lt;br /&gt;
We can check which packages are currently loaded into the library available from our workspace:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; library()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If we need to add one, we type e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; library(multicore)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now, an example of using a function from the multicore package.  The '''lapply''' function, which is included in the standard R core, will map a given function over a list inputs, giving a list of the function outputs in return.  For example, we can map a squaring function over the list of integers from 1 to 3:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; lapply(1:3, function(x) {x^2})&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
which gives us the list:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[[1]]&lt;br /&gt;
[1] 1&lt;br /&gt;
&lt;br /&gt;
[[2]]&lt;br /&gt;
[1] 4&lt;br /&gt;
&lt;br /&gt;
[[3]]&lt;br /&gt;
[1] 9&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now, we can do the same work in parallel using:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; mclapply(1:3, function(x) {x^2})&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Reading Data from File=&lt;br /&gt;
&lt;br /&gt;
R provides some very useful functions for reading and writing data from/to file.&lt;br /&gt;
&lt;br /&gt;
==Text Files==&lt;br /&gt;
&lt;br /&gt;
Let's start with text files.  If your data is organised into a file such that it looks like a table with column headings:  &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Perhaps the simplest one is '''read.table()'''.  If I have a text file with the following contents:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
country              gold silver bronze&lt;br /&gt;
&amp;quot;USA&amp;quot;                46   29     29&lt;br /&gt;
&amp;quot;China&amp;quot;              38   27     23&lt;br /&gt;
&amp;quot;Great Britain&amp;quot;      29   17     19&lt;br /&gt;
&amp;quot;Russian Federation&amp;quot; 24   26     32&lt;br /&gt;
&amp;quot;Republic of Korea&amp;quot;  13   8      7&lt;br /&gt;
&amp;quot;Germany&amp;quot;            11   19     14&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It will be a simple matter to use the '''read.table()''' function to load the data into R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; medals.2012 &amp;lt;- read.table(&amp;quot;medals.txt&amp;quot;, header=TRUE)&lt;br /&gt;
&amp;gt; medals.2012&lt;br /&gt;
             country gold silver bronze&lt;br /&gt;
1                USA   46     29     29&lt;br /&gt;
2              China   38     27     23&lt;br /&gt;
3      Great Britain   29     17     19&lt;br /&gt;
4 Russian Federation   24     26     32&lt;br /&gt;
5  Republic of Korea   13      8      7&lt;br /&gt;
6            Germany   11     19     14&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is a corresponding '''write.table()''' function to export the contents of a data frame into a text file.&lt;br /&gt;
&lt;br /&gt;
CSV files can be easily handled by specifying '''sep=&amp;quot;,&amp;quot;''' as an argument to read.table().  However, for convenience, there are also '''read.csv()''' and '''write.csv()''' functions defined.  For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; write.csv(medals.2012,&amp;quot;medals.csv&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Gives us the file, '''medals.csv''', with the contents:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;quot;&amp;quot;,&amp;quot;country&amp;quot;,&amp;quot;gold&amp;quot;,&amp;quot;silver&amp;quot;,&amp;quot;bronze&amp;quot;&lt;br /&gt;
&amp;quot;1&amp;quot;,&amp;quot;USA&amp;quot;,46,29,29&lt;br /&gt;
&amp;quot;2&amp;quot;,&amp;quot;China&amp;quot;,38,27,23&lt;br /&gt;
&amp;quot;3&amp;quot;,&amp;quot;Great Britain&amp;quot;,29,17,19&lt;br /&gt;
&amp;quot;4&amp;quot;,&amp;quot;Russian Federation&amp;quot;,24,26,32&lt;br /&gt;
&amp;quot;5&amp;quot;,&amp;quot;Republic of Korea&amp;quot;,13,8,7&lt;br /&gt;
&amp;quot;6&amp;quot;,&amp;quot;Germany&amp;quot;,11,19,14&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Binary Files==&lt;br /&gt;
&lt;br /&gt;
The '''save()''' function will store an R data structure in binary form:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; save(medals.2012,file=&amp;quot;medals.RData&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~$ file medals.RData &lt;br /&gt;
medals.RData: gzip compressed data, from Unix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is, of course, a corresponding function to load such data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; load(&amp;quot;medals.RData&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Databases==&lt;br /&gt;
&lt;br /&gt;
If you would like to read and write data directly from/to a database, there are several packages to help you.  See http://cran.r-project.org/doc/manuals/r-release/R-data.html#Relational-databases for more information.&lt;br /&gt;
&lt;br /&gt;
==NetCDF==&lt;br /&gt;
&lt;br /&gt;
The [http://cran.r-project.org/web/packages/ncdf/index.html '''ncdf''' package] provides an interface to NetCDF files.  Before installing the package, you will need the Unidata NetCDF libraries installed on your system.  On Linux, the standard package managers conveniently provide this.  Note that you will need the 'development' packages.  Once the prerequisites are satisfied, you can use the standard R command to install the package from CRAN:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; install.packages(&amp;quot;ncdf&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Examples of Common Tasks=&lt;br /&gt;
&lt;br /&gt;
==Preparing Data==&lt;br /&gt;
&lt;br /&gt;
===Sorting===&lt;br /&gt;
&lt;br /&gt;
Using '''sort''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; railway.engines &amp;lt;- c(&amp;quot;thomas&amp;quot;, &amp;quot;henry&amp;quot;, &amp;quot;gordon&amp;quot;, &amp;quot;edward&amp;quot;, &amp;quot;james&amp;quot;)&lt;br /&gt;
&amp;gt; sort(railway.engines)&lt;br /&gt;
[1] &amp;quot;edward&amp;quot; &amp;quot;gordon&amp;quot; &amp;quot;henry&amp;quot;  &amp;quot;james&amp;quot;  &amp;quot;thomas&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/sort.html&lt;br /&gt;
&lt;br /&gt;
===Random Sampling===&lt;br /&gt;
&lt;br /&gt;
Using '''sample''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; railway.engines &amp;lt;- c(&amp;quot;thomas&amp;quot;, &amp;quot;henry&amp;quot;, &amp;quot;gordon&amp;quot;, &amp;quot;edward&amp;quot;, &amp;quot;james&amp;quot;)&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;gordon&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;james&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;edward&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;thomas&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;gordon&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;james&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/sample.html&lt;br /&gt;
&lt;br /&gt;
===Combining===&lt;br /&gt;
&lt;br /&gt;
Using '''rbind''' to add combine the rows to two data frames:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; country &amp;lt;- c(&amp;quot;France&amp;quot;, &amp;quot;Italy&amp;quot;, &amp;quot;Hungary&amp;quot;, &amp;quot;Australia&amp;quot;)&lt;br /&gt;
&amp;gt; gold &amp;lt;- c(11, 8, 8, 7)&lt;br /&gt;
&amp;gt; silver &amp;lt;- c(11, 9, 4, 16)&lt;br /&gt;
&amp;gt; bronze &amp;lt;- c(12, 11, 5, 12)&lt;br /&gt;
&amp;gt; extras.2012 &amp;lt;- data.frame(country, gold, silver, bronze)&lt;br /&gt;
&amp;gt; rbind(medals.2012, extras.2012)&lt;br /&gt;
              country gold silver bronze&lt;br /&gt;
1                 USA   46     29     29&lt;br /&gt;
2               China   38     27     23&lt;br /&gt;
3       Great Britain   29     17     19&lt;br /&gt;
4  Russian Federation   24     26     32&lt;br /&gt;
5   Republic of Korea   13      8      7&lt;br /&gt;
6             Germany   11     19     14&lt;br /&gt;
7              France   11     11     12&lt;br /&gt;
8               Italy    8      9     11&lt;br /&gt;
9             Hungary    8      4      5&lt;br /&gt;
10          Australia    7     16     12&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/cbind.html&lt;br /&gt;
&lt;br /&gt;
===Binning Data===&lt;br /&gt;
&lt;br /&gt;
Using '''cut''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; girls_2=c(83.8, 86.2, 85.1, 88.6, 83, 88.9, 89.7, 81.3, 88.7, 88.4)&lt;br /&gt;
&amp;gt; bins=cut(girls_2, breaks=3)&lt;br /&gt;
&amp;gt; bins&lt;br /&gt;
 [1] (81.3,84.1] (84.1,86.9] (84.1,86.9] (86.9,89.7] (81.3,84.1] (86.9,89.7]&lt;br /&gt;
 [7] (86.9,89.7] (81.3,84.1] (86.9,89.7] (86.9,89.7]&lt;br /&gt;
Levels: (81.3,84.1] (84.1,86.9] (86.9,89.7]&lt;br /&gt;
&amp;gt; plot(bins)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Plotting the data couldn't be simpler with '''plot(bins)'''!&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/cut.html&lt;br /&gt;
&lt;br /&gt;
==Linear Regression==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(cars)&lt;br /&gt;
&amp;gt; res=lm(dist ~ speed, data=cars)&lt;br /&gt;
&amp;gt; abline(res)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Image:R-lm(cars)-abline.png|400px|thumbnail|center|linear regression of stopping distance against speed from the built-in data set, cars]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
'''Exercises'''&lt;br /&gt;
* You may wish to compare different methods of estimation.  From the MASS package, you can fit a line with the '''rlm''' and '''lqs'' funtions.  You can plot all the lines against the data using:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; abline(res.lm, lty=1)&lt;br /&gt;
&amp;gt; abline(res.rlm, lty=2)&lt;br /&gt;
&amp;gt; abline(res.lqs, lty=3)&lt;br /&gt;
&amp;gt; legend(x=5, y=100, legend=c(&amp;quot;lm&amp;quot;,&amp;quot;rlm&amp;quot;,&amp;quot;lqs&amp;quot;), lty=c(1,2,3))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-patched/library/MASS/html/rlm.html and http://stat.ethz.ch/R-manual/R-devel/RHOME/library/MASS/html/lqs.html.&lt;br /&gt;
&lt;br /&gt;
* Weighted least squares.  The '''lm''' function will accept a vector of weights, '''lm(... weights=...)'''.  If given, the function will optimise the line of best fit according a the equation of weighted least squares.  Experiment with different linear model fits, given different weighting vectors.  Some handy hints for creating a vector of weights:&lt;br /&gt;
** '''w1&amp;lt;-rep(0.1,50)''' will give you a vector, length 50, where each element has a value of 0.1.  W1[1]&amp;lt;-10 will give the first element of the vector a value of 10.&lt;br /&gt;
** '''w2&amp;lt;-seq(from=0.02, to=1.0, by=0.02)''' provides a vector containing a sequence of values from 0.02 to 1.0 in steps of 0.02 (handily, again 50 in total).&lt;br /&gt;
&lt;br /&gt;
==Significance Testing==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; boys_2=c(90.2, 91.4, 86.4, 87.6, 86.7, 88.1, 82.2, 83.8, 91, 87.4)&lt;br /&gt;
&amp;gt; girls_2=c(83.8, 86.2, 85.1, 88.6, 83, 88.9, 89.7, 81.3, 88.7, 88.4)&lt;br /&gt;
&amp;gt; res=var.test(boys_2,girls_2)&lt;br /&gt;
&amp;gt; res&lt;br /&gt;
&lt;br /&gt;
	F test to compare two variances&lt;br /&gt;
&lt;br /&gt;
data:  boys_2 and girls_2 &lt;br /&gt;
F = 1.0186, num df = 9, denom df = 9, p-value = 0.9786&lt;br /&gt;
alternative hypothesis: true ratio of variances is not equal to 1 &lt;br /&gt;
95 percent confidence interval:&lt;br /&gt;
 0.2529956 4.1007126 &lt;br /&gt;
sample estimates:&lt;br /&gt;
ratio of variances &lt;br /&gt;
          1.018559 &lt;br /&gt;
&amp;gt; res=t.test(boys_2, girls_2, var.equal=TRUE, paired=FALSE)&lt;br /&gt;
&amp;gt; res&lt;br /&gt;
&lt;br /&gt;
	Two Sample t-test&lt;br /&gt;
&lt;br /&gt;
data:  boys_2 and girls_2 &lt;br /&gt;
t = 0.8429, df = 18, p-value = 0.4103&lt;br /&gt;
alternative hypothesis: true difference in means is not equal to 0 &lt;br /&gt;
95 percent confidence interval:&lt;br /&gt;
 -1.656675  3.876675 &lt;br /&gt;
sample estimates:&lt;br /&gt;
mean of x mean of y &lt;br /&gt;
    87.48     86.3&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Classification==&lt;br /&gt;
&lt;br /&gt;
===k Nearest Neighbours===&lt;br /&gt;
&lt;br /&gt;
This famous (Fisher's or Anderson's) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa (s), versicolor (c), and virginica (v).&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-patched/library/datasets/html/iris.html&lt;br /&gt;
&lt;br /&gt;
k-nearest neighbour classification for test set from training set: For each row of the test set, the k nearest (in Euclidean distance) training set vectors are found, and the classification is decided by majority vote, with ties broken at random. If there are ties for the kth nearest vector, all candidates are included in the vote.&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/class/html/knn.html&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(class)&lt;br /&gt;
train &amp;lt;- rbind(iris3[1:25,,1], iris3[1:25,,2], iris3[1:25,,3])&lt;br /&gt;
test &amp;lt;- rbind(iris3[26:50,,1], iris3[26:50,,2], iris3[26:50,,3])&lt;br /&gt;
cl &amp;lt;- factor(c(rep(&amp;quot;s&amp;quot;,25), rep(&amp;quot;c&amp;quot;,25), rep(&amp;quot;v&amp;quot;,25)))&lt;br /&gt;
iris3.knn &amp;lt;- knn(train, test, cl, k = 3, prob=TRUE)&lt;br /&gt;
table(predicted=iris3.knn, actual=cl)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
How did we do?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
         actual&lt;br /&gt;
predicted  c  s  v&lt;br /&gt;
        c 23  0  3&lt;br /&gt;
        s  0 25  0&lt;br /&gt;
        v  2  0 22&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Classification Trees===&lt;br /&gt;
&lt;br /&gt;
The kyphosis data frame has 81 rows and 4 columns. representing data on children who have had corrective spinal surgery.&lt;br /&gt;
&lt;br /&gt;
This data frame contains the following columns:&lt;br /&gt;
* Kyphosis: a factor with levels absent present indicating if a kyphosis (a type of deformation) was present after the operation.&lt;br /&gt;
* Age: in months&lt;br /&gt;
* Number: the number of vertebrae involved&lt;br /&gt;
* Start: the number of the first (topmost) vertebra operated on.&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/rpart/html/kyphosis.html&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
fit &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)&lt;br /&gt;
fit2 &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis,&lt;br /&gt;
              parms = list(prior = c(.65,.35), split = &amp;quot;information&amp;quot;))&lt;br /&gt;
fit3 &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis,&lt;br /&gt;
              control = rpart.control(cp = 0.05))&lt;br /&gt;
par(mfrow = c(1,2), xpd = NA) # otherwise on some devices the text is clipped&lt;br /&gt;
plot(fit)&lt;br /&gt;
text(fit, use.n = TRUE)&lt;br /&gt;
plot(fit2)&lt;br /&gt;
text(fit2, use.n = TRUE)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Image:R-classification-tree.png|500px|thumbnail|center|Classification tree for the kyphosis data frame.]]&lt;br /&gt;
&lt;br /&gt;
==Solving Systems of Linear Equations==&lt;br /&gt;
&lt;br /&gt;
See, e.g.: https://source.ggy.bris.ac.uk/wiki/NumMethodsPDEs&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; A &amp;lt;- array(c(1,3,2,3,5,4,-2,6,3), dim=c(3,3))&lt;br /&gt;
&amp;gt; b &amp;lt;- c(5,7,8)&lt;br /&gt;
&amp;gt; solve(A,b)&lt;br /&gt;
[1] -15   8   2&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Suggested Exercises=&lt;br /&gt;
&lt;br /&gt;
If you would like to work through some exercises, with model answers included, you could take a look at:&lt;br /&gt;
* http://www2.warwick.ac.uk/fac/sci/statistics/staff/academic-research/reed/rexercises.pdf&lt;br /&gt;
&lt;br /&gt;
=Writing Faster R Code=&lt;br /&gt;
&lt;br /&gt;
In the above sections we've introduced a number of features of R and have begun the journey to becoming a proficient and productive user of the language.  In the remaining sections, we'll switch tack and focus on a question commonly asked by those beginning to use R in anger--'''&amp;quot;My R code is slow.  How can I speed it up?&amp;quot;'''.  In this section we'll consider the related tasks of finding which bits of your R code is responsible for the majority of the run-time and what you can do about it.&lt;br /&gt;
&lt;br /&gt;
==Profiling &amp;amp; Timing==&lt;br /&gt;
&lt;br /&gt;
In order to remain productive (and sane, and have a social life...), it is essential that we first identify which portions of your R code are responsible for the majority of the run-time.  We could spend ages optimising a portion that we ''think'' may be running slowly, but computers have the gift(!) to constantly surprise us, and if that portion of your program accounted for, say, 10% of the run-time, then you will have sweated for absolutely no useful gain.&lt;br /&gt;
&lt;br /&gt;
The simplest method of investigation is to simply time the application of a function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
system.time(some.function())&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can get a more detailed analysis of a block of code using the built-in R profiler.  The general pattern of invocation is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
Rprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
# Do some work&lt;br /&gt;
Rprof()&lt;br /&gt;
summaryRprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For example, here's an R script, '''profile.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
Rprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
# Create a 10 x 100,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(100000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)})&lt;br /&gt;
Rprof()&lt;br /&gt;
summaryRprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Which I ran by typing:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
R CMD BATCH profile.r&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the output file, '''profile.r.Rout''', I found the following break down:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
               self.time self.pct total.time total.pct&lt;br /&gt;
&amp;quot;simpleLoess&amp;quot;       4.84    88.00       5.10     92.73&lt;br /&gt;
&amp;quot;rnorm&amp;quot;             0.22     4.00       0.22      4.00&lt;br /&gt;
&amp;quot;loess.smooth&amp;quot;      0.18     3.27       5.28     96.00&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The profile tells us that the function '''simpleLoess''' take 88% of the runtime, whereas '''rnorm''' takes only 4%.&lt;br /&gt;
&lt;br /&gt;
==Preallocation of Memory==&lt;br /&gt;
&lt;br /&gt;
As with other scripting languages, such as MATLAB, the simplest method that you can use to speed up your R code is to pre-allocate the storage for variables whenever possible.  To see the benefits of this, consider the following two functions:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; f1 &amp;lt;- function() {&lt;br /&gt;
+ v &amp;lt;- c()&lt;br /&gt;
+ for (i in 1:30000)&lt;br /&gt;
+   v[i] &amp;lt;- i^2&lt;br /&gt;
+ }&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; f2 &amp;lt;- function() {&lt;br /&gt;
+ v &amp;lt;- c(NA)&lt;br /&gt;
+ length(v) &amp;lt;- 30000&lt;br /&gt;
+ for (i in 1:30000)&lt;br /&gt;
+   v[i] &amp;lt;- i^2&lt;br /&gt;
+ }&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Timing calls to each of them shows that the pre-allocation of memory gives a whopping ~'''x30 speed-up'''.  Your mileage will vary depending upon the details of your code.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; system.time(f1())&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  1.720   0.040   1.762&lt;br /&gt;
&amp;gt; system.time(f2())&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.052   0.000   0.05&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Vectorised Operations==&lt;br /&gt;
&lt;br /&gt;
The other principle method for speeding up your R code is to eliminate loops whenever you can.  Many functions and operators in R will accept arrays as input, rather than just single values and this may allow you to not use a loop.  The examples in the previous section used for loops to step through an array, squaring each element.  However, you can achieve the same result far more quickly by passing the array ''en masse'' to exponentiation operator:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; system.time(v &amp;lt;- (1:1000000)^2)&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.024   0.004   0.026&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Here we've been able to square 1,000,000 items in half the time it took to process 30,000!&lt;br /&gt;
&lt;br /&gt;
==Calling Functions Written in a Compiled Language (e.g. C or Fortran)==&lt;br /&gt;
&lt;br /&gt;
Another way to get more speed is to outsource portions of R code that are found to be slow to a compiled language, such as C or Fortran.  A good starting point on this topic is:&lt;br /&gt;
&lt;br /&gt;
* http://mazamascience.com/WorkingWithData/?p=1067&lt;br /&gt;
&lt;br /&gt;
=R and HPC=&lt;br /&gt;
&lt;br /&gt;
If you've profiled your code and tried all that you can to speed it up, as described in the previous section, you might be interested in the various initiatives that exist to run R on high performance computers, such as bluecrsytal:  &lt;br /&gt;
&lt;br /&gt;
* http://cran.r-project.org/web/views/HighPerformanceComputing.html&lt;br /&gt;
&lt;br /&gt;
We will see in the following examples, the general approach to running R in parallel is to arrange your task so that a function is applied to a list of inputs, and then to split the list over several CPU cores or cluster worker nodes.&lt;br /&gt;
&lt;br /&gt;
==Multicore==&lt;br /&gt;
&lt;br /&gt;
The '''multicore''' package allows us to make use of several CPU cores within a single machine.  Note, however, that the package  does not work on a MS Windows computers.&lt;br /&gt;
&lt;br /&gt;
As an example, let's look at the use of the package's '''mclapply''' function, a multicore equivalent of R's built-in list apply mapper, '''lapply'''.   I saved the following commands into an R script called '''mutlicore.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(multicore)&lt;br /&gt;
# how many cores are present?&lt;br /&gt;
multicore:::detectCores()&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And used the following submission script to run it on bluecrystal phase2:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=00:00:05&lt;br /&gt;
&lt;br /&gt;
#! Ensure that we have the correct version of R loaded&lt;br /&gt;
module add languages/R-2.15.1&lt;br /&gt;
&lt;br /&gt;
#! change the working directory (default is home directory)&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
#! Run the R script&lt;br /&gt;
R CMD BATCH multicore.r&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After the job had run, I got the following output in the file '''multicore.r.Rout''':&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(multicore)&lt;br /&gt;
&amp;gt; # how many cores are present?&lt;br /&gt;
&amp;gt; multicore:::detectCores()&lt;br /&gt;
[1] 8&lt;br /&gt;
&amp;gt; # Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
&amp;gt; data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
&amp;gt; # Map a function over the matrix.  First in serial..&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.674   0.007   0.749 &lt;br /&gt;
&amp;gt; # .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.301   0.074   0.113 &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Rmpi==&lt;br /&gt;
&lt;br /&gt;
The '''Rmpi''' package allows us to create and use cohorts of message passing processes from within R.  It does so by providing an interface to the MPI (Message Passing Interface) library.&lt;br /&gt;
&lt;br /&gt;
In order to use the Rmpi package on BCp2, you will need the '''ofed/openmpi/gcc/64/1.4.2-qlc''' module loaded.&lt;br /&gt;
&lt;br /&gt;
Here's a short example that I saved as '''Rmpi.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(Rmpi)&lt;br /&gt;
# spawn as many slaves as possible&lt;br /&gt;
mpi.spawn.Rslaves()&lt;br /&gt;
mpi.remote.exec(mpi.get.processor.name())&lt;br /&gt;
mpi.remote.exec(runif(1))&lt;br /&gt;
mpi.close.Rslaves()&lt;br /&gt;
mpi.quit()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I submitted the job to BCp2 using the following submission script:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#PBS -l nodes=4:ppn=1,walltime=00:00:05&lt;br /&gt;
&lt;br /&gt;
#! Ensure that we have the correct version of R loaded&lt;br /&gt;
module add languages/R-2.15.1&lt;br /&gt;
&lt;br /&gt;
#! change the working directory (default is home directory)&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
#! Create a machine file (used for multi-node jobs)&lt;br /&gt;
cat $PBS_NODEFILE &amp;gt; machine.file.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
#! Disable PSM on the QLogic HCAs&lt;br /&gt;
export OMPI_MCA_mtl=^psm&lt;br /&gt;
&lt;br /&gt;
#! Run the R script&lt;br /&gt;
mpirun -np 1 -machinefile machine.file.$PBS_JOBID R CMD BATCH Rmpi.r&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and got the following output:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(Rmpi)&lt;br /&gt;
&amp;gt; # spawn as many slaves as possible&lt;br /&gt;
&amp;gt; mpi.spawn.Rslaves()&lt;br /&gt;
        4 slaves are spawned successfully. 0 failed.&lt;br /&gt;
master (rank 0, comm 1) of size 5 is running on: u03n074 &lt;br /&gt;
slave1 (rank 1, comm 1) of size 5 is running on: u03n098 &lt;br /&gt;
slave2 (rank 2, comm 1) of size 5 is running on: u04n029 &lt;br /&gt;
slave3 (rank 3, comm 1) of size 5 is running on: u04n030 &lt;br /&gt;
slave4 (rank 4, comm 1) of size 5 is running on: u03n074 &lt;br /&gt;
&amp;gt; mpi.remote.exec(mpi.get.processor.name())&lt;br /&gt;
$slave1&lt;br /&gt;
[1] &amp;quot;u03n098&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave2&lt;br /&gt;
[1] &amp;quot;u04n029&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave3&lt;br /&gt;
[1] &amp;quot;u04n030&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave4&lt;br /&gt;
[1] &amp;quot;u03n074&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;gt; mpi.remote.exec(runif(1))&lt;br /&gt;
         X1        X2        X3        X4&lt;br /&gt;
1 0.5154871 0.5154871 0.5154871 0.5154871&lt;br /&gt;
&amp;gt; mpi.close.Rslaves()&lt;br /&gt;
[1] 1&lt;br /&gt;
&amp;gt; mpi.quit()&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Snow==&lt;br /&gt;
&lt;br /&gt;
Calling MPI routines from within R may be too low level for many people to use comfortably.  Happily, the '''snow''' package provides a higher level abstraction for distributed memory programming from within R.&lt;br /&gt;
&lt;br /&gt;
Here's my example program that a saved as '''snow.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(snow)&lt;br /&gt;
# request a cluster of 3 worker nodes&lt;br /&gt;
cl &amp;lt;- makeCluster(3)&lt;br /&gt;
clusterCall(cl, function() Sys.info()[c(&amp;quot;nodename&amp;quot;,&amp;quot;machine&amp;quot;)])&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using snow, across a cluster of workers)&lt;br /&gt;
system.time(x &amp;lt;- clusterApply(cl, data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
stopCluster(cl)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I ran it on BCp2 using the same submission script given for Rmpi, save for changing Rmpi.r to snow.r.  The output was:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(snow)&lt;br /&gt;
&amp;gt; # request a cluster of 3 worker nodes&lt;br /&gt;
&amp;gt; cl &amp;lt;- makeCluster(3)&lt;br /&gt;
Loading required package: Rmpi&lt;br /&gt;
        3 slaves are spawned successfully. 0 failed.&lt;br /&gt;
&amp;gt; clusterCall(cl, function() Sys.info()[c(&amp;quot;nodename&amp;quot;,&amp;quot;machine&amp;quot;)])&lt;br /&gt;
[[1]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u01n105&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
[[2]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u02n014&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
[[3]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u03n098&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&amp;gt; # Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
&amp;gt; data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
&amp;gt; # Map a function over the matrix.  First in serial..&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.711   0.001   0.715 &lt;br /&gt;
&amp;gt; # .. and secondly in parallel (using snow, across a cluster of workers)&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- clusterApply(cl, data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.259   0.001   0.260 &lt;br /&gt;
&amp;gt; stopCluster(cl)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Parallel==&lt;br /&gt;
&lt;br /&gt;
The '''parallel''' package is an amalgamation of functionality from the multicore and snow packages.  The shared memory parallelism in this package runs on an MS Windows machine (unlike the multicore package). &lt;br /&gt;
&lt;br /&gt;
I trivial translation of our previous multicore example is:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(parallel)&lt;br /&gt;
# how many cores are present?&lt;br /&gt;
parallel:::detectCores()&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I have not been able to get a distributed memory cluster working on BCp2 using the parallel package.&lt;br /&gt;
&lt;br /&gt;
=Further Reading=&lt;br /&gt;
&lt;br /&gt;
* [http://shop.oreilly.com/product/9780596801717.do R in a Nutshell]&lt;br /&gt;
* [http://shop.oreilly.com/product/0636920021421.do Parallel R]&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=CtoC%2B%2B&amp;diff=9405</id>
		<title>CtoC++</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=CtoC%2B%2B&amp;diff=9405"/>
		<updated>2014-02-14T12:18:47Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: /* Templates and the Standard Template Library */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Pragmatic Programming]]&lt;br /&gt;
'''CtoC++: Upgrading to Object Oriented C'''&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
This tutorial carries on where [[StartingC]] left off.&lt;br /&gt;
&lt;br /&gt;
To get the material, cut and paste the contents of the box below onto your command line.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
svn co https://svn.ggy.bris.ac.uk/subversion-open/CtoC++/trunk ./CtoC++&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In this tutorial we will assume basic linux skills as outlined in [[Linux1]].&lt;br /&gt;
&lt;br /&gt;
=Cutting to the Chase: Classes and Encapsulation=&lt;br /&gt;
&lt;br /&gt;
So, he we are contemplating C++.  We've got to grips with most of the C language in [[StartingC]] and it looked alright.  Definitely serviceable.  What's all the fuss about C++?  Well, I believe that most of the fuss is about '''encapsulation'''.  We saw the benefit of collecting together related variables into structures in C, true?  Well, C++ goes further and allows us to collect together not only related variables, but also functions which use those variables too.  An '''instance of a class''' is called an '''object''' and it comes preloaded with all the variables and functions (aka '''methods''') that you'll need when considering said object.&lt;br /&gt;
&lt;br /&gt;
What may have seemed like the relatively small enhancement of adding methods to the encapsulation has, in fact, resulted in a sea-change.  No longer are we thinking about a program in terms of the variables and the functions, but instead we're thinking about '''objects''' (planets, radios, payrolls and the like) and how they interact with other objects.  Hence the term '''object oriented programming''' (OOP).&lt;br /&gt;
&lt;br /&gt;
Things are typically invented for a reason and C++ is no different.  The problem with the traditional functional programming model, such as standard C, is that as our programs grow we end up with more and more variables which are used by more and more functions.  These are functions and variables are typically all mixed up in the scope, or '''namespace''', of the top-level function, called 'main'.  Modification and maintenance of the program becomes harder and harder since it becomes more difficult to keep track of which variables are used by which functions.  Overall are program begins to resemble spaghetti--not a renowned building material!  Instead, we would like to work with something more amenable to our aims.  We would like components which are easily combined, modified or even replaced completely.  A more modular paradigm suggests itself.  We want the programming equivalent of Lego!&lt;br /&gt;
&lt;br /&gt;
[[Image:spaghetti.jpg|300px|thumbnail|centre|less like this...]]&lt;br /&gt;
[[Image:Rube-goldberg-toothpaste.jpg|365px|thumbnail|centre|or this...]]&lt;br /&gt;
[[Image:Lego.jpg|365px|thumbnail|centre|..and more like this.]]&lt;br /&gt;
&lt;br /&gt;
We'll see in the following examples that the OOP approach, and in particular the mindset of encapsulation, provides us with the modular building blocks that we are after.  Repeat after me, &amp;quot;'''encapsulation is the best thing since sliced-bread!'''&amp;quot; :-)&lt;br /&gt;
&lt;br /&gt;
OK, enough of the spiel, let's get our hands dirty with an actual example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cd CtoC++/examples/example1&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first chunk of code to greet you inside '''class.cc''' (we'll use .cc to denote C++ source code files) is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
//&lt;br /&gt;
// This is a C++ comment line&lt;br /&gt;
//&lt;br /&gt;
&lt;br /&gt;
#include &amp;lt;iostream&amp;gt;        // A useful C++ library&lt;br /&gt;
#include &amp;lt;cmath&amp;gt;           // The standard C math library&lt;br /&gt;
&lt;br /&gt;
// declare a namespace in which to keep &lt;br /&gt;
// some handy scientific constants&lt;br /&gt;
namespace scientific&lt;br /&gt;
{&lt;br /&gt;
  const double pi            = 3.14159265; // note the use of 'const'&lt;br /&gt;
  const double grav_constant = 6.673e-11;  // uinversal graviational constant (m3 kg-1 s-2) &lt;br /&gt;
  const int    sec_per_day   = 86400;      // number of seconds in 24 hours&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
// avail ourselves of a couple of namespaces&lt;br /&gt;
// via the 'using' directive&lt;br /&gt;
using namespace std;           // allows us to use 'cout', for example&lt;br /&gt;
using namespace scientific;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
What's new?  Well, first up, we see that the comment syntax has changed and that we can use just a leading double forward slash ('''//''') to signal a note from the author.  '''#include''' is familiar, except that we've dropped the '''.h'''s from inside the angle brackets.&lt;br /&gt;
&lt;br /&gt;
The next block is a '''namespace''' declaration.  The concept of a namespace is common to a number of programming languages and here we're setting one up called '''scientific''' and using it to store some handy constants.  We can enclose anything we like in a namespace.  We access the contents of a namespace via the '''using''' directive.  In this case we're accessing an intrinsic one called '''std''' (standard)--we'll be doing that a lot!--and also our scientific one.  The idea behind namespaces is to reduce the risk of a clash of names when programs get large.  They're handy.&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Next up in the source code is the class declaration (and definition, as it happens) itself:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
class satellite&lt;br /&gt;
{&lt;br /&gt;
private:&lt;br /&gt;
&lt;br /&gt;
  // Private members of a class class cannot be accessed&lt;br /&gt;
  // from outside the class.&lt;br /&gt;
  double period;        // time taken to orbit e.g. earth (s) &lt;br /&gt;
  double sma_of_orbit;  // semi-major axis of satellite's orbit (m)&lt;br /&gt;
&lt;br /&gt;
public:&lt;br /&gt;
&lt;br /&gt;
  // Public members of the class are visible to the&lt;br /&gt;
  // rest of the program.&lt;br /&gt;
&lt;br /&gt;
  // Method to assign values to private variables. &lt;br /&gt;
  void set(const double prd, const double sma)&lt;br /&gt;
  {&lt;br /&gt;
    period = prd;&lt;br /&gt;
    sma_of_orbit = sma;&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
  // Method to compute mass of a celestial body&lt;br /&gt;
  // given the period of a satellite which orbits&lt;br /&gt;
  // it and the semi-major axis of that orbit.&lt;br /&gt;
  // See Kepler's laws of planetary motion.&lt;br /&gt;
  double mass_of_attractor(void) const&lt;br /&gt;
  {&lt;br /&gt;
    return (4.0 * pow(sma_of_orbit,3) * pow(pi,2)) / (pow(period,2) * grav_constant);&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
};&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
You can see that the class called '''satellite''' contains some variables and also some methods.  The contents of the class is also separated into two sections by the keywords '''private''' and '''public'''.  We've declared our variables to be private (cannot be seen from outside the class) and our methods to be public (are visible from outside).  In doing so, we've set up an '''interface''' (i.e. the public methods) through which other parts of the program can interact with this class.  In this case, the program at large can call '''set()''', providing information about the satellite's orbit as it does so, and also '''mass_of_attractor()''' in order to discover the mass of whatever the satellite is orbiting.&lt;br /&gt;
&lt;br /&gt;
[[Image:oop_interface_1.jpg|500px|thumbnail|centre|One object interacts with another via its interface.]]&lt;br /&gt;
&lt;br /&gt;
The existence of an interface simplifies the ways in which the object interacts with the rest of the program and means that any alterations to the program are much easier to make.  For example, any you can make changes to the internals of a class without fear that you will unwittingly break some aspect of the program outside of the interface.  Indeed, we could entirely re-write the contents of a (perhaps complex) class and as long as the interface remains unchanged, the rest of the program need never know!  This is quite a boon for scientific software, which has a more rapid schedule of alterations that other kinds of software.&lt;br /&gt;
&lt;br /&gt;
[[Image:oop_interface_2.jpg|500px|thumbnail|centre|Given a consistent interface, we can change one object without changing the others.]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Last up is our glue code, or main function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
int main (void)&lt;br /&gt;
{&lt;br /&gt;
  // Declare an 'instance' of the satellite class,&lt;br /&gt;
  // called 'moon'.&lt;br /&gt;
  satellite moon;&lt;br /&gt;
&lt;br /&gt;
  cout &amp;lt;&amp;lt; &amp;quot;== Welcome to the intro to classes program! ==&amp;quot; &amp;lt;&amp;lt; endl &amp;lt;&amp;lt; endl;&lt;br /&gt;
&lt;br /&gt;
  // Set some values pertaining to the moon.&lt;br /&gt;
  moon.set((27.322*sec_per_day),384399e3);&lt;br /&gt;
&lt;br /&gt;
  // Call a method of the satellite class&lt;br /&gt;
  // and report results to the 'stdout' stream.&lt;br /&gt;
  cout &amp;lt;&amp;lt; &amp;quot;Mass of the Earth (kg) is: &amp;quot; &amp;lt;&amp;lt; moon.mass_of_attractor() &amp;lt;&amp;lt; endl;&lt;br /&gt;
&lt;br /&gt;
  return EXIT_SUCCESS;&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
in which we declare in instance of our satellite class, the '''moon''' object, call set() and finally mass_of_attractor(), noting the dot ('''.''') operator for accessing members of the class.&lt;br /&gt;
&lt;br /&gt;
The way in which we print to stdout is also different in C++.  Here we have used the left shift operator ('''&amp;lt;&amp;lt;''') together with the '''cout''' I/O stream and also the endline ('''endl''') operator.&lt;br /&gt;
&lt;br /&gt;
You can run the program--and weigh the Earth!--by typing:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./class.exe&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
(The eagle-eyed amongst you will note that we have a small error in our calculation of mass.  The intrigued amongst the cohort of eagles may be relieved to see that [http://en.wikipedia.org/wiki/Kepler%27s_laws_of_planetary_motion Kepler's law] gives the combined mass of the moon and Earth in this case, and that if we subtract off the mass of the moon, we get closer to the actual mass of the Earth--phew!)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
'''Exercises'''&lt;br /&gt;
* Try modifying the main program, so that you weigh the Sun, instead of the Earth.  The following pages give you details of [http://en.wikipedia.org/wiki/Earth the orbit of the Earth] and [http://en.wikipedia.org/wiki/Sun the mass of the Sun], to check.&lt;br /&gt;
* Add a new method to the satellite class to compute the [http://en.wikipedia.org/wiki/Orbital_speed#Mean_orbital_speed mean orbital speed] of the satellite, and perhaps another to compute the satellites speed at various points along it's orbit? &lt;br /&gt;
* Add a whole new class to the program.  This is just for practice, so it could be a very simple one.  How about a class to represent a 2-d vector (i.e. on the x-y plane), which has a method to report the magnitude of that vector?&lt;br /&gt;
&lt;br /&gt;
[[Image:2D-vec-schematic.jpg|150px|thumbnail|centre|Coordinates and magnitude of a 2D vector.]]&lt;br /&gt;
&lt;br /&gt;
=More on Methods=&lt;br /&gt;
&lt;br /&gt;
OK.  We've bundled up some methods and variables into a class.  This is all to the good.  However, we haven't delved too deeply into all the features that C++ provides with regards to methods.  Let's rectify that right now.  We'll make a start by typing:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cd ../example2&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In this directory, you'll see that we've split our program over the files;&lt;br /&gt;
# '''methods.h''', containing the declarations (names and types of arguments) for our enhanced satellite class,&lt;br /&gt;
# '''methods.cc''', containing the 'meat' of the methods and,&lt;br /&gt;
# '''main.cc''', containing the main function inside which we put our class through it's paces.&lt;br /&gt;
&lt;br /&gt;
Looking inside the header file, you'll see our scientific namespace again, as well as the class declaration:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
class satellite&lt;br /&gt;
{&lt;br /&gt;
private:&lt;br /&gt;
&lt;br /&gt;
  char         *name;         // name of satellite&lt;br /&gt;
  unsigned int iNameLen;      // length of name string&lt;br /&gt;
  double       period;        // time taken to orbit e.g. earth (s) &lt;br /&gt;
  double       sma_of_orbit;  // semi-major axis of satellite's orbit (m)&lt;br /&gt;
&lt;br /&gt;
  // copy method&lt;br /&gt;
  void copy(const satellite&amp;amp; _stllt);&lt;br /&gt;
&lt;br /&gt;
public:&lt;br /&gt;
&lt;br /&gt;
  // default constructor&lt;br /&gt;
  // Note same name as class&lt;br /&gt;
  satellite(void);&lt;br /&gt;
&lt;br /&gt;
  // constructor with arguments&lt;br /&gt;
  satellite(const char *nm, const double prd, const double sma);&lt;br /&gt;
&lt;br /&gt;
  // copy construcor&lt;br /&gt;
  satellite(const satellite&amp;amp; _stllt);&lt;br /&gt;
&lt;br /&gt;
  // assignment operator&lt;br /&gt;
  satellite&amp;amp; operator=(const satellite&amp;amp; _stllt);&lt;br /&gt;
&lt;br /&gt;
  // previous mass calculation method&lt;br /&gt;
  double mass_of_attractor(void) const;&lt;br /&gt;
&lt;br /&gt;
  // previous set method&lt;br /&gt;
  void set(const char *nm, const double prd, const double sma);  &lt;br /&gt;
&lt;br /&gt;
  // display method&lt;br /&gt;
  void display(void) const;&lt;br /&gt;
&lt;br /&gt;
  // default destructor&lt;br /&gt;
  ~satellite();&lt;br /&gt;
};&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This time around we have some extra members:&lt;br /&gt;
* We have a character pointer called '''name''', along with an integer to store the length of the character array, once some memory has been allocated.&lt;br /&gt;
* We have a number of '''constructor''' methods, which we immediately see are special since their (shared) name matches the name of the class.&lt;br /&gt;
* We have a '''destructor''', where it's name also matches the class name, but with a leading twiddle ('''~''').&lt;br /&gt;
* We have a private method called '''copy''',&lt;br /&gt;
* a '''display''' method and also &lt;br /&gt;
* an assignment operator ('''=''').&lt;br /&gt;
&lt;br /&gt;
Let's go through these in turn.&lt;br /&gt;
&lt;br /&gt;
Constructors are invoked when a new object is created.  The two relevant lines in '''main.cc'' are:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
  satellite moon1;  // default construcor&lt;br /&gt;
  satellite moon2(&amp;quot;moon2&amp;quot;,(27.322*sec_per_day),384399e3);  // construcor with args&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Here we've declared two instances of the satellite class and--imaginatively enough--called them '''moon1''' and '''moon2'''.  We created moon1 using the '''default constructor''' (no arguments follow the variable name).  The internals of which we can find inside '''methods.cc''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
// default constructor&lt;br /&gt;
satellite::satellite()&lt;br /&gt;
{&lt;br /&gt;
  iNameLen = 0;&lt;br /&gt;
  name = new char[iNameLen + 1];&lt;br /&gt;
  (*name) = '\0';  // empty string&lt;br /&gt;
  period = 0.0;&lt;br /&gt;
  sma_of_orbit = 0.0;&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
As it's name suggests, this method sets up an object with default values (zero values, null strings etc.) in lieu of any specific information.&lt;br /&gt;
&lt;br /&gt;
'''moon2''' was created using a constructor which takes arguments:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
// constructor with arguments&lt;br /&gt;
satellite::satellite(const char *nm, const double prd, const double sma)&lt;br /&gt;
{&lt;br /&gt;
  set(nm, prd, sma);&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This method accepts the name of the satellite instance, together with values for the period and the semi-major axis.  Given these, it merely calls the '''set()''' method, which is sensible since this method has all the functionality that we desire, and it's a bad idea to duplicate the code.&lt;br /&gt;
&lt;br /&gt;
We can see that these two methods have exactly the same save and differ only in their associated argument lists.  This is an example of what's called '''overloading''', which can be highly desirable when designing clear and simple class interfaces.  We can overload methods and operators.&lt;br /&gt;
&lt;br /&gt;
You will see that we also have what we've labelled as a '''copy constructor''', which takes another instance of the satellite class as it's argument, and creates a new object in it's image.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
// copy constructor&lt;br /&gt;
satellite::satellite(const satellite&amp;amp; _stllt) : name(NULL) {copy(_stllt);}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This method makes use of a '''member initializer''' and calls the private copy method (not available from outside the class, but callable from other members).  Member initializers are carried out before the method itself is called and are always done in order.  In this case, we've set '''name''' equal to '''NULL''' so as to avoid dynamic memory allocation manoeuvres in the copy method.&lt;br /&gt;
&lt;br /&gt;
C++ will provide what's known as '''shallow''' copy constructor, assignment and destructor methods implicitly, which are fine for classes which do not make use of dynamic memory allocation.  However, for more complex classes, we must write our own '''deep''' copying methods.  For example, our copy method:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
void satellite::copy(const satellite&amp;amp; _stllt)&lt;br /&gt;
{&lt;br /&gt;
  if (name != NULL) {&lt;br /&gt;
    delete[] name;&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
  iNameLen = _stllt.iNameLen;&lt;br /&gt;
  name = new char[iNameLen + 1];&lt;br /&gt;
  strcpy(name,_stllt.name);&lt;br /&gt;
  period = _stllt.period;&lt;br /&gt;
  sma_of_orbit = _stllt.sma_of_orbit;&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The copy needs to be deep, as if we were not careful, we would end up with two classes containing pointers to the same block of memory (holding the 'name' character string) and that would not be at all what we wanted!  Instead we allocate some new memory and call a string copying method from the standard C library.  Copying the values of the numerical variables is easy.  We've made use of the new C++ memory allocation function '''new''', which we can all agree is far simpler than 'malloc()'.  Correspondingly '''delete''' replaces 'free()'.&lt;br /&gt;
&lt;br /&gt;
None of the other methods warrant any comment, except for the assignment operator:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
satellite&amp;amp; satellite::operator=(const satellite&amp;amp; _stllt)&lt;br /&gt;
{&lt;br /&gt;
  // assignment to self test&lt;br /&gt;
  if (this == &amp;amp;_stllt) {&lt;br /&gt;
    return (*this);&lt;br /&gt;
  }&lt;br /&gt;
  else {&lt;br /&gt;
    copy(_stllt);&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
  return (*this);&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In this case, we've overloaded the '''=''' operator and given it particular instructions when faced with instances of the satellite class on either side of it, such as the statement:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
moon1 = moon2;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Using this method, we've ensured that a deep copy takes place, where the name string is handled appropriately.&lt;br /&gt;
&lt;br /&gt;
Good eh?  Now we see the way to create full and convenient interfaces to our classes.  To run the program, type:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./methods.exe&lt;br /&gt;
&amp;lt;/pre&amp;gt; &lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
'''Exercises'''&lt;br /&gt;
&lt;br /&gt;
* Experiment with the copy constructor.  For example, is it legal syntax to add the declaration '''satellite moon3(moon2);''' towards the end of the main function?&lt;br /&gt;
* Method arguments can have defaults attached, e.g. '''satellite(const char *nm, const double prd=0.0, const double sma=0.0)'''.  Experiment with the constructor with arguments.  How much more flexibility can you introduce to the interface?  '''Note''' that your default values should only be added to the declaration of the class (i.e. inserted in the header file), and your default arguments must be all the rightmost arguments in the list.   &lt;br /&gt;
* Can you define other methods/operators for this class?  How about 'less than' (&amp;lt;) or 'greater than' (&amp;gt;) operators.  If two satellites were to collide and coalesce, what could a plus (+) operator do?&lt;br /&gt;
&lt;br /&gt;
'''Hints''':  My template for the 'less-than' operator is below.  The argument '_stllt', will act as the RHS of the comparison.  The class through which the method is invoked will be the LHS.  (A similar template will hold for the plus operator, except that this method must return a copy of a new instance of the class.)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
bool satellite::operator&amp;lt;(const satellite&amp;amp; _stllt) const&lt;br /&gt;
{&lt;br /&gt;
  if (_stllt.sma_of_orbit &amp;gt; sma_of_orbit) {&lt;br /&gt;
    return true;&lt;br /&gt;
  }&lt;br /&gt;
  else {&lt;br /&gt;
    return false;&lt;br /&gt;
  }&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Is this good enough?  Note that since the class of the argument and '''this''' are the same, the instance on the LHS of the comparison can access the private data members of that on the RHS.&lt;br /&gt;
&lt;br /&gt;
=Templates and the Standard Template Library=&lt;br /&gt;
&lt;br /&gt;
OK, so things are going swimmingly.  We're using classes for encapsulation.  We've considered the interface to a class in some detail and seen how we can improve the way that instances of a class interact with the rest of the program.  This is all excellent, '''but...'''  You knew there was a wrinkle on the horizon, eh?&lt;br /&gt;
&lt;br /&gt;
Let's take a moment to think about '''data structures'''.  The way we store data can make a huge difference to a program.  Given the right data structures, solving an involved problem can be a pleasure, if not a cinch.  Given the wrong data structures, the whole enterprise can be a chore!&lt;br /&gt;
&lt;br /&gt;
So far, we've hardly stopped to think about data structures.  We've seen single variables and arrays of said variables.  As an improvement, we've also seen structures and even arrays of structures.  There are a great many more possibilities, however.  We can have [http://en.wikipedia.org/wiki/Stack_(data_structure) stacks], [http://en.wikipedia.org/wiki/Queue_(data_structure) queues], [http://en.wikipedia.org/wiki/Linked_list linked lists], [http://en.wikipedia.org/wiki/Binary_tree binary trees], sets, strings, vectors, matrices and many, many more.  All these data structures are designed to highlight certain properties of some stored data and so make certain operations as easy as possible.  &lt;br /&gt;
&lt;br /&gt;
For example, a tree structure is good for representing a search through a state-space.  If you wanted to program a computer to play chess, you could represent the state of the board at a node.  Different moves from a given state would be the branches.  As you can see, by using a tree we can hold a number of different move sequences in memory at the same time.  We can pick and advance any stored state by another move.  We can also prune away a whole 'subtree' of moves, should it prove ill-advised, according to some criterion.&lt;br /&gt;
&lt;br /&gt;
For our code example, let's consider one of the simpler structures--a stack.  To create a stack of boxes we would take a box and set it down.  We take another box and place it on top of the first, and so on.  In order to get at the first box, we need to take all the other boxes off it.  The image below shows such as stack. &lt;br /&gt;
&lt;br /&gt;
[[Image:stack-drawers.jpg|300px|thumbnail|centre|A Stack.., in this case of boxes.]]&lt;br /&gt;
&lt;br /&gt;
Sometimes, this is exactly the way in which we want to store our data.  If we we're modelling the deposition and erosion of sediments on the sea floor, for example, a stack would be just the ticket.&lt;br /&gt;
&lt;br /&gt;
OK, ok, this is all well and good, but where's the wrinkle?  Well, let's say we want a stack of real numbers at one point of a program, and a stack of integers at another.  Does that mean that we would need to write two different classes, with all their associated interface gubbins, one for the doubles and one for the integers?  That would be a pain!&lt;br /&gt;
&lt;br /&gt;
Fear not!  We can write a '''template''' class instead.  Templates are neat, as we '''do not need to specify the type''' of thing that will be found in a stack until the point where we declare an instance of said stack.  In order to illustrate this approach, we have a small example of what we will call a LIFO stack.  LIFO stands for 'Last In, First Out'.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cd ../example3&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Inside '''lifo.h''', you'll see the declaration (and definition - many compilers seem to prefer this) of our template class:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
template &amp;lt;class TYPE&amp;gt;&lt;br /&gt;
class Stack&lt;br /&gt;
{&lt;br /&gt;
&lt;br /&gt;
 private:&lt;br /&gt;
&lt;br /&gt;
  int size;        // number of elements in stack&lt;br /&gt;
  int head;        // index of element in the head of the stack&lt;br /&gt;
  TYPE* stackPtr;  // pointer to the stack&lt;br /&gt;
  &lt;br /&gt;
 public:&lt;br /&gt;
  // constructor with default of 10 items in the stack&lt;br /&gt;
  Stack(int s=10);&lt;br /&gt;
  // destructor&lt;br /&gt;
  ~Stack() { delete[] stackPtr; }&lt;br /&gt;
  // method for adding an item &lt;br /&gt;
  bool push(const TYPE&amp;amp; item);&lt;br /&gt;
  // method for removing an item &lt;br /&gt;
  bool pop(void);&lt;br /&gt;
  // method to report top item in stack&lt;br /&gt;
  TYPE top(void) const;&lt;br /&gt;
  // method to test if stack is empty&lt;br /&gt;
  bool empty(void) const;&lt;br /&gt;
  // method to test if stack is full&lt;br /&gt;
  bool full(void) const;&lt;br /&gt;
&lt;br /&gt;
};&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note the use of the wildcard name '''TYPE''' in the angle brackets (this name could be anything, but TYPE in capitals stands out nicely).  The interface to the class contains methods for construction and destruction, as well as the basic modes of operation--'''push'''ing and '''pop'''ing items on to and off the stack.  We have a method to report what's on the top of the stack and a couple more to report whether the stack is 'full' or 'empty'.&lt;br /&gt;
&lt;br /&gt;
Feel free to browse the details of the implementation, but we'll skip over them here.  They are relatively rudimentary and no doubt could tolerate a good deal of improvement.  The short piece of glue code is contained in '''main.cc''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
#include &amp;lt;iostream&amp;gt;&lt;br /&gt;
#include &amp;lt;cstdlib&amp;gt;&lt;br /&gt;
#include &amp;quot;lifo.h&amp;quot;&lt;br /&gt;
&lt;br /&gt;
using namespace std;&lt;br /&gt;
&lt;br /&gt;
int main (void)&lt;br /&gt;
{&lt;br /&gt;
  Stack&amp;lt;char&amp;gt; charLifo;&lt;br /&gt;
  Stack&amp;lt;int&amp;gt;  intLifo;&lt;br /&gt;
&lt;br /&gt;
  cout &amp;lt;&amp;lt; &amp;quot;== Welcome to the simple LIFO Stack Program! ==&amp;quot; &amp;lt;&amp;lt; endl &amp;lt;&amp;lt; endl;&lt;br /&gt;
&lt;br /&gt;
  charLifo.push('P');&lt;br /&gt;
  charLifo.push('Q');&lt;br /&gt;
  charLifo.push('R');&lt;br /&gt;
  charLifo.push('S');&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
you can run the example program by typing: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./lifo.exe&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
One of the reasons, why we haven't laboured too hard over our stack implementation is because C++ provides us with something called the '''Standard Template Library''', or '''STL''', for short.  This contains tried and tested implementations of of many data structures and algorithms that we would like.  All there, provided to us for free!&lt;br /&gt;
&lt;br /&gt;
An example of using a stack from the STL is in:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cd ../example4&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This time, all we need is in '''main.cc''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
#include &amp;lt;iostream&amp;gt;&lt;br /&gt;
#include &amp;lt;stack&amp;gt;&lt;br /&gt;
using namespace std;&lt;br /&gt;
&lt;br /&gt;
int main()&lt;br /&gt;
{&lt;br /&gt;
  stack&amp;lt;char&amp;gt; charStack;  // a stack of characters&lt;br /&gt;
  stack&amp;lt;int&amp;gt;  intStack;   // a stack of integers&lt;br /&gt;
...&lt;br /&gt;
  // populate the character stack&lt;br /&gt;
  charStack.push('A');&lt;br /&gt;
  charStack.push('B');&lt;br /&gt;
  charStack.push('C');&lt;br /&gt;
  charStack.push('D');&lt;br /&gt;
...&lt;br /&gt;
  // ditto for the integer stack&lt;br /&gt;
  intStack.push(1);&lt;br /&gt;
  intStack.push(2);&lt;br /&gt;
  intStack.push(3);&lt;br /&gt;
  intStack.push(4);&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and you can run the program in the usual way:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./stack.exe&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For good measure, you will also see an example of a '''list''' sourced from the STL, and an associated '''iterator''' for cycling through the members of said list.  Iterators allow us to cycle over members of a data structure without having to know the details of how that particular data structure is implemented.&lt;br /&gt;
&lt;br /&gt;
To learn more about the STL, you can take a look at, e.g. [http://www.sgi.com/tech/stl SGI's page] or [http://en.wikipedia.org/wiki/Standard_Template_Library that on Wikipedia].  [http://oreilly.com/pub/topic/cprog O'Reilly], of course, have a few good books on the topic too.&lt;br /&gt;
&lt;br /&gt;
Other libraries that augment the STL are listed on http://www.boost.org.  This collection contains many more useful algorithms and datatypes. With the STL, Boost etc., the sky is the limit!&lt;br /&gt;
&lt;br /&gt;
'''Exercises'''&lt;br /&gt;
&lt;br /&gt;
* Modify the program in example4 to make use of other members of the STL, such as a queue and perhaps a linked list.&lt;br /&gt;
* Those who are really looking for a challenge can get to grips with hash tables (maps) and binary trees!&lt;br /&gt;
* Why not go the whole hog an write your own binary tree and iterator (depth or breadth first search)?  You'll learn a lot!&lt;br /&gt;
&lt;br /&gt;
=Inheritance=&lt;br /&gt;
&lt;br /&gt;
The last topic that we will look at is '''inheritance'''.  This is a mechanism through which you can declare a new class--called the '''derived class'''--to be a specialisation of another class--called the '''base class'''.  In line with the spirit of the '''pragmatic programming''' tutorials, we will not linger on this topic as we believe that while it is certainly neat, it may be of limited use for our scientific projects.&lt;br /&gt;
&lt;br /&gt;
In this example, we will consider the simplest, but quite likely the most often used, form on inheritance--'''public''' inheritance from a single parent base class.   &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cd ../example5&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In '''inheritance.h''', we see a simple base class declared:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
class celestial_body&lt;br /&gt;
{&lt;br /&gt;
 private:&lt;br /&gt;
&lt;br /&gt;
  double equitorial_radius;&lt;br /&gt;
  double polar_radius;&lt;br /&gt;
  &lt;br /&gt;
 public:&lt;br /&gt;
  // Note, not using a constructor...&lt;br /&gt;
  // a set method instead, which we can use to access&lt;br /&gt;
  // variables from the derived class.&lt;br /&gt;
  void set(const double eq_rad, const double pol_rad);&lt;br /&gt;
&lt;br /&gt;
  // volume &lt;br /&gt;
  double volume(void) const;&lt;br /&gt;
&lt;br /&gt;
};&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Followed by a derived class which builds on the concept of a celestial body and adds in space to store information about it's orbit and additional methods:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
class satellite : public celestial_body&lt;br /&gt;
{&lt;br /&gt;
private:&lt;br /&gt;
&lt;br /&gt;
  double period;&lt;br /&gt;
  double sma_of_orbit;&lt;br /&gt;
&lt;br /&gt;
public:&lt;br /&gt;
&lt;br /&gt;
  // constructor&lt;br /&gt;
  satellite(const double prd = 0.0, const double sma = 0.0,&lt;br /&gt;
            const double eq_rad = 0.0, const double pol_rad = 0.0);&lt;br /&gt;
  // mass&lt;br /&gt;
  double mass_of_attractor(void) const;&lt;br /&gt;
&lt;br /&gt;
};&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In '''inheritance.cc''', you will see that we call the 'set()' method in the base class from the constructor of the derived class.  This highlights that what is private in the base class is hidden from the derived class and so an appropriate interface is required even within a chain of parents and children.&lt;br /&gt;
&lt;br /&gt;
In '''main.cc''', we see that through the process of inheritance, we can call the '''volume()''' method (declared in the base class) from an instance of the derived class:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
cout &amp;lt;&amp;lt; &amp;quot;Volume of moon2 (m^3) is: &amp;quot; &amp;lt;&amp;lt; moon2.volume() &amp;lt;&amp;lt; endl;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Nore that we have declared two instances of the class satellite.  The constructor for 'moon2' is given all the relevant information, whereas that for 'moon1' relies on default values for the size settings.  To run the program type:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./inheritance.exe&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Image:class-hierarchy.jpg|300px|thumbnail|centre|A class hierarchy.]]&lt;br /&gt;
&lt;br /&gt;
'''Exercises'''&lt;br /&gt;
&lt;br /&gt;
* There is a good deal more to discuss on the topic of inheritance, but I will leave researching those details as an exercise to the reader for the moment.&lt;br /&gt;
&lt;br /&gt;
=A Good Read?=&lt;br /&gt;
&lt;br /&gt;
* [[A_Good_Read|References]] for further reading.&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=CtoC%2B%2B&amp;diff=9404</id>
		<title>CtoC++</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=CtoC%2B%2B&amp;diff=9404"/>
		<updated>2014-02-14T12:14:43Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: /* Templates and the Standard Template Library */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Pragmatic Programming]]&lt;br /&gt;
'''CtoC++: Upgrading to Object Oriented C'''&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
This tutorial carries on where [[StartingC]] left off.&lt;br /&gt;
&lt;br /&gt;
To get the material, cut and paste the contents of the box below onto your command line.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
svn co https://svn.ggy.bris.ac.uk/subversion-open/CtoC++/trunk ./CtoC++&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In this tutorial we will assume basic linux skills as outlined in [[Linux1]].&lt;br /&gt;
&lt;br /&gt;
=Cutting to the Chase: Classes and Encapsulation=&lt;br /&gt;
&lt;br /&gt;
So, he we are contemplating C++.  We've got to grips with most of the C language in [[StartingC]] and it looked alright.  Definitely serviceable.  What's all the fuss about C++?  Well, I believe that most of the fuss is about '''encapsulation'''.  We saw the benefit of collecting together related variables into structures in C, true?  Well, C++ goes further and allows us to collect together not only related variables, but also functions which use those variables too.  An '''instance of a class''' is called an '''object''' and it comes preloaded with all the variables and functions (aka '''methods''') that you'll need when considering said object.&lt;br /&gt;
&lt;br /&gt;
What may have seemed like the relatively small enhancement of adding methods to the encapsulation has, in fact, resulted in a sea-change.  No longer are we thinking about a program in terms of the variables and the functions, but instead we're thinking about '''objects''' (planets, radios, payrolls and the like) and how they interact with other objects.  Hence the term '''object oriented programming''' (OOP).&lt;br /&gt;
&lt;br /&gt;
Things are typically invented for a reason and C++ is no different.  The problem with the traditional functional programming model, such as standard C, is that as our programs grow we end up with more and more variables which are used by more and more functions.  These are functions and variables are typically all mixed up in the scope, or '''namespace''', of the top-level function, called 'main'.  Modification and maintenance of the program becomes harder and harder since it becomes more difficult to keep track of which variables are used by which functions.  Overall are program begins to resemble spaghetti--not a renowned building material!  Instead, we would like to work with something more amenable to our aims.  We would like components which are easily combined, modified or even replaced completely.  A more modular paradigm suggests itself.  We want the programming equivalent of Lego!&lt;br /&gt;
&lt;br /&gt;
[[Image:spaghetti.jpg|300px|thumbnail|centre|less like this...]]&lt;br /&gt;
[[Image:Rube-goldberg-toothpaste.jpg|365px|thumbnail|centre|or this...]]&lt;br /&gt;
[[Image:Lego.jpg|365px|thumbnail|centre|..and more like this.]]&lt;br /&gt;
&lt;br /&gt;
We'll see in the following examples that the OOP approach, and in particular the mindset of encapsulation, provides us with the modular building blocks that we are after.  Repeat after me, &amp;quot;'''encapsulation is the best thing since sliced-bread!'''&amp;quot; :-)&lt;br /&gt;
&lt;br /&gt;
OK, enough of the spiel, let's get our hands dirty with an actual example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cd CtoC++/examples/example1&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first chunk of code to greet you inside '''class.cc''' (we'll use .cc to denote C++ source code files) is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
//&lt;br /&gt;
// This is a C++ comment line&lt;br /&gt;
//&lt;br /&gt;
&lt;br /&gt;
#include &amp;lt;iostream&amp;gt;        // A useful C++ library&lt;br /&gt;
#include &amp;lt;cmath&amp;gt;           // The standard C math library&lt;br /&gt;
&lt;br /&gt;
// declare a namespace in which to keep &lt;br /&gt;
// some handy scientific constants&lt;br /&gt;
namespace scientific&lt;br /&gt;
{&lt;br /&gt;
  const double pi            = 3.14159265; // note the use of 'const'&lt;br /&gt;
  const double grav_constant = 6.673e-11;  // uinversal graviational constant (m3 kg-1 s-2) &lt;br /&gt;
  const int    sec_per_day   = 86400;      // number of seconds in 24 hours&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
// avail ourselves of a couple of namespaces&lt;br /&gt;
// via the 'using' directive&lt;br /&gt;
using namespace std;           // allows us to use 'cout', for example&lt;br /&gt;
using namespace scientific;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
What's new?  Well, first up, we see that the comment syntax has changed and that we can use just a leading double forward slash ('''//''') to signal a note from the author.  '''#include''' is familiar, except that we've dropped the '''.h'''s from inside the angle brackets.&lt;br /&gt;
&lt;br /&gt;
The next block is a '''namespace''' declaration.  The concept of a namespace is common to a number of programming languages and here we're setting one up called '''scientific''' and using it to store some handy constants.  We can enclose anything we like in a namespace.  We access the contents of a namespace via the '''using''' directive.  In this case we're accessing an intrinsic one called '''std''' (standard)--we'll be doing that a lot!--and also our scientific one.  The idea behind namespaces is to reduce the risk of a clash of names when programs get large.  They're handy.&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Next up in the source code is the class declaration (and definition, as it happens) itself:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
class satellite&lt;br /&gt;
{&lt;br /&gt;
private:&lt;br /&gt;
&lt;br /&gt;
  // Private members of a class class cannot be accessed&lt;br /&gt;
  // from outside the class.&lt;br /&gt;
  double period;        // time taken to orbit e.g. earth (s) &lt;br /&gt;
  double sma_of_orbit;  // semi-major axis of satellite's orbit (m)&lt;br /&gt;
&lt;br /&gt;
public:&lt;br /&gt;
&lt;br /&gt;
  // Public members of the class are visible to the&lt;br /&gt;
  // rest of the program.&lt;br /&gt;
&lt;br /&gt;
  // Method to assign values to private variables. &lt;br /&gt;
  void set(const double prd, const double sma)&lt;br /&gt;
  {&lt;br /&gt;
    period = prd;&lt;br /&gt;
    sma_of_orbit = sma;&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
  // Method to compute mass of a celestial body&lt;br /&gt;
  // given the period of a satellite which orbits&lt;br /&gt;
  // it and the semi-major axis of that orbit.&lt;br /&gt;
  // See Kepler's laws of planetary motion.&lt;br /&gt;
  double mass_of_attractor(void) const&lt;br /&gt;
  {&lt;br /&gt;
    return (4.0 * pow(sma_of_orbit,3) * pow(pi,2)) / (pow(period,2) * grav_constant);&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
};&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
You can see that the class called '''satellite''' contains some variables and also some methods.  The contents of the class is also separated into two sections by the keywords '''private''' and '''public'''.  We've declared our variables to be private (cannot be seen from outside the class) and our methods to be public (are visible from outside).  In doing so, we've set up an '''interface''' (i.e. the public methods) through which other parts of the program can interact with this class.  In this case, the program at large can call '''set()''', providing information about the satellite's orbit as it does so, and also '''mass_of_attractor()''' in order to discover the mass of whatever the satellite is orbiting.&lt;br /&gt;
&lt;br /&gt;
[[Image:oop_interface_1.jpg|500px|thumbnail|centre|One object interacts with another via its interface.]]&lt;br /&gt;
&lt;br /&gt;
The existence of an interface simplifies the ways in which the object interacts with the rest of the program and means that any alterations to the program are much easier to make.  For example, any you can make changes to the internals of a class without fear that you will unwittingly break some aspect of the program outside of the interface.  Indeed, we could entirely re-write the contents of a (perhaps complex) class and as long as the interface remains unchanged, the rest of the program need never know!  This is quite a boon for scientific software, which has a more rapid schedule of alterations that other kinds of software.&lt;br /&gt;
&lt;br /&gt;
[[Image:oop_interface_2.jpg|500px|thumbnail|centre|Given a consistent interface, we can change one object without changing the others.]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Last up is our glue code, or main function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
int main (void)&lt;br /&gt;
{&lt;br /&gt;
  // Declare an 'instance' of the satellite class,&lt;br /&gt;
  // called 'moon'.&lt;br /&gt;
  satellite moon;&lt;br /&gt;
&lt;br /&gt;
  cout &amp;lt;&amp;lt; &amp;quot;== Welcome to the intro to classes program! ==&amp;quot; &amp;lt;&amp;lt; endl &amp;lt;&amp;lt; endl;&lt;br /&gt;
&lt;br /&gt;
  // Set some values pertaining to the moon.&lt;br /&gt;
  moon.set((27.322*sec_per_day),384399e3);&lt;br /&gt;
&lt;br /&gt;
  // Call a method of the satellite class&lt;br /&gt;
  // and report results to the 'stdout' stream.&lt;br /&gt;
  cout &amp;lt;&amp;lt; &amp;quot;Mass of the Earth (kg) is: &amp;quot; &amp;lt;&amp;lt; moon.mass_of_attractor() &amp;lt;&amp;lt; endl;&lt;br /&gt;
&lt;br /&gt;
  return EXIT_SUCCESS;&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
in which we declare in instance of our satellite class, the '''moon''' object, call set() and finally mass_of_attractor(), noting the dot ('''.''') operator for accessing members of the class.&lt;br /&gt;
&lt;br /&gt;
The way in which we print to stdout is also different in C++.  Here we have used the left shift operator ('''&amp;lt;&amp;lt;''') together with the '''cout''' I/O stream and also the endline ('''endl''') operator.&lt;br /&gt;
&lt;br /&gt;
You can run the program--and weigh the Earth!--by typing:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./class.exe&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
(The eagle-eyed amongst you will note that we have a small error in our calculation of mass.  The intrigued amongst the cohort of eagles may be relieved to see that [http://en.wikipedia.org/wiki/Kepler%27s_laws_of_planetary_motion Kepler's law] gives the combined mass of the moon and Earth in this case, and that if we subtract off the mass of the moon, we get closer to the actual mass of the Earth--phew!)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
'''Exercises'''&lt;br /&gt;
* Try modifying the main program, so that you weigh the Sun, instead of the Earth.  The following pages give you details of [http://en.wikipedia.org/wiki/Earth the orbit of the Earth] and [http://en.wikipedia.org/wiki/Sun the mass of the Sun], to check.&lt;br /&gt;
* Add a new method to the satellite class to compute the [http://en.wikipedia.org/wiki/Orbital_speed#Mean_orbital_speed mean orbital speed] of the satellite, and perhaps another to compute the satellites speed at various points along it's orbit? &lt;br /&gt;
* Add a whole new class to the program.  This is just for practice, so it could be a very simple one.  How about a class to represent a 2-d vector (i.e. on the x-y plane), which has a method to report the magnitude of that vector?&lt;br /&gt;
&lt;br /&gt;
[[Image:2D-vec-schematic.jpg|150px|thumbnail|centre|Coordinates and magnitude of a 2D vector.]]&lt;br /&gt;
&lt;br /&gt;
=More on Methods=&lt;br /&gt;
&lt;br /&gt;
OK.  We've bundled up some methods and variables into a class.  This is all to the good.  However, we haven't delved too deeply into all the features that C++ provides with regards to methods.  Let's rectify that right now.  We'll make a start by typing:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cd ../example2&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In this directory, you'll see that we've split our program over the files;&lt;br /&gt;
# '''methods.h''', containing the declarations (names and types of arguments) for our enhanced satellite class,&lt;br /&gt;
# '''methods.cc''', containing the 'meat' of the methods and,&lt;br /&gt;
# '''main.cc''', containing the main function inside which we put our class through it's paces.&lt;br /&gt;
&lt;br /&gt;
Looking inside the header file, you'll see our scientific namespace again, as well as the class declaration:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
class satellite&lt;br /&gt;
{&lt;br /&gt;
private:&lt;br /&gt;
&lt;br /&gt;
  char         *name;         // name of satellite&lt;br /&gt;
  unsigned int iNameLen;      // length of name string&lt;br /&gt;
  double       period;        // time taken to orbit e.g. earth (s) &lt;br /&gt;
  double       sma_of_orbit;  // semi-major axis of satellite's orbit (m)&lt;br /&gt;
&lt;br /&gt;
  // copy method&lt;br /&gt;
  void copy(const satellite&amp;amp; _stllt);&lt;br /&gt;
&lt;br /&gt;
public:&lt;br /&gt;
&lt;br /&gt;
  // default constructor&lt;br /&gt;
  // Note same name as class&lt;br /&gt;
  satellite(void);&lt;br /&gt;
&lt;br /&gt;
  // constructor with arguments&lt;br /&gt;
  satellite(const char *nm, const double prd, const double sma);&lt;br /&gt;
&lt;br /&gt;
  // copy construcor&lt;br /&gt;
  satellite(const satellite&amp;amp; _stllt);&lt;br /&gt;
&lt;br /&gt;
  // assignment operator&lt;br /&gt;
  satellite&amp;amp; operator=(const satellite&amp;amp; _stllt);&lt;br /&gt;
&lt;br /&gt;
  // previous mass calculation method&lt;br /&gt;
  double mass_of_attractor(void) const;&lt;br /&gt;
&lt;br /&gt;
  // previous set method&lt;br /&gt;
  void set(const char *nm, const double prd, const double sma);  &lt;br /&gt;
&lt;br /&gt;
  // display method&lt;br /&gt;
  void display(void) const;&lt;br /&gt;
&lt;br /&gt;
  // default destructor&lt;br /&gt;
  ~satellite();&lt;br /&gt;
};&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This time around we have some extra members:&lt;br /&gt;
* We have a character pointer called '''name''', along with an integer to store the length of the character array, once some memory has been allocated.&lt;br /&gt;
* We have a number of '''constructor''' methods, which we immediately see are special since their (shared) name matches the name of the class.&lt;br /&gt;
* We have a '''destructor''', where it's name also matches the class name, but with a leading twiddle ('''~''').&lt;br /&gt;
* We have a private method called '''copy''',&lt;br /&gt;
* a '''display''' method and also &lt;br /&gt;
* an assignment operator ('''=''').&lt;br /&gt;
&lt;br /&gt;
Let's go through these in turn.&lt;br /&gt;
&lt;br /&gt;
Constructors are invoked when a new object is created.  The two relevant lines in '''main.cc'' are:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
  satellite moon1;  // default construcor&lt;br /&gt;
  satellite moon2(&amp;quot;moon2&amp;quot;,(27.322*sec_per_day),384399e3);  // construcor with args&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Here we've declared two instances of the satellite class and--imaginatively enough--called them '''moon1''' and '''moon2'''.  We created moon1 using the '''default constructor''' (no arguments follow the variable name).  The internals of which we can find inside '''methods.cc''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
// default constructor&lt;br /&gt;
satellite::satellite()&lt;br /&gt;
{&lt;br /&gt;
  iNameLen = 0;&lt;br /&gt;
  name = new char[iNameLen + 1];&lt;br /&gt;
  (*name) = '\0';  // empty string&lt;br /&gt;
  period = 0.0;&lt;br /&gt;
  sma_of_orbit = 0.0;&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
As it's name suggests, this method sets up an object with default values (zero values, null strings etc.) in lieu of any specific information.&lt;br /&gt;
&lt;br /&gt;
'''moon2''' was created using a constructor which takes arguments:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
// constructor with arguments&lt;br /&gt;
satellite::satellite(const char *nm, const double prd, const double sma)&lt;br /&gt;
{&lt;br /&gt;
  set(nm, prd, sma);&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This method accepts the name of the satellite instance, together with values for the period and the semi-major axis.  Given these, it merely calls the '''set()''' method, which is sensible since this method has all the functionality that we desire, and it's a bad idea to duplicate the code.&lt;br /&gt;
&lt;br /&gt;
We can see that these two methods have exactly the same save and differ only in their associated argument lists.  This is an example of what's called '''overloading''', which can be highly desirable when designing clear and simple class interfaces.  We can overload methods and operators.&lt;br /&gt;
&lt;br /&gt;
You will see that we also have what we've labelled as a '''copy constructor''', which takes another instance of the satellite class as it's argument, and creates a new object in it's image.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
// copy constructor&lt;br /&gt;
satellite::satellite(const satellite&amp;amp; _stllt) : name(NULL) {copy(_stllt);}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This method makes use of a '''member initializer''' and calls the private copy method (not available from outside the class, but callable from other members).  Member initializers are carried out before the method itself is called and are always done in order.  In this case, we've set '''name''' equal to '''NULL''' so as to avoid dynamic memory allocation manoeuvres in the copy method.&lt;br /&gt;
&lt;br /&gt;
C++ will provide what's known as '''shallow''' copy constructor, assignment and destructor methods implicitly, which are fine for classes which do not make use of dynamic memory allocation.  However, for more complex classes, we must write our own '''deep''' copying methods.  For example, our copy method:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
void satellite::copy(const satellite&amp;amp; _stllt)&lt;br /&gt;
{&lt;br /&gt;
  if (name != NULL) {&lt;br /&gt;
    delete[] name;&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
  iNameLen = _stllt.iNameLen;&lt;br /&gt;
  name = new char[iNameLen + 1];&lt;br /&gt;
  strcpy(name,_stllt.name);&lt;br /&gt;
  period = _stllt.period;&lt;br /&gt;
  sma_of_orbit = _stllt.sma_of_orbit;&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The copy needs to be deep, as if we were not careful, we would end up with two classes containing pointers to the same block of memory (holding the 'name' character string) and that would not be at all what we wanted!  Instead we allocate some new memory and call a string copying method from the standard C library.  Copying the values of the numerical variables is easy.  We've made use of the new C++ memory allocation function '''new''', which we can all agree is far simpler than 'malloc()'.  Correspondingly '''delete''' replaces 'free()'.&lt;br /&gt;
&lt;br /&gt;
None of the other methods warrant any comment, except for the assignment operator:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
satellite&amp;amp; satellite::operator=(const satellite&amp;amp; _stllt)&lt;br /&gt;
{&lt;br /&gt;
  // assignment to self test&lt;br /&gt;
  if (this == &amp;amp;_stllt) {&lt;br /&gt;
    return (*this);&lt;br /&gt;
  }&lt;br /&gt;
  else {&lt;br /&gt;
    copy(_stllt);&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
  return (*this);&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In this case, we've overloaded the '''=''' operator and given it particular instructions when faced with instances of the satellite class on either side of it, such as the statement:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
moon1 = moon2;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Using this method, we've ensured that a deep copy takes place, where the name string is handled appropriately.&lt;br /&gt;
&lt;br /&gt;
Good eh?  Now we see the way to create full and convenient interfaces to our classes.  To run the program, type:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./methods.exe&lt;br /&gt;
&amp;lt;/pre&amp;gt; &lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
'''Exercises'''&lt;br /&gt;
&lt;br /&gt;
* Experiment with the copy constructor.  For example, is it legal syntax to add the declaration '''satellite moon3(moon2);''' towards the end of the main function?&lt;br /&gt;
* Method arguments can have defaults attached, e.g. '''satellite(const char *nm, const double prd=0.0, const double sma=0.0)'''.  Experiment with the constructor with arguments.  How much more flexibility can you introduce to the interface?  '''Note''' that your default values should only be added to the declaration of the class (i.e. inserted in the header file), and your default arguments must be all the rightmost arguments in the list.   &lt;br /&gt;
* Can you define other methods/operators for this class?  How about 'less than' (&amp;lt;) or 'greater than' (&amp;gt;) operators.  If two satellites were to collide and coalesce, what could a plus (+) operator do?&lt;br /&gt;
&lt;br /&gt;
'''Hints''':  My template for the 'less-than' operator is below.  The argument '_stllt', will act as the RHS of the comparison.  The class through which the method is invoked will be the LHS.  (A similar template will hold for the plus operator, except that this method must return a copy of a new instance of the class.)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
bool satellite::operator&amp;lt;(const satellite&amp;amp; _stllt) const&lt;br /&gt;
{&lt;br /&gt;
  if (_stllt.sma_of_orbit &amp;gt; sma_of_orbit) {&lt;br /&gt;
    return true;&lt;br /&gt;
  }&lt;br /&gt;
  else {&lt;br /&gt;
    return false;&lt;br /&gt;
  }&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Is this good enough?  Note that since the class of the argument and '''this''' are the same, the instance on the LHS of the comparison can access the private data members of that on the RHS.&lt;br /&gt;
&lt;br /&gt;
=Templates and the Standard Template Library=&lt;br /&gt;
&lt;br /&gt;
OK, so things are going swimmingly.  We're using classes for encapsulation.  We've considered the interface to a class in some detail and seen how we can improve the way that instances of a class interact with the rest of the program.  This is all excellent, '''but...'''  You knew there was a wrinkle on the horizon, eh?&lt;br /&gt;
&lt;br /&gt;
Let's take a moment to think about '''data structures'''.  The way we store data can make a huge difference to a program.  Given the right data structures, solving an involved problem can be a pleasure, if not a cinch.  Given the wrong data structures, the whole enterprise can be a chore!&lt;br /&gt;
&lt;br /&gt;
So far, we've hardly stopped to think about data structures.  We've seen single variables and arrays of said variables.  As an improvement, we've also seen structures and even arrays of structures.  There are a great many more possibilities, however.  We can have [http://en.wikipedia.org/wiki/Stack_(data_structure) stacks], [http://en.wikipedia.org/wiki/Queue_(data_structure) queues], [http://en.wikipedia.org/wiki/Linked_list linked lists], [http://en.wikipedia.org/wiki/Binary_tree binary trees], sets, strings, vectors, matrices and many, many more.  All these data structures are designed to highlight certain properties of some stored data and so make certain operations as easy as possible.  &lt;br /&gt;
&lt;br /&gt;
For example, a tree structure is good for representing a search through a state-space.  If you wanted to program a computer to play chess, you could represent the state of the board at a node.  Different moves from a given state would be the branches.  As you can see, by using a tree we can hold a number of different move sequences in memory at the same time.  We can pick and advance any stored state by another move.  We can also prune away a whole 'subtree' of moves, should it prove ill-advised, according to some criterion.&lt;br /&gt;
&lt;br /&gt;
For our code example, let's consider one of the simpler structures--a stack.  To create a stack of boxes we would take a box and set it down.  We take another box and place it on top of the first, and so on.  In order to get at the first box, we need to take all the other boxes off it.  The image below shows such as stack. &lt;br /&gt;
&lt;br /&gt;
[[Image:stack-drawers.jpg|300px|thumbnail|centre|A Stack.., in this case of boxes.]]&lt;br /&gt;
&lt;br /&gt;
Sometimes, this is exactly the way in which we want to store our data.  If we we're modelling the deposition and erosion of sediments on the sea floor, for example, a stack would be just the ticket.&lt;br /&gt;
&lt;br /&gt;
OK, ok, this is all well and good, but where's the wrinkle?  Well, let's say we want a stack of real numbers at one point of a program, and a stack of integers at another.  Does that mean that we would need to write two different classes, with all their associated interface gubbins, one for the doubles and one for the integers?  That would be a pain!&lt;br /&gt;
&lt;br /&gt;
Fear not!  We can write a '''template''' class instead.  Templates are neat, as we '''do not need to specify the type''' of thing that will be found in a stack until the point where we declare an instance of said stack.  In order to illustrate this approach, we have a small example of what we will call a LIFO stack.  LIFO stands for 'Last In, First Out'.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cd ../example3&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Inside '''lifo.h''', you'll see the declaration (and definition - many compilers seem to prefer this) of our template class:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
template &amp;lt;class TYPE&amp;gt;&lt;br /&gt;
class Stack&lt;br /&gt;
{&lt;br /&gt;
&lt;br /&gt;
 private:&lt;br /&gt;
&lt;br /&gt;
  int size;        // number of elements in stack&lt;br /&gt;
  int head;        // index of element in the head of the stack&lt;br /&gt;
  TYPE* stackPtr;  // pointer to the stack&lt;br /&gt;
  &lt;br /&gt;
 public:&lt;br /&gt;
  // constructor with default of 10 items in the stack&lt;br /&gt;
  Stack(int s=10);&lt;br /&gt;
  // destructor&lt;br /&gt;
  ~Stack() { delete[] stackPtr; }&lt;br /&gt;
  // method for adding an item &lt;br /&gt;
  bool push(const TYPE&amp;amp; item);&lt;br /&gt;
  // method for removing an item &lt;br /&gt;
  bool pop(void);&lt;br /&gt;
  // method to report top item in stack&lt;br /&gt;
  TYPE top(void) const;&lt;br /&gt;
  // method to test if stack is empty&lt;br /&gt;
  bool empty(void) const;&lt;br /&gt;
  // method to test if stack is full&lt;br /&gt;
  bool full(void) const;&lt;br /&gt;
&lt;br /&gt;
};&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note the use of the wildcard name '''TYPE''' in the angle brackets (this name could be anything, but TYPE in capitals stands out nicely).  The interface to the class contains methods for construction and destruction, as well as the basic modes of operation--'''push'''ing and '''pop'''ing items on to and off the stack.  We have a method to report what's on the top of the stack and a couple more to report whether the stack is 'full' or 'empty'.&lt;br /&gt;
&lt;br /&gt;
Feel free to browse the details of the implementation, but we'll skip over them here.  They are relatively rudimentary and no doubt could tolerate a good deal of improvement.  The short piece of glue code is contained in '''main.cc''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
#include &amp;lt;iostream&amp;gt;&lt;br /&gt;
#include &amp;lt;cstdlib&amp;gt;&lt;br /&gt;
#include &amp;quot;lifo.h&amp;quot;&lt;br /&gt;
&lt;br /&gt;
using namespace std;&lt;br /&gt;
&lt;br /&gt;
int main (void)&lt;br /&gt;
{&lt;br /&gt;
  Stack&amp;lt;char&amp;gt; charLifo;&lt;br /&gt;
  Stack&amp;lt;int&amp;gt;  intLifo;&lt;br /&gt;
&lt;br /&gt;
  cout &amp;lt;&amp;lt; &amp;quot;== Welcome to the simple LIFO Stack Program! ==&amp;quot; &amp;lt;&amp;lt; endl &amp;lt;&amp;lt; endl;&lt;br /&gt;
&lt;br /&gt;
  charLifo.push('P');&lt;br /&gt;
  charLifo.push('Q');&lt;br /&gt;
  charLifo.push('R');&lt;br /&gt;
  charLifo.push('S');&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
you can run the example program by typing: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./lifo.exe&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
One of the reasons, why we haven't laboured too hard over our stack implementation is because C++ provides us with something called the '''Standard Template Library''', or '''STL''', for short.  This contains tried and tested implementations of of many data structures and algorithms that we would like.  All there, provided to us for free!&lt;br /&gt;
&lt;br /&gt;
An example of using a stack from the STL is in:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cd ../example4&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This time, all we need is in '''main.cc''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
#include &amp;lt;iostream&amp;gt;&lt;br /&gt;
#include &amp;lt;stack&amp;gt;&lt;br /&gt;
using namespace std;&lt;br /&gt;
&lt;br /&gt;
int main()&lt;br /&gt;
{&lt;br /&gt;
  stack&amp;lt;char&amp;gt; charStack;  // a stack of characters&lt;br /&gt;
  stack&amp;lt;int&amp;gt;  intStack;   // a stack of integers&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and you can run the program in the usual way:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./stack.exe&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For good measure, you will also see an example of a '''list''' sourced from the STL, and an associated '''iterator''' for cycling through the members of said list.  Iterators allow us to cycle over members of a data structure without having to know the details of how that particular data structure is implemented.&lt;br /&gt;
&lt;br /&gt;
To learn more about the STL, you can take a look at, e.g. [http://www.sgi.com/tech/stl SGI's page] or [http://en.wikipedia.org/wiki/Standard_Template_Library that on Wikipedia].  [http://oreilly.com/pub/topic/cprog O'Reilly], of course, have a few good books on the topic too.&lt;br /&gt;
&lt;br /&gt;
Other libraries that augment the STL are listed on http://www.boost.org.  This collection contains many more useful algorithms and datatypes. With the STL, Boost etc., the sky is the limit!&lt;br /&gt;
&lt;br /&gt;
'''Exercises'''&lt;br /&gt;
&lt;br /&gt;
* Modify the program in example4 to make use of other members of the STL, such as a queue and perhaps a linked list.&lt;br /&gt;
* Those who are really looking for a challenge can get to grips with hash tables (maps) and binary trees!&lt;br /&gt;
* Why not go the whole hog an write your own binary tree and iterator (depth or breadth first search)?  You'll learn a lot!&lt;br /&gt;
&lt;br /&gt;
=Inheritance=&lt;br /&gt;
&lt;br /&gt;
The last topic that we will look at is '''inheritance'''.  This is a mechanism through which you can declare a new class--called the '''derived class'''--to be a specialisation of another class--called the '''base class'''.  In line with the spirit of the '''pragmatic programming''' tutorials, we will not linger on this topic as we believe that while it is certainly neat, it may be of limited use for our scientific projects.&lt;br /&gt;
&lt;br /&gt;
In this example, we will consider the simplest, but quite likely the most often used, form on inheritance--'''public''' inheritance from a single parent base class.   &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cd ../example5&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In '''inheritance.h''', we see a simple base class declared:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
class celestial_body&lt;br /&gt;
{&lt;br /&gt;
 private:&lt;br /&gt;
&lt;br /&gt;
  double equitorial_radius;&lt;br /&gt;
  double polar_radius;&lt;br /&gt;
  &lt;br /&gt;
 public:&lt;br /&gt;
  // Note, not using a constructor...&lt;br /&gt;
  // a set method instead, which we can use to access&lt;br /&gt;
  // variables from the derived class.&lt;br /&gt;
  void set(const double eq_rad, const double pol_rad);&lt;br /&gt;
&lt;br /&gt;
  // volume &lt;br /&gt;
  double volume(void) const;&lt;br /&gt;
&lt;br /&gt;
};&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Followed by a derived class which builds on the concept of a celestial body and adds in space to store information about it's orbit and additional methods:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
class satellite : public celestial_body&lt;br /&gt;
{&lt;br /&gt;
private:&lt;br /&gt;
&lt;br /&gt;
  double period;&lt;br /&gt;
  double sma_of_orbit;&lt;br /&gt;
&lt;br /&gt;
public:&lt;br /&gt;
&lt;br /&gt;
  // constructor&lt;br /&gt;
  satellite(const double prd = 0.0, const double sma = 0.0,&lt;br /&gt;
            const double eq_rad = 0.0, const double pol_rad = 0.0);&lt;br /&gt;
  // mass&lt;br /&gt;
  double mass_of_attractor(void) const;&lt;br /&gt;
&lt;br /&gt;
};&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In '''inheritance.cc''', you will see that we call the 'set()' method in the base class from the constructor of the derived class.  This highlights that what is private in the base class is hidden from the derived class and so an appropriate interface is required even within a chain of parents and children.&lt;br /&gt;
&lt;br /&gt;
In '''main.cc''', we see that through the process of inheritance, we can call the '''volume()''' method (declared in the base class) from an instance of the derived class:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
cout &amp;lt;&amp;lt; &amp;quot;Volume of moon2 (m^3) is: &amp;quot; &amp;lt;&amp;lt; moon2.volume() &amp;lt;&amp;lt; endl;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Nore that we have declared two instances of the class satellite.  The constructor for 'moon2' is given all the relevant information, whereas that for 'moon1' relies on default values for the size settings.  To run the program type:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./inheritance.exe&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Image:class-hierarchy.jpg|300px|thumbnail|centre|A class hierarchy.]]&lt;br /&gt;
&lt;br /&gt;
'''Exercises'''&lt;br /&gt;
&lt;br /&gt;
* There is a good deal more to discuss on the topic of inheritance, but I will leave researching those details as an exercise to the reader for the moment.&lt;br /&gt;
&lt;br /&gt;
=A Good Read?=&lt;br /&gt;
&lt;br /&gt;
* [[A_Good_Read|References]] for further reading.&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=CtoC%2B%2B&amp;diff=9403</id>
		<title>CtoC++</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=CtoC%2B%2B&amp;diff=9403"/>
		<updated>2014-02-14T12:14:16Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: /* Templates and the Standard Template Library */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Pragmatic Programming]]&lt;br /&gt;
'''CtoC++: Upgrading to Object Oriented C'''&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
This tutorial carries on where [[StartingC]] left off.&lt;br /&gt;
&lt;br /&gt;
To get the material, cut and paste the contents of the box below onto your command line.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
svn co https://svn.ggy.bris.ac.uk/subversion-open/CtoC++/trunk ./CtoC++&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In this tutorial we will assume basic linux skills as outlined in [[Linux1]].&lt;br /&gt;
&lt;br /&gt;
=Cutting to the Chase: Classes and Encapsulation=&lt;br /&gt;
&lt;br /&gt;
So, he we are contemplating C++.  We've got to grips with most of the C language in [[StartingC]] and it looked alright.  Definitely serviceable.  What's all the fuss about C++?  Well, I believe that most of the fuss is about '''encapsulation'''.  We saw the benefit of collecting together related variables into structures in C, true?  Well, C++ goes further and allows us to collect together not only related variables, but also functions which use those variables too.  An '''instance of a class''' is called an '''object''' and it comes preloaded with all the variables and functions (aka '''methods''') that you'll need when considering said object.&lt;br /&gt;
&lt;br /&gt;
What may have seemed like the relatively small enhancement of adding methods to the encapsulation has, in fact, resulted in a sea-change.  No longer are we thinking about a program in terms of the variables and the functions, but instead we're thinking about '''objects''' (planets, radios, payrolls and the like) and how they interact with other objects.  Hence the term '''object oriented programming''' (OOP).&lt;br /&gt;
&lt;br /&gt;
Things are typically invented for a reason and C++ is no different.  The problem with the traditional functional programming model, such as standard C, is that as our programs grow we end up with more and more variables which are used by more and more functions.  These are functions and variables are typically all mixed up in the scope, or '''namespace''', of the top-level function, called 'main'.  Modification and maintenance of the program becomes harder and harder since it becomes more difficult to keep track of which variables are used by which functions.  Overall are program begins to resemble spaghetti--not a renowned building material!  Instead, we would like to work with something more amenable to our aims.  We would like components which are easily combined, modified or even replaced completely.  A more modular paradigm suggests itself.  We want the programming equivalent of Lego!&lt;br /&gt;
&lt;br /&gt;
[[Image:spaghetti.jpg|300px|thumbnail|centre|less like this...]]&lt;br /&gt;
[[Image:Rube-goldberg-toothpaste.jpg|365px|thumbnail|centre|or this...]]&lt;br /&gt;
[[Image:Lego.jpg|365px|thumbnail|centre|..and more like this.]]&lt;br /&gt;
&lt;br /&gt;
We'll see in the following examples that the OOP approach, and in particular the mindset of encapsulation, provides us with the modular building blocks that we are after.  Repeat after me, &amp;quot;'''encapsulation is the best thing since sliced-bread!'''&amp;quot; :-)&lt;br /&gt;
&lt;br /&gt;
OK, enough of the spiel, let's get our hands dirty with an actual example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cd CtoC++/examples/example1&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first chunk of code to greet you inside '''class.cc''' (we'll use .cc to denote C++ source code files) is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
//&lt;br /&gt;
// This is a C++ comment line&lt;br /&gt;
//&lt;br /&gt;
&lt;br /&gt;
#include &amp;lt;iostream&amp;gt;        // A useful C++ library&lt;br /&gt;
#include &amp;lt;cmath&amp;gt;           // The standard C math library&lt;br /&gt;
&lt;br /&gt;
// declare a namespace in which to keep &lt;br /&gt;
// some handy scientific constants&lt;br /&gt;
namespace scientific&lt;br /&gt;
{&lt;br /&gt;
  const double pi            = 3.14159265; // note the use of 'const'&lt;br /&gt;
  const double grav_constant = 6.673e-11;  // uinversal graviational constant (m3 kg-1 s-2) &lt;br /&gt;
  const int    sec_per_day   = 86400;      // number of seconds in 24 hours&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
// avail ourselves of a couple of namespaces&lt;br /&gt;
// via the 'using' directive&lt;br /&gt;
using namespace std;           // allows us to use 'cout', for example&lt;br /&gt;
using namespace scientific;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
What's new?  Well, first up, we see that the comment syntax has changed and that we can use just a leading double forward slash ('''//''') to signal a note from the author.  '''#include''' is familiar, except that we've dropped the '''.h'''s from inside the angle brackets.&lt;br /&gt;
&lt;br /&gt;
The next block is a '''namespace''' declaration.  The concept of a namespace is common to a number of programming languages and here we're setting one up called '''scientific''' and using it to store some handy constants.  We can enclose anything we like in a namespace.  We access the contents of a namespace via the '''using''' directive.  In this case we're accessing an intrinsic one called '''std''' (standard)--we'll be doing that a lot!--and also our scientific one.  The idea behind namespaces is to reduce the risk of a clash of names when programs get large.  They're handy.&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Next up in the source code is the class declaration (and definition, as it happens) itself:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
class satellite&lt;br /&gt;
{&lt;br /&gt;
private:&lt;br /&gt;
&lt;br /&gt;
  // Private members of a class class cannot be accessed&lt;br /&gt;
  // from outside the class.&lt;br /&gt;
  double period;        // time taken to orbit e.g. earth (s) &lt;br /&gt;
  double sma_of_orbit;  // semi-major axis of satellite's orbit (m)&lt;br /&gt;
&lt;br /&gt;
public:&lt;br /&gt;
&lt;br /&gt;
  // Public members of the class are visible to the&lt;br /&gt;
  // rest of the program.&lt;br /&gt;
&lt;br /&gt;
  // Method to assign values to private variables. &lt;br /&gt;
  void set(const double prd, const double sma)&lt;br /&gt;
  {&lt;br /&gt;
    period = prd;&lt;br /&gt;
    sma_of_orbit = sma;&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
  // Method to compute mass of a celestial body&lt;br /&gt;
  // given the period of a satellite which orbits&lt;br /&gt;
  // it and the semi-major axis of that orbit.&lt;br /&gt;
  // See Kepler's laws of planetary motion.&lt;br /&gt;
  double mass_of_attractor(void) const&lt;br /&gt;
  {&lt;br /&gt;
    return (4.0 * pow(sma_of_orbit,3) * pow(pi,2)) / (pow(period,2) * grav_constant);&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
};&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
You can see that the class called '''satellite''' contains some variables and also some methods.  The contents of the class is also separated into two sections by the keywords '''private''' and '''public'''.  We've declared our variables to be private (cannot be seen from outside the class) and our methods to be public (are visible from outside).  In doing so, we've set up an '''interface''' (i.e. the public methods) through which other parts of the program can interact with this class.  In this case, the program at large can call '''set()''', providing information about the satellite's orbit as it does so, and also '''mass_of_attractor()''' in order to discover the mass of whatever the satellite is orbiting.&lt;br /&gt;
&lt;br /&gt;
[[Image:oop_interface_1.jpg|500px|thumbnail|centre|One object interacts with another via its interface.]]&lt;br /&gt;
&lt;br /&gt;
The existence of an interface simplifies the ways in which the object interacts with the rest of the program and means that any alterations to the program are much easier to make.  For example, any you can make changes to the internals of a class without fear that you will unwittingly break some aspect of the program outside of the interface.  Indeed, we could entirely re-write the contents of a (perhaps complex) class and as long as the interface remains unchanged, the rest of the program need never know!  This is quite a boon for scientific software, which has a more rapid schedule of alterations that other kinds of software.&lt;br /&gt;
&lt;br /&gt;
[[Image:oop_interface_2.jpg|500px|thumbnail|centre|Given a consistent interface, we can change one object without changing the others.]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Last up is our glue code, or main function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
int main (void)&lt;br /&gt;
{&lt;br /&gt;
  // Declare an 'instance' of the satellite class,&lt;br /&gt;
  // called 'moon'.&lt;br /&gt;
  satellite moon;&lt;br /&gt;
&lt;br /&gt;
  cout &amp;lt;&amp;lt; &amp;quot;== Welcome to the intro to classes program! ==&amp;quot; &amp;lt;&amp;lt; endl &amp;lt;&amp;lt; endl;&lt;br /&gt;
&lt;br /&gt;
  // Set some values pertaining to the moon.&lt;br /&gt;
  moon.set((27.322*sec_per_day),384399e3);&lt;br /&gt;
&lt;br /&gt;
  // Call a method of the satellite class&lt;br /&gt;
  // and report results to the 'stdout' stream.&lt;br /&gt;
  cout &amp;lt;&amp;lt; &amp;quot;Mass of the Earth (kg) is: &amp;quot; &amp;lt;&amp;lt; moon.mass_of_attractor() &amp;lt;&amp;lt; endl;&lt;br /&gt;
&lt;br /&gt;
  return EXIT_SUCCESS;&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
in which we declare in instance of our satellite class, the '''moon''' object, call set() and finally mass_of_attractor(), noting the dot ('''.''') operator for accessing members of the class.&lt;br /&gt;
&lt;br /&gt;
The way in which we print to stdout is also different in C++.  Here we have used the left shift operator ('''&amp;lt;&amp;lt;''') together with the '''cout''' I/O stream and also the endline ('''endl''') operator.&lt;br /&gt;
&lt;br /&gt;
You can run the program--and weigh the Earth!--by typing:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./class.exe&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
(The eagle-eyed amongst you will note that we have a small error in our calculation of mass.  The intrigued amongst the cohort of eagles may be relieved to see that [http://en.wikipedia.org/wiki/Kepler%27s_laws_of_planetary_motion Kepler's law] gives the combined mass of the moon and Earth in this case, and that if we subtract off the mass of the moon, we get closer to the actual mass of the Earth--phew!)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
'''Exercises'''&lt;br /&gt;
* Try modifying the main program, so that you weigh the Sun, instead of the Earth.  The following pages give you details of [http://en.wikipedia.org/wiki/Earth the orbit of the Earth] and [http://en.wikipedia.org/wiki/Sun the mass of the Sun], to check.&lt;br /&gt;
* Add a new method to the satellite class to compute the [http://en.wikipedia.org/wiki/Orbital_speed#Mean_orbital_speed mean orbital speed] of the satellite, and perhaps another to compute the satellites speed at various points along it's orbit? &lt;br /&gt;
* Add a whole new class to the program.  This is just for practice, so it could be a very simple one.  How about a class to represent a 2-d vector (i.e. on the x-y plane), which has a method to report the magnitude of that vector?&lt;br /&gt;
&lt;br /&gt;
[[Image:2D-vec-schematic.jpg|150px|thumbnail|centre|Coordinates and magnitude of a 2D vector.]]&lt;br /&gt;
&lt;br /&gt;
=More on Methods=&lt;br /&gt;
&lt;br /&gt;
OK.  We've bundled up some methods and variables into a class.  This is all to the good.  However, we haven't delved too deeply into all the features that C++ provides with regards to methods.  Let's rectify that right now.  We'll make a start by typing:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cd ../example2&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In this directory, you'll see that we've split our program over the files;&lt;br /&gt;
# '''methods.h''', containing the declarations (names and types of arguments) for our enhanced satellite class,&lt;br /&gt;
# '''methods.cc''', containing the 'meat' of the methods and,&lt;br /&gt;
# '''main.cc''', containing the main function inside which we put our class through it's paces.&lt;br /&gt;
&lt;br /&gt;
Looking inside the header file, you'll see our scientific namespace again, as well as the class declaration:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
class satellite&lt;br /&gt;
{&lt;br /&gt;
private:&lt;br /&gt;
&lt;br /&gt;
  char         *name;         // name of satellite&lt;br /&gt;
  unsigned int iNameLen;      // length of name string&lt;br /&gt;
  double       period;        // time taken to orbit e.g. earth (s) &lt;br /&gt;
  double       sma_of_orbit;  // semi-major axis of satellite's orbit (m)&lt;br /&gt;
&lt;br /&gt;
  // copy method&lt;br /&gt;
  void copy(const satellite&amp;amp; _stllt);&lt;br /&gt;
&lt;br /&gt;
public:&lt;br /&gt;
&lt;br /&gt;
  // default constructor&lt;br /&gt;
  // Note same name as class&lt;br /&gt;
  satellite(void);&lt;br /&gt;
&lt;br /&gt;
  // constructor with arguments&lt;br /&gt;
  satellite(const char *nm, const double prd, const double sma);&lt;br /&gt;
&lt;br /&gt;
  // copy construcor&lt;br /&gt;
  satellite(const satellite&amp;amp; _stllt);&lt;br /&gt;
&lt;br /&gt;
  // assignment operator&lt;br /&gt;
  satellite&amp;amp; operator=(const satellite&amp;amp; _stllt);&lt;br /&gt;
&lt;br /&gt;
  // previous mass calculation method&lt;br /&gt;
  double mass_of_attractor(void) const;&lt;br /&gt;
&lt;br /&gt;
  // previous set method&lt;br /&gt;
  void set(const char *nm, const double prd, const double sma);  &lt;br /&gt;
&lt;br /&gt;
  // display method&lt;br /&gt;
  void display(void) const;&lt;br /&gt;
&lt;br /&gt;
  // default destructor&lt;br /&gt;
  ~satellite();&lt;br /&gt;
};&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This time around we have some extra members:&lt;br /&gt;
* We have a character pointer called '''name''', along with an integer to store the length of the character array, once some memory has been allocated.&lt;br /&gt;
* We have a number of '''constructor''' methods, which we immediately see are special since their (shared) name matches the name of the class.&lt;br /&gt;
* We have a '''destructor''', where it's name also matches the class name, but with a leading twiddle ('''~''').&lt;br /&gt;
* We have a private method called '''copy''',&lt;br /&gt;
* a '''display''' method and also &lt;br /&gt;
* an assignment operator ('''=''').&lt;br /&gt;
&lt;br /&gt;
Let's go through these in turn.&lt;br /&gt;
&lt;br /&gt;
Constructors are invoked when a new object is created.  The two relevant lines in '''main.cc'' are:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
  satellite moon1;  // default construcor&lt;br /&gt;
  satellite moon2(&amp;quot;moon2&amp;quot;,(27.322*sec_per_day),384399e3);  // construcor with args&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Here we've declared two instances of the satellite class and--imaginatively enough--called them '''moon1''' and '''moon2'''.  We created moon1 using the '''default constructor''' (no arguments follow the variable name).  The internals of which we can find inside '''methods.cc''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
// default constructor&lt;br /&gt;
satellite::satellite()&lt;br /&gt;
{&lt;br /&gt;
  iNameLen = 0;&lt;br /&gt;
  name = new char[iNameLen + 1];&lt;br /&gt;
  (*name) = '\0';  // empty string&lt;br /&gt;
  period = 0.0;&lt;br /&gt;
  sma_of_orbit = 0.0;&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
As it's name suggests, this method sets up an object with default values (zero values, null strings etc.) in lieu of any specific information.&lt;br /&gt;
&lt;br /&gt;
'''moon2''' was created using a constructor which takes arguments:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
// constructor with arguments&lt;br /&gt;
satellite::satellite(const char *nm, const double prd, const double sma)&lt;br /&gt;
{&lt;br /&gt;
  set(nm, prd, sma);&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This method accepts the name of the satellite instance, together with values for the period and the semi-major axis.  Given these, it merely calls the '''set()''' method, which is sensible since this method has all the functionality that we desire, and it's a bad idea to duplicate the code.&lt;br /&gt;
&lt;br /&gt;
We can see that these two methods have exactly the same save and differ only in their associated argument lists.  This is an example of what's called '''overloading''', which can be highly desirable when designing clear and simple class interfaces.  We can overload methods and operators.&lt;br /&gt;
&lt;br /&gt;
You will see that we also have what we've labelled as a '''copy constructor''', which takes another instance of the satellite class as it's argument, and creates a new object in it's image.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
// copy constructor&lt;br /&gt;
satellite::satellite(const satellite&amp;amp; _stllt) : name(NULL) {copy(_stllt);}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This method makes use of a '''member initializer''' and calls the private copy method (not available from outside the class, but callable from other members).  Member initializers are carried out before the method itself is called and are always done in order.  In this case, we've set '''name''' equal to '''NULL''' so as to avoid dynamic memory allocation manoeuvres in the copy method.&lt;br /&gt;
&lt;br /&gt;
C++ will provide what's known as '''shallow''' copy constructor, assignment and destructor methods implicitly, which are fine for classes which do not make use of dynamic memory allocation.  However, for more complex classes, we must write our own '''deep''' copying methods.  For example, our copy method:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
void satellite::copy(const satellite&amp;amp; _stllt)&lt;br /&gt;
{&lt;br /&gt;
  if (name != NULL) {&lt;br /&gt;
    delete[] name;&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
  iNameLen = _stllt.iNameLen;&lt;br /&gt;
  name = new char[iNameLen + 1];&lt;br /&gt;
  strcpy(name,_stllt.name);&lt;br /&gt;
  period = _stllt.period;&lt;br /&gt;
  sma_of_orbit = _stllt.sma_of_orbit;&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The copy needs to be deep, as if we were not careful, we would end up with two classes containing pointers to the same block of memory (holding the 'name' character string) and that would not be at all what we wanted!  Instead we allocate some new memory and call a string copying method from the standard C library.  Copying the values of the numerical variables is easy.  We've made use of the new C++ memory allocation function '''new''', which we can all agree is far simpler than 'malloc()'.  Correspondingly '''delete''' replaces 'free()'.&lt;br /&gt;
&lt;br /&gt;
None of the other methods warrant any comment, except for the assignment operator:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
satellite&amp;amp; satellite::operator=(const satellite&amp;amp; _stllt)&lt;br /&gt;
{&lt;br /&gt;
  // assignment to self test&lt;br /&gt;
  if (this == &amp;amp;_stllt) {&lt;br /&gt;
    return (*this);&lt;br /&gt;
  }&lt;br /&gt;
  else {&lt;br /&gt;
    copy(_stllt);&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
  return (*this);&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In this case, we've overloaded the '''=''' operator and given it particular instructions when faced with instances of the satellite class on either side of it, such as the statement:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
moon1 = moon2;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Using this method, we've ensured that a deep copy takes place, where the name string is handled appropriately.&lt;br /&gt;
&lt;br /&gt;
Good eh?  Now we see the way to create full and convenient interfaces to our classes.  To run the program, type:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./methods.exe&lt;br /&gt;
&amp;lt;/pre&amp;gt; &lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
'''Exercises'''&lt;br /&gt;
&lt;br /&gt;
* Experiment with the copy constructor.  For example, is it legal syntax to add the declaration '''satellite moon3(moon2);''' towards the end of the main function?&lt;br /&gt;
* Method arguments can have defaults attached, e.g. '''satellite(const char *nm, const double prd=0.0, const double sma=0.0)'''.  Experiment with the constructor with arguments.  How much more flexibility can you introduce to the interface?  '''Note''' that your default values should only be added to the declaration of the class (i.e. inserted in the header file), and your default arguments must be all the rightmost arguments in the list.   &lt;br /&gt;
* Can you define other methods/operators for this class?  How about 'less than' (&amp;lt;) or 'greater than' (&amp;gt;) operators.  If two satellites were to collide and coalesce, what could a plus (+) operator do?&lt;br /&gt;
&lt;br /&gt;
'''Hints''':  My template for the 'less-than' operator is below.  The argument '_stllt', will act as the RHS of the comparison.  The class through which the method is invoked will be the LHS.  (A similar template will hold for the plus operator, except that this method must return a copy of a new instance of the class.)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
bool satellite::operator&amp;lt;(const satellite&amp;amp; _stllt) const&lt;br /&gt;
{&lt;br /&gt;
  if (_stllt.sma_of_orbit &amp;gt; sma_of_orbit) {&lt;br /&gt;
    return true;&lt;br /&gt;
  }&lt;br /&gt;
  else {&lt;br /&gt;
    return false;&lt;br /&gt;
  }&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Is this good enough?  Note that since the class of the argument and '''this''' are the same, the instance on the LHS of the comparison can access the private data members of that on the RHS.&lt;br /&gt;
&lt;br /&gt;
=Templates and the Standard Template Library=&lt;br /&gt;
&lt;br /&gt;
OK, so things are going swimmingly.  We're using classes for encapsulation.  We've considered the interface to a class in some detail and seen how we can improve the way that instances of a class interact with the rest of the program.  This is all excellent, '''but...'''  You knew there was a wrinkle on the horizon, eh?&lt;br /&gt;
&lt;br /&gt;
Let's take a moment to think about '''data structures'''.  The way we store data can make a huge difference to a program.  Given the right data structures, solving an involved problem can be a pleasure, if not a cinch.  Given the wrong data structures, the whole enterprise can be a chore!&lt;br /&gt;
&lt;br /&gt;
So far, we've hardly stopped to think about data structures.  We've seen single variables and arrays of said variables.  As an improvement, we've also seen structures and even arrays of structures.  There are a great many more possibilities, however.  We can have [http://en.wikipedia.org/wiki/Stack_(data_structure) stacks], [http://en.wikipedia.org/wiki/Queue_(data_structure) queues], [http://en.wikipedia.org/wiki/Linked_list linked lists], [http://en.wikipedia.org/wiki/Binary_tree binary trees], sets, strings, vectors, matrices and many, many more.  All these data structures are designed to highlight certain properties of some stored data and so make certain operations as easy as possible.  &lt;br /&gt;
&lt;br /&gt;
For example, a tree structure is good for representing a search through a state-space.  If you wanted to program a computer to play chess, you could represent the state of the board at a node.  Different moves from a given state would be the branches.  As you can see, by using a tree we can hold a number of different move sequences in memory at the same time.  We can pick and advance any stored state by another move.  We can also prune away a whole 'subtree' of moves, should it prove ill-advised, according to some criterion.&lt;br /&gt;
&lt;br /&gt;
For our code example, let's consider one of the simpler structures--a stack.  To create a stack of boxes we would take a box and set it down.  We take another box and place it on top of the first, and so on.  In order to get at the first box, we need to take all the other boxes off it.  The image below shows such as stack. &lt;br /&gt;
&lt;br /&gt;
[[Image:stack-drawers.jpg|300px|thumbnail|centre|A Stack.., in this case of boxes.]]&lt;br /&gt;
&lt;br /&gt;
Sometimes, this is exactly the way in which we want to store our data.  If we we're modelling the deposition and erosion of sediments on the sea floor, for example, a stack would be just the ticket.&lt;br /&gt;
&lt;br /&gt;
OK, ok, this is all well and good, but where's the wrinkle?  Well, let's say we want a stack of real numbers at one point of a program, and a stack of integers at another.  Does that mean that we would need to write two different classes, with all their associated interface gubbins, one for the doubles and one for the integers?  That would be a pain!&lt;br /&gt;
&lt;br /&gt;
Fear not!  We can write a '''template''' class instead.  Templates are neat, as we '''do not need to specify the type''' of thing that will be found in a stack until the point where we declare an instance of said stack.  In order to illustrate this approach, we have a small example of what we will call a LIFO stack.  LIFO stands for 'Last In, First Out'.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cd ../example3&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Inside '''lifo.h''', you'll see the declaration (and definition - many compilers seem to prefer this) of our template class:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
template &amp;lt;class TYPE&amp;gt;&lt;br /&gt;
class Stack&lt;br /&gt;
{&lt;br /&gt;
&lt;br /&gt;
 private:&lt;br /&gt;
&lt;br /&gt;
  int size;        // number of elements in stack&lt;br /&gt;
  int head;        // index of element in the head of the stack&lt;br /&gt;
  TYPE* stackPtr;  // pointer to the stack&lt;br /&gt;
  &lt;br /&gt;
 public:&lt;br /&gt;
  // constructor with default of 10 items in the stack&lt;br /&gt;
  Stack(int s=10);&lt;br /&gt;
  // destructor&lt;br /&gt;
  ~Stack() { delete[] stackPtr; }&lt;br /&gt;
  // method for adding an item &lt;br /&gt;
  bool push(const TYPE&amp;amp; item);&lt;br /&gt;
  // method for removing an item &lt;br /&gt;
  bool pop(void);&lt;br /&gt;
  // method to report top item in stack&lt;br /&gt;
  TYPE top(void) const;&lt;br /&gt;
  // method to test if stack is empty&lt;br /&gt;
  bool empty(void) const;&lt;br /&gt;
  // method to test if stack is full&lt;br /&gt;
  bool full(void) const;&lt;br /&gt;
&lt;br /&gt;
};&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note the use of the wildcard name '''TYPE''' in the angle brackets (this name could be anything, but TYPE in capitals stands out nicely).  The interface to the class contains methods for construction and destruction, as well as the basic modes of operation--'''push'''ing and '''pop'''ing items on to and off the stack.  We have a method to report what's on the top of the stack and a couple more to report whether the stack is 'full' or 'empty'.&lt;br /&gt;
&lt;br /&gt;
Feel free to browse the details of the implementation, but we'll skip over them here.  They are relatively rudimentary and no doubt could tolerate a good deal of improvement.  The short piece of glue code is contained in '''main.cc''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
#include &amp;lt;iostream&amp;gt;&lt;br /&gt;
#include &amp;lt;cstdlib&amp;gt;&lt;br /&gt;
#include &amp;quot;lifo.h&amp;quot;&lt;br /&gt;
&lt;br /&gt;
using namespace std;&lt;br /&gt;
&lt;br /&gt;
int main (void)&lt;br /&gt;
{&lt;br /&gt;
  Stack&amp;lt;char&amp;gt; charLifo;&lt;br /&gt;
  Stack&amp;lt;int&amp;gt;  intLifo;&lt;br /&gt;
&lt;br /&gt;
  cout &amp;lt;&amp;lt; &amp;quot;== Welcome to the simple LIFO Stack Program! ==&amp;quot; &amp;lt;&amp;lt; endl &amp;lt;&amp;lt; endl;&lt;br /&gt;
&lt;br /&gt;
  charLifo.push('P');&lt;br /&gt;
  charLifo.push('Q');&lt;br /&gt;
  charLifo.push('R');&lt;br /&gt;
  charLifo.push('S');&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;\source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
you can run the example program by typing: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./lifo.exe&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
One of the reasons, why we haven't laboured too hard over our stack implementation is because C++ provides us with something called the '''Standard Template Library''', or '''STL''', for short.  This contains tried and tested implementations of of many data structures and algorithms that we would like.  All there, provided to us for free!&lt;br /&gt;
&lt;br /&gt;
An example of using a stack from the STL is in:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cd ../example4&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This time, all we need is in '''main.cc''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
#include &amp;lt;iostream&amp;gt;&lt;br /&gt;
#include &amp;lt;stack&amp;gt;&lt;br /&gt;
using namespace std;&lt;br /&gt;
&lt;br /&gt;
int main()&lt;br /&gt;
{&lt;br /&gt;
  stack&amp;lt;char&amp;gt; charStack;  // a stack of characters&lt;br /&gt;
  stack&amp;lt;int&amp;gt;  intStack;   // a stack of integers&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and you can run the program in the usual way:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./stack.exe&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For good measure, you will also see an example of a '''list''' sourced from the STL, and an associated '''iterator''' for cycling through the members of said list.  Iterators allow us to cycle over members of a data structure without having to know the details of how that particular data structure is implemented.&lt;br /&gt;
&lt;br /&gt;
To learn more about the STL, you can take a look at, e.g. [http://www.sgi.com/tech/stl SGI's page] or [http://en.wikipedia.org/wiki/Standard_Template_Library that on Wikipedia].  [http://oreilly.com/pub/topic/cprog O'Reilly], of course, have a few good books on the topic too.&lt;br /&gt;
&lt;br /&gt;
Other libraries that augment the STL are listed on http://www.boost.org.  This collection contains many more useful algorithms and datatypes. With the STL, Boost etc., the sky is the limit!&lt;br /&gt;
&lt;br /&gt;
'''Exercises'''&lt;br /&gt;
&lt;br /&gt;
* Modify the program in example4 to make use of other members of the STL, such as a queue and perhaps a linked list.&lt;br /&gt;
* Those who are really looking for a challenge can get to grips with hash tables (maps) and binary trees!&lt;br /&gt;
* Why not go the whole hog an write your own binary tree and iterator (depth or breadth first search)?  You'll learn a lot!&lt;br /&gt;
&lt;br /&gt;
=Inheritance=&lt;br /&gt;
&lt;br /&gt;
The last topic that we will look at is '''inheritance'''.  This is a mechanism through which you can declare a new class--called the '''derived class'''--to be a specialisation of another class--called the '''base class'''.  In line with the spirit of the '''pragmatic programming''' tutorials, we will not linger on this topic as we believe that while it is certainly neat, it may be of limited use for our scientific projects.&lt;br /&gt;
&lt;br /&gt;
In this example, we will consider the simplest, but quite likely the most often used, form on inheritance--'''public''' inheritance from a single parent base class.   &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cd ../example5&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In '''inheritance.h''', we see a simple base class declared:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
class celestial_body&lt;br /&gt;
{&lt;br /&gt;
 private:&lt;br /&gt;
&lt;br /&gt;
  double equitorial_radius;&lt;br /&gt;
  double polar_radius;&lt;br /&gt;
  &lt;br /&gt;
 public:&lt;br /&gt;
  // Note, not using a constructor...&lt;br /&gt;
  // a set method instead, which we can use to access&lt;br /&gt;
  // variables from the derived class.&lt;br /&gt;
  void set(const double eq_rad, const double pol_rad);&lt;br /&gt;
&lt;br /&gt;
  // volume &lt;br /&gt;
  double volume(void) const;&lt;br /&gt;
&lt;br /&gt;
};&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Followed by a derived class which builds on the concept of a celestial body and adds in space to store information about it's orbit and additional methods:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
class satellite : public celestial_body&lt;br /&gt;
{&lt;br /&gt;
private:&lt;br /&gt;
&lt;br /&gt;
  double period;&lt;br /&gt;
  double sma_of_orbit;&lt;br /&gt;
&lt;br /&gt;
public:&lt;br /&gt;
&lt;br /&gt;
  // constructor&lt;br /&gt;
  satellite(const double prd = 0.0, const double sma = 0.0,&lt;br /&gt;
            const double eq_rad = 0.0, const double pol_rad = 0.0);&lt;br /&gt;
  // mass&lt;br /&gt;
  double mass_of_attractor(void) const;&lt;br /&gt;
&lt;br /&gt;
};&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In '''inheritance.cc''', you will see that we call the 'set()' method in the base class from the constructor of the derived class.  This highlights that what is private in the base class is hidden from the derived class and so an appropriate interface is required even within a chain of parents and children.&lt;br /&gt;
&lt;br /&gt;
In '''main.cc''', we see that through the process of inheritance, we can call the '''volume()''' method (declared in the base class) from an instance of the derived class:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
cout &amp;lt;&amp;lt; &amp;quot;Volume of moon2 (m^3) is: &amp;quot; &amp;lt;&amp;lt; moon2.volume() &amp;lt;&amp;lt; endl;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Nore that we have declared two instances of the class satellite.  The constructor for 'moon2' is given all the relevant information, whereas that for 'moon1' relies on default values for the size settings.  To run the program type:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./inheritance.exe&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Image:class-hierarchy.jpg|300px|thumbnail|centre|A class hierarchy.]]&lt;br /&gt;
&lt;br /&gt;
'''Exercises'''&lt;br /&gt;
&lt;br /&gt;
* There is a good deal more to discuss on the topic of inheritance, but I will leave researching those details as an exercise to the reader for the moment.&lt;br /&gt;
&lt;br /&gt;
=A Good Read?=&lt;br /&gt;
&lt;br /&gt;
* [[A_Good_Read|References]] for further reading.&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=Python1&amp;diff=9398</id>
		<title>Python1</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=Python1&amp;diff=9398"/>
		<updated>2014-01-21T13:57:59Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: /* Pylab and Matplotlib */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[category:Pragmatic Programming]]&lt;br /&gt;
'''Python for Scientists'''&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
[[Image:Python.png|thumb|1100px|none|http://xkcd.com/353/]]&lt;br /&gt;
&lt;br /&gt;
With thanks to Simon Metson and Mike Wallace for much of the following material.&lt;br /&gt;
&lt;br /&gt;
=Getting Started on BlueCrystal Phase-2=&lt;br /&gt;
&lt;br /&gt;
After you have logged in, type the following at the command line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module add languages/python-2.7.2.0&lt;br /&gt;
python&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This should start up an interactive python session:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Python 2.7.2 (default, Aug 25 2011, 10:51:03) &lt;br /&gt;
[GCC 4.3.3] on linux2&lt;br /&gt;
Type &amp;quot;help&amp;quot;, &amp;quot;copyright&amp;quot;, &amp;quot;credits&amp;quot; or &amp;quot;license&amp;quot; for more information.&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where we can type commands at the '''&amp;gt;&amp;gt;&amp;gt;''' prompt.&lt;br /&gt;
&lt;br /&gt;
=Python as a Calculator=&lt;br /&gt;
&lt;br /&gt;
To get started, let's just try a few commands out.  If you type:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print &amp;quot;Hello!&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
you'll get:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hello!&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you try:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print 5 + 9&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
you'll get:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
14&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far so simple!  Here is a copy of a session containing a few more commands where we've set the values of some variables and also defined and run our own function: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; five = 5&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; neuf = 9&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print five + neuf&lt;br /&gt;
14&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; def say_hello():&lt;br /&gt;
...     print &amp;quot;Hello, world!&amp;quot;&lt;br /&gt;
... # hit return here &lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; say_hello()&lt;br /&gt;
Hello, world!&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can exit an interactive session at any time by typing '''Ctrl-D'''.&lt;br /&gt;
&lt;br /&gt;
=Getting Help=&lt;br /&gt;
&lt;br /&gt;
One of the good things about Python is that it has lots of useful online documentation.  ([[A_Good_Read|There are good books on the language too]].)  For example, take a look at: http://docs.python.org/.  You can also type '''help()''' and the interpreter prompt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; help()&lt;br /&gt;
&lt;br /&gt;
Welcome to Python 2.7!  This is the online help utility.&lt;br /&gt;
&lt;br /&gt;
If this is your first time using Python, you should definitely check out&lt;br /&gt;
the tutorial on the Internet at http://docs.python.org/tutorial/.&lt;br /&gt;
&lt;br /&gt;
Enter the name of any module, keyword, or topic to get help on writing&lt;br /&gt;
Python programs and using Python modules.  To quit this help utility and&lt;br /&gt;
return to the interpreter, just type &amp;quot;quit&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
...&lt;br /&gt;
&lt;br /&gt;
help&amp;gt; keywords&lt;br /&gt;
&lt;br /&gt;
Here is a list of the Python keywords.  Enter any keyword to get more help.&lt;br /&gt;
&lt;br /&gt;
and                 elif                if                  print&lt;br /&gt;
...&lt;br /&gt;
&lt;br /&gt;
help&amp;gt; if&lt;br /&gt;
The ``if`` statement&lt;br /&gt;
********************&lt;br /&gt;
&lt;br /&gt;
The ``if`` statement is used for conditional execution:&lt;br /&gt;
&lt;br /&gt;
   if_stmt ::= &amp;quot;if&amp;quot; expression &amp;quot;:&amp;quot; suite&lt;br /&gt;
               ( &amp;quot;elif&amp;quot; expression &amp;quot;:&amp;quot; suite )*&lt;br /&gt;
               [&amp;quot;else&amp;quot; &amp;quot;:&amp;quot; suite]&lt;br /&gt;
&lt;br /&gt;
It selects exactly one of the suites by evaluating the expressions one&lt;br /&gt;
by one until one is found to be true...&lt;br /&gt;
...&lt;br /&gt;
&lt;br /&gt;
help&amp;gt; quit&lt;br /&gt;
&lt;br /&gt;
You are now leaving help and returning to the Python interpreter.&lt;br /&gt;
...&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Making a Script=&lt;br /&gt;
&lt;br /&gt;
An interactive session can be fun and useful for trying things out.  However--to save our fingers--we will typically want to execute a series of commands as a script, created using your favourite text editor.  Here are the contents of an example script:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/env python&lt;br /&gt;
&lt;br /&gt;
print &amp;quot;Hello, from a python script!&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
Ensure that your script is executable:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
chmod u+x myscript.py&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and now you can run it:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[ggdagw@bigblue4 ~]$ ./myscript.py &lt;br /&gt;
Hello, from a python script!&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Python and Whitespace=&lt;br /&gt;
&lt;br /&gt;
Love it of hate it, Python incorporates whitespace in it's syntax. (It's either that or demarcate blocks with some other syntax, such as ending a line with a semi-colon as it is in C.  Pick your poison.)  Spacing is therefore key in creating a valid python script.  For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
message = &amp;quot;happy days!&amp;quot;&lt;br /&gt;
if len(message) &amp;gt; 10:&lt;br /&gt;
    print &amp;quot;longer..&amp;quot;&lt;br /&gt;
else:&lt;br /&gt;
    print &amp;quot;shorter..&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
will work, but:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
message = &amp;quot;happy days!&amp;quot;&lt;br /&gt;
if len(message) &amp;gt; 10:&lt;br /&gt;
 print &amp;quot;longer..&amp;quot;&lt;br /&gt;
else:&lt;br /&gt;
print &amp;quot;shorter..&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
will not:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
  File &amp;quot;./myscript.py&amp;quot;, line 7&lt;br /&gt;
    print &amp;quot;shorter..&amp;quot;&lt;br /&gt;
        ^&lt;br /&gt;
IndentationError: expected an indented block&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It is therefore a great advantage, when writing to python script, to use a text editor which has a dedicated python mode--such as '''emacs'''--and will actively help you to keep your spacing correct.  See, http://wiki.python.org/moin/PythonEditors, for an extensive list.&lt;br /&gt;
&lt;br /&gt;
=Some Suggested Exercises=&lt;br /&gt;
&lt;br /&gt;
* Calculate the volume of a sphere (Hint: 4/3*pi*r^3)&lt;br /&gt;
* Concatenate two strings&lt;br /&gt;
* Write a recursive function to compute fibonacci numbers (Hint: F(n) = F(n-1) +F(n-2), F(0)=0 and F(1)=1)&lt;br /&gt;
&lt;br /&gt;
=Nuts and Bolts=&lt;br /&gt;
&lt;br /&gt;
==Types==&lt;br /&gt;
&lt;br /&gt;
Python has intrinsic types including, integers, floats, booleans and complex numbers.  It is dynamically typed (meaning that you don't have to have a block of variable declarations at the top of your script), but it is '''not weakly''' typed, for example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; my_complex = 2 + 0.5j&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; my_complex&lt;br /&gt;
(2+0.5j)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; my_complex.real&lt;br /&gt;
2.0&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; my_complex.imag&lt;br /&gt;
0.5&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; name = 'fred'&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; lucky = 7&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; name + lucky&lt;br /&gt;
Traceback (most recent call last):&lt;br /&gt;
  File &amp;quot;&amp;lt;stdin&amp;gt;&amp;quot;, line 1, in &amp;lt;module&amp;gt;&lt;br /&gt;
TypeError: cannot concatenate 'str' and 'int' objects&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Strings==&lt;br /&gt;
&lt;br /&gt;
The eagle-eyed will have spotted in a previous examples that we could ask the length a character string--straight off the bat.  No need to write a counting routine ourselves:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
message = &amp;quot;happy days!&amp;quot;&lt;br /&gt;
print len(message)&lt;br /&gt;
11&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We also take '''slices''' of our character string.  In my case&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
print message[:5]&lt;br /&gt;
happy&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Since a string is an '''object''' (in the object oriented programming sense of the word, but more of that another time...) we can call a number of methods that operate on a string.  A selected sample include:&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot;&lt;br /&gt;
|| s.find(sub) || Finds the first occurrence of the given substring&lt;br /&gt;
|-&lt;br /&gt;
|| s.islower() || Checks whether all characters are lowercase&lt;br /&gt;
|-&lt;br /&gt;
|| s.upper() || Returns '''s''' converted to uppercase&lt;br /&gt;
|-&lt;br /&gt;
|| s.strip() || Removes leading and trailing whitespace&lt;br /&gt;
|-&lt;br /&gt;
|| s.replace(old,new) || Replaces substring '''old''' with '''new'''&lt;br /&gt;
|-&lt;br /&gt;
|| s.split([sep]) || Splits '''s''' uses (optional) '''sep''' as a delimiter.  Returns a list&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Lists and Tuples==&lt;br /&gt;
&lt;br /&gt;
An example of a list is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
shopping = ['bread', 'marmalade', 'milk', 'tea']&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and we can inquire about the length of that using the same function as before:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
len(shopping)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can also take '''slices''' of a list, as we did with a string:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
shopping[0:2]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and even reset a portion of the list that way:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
shopping[0:2] = ['bagels', 'jam']&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Since a list is also an object, we have more handy methods, including:&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot;&lt;br /&gt;
|| s.append(x) || Appends an new element '''x''' to the end of '''s'''&lt;br /&gt;
|-&lt;br /&gt;
|| s.count(x) || Returns the number of occurences of '''x''' in '''s'''&lt;br /&gt;
|-&lt;br /&gt;
|| s.reverse(x) || Reverses items of '''s''' in place&lt;br /&gt;
|-&lt;br /&gt;
|| s.sort([compfunc]) || Sorts items of '''s''' in place.  '''compfunc''' is an optional comparison function&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Tuples are very similar to lists and support many of the same operations (indexing, slicing, concatenation etc.) but differ in that they are '''not mutable''' after creation:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source land=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; mytuple = ('fred', 'ginger', 7, 2.5)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; mylist = ['fred', 'ginger', 7, 2.5]&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; mylist[2] = 8&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print mylist&lt;br /&gt;
['fred', 'ginger', 8, 2.5]&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print mytuple[2]    &lt;br /&gt;
7&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; mytuple[2] = 8&lt;br /&gt;
Traceback (most recent call last):&lt;br /&gt;
  File &amp;quot;&amp;lt;stdin&amp;gt;&amp;quot;, line 1, in &amp;lt;module&amp;gt;&lt;br /&gt;
TypeError: 'tuple' object does not support item assignment&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
List comprehension:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; numbers = [12, 3, 90, 40, 52, 11, 10]&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; small_numbers_doubled = [number * 2 for number in numbers if number &amp;lt; 20]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; small_numbers_doubled&lt;br /&gt;
[24, 6, 22, 20]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Dictionaries==&lt;br /&gt;
&lt;br /&gt;
A dictionary is an associative array or hash table, containing '''key-value''' pairs:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
mydict = {'thomas':'blue', 'james':'red', 'henry':'green'}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print mydict['james']&lt;br /&gt;
red&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can write much more user-friendly and intuitive code using dictionaries, rather than arbitrary indexes into a list.&lt;br /&gt;
&lt;br /&gt;
Some example dictionary methods are:&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot;&lt;br /&gt;
|| m.keys() || Returns a list of the keys in '''m'''&lt;br /&gt;
|-&lt;br /&gt;
|| m.items() || Returns a list of the (key,value) pairs in '''m'''&lt;br /&gt;
|-&lt;br /&gt;
|| m[k] = x || Sets m[k] to x&lt;br /&gt;
|-&lt;br /&gt;
|| m.update(b) || Adds objects from dictionary '''b''' to '''m'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Control Structures==&lt;br /&gt;
&lt;br /&gt;
Of course, we'll need conditionals and loops etc. to go beyond the simplest of scripts.  Here is an '''if-then-else''', python style:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
if sky == ‘blue’:&lt;br /&gt;
    birds_sing()&lt;br /&gt;
elif sky == ‘black’:&lt;br /&gt;
    birds_sleep()&lt;br /&gt;
else:&lt;br /&gt;
    pass #do nothing&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and a classic '''for loop''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
for ii in range(1,10):&lt;br /&gt;
    print ii&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1&lt;br /&gt;
...&lt;br /&gt;
9&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We'll also see a '''while loop''' shoehorned into the next example.&lt;br /&gt;
&lt;br /&gt;
For our control statements, we can use comparison operators such as, '''==''', '''!=''', '''&amp;gt;''', '''&amp;lt;''', '''&amp;lt;=''', '''&amp;gt;=''', and logical operators, such as, '''and''', '''or''','''not'''&lt;br /&gt;
&lt;br /&gt;
==File Input and Output==&lt;br /&gt;
&lt;br /&gt;
Here's some code for printing the contents of a text file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
fp = open(&amp;quot;foo.txt&amp;quot;,&amp;quot;r&amp;quot;)&lt;br /&gt;
line = fp.readline()&lt;br /&gt;
while line:&lt;br /&gt;
    line = line.strip()&lt;br /&gt;
    print line&lt;br /&gt;
    line = fp.readline()&lt;br /&gt;
fp.close()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We could open a file for writing with:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
fp = open(&amp;quot;foo.txt&amp;quot;,&amp;quot;w&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Command Line Parsing==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
&lt;br /&gt;
import sys&lt;br /&gt;
&lt;br /&gt;
if __name__ == &amp;quot;__main__&amp;quot;:&lt;br /&gt;
    # We can test on the length of argv&lt;br /&gt;
    if len(sys.argv) &amp;lt; 2:&lt;br /&gt;
        print &amp;quot;usage: to use this script...&amp;quot;&lt;br /&gt;
    else:&lt;br /&gt;
        ii = 0&lt;br /&gt;
        for arg in sys.argv:&lt;br /&gt;
            # (typically) argv[0] is bound to the script name&lt;br /&gt;
            print &amp;quot;arg&amp;quot;, ii, &amp;quot;is:&amp;quot;, arg&lt;br /&gt;
            ii = ii+1&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~$ ./cmdline.py&lt;br /&gt;
usage: to use this script...&lt;br /&gt;
gethin@gethin-desktop:~$ ./cmdline.py fred ginger&lt;br /&gt;
arg 0 is: ./cmdline.py&lt;br /&gt;
arg 1 is: fred&lt;br /&gt;
arg 2 is: ginger&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Object Oriented Programming in Python=&lt;br /&gt;
&lt;br /&gt;
Here is an example of using a class in python:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
&lt;br /&gt;
class Radio:&lt;br /&gt;
    &amp;quot;A simple radio&amp;quot;&lt;br /&gt;
    def __init__(self,freq=0.0,name=&amp;quot;&amp;quot;):&lt;br /&gt;
        &amp;quot;Constructor method&amp;quot;&lt;br /&gt;
        self.__frequency=freq&lt;br /&gt;
        self.name=name&lt;br /&gt;
    def tune(self,freq):&lt;br /&gt;
        self.__frequency=freq&lt;br /&gt;
    def tuned_to(self):&lt;br /&gt;
        print self.name, &amp;quot;tuned to:&amp;quot;, self.__frequency&lt;br /&gt;
&lt;br /&gt;
if __name__ == &amp;quot;__main__&amp;quot;:&lt;br /&gt;
    # declare two radio instances&lt;br /&gt;
    car = Radio(name=&amp;quot;car&amp;quot;)&lt;br /&gt;
    kitchen = Radio(91.5,&amp;quot;kitchen&amp;quot;)&lt;br /&gt;
    # call some methods&lt;br /&gt;
    car.tuned_to()&lt;br /&gt;
    kitchen.tuned_to()&lt;br /&gt;
    car.tune(89.3)&lt;br /&gt;
    car.tuned_to()&lt;br /&gt;
    # Docstrings--double quotes at the top of the class:                        &lt;br /&gt;
    print car.__doc__&lt;br /&gt;
    # NB members not private by default:&lt;br /&gt;
    print car.name&lt;br /&gt;
    # BUT leading double underscores will trigger&lt;br /&gt;
    # name mangling and hence the member will be hidden &lt;br /&gt;
    print car.__frequency&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Running the script gives us:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
car tuned to: 0.0&lt;br /&gt;
kitchen tuned to: 91.5&lt;br /&gt;
car tuned to: 89.3&lt;br /&gt;
A simple radio&lt;br /&gt;
car&lt;br /&gt;
Traceback (most recent call last):&lt;br /&gt;
  File &amp;quot;./foo.py&amp;quot;, line 27, in &amp;lt;module&amp;gt;&lt;br /&gt;
    print car.__frequency&lt;br /&gt;
AttributeError: Radio instance has no attribute '__frequency'&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Python for Shell Scripting=&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
from subprocess import call&lt;br /&gt;
call([&amp;quot;ls&amp;quot;, &amp;quot;-l&amp;quot;])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Python as a Glue Languge=&lt;br /&gt;
&lt;br /&gt;
* Calling R from python is possible using: http://rpy.sourceforge.net/index.html.&lt;br /&gt;
* Calling Matlab from python: http://mlabwrap.sourceforge.net.&lt;br /&gt;
* With SWIG you can make many bindings, including Python to C and C++: http://www.swig.org/.&lt;br /&gt;
* Or if Fortran is more your cup-of-tea, you can use f2py: http://www.scipy.org/F2py.&lt;br /&gt;
* There are many more examples.&lt;br /&gt;
&lt;br /&gt;
=Using Packages=&lt;br /&gt;
&lt;br /&gt;
Python packages are great because they provide us with a whole lot of extra functionality--above and beyond the core language--that we didn't have to write and debug ourselves.&lt;br /&gt;
&lt;br /&gt;
Let's walk through a simple example using a package.  At an interactive prompt type:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
from random import randint&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will give us access to the '''randint(x,y)''' function, which returns a randomly chosen integer from the given range [x,y]:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
4&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
1&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
3&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
0&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
OK, so far so good.  One thing to note is that the above '''import''' statement has drawn the name ''randint'' into our current '''namespace'''.  What if we had already defined a function named ''randint''.  That could cause problems.  In order to protect ourselves from this kind of problem, there are several import variants.&lt;br /&gt;
&lt;br /&gt;
By default, functions will be added to a namespace with the same name as the package.  In order to call the functions we will, in this case, have to prefix them with there namespace:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; import random&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; random.randint(0,10)&lt;br /&gt;
6&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Should we desire, we can apply a little more control and specify the namespace for the import ourselves: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; import random as rnd&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; rnd.randint(0,10)&lt;br /&gt;
3&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Another--more 'devil-may-care'--approach is to do away with the separate namespace and pull everything from a given package into the current namespace:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from random import *&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
9&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; random()&lt;br /&gt;
0.3172268098313996&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
(The '''random()''' function returns a randomly selected floating point number in the range [0, 1)--that is, between 0 and 1, including 0.0 but always smaller than 1.0.)&lt;br /&gt;
&lt;br /&gt;
==A Namespace Collision==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; def randint():&lt;br /&gt;
...     print &amp;quot;dummy function&amp;quot;&lt;br /&gt;
... &lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint()&lt;br /&gt;
dummy function&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from random import randint&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint()&lt;br /&gt;
Traceback (most recent call last):&lt;br /&gt;
  File &amp;quot;&amp;lt;stdin&amp;gt;&amp;quot;, line 1, in &amp;lt;module&amp;gt;&lt;br /&gt;
TypeError: randint() takes exactly 3 arguments (1 given)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; randint(0,10)&lt;br /&gt;
0&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Databases=&lt;br /&gt;
&lt;br /&gt;
==Simple Databases==&lt;br /&gt;
&lt;br /&gt;
Python provides access to some database packages through some standard packages.  The '''bsddb''' module allows you to access the highly popular '''Berkeley DB database''' from your python code.&lt;br /&gt;
&lt;br /&gt;
The interface to the database provided by this module is very similar to the way in which we access a dictionary.  First, let's populate a database:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
import bsddb&lt;br /&gt;
d = bsddb.btopen('engines.db')&lt;br /&gt;
d['thomas'] = 'blue'&lt;br /&gt;
d['james'] = 'red'&lt;br /&gt;
d['henry'] = 'green'&lt;br /&gt;
d.close()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now let's open the database again and query it's contents:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; d = bsddb.btopen('engines.db')&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; d.keys()&lt;br /&gt;
['henry', 'james', 'thomas']&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; d.first()&lt;br /&gt;
('henry', 'green')&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; d.last()&lt;br /&gt;
('thomas', 'blue')&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; colour = d['james']&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; colour&lt;br /&gt;
'red'&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; del d['henry']&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; d.keys()&lt;br /&gt;
['james', 'thomas']&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Relational Databases==&lt;br /&gt;
&lt;br /&gt;
Relational databases give us more oomph.  '''SQLite''' is a useful relational database to consider as it is light, in that it requires hardly anything in terms of setup or management, yet still understands queries formulated in SQL.  As such it is useful for creating relatively simple examples of SQL access to a database in python and is a stepping stone toward more powerful database packages.&lt;br /&gt;
&lt;br /&gt;
Here is a script which will create a table called '''planets''' in the file '''pytest.db''' and populate with details of the planets in our solar system:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
#&lt;br /&gt;
# Example python script using sqlite3 package&lt;br /&gt;
# to connect to an SQLite database.&lt;br /&gt;
#&lt;br /&gt;
&lt;br /&gt;
import sqlite3&lt;br /&gt;
 &lt;br /&gt;
conn = sqlite3.connect('pytest.db') # or use :memory: to put it in RAM&lt;br /&gt;
&lt;br /&gt;
cursor = conn.cursor()&lt;br /&gt;
 &lt;br /&gt;
# create a table&lt;br /&gt;
cursor.execute(&amp;quot;&amp;quot;&amp;quot;CREATE TABLE planets&lt;br /&gt;
                  (Id INT, Name TEXT, Diameter REAL, &lt;br /&gt;
                   Mass REAL, Orbital_Period REAL)&amp;quot;&amp;quot;&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
# insert a single record&lt;br /&gt;
cursor.execute(&amp;quot;INSERT INTO planets VALUES(1,'Mercury',0.382,0.06,0.24)&amp;quot;)&lt;br /&gt;
conn.commit() # save data to file&lt;br /&gt;
 &lt;br /&gt;
# insert multiple records&lt;br /&gt;
other_planets = [(2,'Venus',0.949,0.82,0.72),&lt;br /&gt;
                 (3,'Earth',1.0,1.0,1.0),&lt;br /&gt;
                 (4,'Mars',0.532,0.11,1.52),&lt;br /&gt;
                 (5,'Jupiter',11.209,317.8,5.20),&lt;br /&gt;
                 (6,'Saturn',9.449,95.2,9.54),&lt;br /&gt;
                 (7,'Uranus',4.007,14.6,19.22),&lt;br /&gt;
                 (8,'Neptune',3.883,17.2,30.06),&lt;br /&gt;
                 (9,'Pluto',0.18,0.002,248.09)]&lt;br /&gt;
cursor.executemany(&amp;quot;INSERT INTO planets VALUES (?,?,?,?,?)&amp;quot;, other_planets)&lt;br /&gt;
conn.commit() # save data to file&lt;br /&gt;
&lt;br /&gt;
# delete a record&lt;br /&gt;
sql = &amp;quot;&amp;quot;&amp;quot;&lt;br /&gt;
DELETE FROM planets&lt;br /&gt;
WHERE Name = 'Pluto'&lt;br /&gt;
&amp;quot;&amp;quot;&amp;quot;&lt;br /&gt;
cursor.execute(sql)  # poor old pluto! &lt;br /&gt;
conn.commit()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And here is a short example script showing a couple of ways to interrogate the database: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
#&lt;br /&gt;
# Example python script using sqlite3 package&lt;br /&gt;
# to connect to an SQLite database.&lt;br /&gt;
#&lt;br /&gt;
&lt;br /&gt;
import sqlite3&lt;br /&gt;
 &lt;br /&gt;
conn = sqlite3.connect('pytest.db') # or use :memory: to put it in RAM&lt;br /&gt;
&lt;br /&gt;
cursor = conn.cursor()&lt;br /&gt;
&lt;br /&gt;
print &amp;quot;All the records in the table, ordered by Name:\n&amp;quot;&lt;br /&gt;
for row in cursor.execute(&amp;quot;SELECT rowid, * FROM planets ORDER BY Name&amp;quot;):&lt;br /&gt;
    print row&lt;br /&gt;
&lt;br /&gt;
print &amp;quot;\n&amp;quot;&lt;br /&gt;
&lt;br /&gt;
print &amp;quot;All the planets with a mass greater than or equal to that of Earth:\n&amp;quot;&lt;br /&gt;
sql = &amp;quot;SELECT * FROM planets WHERE Mass&amp;gt;=?&amp;quot;&lt;br /&gt;
cursor.execute(sql, [(&amp;quot;1.0&amp;quot;)])&lt;br /&gt;
for row in cursor.fetchall():  # or use fetchone()&lt;br /&gt;
    print row&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the results of running the script are:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
All the records in the table, ordered by Name:&lt;br /&gt;
&lt;br /&gt;
(3, 3, u'Earth', 1.0, 1.0, 1.0)&lt;br /&gt;
(5, 5, u'Jupiter', 11.209, 317.80000000000001, 5.2000000000000002)&lt;br /&gt;
(4, 4, u'Mars', 0.53200000000000003, 0.11, 1.52)&lt;br /&gt;
(1, 1, u'Mercury', 0.38200000000000001, 0.059999999999999998, 0.23999999999999999)&lt;br /&gt;
(8, 8, u'Neptune', 3.883, 17.199999999999999, 30.059999999999999)&lt;br /&gt;
(6, 6, u'Saturn', 9.4489999999999998, 95.200000000000003, 9.5399999999999991)&lt;br /&gt;
(7, 7, u'Uranus', 4.0069999999999997, 14.6, 19.219999999999999)&lt;br /&gt;
(2, 2, u'Venus', 0.94899999999999995, 0.81999999999999995, 0.71999999999999997)&lt;br /&gt;
&lt;br /&gt;
All the planets with a mass greater than or equal to that of Earth:&lt;br /&gt;
&lt;br /&gt;
(3, u'Earth', 1.0, 1.0, 1.0),&lt;br /&gt;
(5, u'Jupiter', 11.209, 317.80000000000001, 5.2000000000000002),&lt;br /&gt;
(6, u'Saturn', 9.4489999999999998, 95.200000000000003, 9.5399999999999991),&lt;br /&gt;
(7, u'Uranus', 4.0069999999999997, 14.6, 19.219999999999999),&lt;br /&gt;
(8, u'Neptune', 3.883, 17.199999999999999, 30.059999999999999)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more information on using SQLite with Python, see, e.g.:&lt;br /&gt;
* http://zetcode.com/db/sqlitepythontutorial/&lt;br /&gt;
* http://www.blog.pythonlibrary.org/2012/07/18/python-a-simple-step-by-step-sqlite-tutorial/&lt;br /&gt;
&lt;br /&gt;
You can also connect to a MySQL database from python using, e.g. the [http://mysql-python.sourceforge.net/ python-mysqldb] package.  A snippet of python code for connecting to a database is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
import MySQLdb&lt;br /&gt;
&lt;br /&gt;
conn = MySQLdb.connect(host=&amp;quot;localhost&amp;quot;,   # your host, usually localhost&lt;br /&gt;
                     user=&amp;quot;gethin&amp;quot;,      # your username&lt;br /&gt;
                      passwd=&amp;quot;changeme&amp;quot;, # your password&lt;br /&gt;
                      db=&amp;quot;menagerie&amp;quot;)    # name of the data base&lt;br /&gt;
&lt;br /&gt;
# Create a cursor object, as before with SQLite&lt;br /&gt;
cur = conn.cursor() &lt;br /&gt;
&lt;br /&gt;
# and then you can submit your SQL command:&lt;br /&gt;
cur.execute(&amp;quot;SELECT * FROM YOUR_TABLE_NAME&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Numpy=&lt;br /&gt;
&lt;br /&gt;
OK, let's move onto looking at python's numerical processing capabilities.  We will start by looking at the '''numpy''' package:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
from numpy import *&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now that we have access to the functions from '''numpy''', let's create an array.  '''Note that a numpy array is an object of a different type to an intrinsic array in Python'''.   A simple approach is to use the '''array''' function.  For example we might enter:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
a = array([[1.0,0.0,0.0],[0.0,1.0,0.0],[0.0,0.0,1.0]])&lt;br /&gt;
b = array([[1,2,3],[4,5,6],[7,8,9]])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a&lt;br /&gt;
array([[ 1.,  0.,  0.],&lt;br /&gt;
       [ 0.,  1.,  0.],&lt;br /&gt;
       [ 0.,  0.,  1.]])&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; b        &lt;br /&gt;
array([[1, 2, 3],&lt;br /&gt;
       [4, 5, 6],&lt;br /&gt;
       [7, 8, 9]])&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; transpose(b)&lt;br /&gt;
array([[1, 4, 7],&lt;br /&gt;
       [2, 5, 8],&lt;br /&gt;
       [3, 6, 9]])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Given an array, we may inquire about it's shape:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
print a.shape&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and we are told that it is a 2-dimensional array (i.e. an array of rank 2) and that the length of both dimensions is 3:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
(3, 3)&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
We can also apply operators to array objects.  For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
a = a * 9&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
array([[ 9.,  0.,  0.],&lt;br /&gt;
       [ 0.,  9.,  0.],&lt;br /&gt;
       [ 0.,  0.,  9.]])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Note, however, that most operations on numpy arrays are done element-wise''', which is '''different to a linear algebra operation that you may have been expecting.'''  We will return to linear algebra operations when we look at the '''scipy''' package.&lt;br /&gt;
&lt;br /&gt;
Should we so desire, we could re-shape the array.  One way to do this is to to set it's shape attribute directly:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a.shape = (1,9)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a&lt;br /&gt;
array([[ 9.,  0.,  0.,  0.,  9.,  0.,  0.,  0.,  9.]])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As with the list example, it can be useful to read or change the value of an element (or sub array) individually.  Let's turn the array back to it's rank-2 form and try it out:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a.shape = (3,3)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a[1,1] = 777.0&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print a&lt;br /&gt;
[[   9.    0.    0.]&lt;br /&gt;
 [   0.  777.    0.]&lt;br /&gt;
 [   0.    0.    9.]]&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; a[1:,1:] = [[777.0, 777.0],[777.0, 777.0]]&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print a&lt;br /&gt;
[[   9.    0.    0.]&lt;br /&gt;
 [   0.  777.  777.]&lt;br /&gt;
 [   0.  777.  777.]]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This is all pretty handy so far, but specifying the value of each element explicitly could become a chore.  Happily some helper functions exist to give you a head start with some building blocks.  For example, your can use:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; b = zeros((3,3))&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print b&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; b = ones((3,2))&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print b&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; b = identity(2)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print b&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; big = resize(b, (6,6))&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print big&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The use of '''resize''' in the last example illustrates a useful '''replicating feature'''.&lt;br /&gt;
&lt;br /&gt;
A list of all the functions and operations contained within numpy is: http://scipy.org/Numpy_Example_List.&lt;br /&gt;
&lt;br /&gt;
=Pylab and Matplotlib=&lt;br /&gt;
&lt;br /&gt;
The above examples are quite natty, but we have deliberately kept the array sizes small so that we can print the element values easily.  In practice, you may find that your array sizes are much larger and printing the values to the screen is impractical.  Fear not!  Python has many packages which help you plot your data, so that you can explore it.&lt;br /&gt;
&lt;br /&gt;
Using the pylab plotting interface we can create:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
import pylab&lt;br /&gt;
from numpy import arange, pi, cos, sin, add, sqrt&lt;br /&gt;
t = arange(0.0, 3.0, 0.01)&lt;br /&gt;
c = cos(2 * pi * t)&lt;br /&gt;
s = sin(2 * pi * t)&lt;br /&gt;
pylab.ylabel('some numbers')&lt;br /&gt;
pylab.xlabel('some more numbers')&lt;br /&gt;
pylab.plot(t, c, 'r', lw=2)&lt;br /&gt;
pylab.plot(t, s, 'b', lw=2)&lt;br /&gt;
pylab.plot(t, c-s, 'gs', lw=2)&lt;br /&gt;
pylab.ylim(-1.5, 1.5)&lt;br /&gt;
pylab.title('sin and cos functions')&lt;br /&gt;
pylab.savefig('curves', dpi=300)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where '''curves.png''' looks like:&lt;br /&gt;
&lt;br /&gt;
[[Image:Curves.png|thumb|600px|none|Some nice curves]]&lt;br /&gt;
&lt;br /&gt;
You can open .png images from the linux command line (inc. bluecrystal) using, e.g.: '''display -resize 1000 curves.png''' &lt;br /&gt;
&lt;br /&gt;
We can also use Matplotlib directly for more control:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
import matplotlib.pyplot as plt&lt;br /&gt;
from pylab import meshgrid&lt;br /&gt;
from numpy import arange, add, sin, sqrt&lt;br /&gt;
x = arange(-5,10)&lt;br /&gt;
y = arange(-4,11)&lt;br /&gt;
z1 = sqrt(add.outer(x**2,y**2))&lt;br /&gt;
Z = sin(z1)/z1 &lt;br /&gt;
X, Y = meshgrid(x,y)&lt;br /&gt;
plt.figure()&lt;br /&gt;
plt.contour(X,Y,Z)&lt;br /&gt;
plt.show()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and you should get a window similar to:&lt;br /&gt;
&lt;br /&gt;
[[Image:Sinc-matplotlib-contour.png|thumb|600px|none|A contour map of the sinc function]]&lt;br /&gt;
&lt;br /&gt;
Perhaps the best way next step for matplotlib is to look at the gallery: http://matplotlib.org/gallery.html.&lt;br /&gt;
Just click on a figure and you will get the code used to generate it--a really great resource!&lt;br /&gt;
&lt;br /&gt;
==Input and Output==&lt;br /&gt;
&lt;br /&gt;
The foregoing is all very interesting, but life would be rather dull if you had to re-enter all your data by hand whenever you set to work with Python and numpy.  Therefore we need a means to save data to a file and load it again.  Happily, we can do this rather easily using a couple of routines from the '''pylab''' package:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from numpy import *&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from pylab import load&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from pylab import save&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; data = zeros((3,3))&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; save('myfile.txt', data)&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; read_data = load(&amp;quot;myfile.txt&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''warning, the load() function of numpy will be shadowed''' in the above example.  One way to protect yourself against this is to make use of '''namespaces''':  Modify your import command to '''import pylab''' and then use '''pylab.load(..)'''.&lt;br /&gt;
&lt;br /&gt;
=Scipy=&lt;br /&gt;
&lt;br /&gt;
* http://www.scipy.org/&lt;br /&gt;
* ..and good examples on http://scipy-lectures.github.com/intro/scipy.html&lt;br /&gt;
* Many useful features:&lt;br /&gt;
* Integration &amp;amp; Differentiation&lt;br /&gt;
* Optimisation (curve fitting, etc)&lt;br /&gt;
* Fourier transforms&lt;br /&gt;
* Signal processing&lt;br /&gt;
* Statistical algorithms&lt;br /&gt;
* Much, much more...&lt;br /&gt;
* If you know Python you can use SciPy&lt;br /&gt;
&lt;br /&gt;
==An example: Differentiation==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; # derivative of x^2 at x=3&lt;br /&gt;
...&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from scipy import derivative&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; derivative(lambda x: x**2, 3)&lt;br /&gt;
6.0&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; # also works with arrays&lt;br /&gt;
...&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; from numpy import array&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; my_array = array([1,2,3])&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; derivative(lambda x: x**2,my_array)&lt;br /&gt;
array([ 2., 4., 6.])&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Google for many more examples pertaining to your favourite numerical procedure!&lt;br /&gt;
&lt;br /&gt;
=A Repository of Packages You Could Use=&lt;br /&gt;
&lt;br /&gt;
Now, we've touched on a couple, but there are thousands of python packages available.  Before you start writing your own function for X, check that someone hasn't contributed code for that already at http://pypi.python.org/pypi.&lt;br /&gt;
&lt;br /&gt;
=Writing Faster Python=&lt;br /&gt;
&lt;br /&gt;
As with other scripting languages, such as MATLAB and R, one of the simplest ways in which you can write faster python code is to eliminate loops by vectorising your code.&lt;br /&gt;
&lt;br /&gt;
Consider the following two scripts.  First '''for-loop.py''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
&lt;br /&gt;
import numpy as np&lt;br /&gt;
arr = np.random.rand(1000000)&lt;br /&gt;
&lt;br /&gt;
def filter(arr):&lt;br /&gt;
    for i, val in enumerate(arr):&lt;br /&gt;
        if val &amp;lt; 0.5:&lt;br /&gt;
            arr[i] = 0&lt;br /&gt;
    return arr&lt;br /&gt;
&lt;br /&gt;
if __name__ == &amp;quot;__main__&amp;quot;:&lt;br /&gt;
    filter(arr)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and secondly, '''vectorised.py''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;python&amp;quot;&amp;gt;&lt;br /&gt;
#!/usr/bin/env python&lt;br /&gt;
&lt;br /&gt;
import numpy as np&lt;br /&gt;
arr = np.random.rand(1000000)&lt;br /&gt;
&lt;br /&gt;
def filter(arr):&lt;br /&gt;
    arr[arr &amp;lt; 0.5] = 0&lt;br /&gt;
    return arr&lt;br /&gt;
&lt;br /&gt;
if __name__ == &amp;quot;__main__&amp;quot;:&lt;br /&gt;
    filter(arr)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If we now run these two scripts through the Linux command line '''time''' utility, we see that the vectorised code runs a lot faster than the for loop:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~$ time ./for-loop.py &lt;br /&gt;
&lt;br /&gt;
real	0m0.963s&lt;br /&gt;
user	0m0.952s&lt;br /&gt;
sys	0m0.012s&lt;br /&gt;
gethin@gethin-desktop:~$ time ./vectorised.py &lt;br /&gt;
&lt;br /&gt;
real	0m0.116s&lt;br /&gt;
user	0m0.096s&lt;br /&gt;
sys	0m0.020s&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For some more tips on writing faster python code, and examples of how to use one of the python profiler modules, take a look at:&lt;br /&gt;
* https://wiki.python.org/moin/PythonSpeed/PerformanceTips&lt;br /&gt;
* http://technicaldiscovery.blogspot.co.uk/2011/06/speeding-up-python-numpy-cython-and.html&lt;br /&gt;
* http://www.huyng.com/posts/python-performance-analysis/&lt;br /&gt;
* http://www.appneta.com/2012/05/21/profiling-python-performance-lineprof-statprof-cprofile/&lt;br /&gt;
&lt;br /&gt;
=Further Reading=&lt;br /&gt;
&lt;br /&gt;
* http://docs.python.org/tutorial/&lt;br /&gt;
* http://wiki.python.org/moin/PythonBooks&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=R1&amp;diff=9377</id>
		<title>R1</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=R1&amp;diff=9377"/>
		<updated>2013-11-28T17:06:24Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: /* Binning Data */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[category:Pragmatic Programming]]&lt;br /&gt;
'''Open Source Statistics with R'''&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
R is a mature, open-source (i.e. free!) statistics package, with an intuitive interface, excellent graphics and a vibrant community constantly adding new methods for the statistical investigation of your data to the library of packages available.&lt;br /&gt;
&lt;br /&gt;
The goal of this tutorial is to introduce you to the R package, and not to be an introductory course in statistics.&lt;br /&gt;
&lt;br /&gt;
If you are working on a Linux system, you will typically start R from the command line.  On a Windows machine, or a Mac, you will typically start up R in some form of GUI.  However you get R started, you will have access to an R command prompt.  The good news is that the examples below will all work at the R command prompt, however you gained access to it.&lt;br /&gt;
&lt;br /&gt;
Further resources:&lt;br /&gt;
&lt;br /&gt;
* The R manual is a great resource for learning R: http://cran.r-project.org/doc/manuals/r-release/R-intro.pdf&lt;br /&gt;
* Some excellent examples of using R can also be found at: http://msenux.redwoods.edu/math/R/ and http://www.r-tutor.com/&lt;br /&gt;
&lt;br /&gt;
=Getting Started=&lt;br /&gt;
&lt;br /&gt;
The very simplest thing we can do with R is to perform some arithmetic at the command prompt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; phi &amp;lt;- (1+sqrt(5))/2&lt;br /&gt;
&amp;gt; phi&lt;br /&gt;
[1] 1.618034&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Parentheses are used to modify the usual order of precedence of the operators ('''/''' will typically be evaluated before '''+''').  Note the '''[1]''' accompanying the returned value.  All numbers entered at the console are interpreted as a vector.  The '[1]' indicates that the line in question is displaying the vector of values starting at first index.  We can use the handy sequence function to create a vector containing more than a single element:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; odds &amp;lt;- seq(from=1, to=67, by=2)&lt;br /&gt;
&amp;gt; odds&lt;br /&gt;
 [1]  1  3  5  7  9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49&lt;br /&gt;
[26] 51 53 55 57 59 61 63 65 67&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
From the above example, we can see that both the '''&amp;lt;-''' and '''=''' operators can be used for assignment.&lt;br /&gt;
&lt;br /&gt;
Vectors are commonly used data structures in R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
coords.bris &amp;lt;- c(51.5, 2.6)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As are matrices:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; magic &amp;lt;- matrix(data=c(2,7,6,9,5,1,4,3,8),nrow=3,ncol=3)&lt;br /&gt;
&amp;gt; magic&lt;br /&gt;
     [,1] [,2] [,3]&lt;br /&gt;
[1,]    2    9    4&lt;br /&gt;
[2,]    7    5    3&lt;br /&gt;
[3,]    6    1    8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the '''c''' function combines the arguments given in the parentheses.  We can access portions of the array using the syntax shown in the square brackets.  For example, we can access the first row using the '''[1,]''' notation, and similarly the second column using '''[,2]'''.  Since the square is 3x3 magic, the numbers in both slices should sum to 15:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; sum(magic[1,])&lt;br /&gt;
[1] 15&lt;br /&gt;
&amp;gt; sum(magic[,2])&lt;br /&gt;
[1] 15&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Single elements and ranges can also accessed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; magic[2,2]&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;gt; magic[2:3,2:3]&lt;br /&gt;
     [,1] [,2]&lt;br /&gt;
[1,]    5    3&lt;br /&gt;
[2,]    1    8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
R also provides '''arrays''', which have more than two dimensions, and '''lists''' to hold heterogeneous collections.&lt;br /&gt;
&lt;br /&gt;
An example list:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; list.r4 &amp;lt;- list(name=&amp;quot;Radio4&amp;quot;, frequency=&amp;quot;93.7&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The items of which, we can access in several ways:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; list.r4$frequency&lt;br /&gt;
[1] &amp;quot;93.7&amp;quot;&lt;br /&gt;
&amp;gt; list.r4[1]&lt;br /&gt;
$name&lt;br /&gt;
[1] &amp;quot;Radio4&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;gt; list.r4[[1]]&lt;br /&gt;
[1] &amp;quot;Radio4&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A very commonly used data structure is the '''data frame''', which R uses to store tabular data.  Given several vectors of equal length, we can collate them into a data frame:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; country &amp;lt;- c(&amp;quot;USA&amp;quot;, &amp;quot;China&amp;quot;, &amp;quot;GB&amp;quot;)&lt;br /&gt;
&amp;gt; gold &amp;lt;- c(46, 38, 29)&lt;br /&gt;
&amp;gt; silver &amp;lt;- c(29, 27, 17)&lt;br /&gt;
&amp;gt; bronze &amp;lt;- c(29, 23, 19)&lt;br /&gt;
&amp;gt; medals.2012 &amp;lt;- data.frame(country, gold, silver, bronze)&lt;br /&gt;
&amp;gt; medals.2012&lt;br /&gt;
    country  gold  silver  bronze&lt;br /&gt;
1       USA    46      29      29&lt;br /&gt;
2     China    38      27      23&lt;br /&gt;
3        GB    29      17      19&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can access columns of a data frame using the '''$''' operator:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; medals.2012$country&lt;br /&gt;
[1] USA   China GB   &lt;br /&gt;
Levels: China GB USA&lt;br /&gt;
&amp;gt; medals.2012$gold&lt;br /&gt;
[1] 46 38 29&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Standard Graphics: A taster=&lt;br /&gt;
&lt;br /&gt;
An aspect which makes R popular are it's graphing functions.  R also has some very handy built-in data sets--we'll use this to demonstrate just a small fraction of R's graphing abilities.&lt;br /&gt;
&lt;br /&gt;
First up is the humble '''plot()''' function.  Given a data frame of points, such as one charting the relationship between temperature and the vapour pressure of mercury, it will give us a (handily labelled) scatter plot: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(pressure)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See the gallery below for all the plots created in this section.&lt;br /&gt;
&lt;br /&gt;
The plot function will also accept a time-series (another class of object recognised by R) and will sensibly join the points with a line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(co2)&lt;br /&gt;
&amp;gt; class(co2)&lt;br /&gt;
[1] &amp;quot;ts&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Pie charts are easily constructed.  In this case, to show the relative proportions of electricity generated from different sources in the UK in 2011 (source: https://www.gov.uk/government/.../5942-uk-energy-in-brief-2012.pdf‎):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; uk.electricty.sources.2011 &amp;lt;- c(41,29,18,5,4,2,1)&lt;br /&gt;
&amp;gt; names(uk.electricty.sources.2011) &amp;lt;- (&amp;quot;Gas&amp;quot;, &amp;quot;Coal&amp;quot;, &amp;quot;Nuclear&amp;quot;, &amp;quot;Hydro &amp;amp; other&amp;quot;, &amp;quot;Wind&amp;quot;, &amp;quot;Imports&amp;quot;, &amp;quot;Oil&amp;quot;)&lt;br /&gt;
&amp;gt; pie(uk.electricty.sources.2011, main=&amp;quot;UK Electricty Generating Mix, 2011&amp;quot;, col=rainbow(7))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Next, let's create a bar chart of monthly average precipitation falling here in the fair city of Bristol (source: http://www.worldweatheronline.com):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; bristol.precip &amp;lt;- c(82.9, 56.1, 59.2, 69, 50.8, 50.9, 50.8, 74.8, 74.7, 91.1, 94.5, 93.6)&lt;br /&gt;
&amp;gt; names(bristol.precip) &amp;lt;- c(&amp;quot;Jan&amp;quot;, &amp;quot;Feb&amp;quot;, &amp;quot;Mar&amp;quot;, &amp;quot;Apr&amp;quot;, &amp;quot;May&amp;quot;, &amp;quot;Jun&amp;quot;, &amp;quot;Jul&amp;quot;, &amp;quot;Aug&amp;quot;, &amp;quot;Sep&amp;quot;, &amp;quot;Oct&amp;quot;, &amp;quot;Nov&amp;quot;, &amp;quot;Dec&amp;quot;)&lt;br /&gt;
&amp;gt; barplot(bristol.precip,&lt;br /&gt;
+ main=&amp;quot;Average Monthly Precipitation in Bristol&amp;quot;,&lt;br /&gt;
+ ylab=&amp;quot;Mean precipitation (mm)&amp;quot;,&lt;br /&gt;
+ ylim=c(0,100),&lt;br /&gt;
+ col=c(&amp;quot;darkblue&amp;quot;))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Box_plot 'Box and whisker' plots] are useful ways to graph the quartiles of some data.  In this case, the fuel efficiencies of various US cars, circa 1974:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; boxplot(mpg~cyl,data=mtcars, main=&amp;quot;Car Milage Data&amp;quot;,&lt;br /&gt;
+    xlab=&amp;quot;Number of Cylinders&amp;quot;, ylab=&amp;quot;Miles Per Gallon&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
R includes a very useful help facility.  In the case of the '''filled.contour()''' plotting function, the help page includes an example of it's use to plot the topology of a volcano in Auckland, NZ:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; ?filled.countour&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;gallery widths=300px heights=300px perrow=3&amp;gt;&lt;br /&gt;
File:Vapour-pressure.png|Vapour pressure of mercury against temperature&lt;br /&gt;
File:Mauna-loa.png|CO2 concentrations measured at Mauna-Loa between 1959 and 1997&lt;br /&gt;
File:Pie.png|The UK's electricity generating mix, 2011&lt;br /&gt;
File:Barplot.png|Average monthly precipitation in Bristol&lt;br /&gt;
File:Boxplot.png|Range of fuel efficiencies for different engine sizes &lt;br /&gt;
File:Maunga-Whau.png|Topology of Maunga Whau volcano in Auckland&lt;br /&gt;
&amp;lt;/gallery&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are many more example plots--complete with the R code required to create the plots (at the bottom of the page, after the comments)--on the following web page:&lt;br /&gt;
* http://gallery.r-enthusiasts.com/thumbs.php&lt;br /&gt;
&lt;br /&gt;
=Loops=&lt;br /&gt;
&lt;br /&gt;
A simple '''for''' loop:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; for (ii in seq(1,10)) print(ii)&lt;br /&gt;
[1] 1&lt;br /&gt;
[1] 2&lt;br /&gt;
[1] 3&lt;br /&gt;
[1] 4&lt;br /&gt;
[1] 5&lt;br /&gt;
[1] 6&lt;br /&gt;
[1] 7&lt;br /&gt;
[1] 8&lt;br /&gt;
[1] 9&lt;br /&gt;
[1] 10&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Some more exotic counting:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; for (ii in seq(from=10, to=0, by=-2)) print(ii)&lt;br /&gt;
[1] 10&lt;br /&gt;
[1] 8&lt;br /&gt;
[1] 6&lt;br /&gt;
[1] 4&lt;br /&gt;
[1] 2&lt;br /&gt;
[1] 0&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''while''' loops are for when we don't know the number of iterations in advance:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; ii &amp;lt;- runif(1,0,1)&lt;br /&gt;
&amp;gt; ii&lt;br /&gt;
[1] 0.3998513&lt;br /&gt;
&amp;gt; while (ii &amp;lt; 0.5) {print(ii); ii &amp;lt;- runif(1,0,1)}&lt;br /&gt;
[1] 0.3998513&lt;br /&gt;
[1] 0.05469244&lt;br /&gt;
&amp;gt; ii&lt;br /&gt;
[1] 0.8265036&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Functions=&lt;br /&gt;
&lt;br /&gt;
You can define your own functions in R, using the '''function''' keyword.  For example, Pythagoras' Theorem:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypotenuse &amp;lt;- function(x, y) {sqrt(x^2 + y^2)}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The braces ({}) are optional, but add clarity.&lt;br /&gt;
&lt;br /&gt;
To call the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypotenuse(3,4)&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can provide default values for the arguments, which can be overridden for any given invocation of the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot2 &amp;lt;- function(x=3 ,y=4) {sqrt(x^2 + y^2)}&lt;br /&gt;
&amp;gt; hypot2()&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;gt; hypot2(12,16)&lt;br /&gt;
[1] 20&lt;br /&gt;
&amp;gt; hypot2(y=16, x=12)&lt;br /&gt;
[1] 20&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can see that the order of the arguments is respected, unless the names are given, in which case the order can be changed. &lt;br /&gt;
&lt;br /&gt;
Longer functions can be spread over several lines.  We can also use the '''return''' keyword to control which value is returned by the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot3 &amp;lt;- function(x=3 ,y=4) {&lt;br /&gt;
+ x_sq &amp;lt;- x^2&lt;br /&gt;
+ y_sq &amp;lt;- y^2&lt;br /&gt;
+ return( sqrt(x_sq + y_sq) )}&lt;br /&gt;
&amp;gt; hypot3(6,8)&lt;br /&gt;
[1] 10&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can check on the contents of a function, by just typing it's name (without parentheses):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot3&lt;br /&gt;
function(x=3 ,y=4) {&lt;br /&gt;
x_sq &amp;lt;- x^2&lt;br /&gt;
y_sq &amp;lt;- y^2&lt;br /&gt;
return( sqrt(x_sq + y_sq) )}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or just check the arguments, using the '''args''' function.  (The body of the function in general is reported as NULL):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; args(hypot3)&lt;br /&gt;
function (x = 3, y = 4) &lt;br /&gt;
NULL&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Packages=&lt;br /&gt;
&lt;br /&gt;
Listed at http://cran.r-project.org/&lt;br /&gt;
&lt;br /&gt;
Let's install the '''multicore''' package, that will give us access to functions within R which will run on the multiple processors which we often find in our computers these days:   &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; install.packages(&amp;quot;multicore&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Et voila!  It is done.&lt;br /&gt;
&lt;br /&gt;
We can check which packages are currently loaded into the library available from our workspace:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; library()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If we need to add one, we type e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; library(multicore)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now, an example of using a function from the multicore package.  The '''lapply''' function, which is included in the standard R core, will map a given function over a list inputs, giving a list of the function outputs in return.  For example, we can map a squaring function over the list of integers from 1 to 3:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; lapply(1:3, function(x) {x^2})&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
which gives us the list:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[[1]]&lt;br /&gt;
[1] 1&lt;br /&gt;
&lt;br /&gt;
[[2]]&lt;br /&gt;
[1] 4&lt;br /&gt;
&lt;br /&gt;
[[3]]&lt;br /&gt;
[1] 9&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now, we can do the same work in parallel using:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; mclapply(1:3, function(x) {x^2})&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Reading Data from File=&lt;br /&gt;
&lt;br /&gt;
R provides some very useful functions for reading and writing data from/to file.&lt;br /&gt;
&lt;br /&gt;
==Text Files==&lt;br /&gt;
&lt;br /&gt;
Let's start with text files.  If your data is organised into a file such that it looks like a table with column headings:  &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Perhaps the simplest one is '''read.table()'''.  If I have a text file with the following contents:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
country              gold silver bronze&lt;br /&gt;
&amp;quot;USA&amp;quot;                46   29     29&lt;br /&gt;
&amp;quot;China&amp;quot;              38   27     23&lt;br /&gt;
&amp;quot;Great Britain&amp;quot;      29   17     19&lt;br /&gt;
&amp;quot;Russian Federation&amp;quot; 24   26     32&lt;br /&gt;
&amp;quot;Republic of Korea&amp;quot;  13   8      7&lt;br /&gt;
&amp;quot;Germany&amp;quot;            11   19     14&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It will be a simple matter to use the '''read.table()''' function to load the data into R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; medals.2012 &amp;lt;- read.table(&amp;quot;medals.txt&amp;quot;, header=TRUE)&lt;br /&gt;
&amp;gt; medals.2012&lt;br /&gt;
             country gold silver bronze&lt;br /&gt;
1                USA   46     29     29&lt;br /&gt;
2              China   38     27     23&lt;br /&gt;
3      Great Britain   29     17     19&lt;br /&gt;
4 Russian Federation   24     26     32&lt;br /&gt;
5  Republic of Korea   13      8      7&lt;br /&gt;
6            Germany   11     19     14&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is a corresponding '''write.table()''' function to export the contents of a data frame into a text file.&lt;br /&gt;
&lt;br /&gt;
CSV files can be easily handled by specifying '''sep=&amp;quot;,&amp;quot;''' as an argument to read.table().  However, for convenience, there are also '''read.csv()''' and '''write.csv()''' functions defined.  For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; write.csv(medals.2012,&amp;quot;medals.csv&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Gives us the file, '''medals.csv''', with the contents:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;quot;&amp;quot;,&amp;quot;country&amp;quot;,&amp;quot;gold&amp;quot;,&amp;quot;silver&amp;quot;,&amp;quot;bronze&amp;quot;&lt;br /&gt;
&amp;quot;1&amp;quot;,&amp;quot;USA&amp;quot;,46,29,29&lt;br /&gt;
&amp;quot;2&amp;quot;,&amp;quot;China&amp;quot;,38,27,23&lt;br /&gt;
&amp;quot;3&amp;quot;,&amp;quot;Great Britain&amp;quot;,29,17,19&lt;br /&gt;
&amp;quot;4&amp;quot;,&amp;quot;Russian Federation&amp;quot;,24,26,32&lt;br /&gt;
&amp;quot;5&amp;quot;,&amp;quot;Republic of Korea&amp;quot;,13,8,7&lt;br /&gt;
&amp;quot;6&amp;quot;,&amp;quot;Germany&amp;quot;,11,19,14&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Binary Files==&lt;br /&gt;
&lt;br /&gt;
The '''save()''' function will store an R data structure in binary form:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; save(medals.2012,file=&amp;quot;medals.RData&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~$ file medals.RData &lt;br /&gt;
medals.RData: gzip compressed data, from Unix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is, of course, a corresponding function to load such data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; load(&amp;quot;medals.RData&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Databases==&lt;br /&gt;
&lt;br /&gt;
If you would like to read and write data directly from/to a database, there are several packages to help you.  See http://cran.r-project.org/doc/manuals/r-release/R-data.html#Relational-databases for more information.&lt;br /&gt;
&lt;br /&gt;
==NetCDF==&lt;br /&gt;
&lt;br /&gt;
The [http://cran.r-project.org/web/packages/ncdf/index.html '''ncdf''' package] provides an interface to NetCDF files.  Before installing the package, you will need the Unidata NetCDF libraries installed on your system.  On Linux, the standard package managers conveniently provide this.  Note that you will need the 'development' packages.  Once the prerequisites are satisfied, you can use the standard R command to install the package from CRAN:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; install.packages(&amp;quot;ncdf&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Examples of Common Tasks=&lt;br /&gt;
&lt;br /&gt;
==Preparing Data==&lt;br /&gt;
&lt;br /&gt;
===Sorting===&lt;br /&gt;
&lt;br /&gt;
Using '''sort''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; railway.engines &amp;lt;- c(&amp;quot;thomas&amp;quot;, &amp;quot;henry&amp;quot;, &amp;quot;gordon&amp;quot;, &amp;quot;edward&amp;quot;, &amp;quot;james&amp;quot;)&lt;br /&gt;
&amp;gt; sort(railway.engines)&lt;br /&gt;
[1] &amp;quot;edward&amp;quot; &amp;quot;gordon&amp;quot; &amp;quot;henry&amp;quot;  &amp;quot;james&amp;quot;  &amp;quot;thomas&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/sort.html&lt;br /&gt;
&lt;br /&gt;
===Random Sampling===&lt;br /&gt;
&lt;br /&gt;
Using '''sample''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; railway.engines &amp;lt;- c(&amp;quot;thomas&amp;quot;, &amp;quot;henry&amp;quot;, &amp;quot;gordon&amp;quot;, &amp;quot;edward&amp;quot;, &amp;quot;james&amp;quot;)&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;gordon&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;james&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;edward&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;thomas&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;gordon&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;james&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/sample.html&lt;br /&gt;
&lt;br /&gt;
===Combining===&lt;br /&gt;
&lt;br /&gt;
Using '''rbind''' to add combine the rows to two data frames:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; country &amp;lt;- c(&amp;quot;France&amp;quot;, &amp;quot;Italy&amp;quot;, &amp;quot;Hungary&amp;quot;, &amp;quot;Australia&amp;quot;)&lt;br /&gt;
&amp;gt; gold &amp;lt;- c(11, 8, 8, 7)&lt;br /&gt;
&amp;gt; silver &amp;lt;- c(11, 9, 4, 16)&lt;br /&gt;
&amp;gt; bronze &amp;lt;- c(12, 11, 5, 12)&lt;br /&gt;
&amp;gt; extras.2012 &amp;lt;- data.frame(country, gold, silver, bronze)&lt;br /&gt;
&amp;gt; rbind(medals.2012, extras.2012)&lt;br /&gt;
              country gold silver bronze&lt;br /&gt;
1                 USA   46     29     29&lt;br /&gt;
2               China   38     27     23&lt;br /&gt;
3       Great Britain   29     17     19&lt;br /&gt;
4  Russian Federation   24     26     32&lt;br /&gt;
5   Republic of Korea   13      8      7&lt;br /&gt;
6             Germany   11     19     14&lt;br /&gt;
7              France   11     11     12&lt;br /&gt;
8               Italy    8      9     11&lt;br /&gt;
9             Hungary    8      4      5&lt;br /&gt;
10          Australia    7     16     12&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/cbind.html&lt;br /&gt;
&lt;br /&gt;
===Binning Data===&lt;br /&gt;
&lt;br /&gt;
Using '''cut''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; girls_2=c(83.8, 86.2, 85.1, 88.6, 83, 88.9, 89.7, 81.3, 88.7, 88.4)&lt;br /&gt;
&amp;gt; bins=cut(girls_2, breaks=3)&lt;br /&gt;
&amp;gt; bins&lt;br /&gt;
 [1] (81.3,84.1] (84.1,86.9] (84.1,86.9] (86.9,89.7] (81.3,84.1] (86.9,89.7]&lt;br /&gt;
 [7] (86.9,89.7] (81.3,84.1] (86.9,89.7] (86.9,89.7]&lt;br /&gt;
Levels: (81.3,84.1] (84.1,86.9] (86.9,89.7]&lt;br /&gt;
&amp;gt; plot(bins)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Plotting the data couldn't be simpler with '''plot(bins)'''!&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/cut.html&lt;br /&gt;
&lt;br /&gt;
==Linear Regression==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(cars)&lt;br /&gt;
&amp;gt; res=lm(dist ~ speed, data=cars)&lt;br /&gt;
&amp;gt; abline(res)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Image:R-lm(cars)-abline.png|400px|thumbnail|center|linear regression of stopping distance against speed from the built-in data set, cars]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
'''Exercises'''&lt;br /&gt;
* You may wish to compare different methods of estimation.  From the MASS package, you can fit a line with the '''rlm''' and '''lqs'' funtions.  You can plot all the lines against the data using:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; abline(res.lm, lty=1)&lt;br /&gt;
&amp;gt; abline(res.rlm, lty=2)&lt;br /&gt;
&amp;gt; abline(res.lqs, lty=3)&lt;br /&gt;
&amp;gt; legend(x=5, y=100, legend=c(&amp;quot;lm&amp;quot;,&amp;quot;rlm&amp;quot;,&amp;quot;lqs&amp;quot;), lty=c(1,2,3))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-patched/library/MASS/html/rlm.html and http://stat.ethz.ch/R-manual/R-devel/RHOME/library/MASS/html/lqs.html.&lt;br /&gt;
&lt;br /&gt;
* Weighted least squares.  The '''lm''' function will accept a vector of weights, '''lm(... weights=...)'''.  If given, the function will optimise the line of best fit according a the equation of weighted least squares.  Experiment with different linear model fits, given different weighting vectors.  Some handy hints for creating a vector of weights:&lt;br /&gt;
** '''w1&amp;lt;-rep(0.1,50)''' will give you a vector, length 50, where each element has a value of 0.1.  W1[1]&amp;lt;-10 will give the first element of the vector a value of 10.&lt;br /&gt;
** '''w2&amp;lt;-seq(from=0.02, to=1.0, by=0.02)''' provides a vector containing a sequence of values from 0.02 to 1.0 in steps of 0.02 (handily, again 50 in total).&lt;br /&gt;
&lt;br /&gt;
==Significance Testing==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; boys_2=c(90.2, 91.4, 86.4, 87.6, 86.7, 88.1, 82.2, 83.8, 91, 87.4)&lt;br /&gt;
&amp;gt; girls_2=c(83.8, 86.2, 85.1, 88.6, 83, 88.9, 89.7, 81.3, 88.7, 88.4)&lt;br /&gt;
&amp;gt; res=var.test(boys_2,girls_2)&lt;br /&gt;
&amp;gt; res&lt;br /&gt;
&lt;br /&gt;
	F test to compare two variances&lt;br /&gt;
&lt;br /&gt;
data:  boys_2 and girls_2 &lt;br /&gt;
F = 1.0186, num df = 9, denom df = 9, p-value = 0.9786&lt;br /&gt;
alternative hypothesis: true ratio of variances is not equal to 1 &lt;br /&gt;
95 percent confidence interval:&lt;br /&gt;
 0.2529956 4.1007126 &lt;br /&gt;
sample estimates:&lt;br /&gt;
ratio of variances &lt;br /&gt;
          1.018559 &lt;br /&gt;
&amp;gt; res=t.test(boys_2, girls_2, var.equal=TRUE, paired=FALSE)&lt;br /&gt;
&amp;gt; res&lt;br /&gt;
&lt;br /&gt;
	Two Sample t-test&lt;br /&gt;
&lt;br /&gt;
data:  boys_2 and girls_2 &lt;br /&gt;
t = 0.8429, df = 18, p-value = 0.4103&lt;br /&gt;
alternative hypothesis: true difference in means is not equal to 0 &lt;br /&gt;
95 percent confidence interval:&lt;br /&gt;
 -1.656675  3.876675 &lt;br /&gt;
sample estimates:&lt;br /&gt;
mean of x mean of y &lt;br /&gt;
    87.48     86.3&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Classification==&lt;br /&gt;
&lt;br /&gt;
===k Nearest Neighbours===&lt;br /&gt;
&lt;br /&gt;
This famous (Fisher's or Anderson's) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa (s), versicolor (c), and virginica (v).&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-patched/library/datasets/html/iris.html&lt;br /&gt;
&lt;br /&gt;
k-nearest neighbour classification for test set from training set: For each row of the test set, the k nearest (in Euclidean distance) training set vectors are found, and the classification is decided by majority vote, with ties broken at random. If there are ties for the kth nearest vector, all candidates are included in the vote.&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/class/html/knn.html&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(class)&lt;br /&gt;
train &amp;lt;- rbind(iris3[1:25,,1], iris3[1:25,,2], iris3[1:25,,3])&lt;br /&gt;
test &amp;lt;- rbind(iris3[26:50,,1], iris3[26:50,,2], iris3[26:50,,3])&lt;br /&gt;
cl &amp;lt;- factor(c(rep(&amp;quot;s&amp;quot;,25), rep(&amp;quot;c&amp;quot;,25), rep(&amp;quot;v&amp;quot;,25)))&lt;br /&gt;
iris3.knn &amp;lt;- knn(train, test, cl, k = 3, prob=TRUE)&lt;br /&gt;
table(predicted=iris3.knn, actual=cl)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
How did we do?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
         actual&lt;br /&gt;
predicted  c  s  v&lt;br /&gt;
        c 23  0  3&lt;br /&gt;
        s  0 25  0&lt;br /&gt;
        v  2  0 22&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Classification Trees===&lt;br /&gt;
&lt;br /&gt;
The kyphosis data frame has 81 rows and 4 columns. representing data on children who have had corrective spinal surgery.&lt;br /&gt;
&lt;br /&gt;
This data frame contains the following columns:&lt;br /&gt;
* Kyphosis: a factor with levels absent present indicating if a kyphosis (a type of deformation) was present after the operation.&lt;br /&gt;
* Age: in months&lt;br /&gt;
* Number: the number of vertebrae involved&lt;br /&gt;
* Start: the number of the first (topmost) vertebra operated on.&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/rpart/html/kyphosis.html&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
fit &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)&lt;br /&gt;
fit2 &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis,&lt;br /&gt;
              parms = list(prior = c(.65,.35), split = &amp;quot;information&amp;quot;))&lt;br /&gt;
fit3 &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis,&lt;br /&gt;
              control = rpart.control(cp = 0.05))&lt;br /&gt;
par(mfrow = c(1,2), xpd = NA) # otherwise on some devices the text is clipped&lt;br /&gt;
plot(fit)&lt;br /&gt;
text(fit, use.n = TRUE)&lt;br /&gt;
plot(fit2)&lt;br /&gt;
text(fit2, use.n = TRUE)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Image:R-classification-tree.png|500px|thumbnail|center|Classification tree for the kyphosis data frame.]]&lt;br /&gt;
&lt;br /&gt;
==Solving Systems of Linear Equations==&lt;br /&gt;
&lt;br /&gt;
See, e.g.: https://source.ggy.bris.ac.uk/wiki/NumMethodsPDEs&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; A &amp;lt;- array(c(1,3,2,3,5,4,-2,6,3), dim=c(3,3))&lt;br /&gt;
&amp;gt; b &amp;lt;- c(5,7,8)&lt;br /&gt;
&amp;gt; solve(A,b)&lt;br /&gt;
[1] -15   8   2&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Suggested Exercises=&lt;br /&gt;
&lt;br /&gt;
If you would like to work through some exercises, with model answers included, you could take a look at:&lt;br /&gt;
* http://www2.warwick.ac.uk/fac/sci/statistics/staff/academic-research/reed/rexercises.pdf&lt;br /&gt;
&lt;br /&gt;
=Writing Faster R Code=&lt;br /&gt;
&lt;br /&gt;
In the above sections we've introduced a number of features of R and have begun the journey to becoming a proficient and productive user of the language.  In the remaining sections, we'll switch tack and focus on a question commonly asked by those beginning to use R in anger--'''&amp;quot;My R code is slow.  How can I speed it up?&amp;quot;'''.  In this section we'll consider the related tasks of finding which bits of your R code is responsible for the majority of the run-time and what you can do about it.&lt;br /&gt;
&lt;br /&gt;
==Profiling &amp;amp; Timing==&lt;br /&gt;
&lt;br /&gt;
In order to remain productive (and sane, and have a social life...), it is essential that we first identify which portions of your R code are responsible for the majority of the run-time.  We could spend ages optimising a portion that we ''think'' may be running slowly, but computers have the gift(!) to constantly surprise us, and if that portion of your program accounted for, say, 10% of the run-time, then you will have sweated for absolutely no useful gain.&lt;br /&gt;
&lt;br /&gt;
The simplest method of investigation is to simply time the application of a function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
system.time(some.function())&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can get a more detailed analysis of a block of code using the built-in R profiler.  The general pattern of invocation is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
Rprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
# Do some work&lt;br /&gt;
Rprof()&lt;br /&gt;
summaryRprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For example, here's an R script, '''profile.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
Rprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
# Create a 10 x 100,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(100000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)})&lt;br /&gt;
Rprof()&lt;br /&gt;
summaryRprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Which I ran by typing:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
R CMD BATCH profile.r&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the output file, '''profile.r.Rout''', I found the following break down:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
               self.time self.pct total.time total.pct&lt;br /&gt;
&amp;quot;simpleLoess&amp;quot;       4.84    88.00       5.10     92.73&lt;br /&gt;
&amp;quot;rnorm&amp;quot;             0.22     4.00       0.22      4.00&lt;br /&gt;
&amp;quot;loess.smooth&amp;quot;      0.18     3.27       5.28     96.00&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The profile tells us that the function '''simpleLoess''' take 88% of the runtime, whereas '''rnorm''' takes only 4%.&lt;br /&gt;
&lt;br /&gt;
==Preallocation of Memory==&lt;br /&gt;
&lt;br /&gt;
As with other scripting languages, such as MATLAB, the simplest method that you can use to speed up your R code is to pre-allocate the storage for variables whenever possible.  To see the benefits of this, consider the following two functions:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; f1 &amp;lt;- function() {&lt;br /&gt;
+ v &amp;lt;- c()&lt;br /&gt;
+ for (i in 1:30000)&lt;br /&gt;
+   v[i] &amp;lt;- i^2&lt;br /&gt;
+ }&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; f2 &amp;lt;- function() {&lt;br /&gt;
+ v &amp;lt;- c(NA)&lt;br /&gt;
+ length(v) &amp;lt;- 30000&lt;br /&gt;
+ for (i in 1:30000)&lt;br /&gt;
+   v[i] &amp;lt;- i^2&lt;br /&gt;
+ }&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Timing calls to each of them shows that the pre-allocation of memory gives a whopping ~'''x30 speed-up'''.  Your mileage will vary depending upon the details of your code.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; system.time(f1())&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  1.720   0.040   1.762&lt;br /&gt;
&amp;gt; system.time(f2())&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.052   0.000   0.05&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Vectorised Operations==&lt;br /&gt;
&lt;br /&gt;
The other principle method for speeding up your R code is to eliminate loops whenever you can.  Many functions and operators in R will accept arrays as input, rather than just single values and this may allow you to not use a loop.  The examples in the previous section used for loops to step through an array, squaring each element.  However, you can achieve the same result far more quickly by passing the array ''en masse'' to exponentiation operator:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; system.time(v &amp;lt;- (1:1000000)^2)&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.024   0.004   0.026&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Here we've been able to square 1,000,000 items in half the time it took to process 30,000!&lt;br /&gt;
&lt;br /&gt;
==Calling Functions Written in a Compiled Language (e.g. C or Fortran)==&lt;br /&gt;
&lt;br /&gt;
Another way to get more speed is to outsource portions of R code that are found to be slow to a compiled language, such as C or Fortran.  A good starting point on this topic is:&lt;br /&gt;
&lt;br /&gt;
* http://mazamascience.com/WorkingWithData/?p=1067&lt;br /&gt;
&lt;br /&gt;
=R and HPC=&lt;br /&gt;
&lt;br /&gt;
If you've profiled your code and tried all that you can to speed it up, as described in the previous section, you might be interested in the various initiatives that exist to run R on high performance computers, such as bluecrsytal:  &lt;br /&gt;
&lt;br /&gt;
* http://cran.r-project.org/web/views/HighPerformanceComputing.html&lt;br /&gt;
&lt;br /&gt;
We will see in the following examples, the general approach to running R in parallel is to arrange your task so that a function is applied to a list of inputs, and then to split the list over several CPU cores or cluster worker nodes.&lt;br /&gt;
&lt;br /&gt;
==Multicore==&lt;br /&gt;
&lt;br /&gt;
The '''multicore''' package allows us to make use of several CPU cores within a single machine.  Note, however, that the package  does not work on a MS Windows computers.&lt;br /&gt;
&lt;br /&gt;
As an example, let's look at the use of the package's '''mclapply''' function, a multicore equivalent of R's built-in list apply mapper, '''lapply'''.   I saved the following commands into an R script called '''mutlicore.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(multicore)&lt;br /&gt;
# how many cores are present?&lt;br /&gt;
multicore:::detectCores()&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And used the following submission script to run it on bluecrystal phase2:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=00:00:05&lt;br /&gt;
&lt;br /&gt;
#! Ensure that we have the correct version of R loaded&lt;br /&gt;
module add languages/R-2.15.1&lt;br /&gt;
&lt;br /&gt;
#! change the working directory (default is home directory)&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
#! Run the R script&lt;br /&gt;
R CMD BATCH multicore.r&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After the job had run, I got the following output in the file '''multicore.r.Rout''':&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(multicore)&lt;br /&gt;
&amp;gt; # how many cores are present?&lt;br /&gt;
&amp;gt; multicore:::detectCores()&lt;br /&gt;
[1] 8&lt;br /&gt;
&amp;gt; # Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
&amp;gt; data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
&amp;gt; # Map a function over the matrix.  First in serial..&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.674   0.007   0.749 &lt;br /&gt;
&amp;gt; # .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.301   0.074   0.113 &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Rmpi==&lt;br /&gt;
&lt;br /&gt;
The '''Rmpi''' package allows us to create and use cohorts of message passing processes from within R.  It does so by providing an interface to the MPI (Message Passing Interface) library.&lt;br /&gt;
&lt;br /&gt;
In order to use the Rmpi package on BCp2, you will need the '''ofed/openmpi/gcc/64/1.4.2-qlc''' module loaded.&lt;br /&gt;
&lt;br /&gt;
Here's a short example that I saved as '''Rmpi.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(Rmpi)&lt;br /&gt;
# spawn as many slaves as possible&lt;br /&gt;
mpi.spawn.Rslaves()&lt;br /&gt;
mpi.remote.exec(mpi.get.processor.name())&lt;br /&gt;
mpi.remote.exec(runif(1))&lt;br /&gt;
mpi.close.Rslaves()&lt;br /&gt;
mpi.quit()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I submitted the job to BCp2 using the following submission script:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#PBS -l nodes=4:ppn=1,walltime=00:00:05&lt;br /&gt;
&lt;br /&gt;
#! Ensure that we have the correct version of R loaded&lt;br /&gt;
module add languages/R-2.15.1&lt;br /&gt;
&lt;br /&gt;
#! change the working directory (default is home directory)&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
#! Create a machine file (used for multi-node jobs)&lt;br /&gt;
cat $PBS_NODEFILE &amp;gt; machine.file.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
#! Run the R script&lt;br /&gt;
mpirun -np 1 -machinefile machine.file.$PBS_JOBID R CMD BATCH Rmpi.r&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and got the following output:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(Rmpi)&lt;br /&gt;
&amp;gt; # spawn as many slaves as possible&lt;br /&gt;
&amp;gt; mpi.spawn.Rslaves()&lt;br /&gt;
        4 slaves are spawned successfully. 0 failed.&lt;br /&gt;
master (rank 0, comm 1) of size 5 is running on: u03n074 &lt;br /&gt;
slave1 (rank 1, comm 1) of size 5 is running on: u03n098 &lt;br /&gt;
slave2 (rank 2, comm 1) of size 5 is running on: u04n029 &lt;br /&gt;
slave3 (rank 3, comm 1) of size 5 is running on: u04n030 &lt;br /&gt;
slave4 (rank 4, comm 1) of size 5 is running on: u03n074 &lt;br /&gt;
&amp;gt; mpi.remote.exec(mpi.get.processor.name())&lt;br /&gt;
$slave1&lt;br /&gt;
[1] &amp;quot;u03n098&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave2&lt;br /&gt;
[1] &amp;quot;u04n029&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave3&lt;br /&gt;
[1] &amp;quot;u04n030&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave4&lt;br /&gt;
[1] &amp;quot;u03n074&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;gt; mpi.remote.exec(runif(1))&lt;br /&gt;
         X1        X2        X3        X4&lt;br /&gt;
1 0.5154871 0.5154871 0.5154871 0.5154871&lt;br /&gt;
&amp;gt; mpi.close.Rslaves()&lt;br /&gt;
[1] 1&lt;br /&gt;
&amp;gt; mpi.quit()&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Snow==&lt;br /&gt;
&lt;br /&gt;
Calling MPI routines from within R may be too low level for many people to use comfortably.  Happily, the '''snow''' package provides a higher level abstraction for distributed memory programming from within R.&lt;br /&gt;
&lt;br /&gt;
Here's my example program that a saved as '''snow.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(snow)&lt;br /&gt;
# request a cluster of 3 worker nodes&lt;br /&gt;
cl &amp;lt;- makeCluster(3)&lt;br /&gt;
clusterCall(cl, function() Sys.info()[c(&amp;quot;nodename&amp;quot;,&amp;quot;machine&amp;quot;)])&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using snow, across a cluster of workers)&lt;br /&gt;
system.time(x &amp;lt;- clusterApply(cl, data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
stopCluster(cl)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I ran it on BCp2 using the same submission script given for Rmpi, save for changing Rmpi.r to snow.r.  The output was:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(snow)&lt;br /&gt;
&amp;gt; # request a cluster of 3 worker nodes&lt;br /&gt;
&amp;gt; cl &amp;lt;- makeCluster(3)&lt;br /&gt;
Loading required package: Rmpi&lt;br /&gt;
        3 slaves are spawned successfully. 0 failed.&lt;br /&gt;
&amp;gt; clusterCall(cl, function() Sys.info()[c(&amp;quot;nodename&amp;quot;,&amp;quot;machine&amp;quot;)])&lt;br /&gt;
[[1]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u01n105&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
[[2]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u02n014&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
[[3]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u03n098&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&amp;gt; # Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
&amp;gt; data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
&amp;gt; # Map a function over the matrix.  First in serial..&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.711   0.001   0.715 &lt;br /&gt;
&amp;gt; # .. and secondly in parallel (using snow, across a cluster of workers)&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- clusterApply(cl, data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.259   0.001   0.260 &lt;br /&gt;
&amp;gt; stopCluster(cl)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Parallel==&lt;br /&gt;
&lt;br /&gt;
The '''parallel''' package is an amalgamation of functionality from the multicore and snow packages.  The shared memory parallelism in this package runs on an MS Windows machine (unlike the multicore package). &lt;br /&gt;
&lt;br /&gt;
I trivial translation of our previous multicore example is:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(parallel)&lt;br /&gt;
# how many cores are present?&lt;br /&gt;
parallel:::detectCores()&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I have not been able to get a distributed memory cluster working on BCp2 using the parallel package.&lt;br /&gt;
&lt;br /&gt;
=Further Reading=&lt;br /&gt;
&lt;br /&gt;
* [http://shop.oreilly.com/product/9780596801717.do R in a Nutshell]&lt;br /&gt;
* [http://shop.oreilly.com/product/0636920021421.do Parallel R]&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=R1&amp;diff=9376</id>
		<title>R1</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=R1&amp;diff=9376"/>
		<updated>2013-11-28T17:01:54Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: /* Preparing Data */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[category:Pragmatic Programming]]&lt;br /&gt;
'''Open Source Statistics with R'''&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
R is a mature, open-source (i.e. free!) statistics package, with an intuitive interface, excellent graphics and a vibrant community constantly adding new methods for the statistical investigation of your data to the library of packages available.&lt;br /&gt;
&lt;br /&gt;
The goal of this tutorial is to introduce you to the R package, and not to be an introductory course in statistics.&lt;br /&gt;
&lt;br /&gt;
If you are working on a Linux system, you will typically start R from the command line.  On a Windows machine, or a Mac, you will typically start up R in some form of GUI.  However you get R started, you will have access to an R command prompt.  The good news is that the examples below will all work at the R command prompt, however you gained access to it.&lt;br /&gt;
&lt;br /&gt;
Further resources:&lt;br /&gt;
&lt;br /&gt;
* The R manual is a great resource for learning R: http://cran.r-project.org/doc/manuals/r-release/R-intro.pdf&lt;br /&gt;
* Some excellent examples of using R can also be found at: http://msenux.redwoods.edu/math/R/ and http://www.r-tutor.com/&lt;br /&gt;
&lt;br /&gt;
=Getting Started=&lt;br /&gt;
&lt;br /&gt;
The very simplest thing we can do with R is to perform some arithmetic at the command prompt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; phi &amp;lt;- (1+sqrt(5))/2&lt;br /&gt;
&amp;gt; phi&lt;br /&gt;
[1] 1.618034&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Parentheses are used to modify the usual order of precedence of the operators ('''/''' will typically be evaluated before '''+''').  Note the '''[1]''' accompanying the returned value.  All numbers entered at the console are interpreted as a vector.  The '[1]' indicates that the line in question is displaying the vector of values starting at first index.  We can use the handy sequence function to create a vector containing more than a single element:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; odds &amp;lt;- seq(from=1, to=67, by=2)&lt;br /&gt;
&amp;gt; odds&lt;br /&gt;
 [1]  1  3  5  7  9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49&lt;br /&gt;
[26] 51 53 55 57 59 61 63 65 67&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
From the above example, we can see that both the '''&amp;lt;-''' and '''=''' operators can be used for assignment.&lt;br /&gt;
&lt;br /&gt;
Vectors are commonly used data structures in R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
coords.bris &amp;lt;- c(51.5, 2.6)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As are matrices:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; magic &amp;lt;- matrix(data=c(2,7,6,9,5,1,4,3,8),nrow=3,ncol=3)&lt;br /&gt;
&amp;gt; magic&lt;br /&gt;
     [,1] [,2] [,3]&lt;br /&gt;
[1,]    2    9    4&lt;br /&gt;
[2,]    7    5    3&lt;br /&gt;
[3,]    6    1    8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the '''c''' function combines the arguments given in the parentheses.  We can access portions of the array using the syntax shown in the square brackets.  For example, we can access the first row using the '''[1,]''' notation, and similarly the second column using '''[,2]'''.  Since the square is 3x3 magic, the numbers in both slices should sum to 15:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; sum(magic[1,])&lt;br /&gt;
[1] 15&lt;br /&gt;
&amp;gt; sum(magic[,2])&lt;br /&gt;
[1] 15&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Single elements and ranges can also accessed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; magic[2,2]&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;gt; magic[2:3,2:3]&lt;br /&gt;
     [,1] [,2]&lt;br /&gt;
[1,]    5    3&lt;br /&gt;
[2,]    1    8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
R also provides '''arrays''', which have more than two dimensions, and '''lists''' to hold heterogeneous collections.&lt;br /&gt;
&lt;br /&gt;
An example list:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; list.r4 &amp;lt;- list(name=&amp;quot;Radio4&amp;quot;, frequency=&amp;quot;93.7&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The items of which, we can access in several ways:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; list.r4$frequency&lt;br /&gt;
[1] &amp;quot;93.7&amp;quot;&lt;br /&gt;
&amp;gt; list.r4[1]&lt;br /&gt;
$name&lt;br /&gt;
[1] &amp;quot;Radio4&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;gt; list.r4[[1]]&lt;br /&gt;
[1] &amp;quot;Radio4&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A very commonly used data structure is the '''data frame''', which R uses to store tabular data.  Given several vectors of equal length, we can collate them into a data frame:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; country &amp;lt;- c(&amp;quot;USA&amp;quot;, &amp;quot;China&amp;quot;, &amp;quot;GB&amp;quot;)&lt;br /&gt;
&amp;gt; gold &amp;lt;- c(46, 38, 29)&lt;br /&gt;
&amp;gt; silver &amp;lt;- c(29, 27, 17)&lt;br /&gt;
&amp;gt; bronze &amp;lt;- c(29, 23, 19)&lt;br /&gt;
&amp;gt; medals.2012 &amp;lt;- data.frame(country, gold, silver, bronze)&lt;br /&gt;
&amp;gt; medals.2012&lt;br /&gt;
    country  gold  silver  bronze&lt;br /&gt;
1       USA    46      29      29&lt;br /&gt;
2     China    38      27      23&lt;br /&gt;
3        GB    29      17      19&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can access columns of a data frame using the '''$''' operator:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; medals.2012$country&lt;br /&gt;
[1] USA   China GB   &lt;br /&gt;
Levels: China GB USA&lt;br /&gt;
&amp;gt; medals.2012$gold&lt;br /&gt;
[1] 46 38 29&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Standard Graphics: A taster=&lt;br /&gt;
&lt;br /&gt;
An aspect which makes R popular are it's graphing functions.  R also has some very handy built-in data sets--we'll use this to demonstrate just a small fraction of R's graphing abilities.&lt;br /&gt;
&lt;br /&gt;
First up is the humble '''plot()''' function.  Given a data frame of points, such as one charting the relationship between temperature and the vapour pressure of mercury, it will give us a (handily labelled) scatter plot: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(pressure)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See the gallery below for all the plots created in this section.&lt;br /&gt;
&lt;br /&gt;
The plot function will also accept a time-series (another class of object recognised by R) and will sensibly join the points with a line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(co2)&lt;br /&gt;
&amp;gt; class(co2)&lt;br /&gt;
[1] &amp;quot;ts&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Pie charts are easily constructed.  In this case, to show the relative proportions of electricity generated from different sources in the UK in 2011 (source: https://www.gov.uk/government/.../5942-uk-energy-in-brief-2012.pdf‎):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; uk.electricty.sources.2011 &amp;lt;- c(41,29,18,5,4,2,1)&lt;br /&gt;
&amp;gt; names(uk.electricty.sources.2011) &amp;lt;- (&amp;quot;Gas&amp;quot;, &amp;quot;Coal&amp;quot;, &amp;quot;Nuclear&amp;quot;, &amp;quot;Hydro &amp;amp; other&amp;quot;, &amp;quot;Wind&amp;quot;, &amp;quot;Imports&amp;quot;, &amp;quot;Oil&amp;quot;)&lt;br /&gt;
&amp;gt; pie(uk.electricty.sources.2011, main=&amp;quot;UK Electricty Generating Mix, 2011&amp;quot;, col=rainbow(7))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Next, let's create a bar chart of monthly average precipitation falling here in the fair city of Bristol (source: http://www.worldweatheronline.com):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; bristol.precip &amp;lt;- c(82.9, 56.1, 59.2, 69, 50.8, 50.9, 50.8, 74.8, 74.7, 91.1, 94.5, 93.6)&lt;br /&gt;
&amp;gt; names(bristol.precip) &amp;lt;- c(&amp;quot;Jan&amp;quot;, &amp;quot;Feb&amp;quot;, &amp;quot;Mar&amp;quot;, &amp;quot;Apr&amp;quot;, &amp;quot;May&amp;quot;, &amp;quot;Jun&amp;quot;, &amp;quot;Jul&amp;quot;, &amp;quot;Aug&amp;quot;, &amp;quot;Sep&amp;quot;, &amp;quot;Oct&amp;quot;, &amp;quot;Nov&amp;quot;, &amp;quot;Dec&amp;quot;)&lt;br /&gt;
&amp;gt; barplot(bristol.precip,&lt;br /&gt;
+ main=&amp;quot;Average Monthly Precipitation in Bristol&amp;quot;,&lt;br /&gt;
+ ylab=&amp;quot;Mean precipitation (mm)&amp;quot;,&lt;br /&gt;
+ ylim=c(0,100),&lt;br /&gt;
+ col=c(&amp;quot;darkblue&amp;quot;))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Box_plot 'Box and whisker' plots] are useful ways to graph the quartiles of some data.  In this case, the fuel efficiencies of various US cars, circa 1974:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; boxplot(mpg~cyl,data=mtcars, main=&amp;quot;Car Milage Data&amp;quot;,&lt;br /&gt;
+    xlab=&amp;quot;Number of Cylinders&amp;quot;, ylab=&amp;quot;Miles Per Gallon&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
R includes a very useful help facility.  In the case of the '''filled.contour()''' plotting function, the help page includes an example of it's use to plot the topology of a volcano in Auckland, NZ:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; ?filled.countour&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;gallery widths=300px heights=300px perrow=3&amp;gt;&lt;br /&gt;
File:Vapour-pressure.png|Vapour pressure of mercury against temperature&lt;br /&gt;
File:Mauna-loa.png|CO2 concentrations measured at Mauna-Loa between 1959 and 1997&lt;br /&gt;
File:Pie.png|The UK's electricity generating mix, 2011&lt;br /&gt;
File:Barplot.png|Average monthly precipitation in Bristol&lt;br /&gt;
File:Boxplot.png|Range of fuel efficiencies for different engine sizes &lt;br /&gt;
File:Maunga-Whau.png|Topology of Maunga Whau volcano in Auckland&lt;br /&gt;
&amp;lt;/gallery&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are many more example plots--complete with the R code required to create the plots (at the bottom of the page, after the comments)--on the following web page:&lt;br /&gt;
* http://gallery.r-enthusiasts.com/thumbs.php&lt;br /&gt;
&lt;br /&gt;
=Loops=&lt;br /&gt;
&lt;br /&gt;
A simple '''for''' loop:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; for (ii in seq(1,10)) print(ii)&lt;br /&gt;
[1] 1&lt;br /&gt;
[1] 2&lt;br /&gt;
[1] 3&lt;br /&gt;
[1] 4&lt;br /&gt;
[1] 5&lt;br /&gt;
[1] 6&lt;br /&gt;
[1] 7&lt;br /&gt;
[1] 8&lt;br /&gt;
[1] 9&lt;br /&gt;
[1] 10&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Some more exotic counting:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; for (ii in seq(from=10, to=0, by=-2)) print(ii)&lt;br /&gt;
[1] 10&lt;br /&gt;
[1] 8&lt;br /&gt;
[1] 6&lt;br /&gt;
[1] 4&lt;br /&gt;
[1] 2&lt;br /&gt;
[1] 0&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''while''' loops are for when we don't know the number of iterations in advance:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; ii &amp;lt;- runif(1,0,1)&lt;br /&gt;
&amp;gt; ii&lt;br /&gt;
[1] 0.3998513&lt;br /&gt;
&amp;gt; while (ii &amp;lt; 0.5) {print(ii); ii &amp;lt;- runif(1,0,1)}&lt;br /&gt;
[1] 0.3998513&lt;br /&gt;
[1] 0.05469244&lt;br /&gt;
&amp;gt; ii&lt;br /&gt;
[1] 0.8265036&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Functions=&lt;br /&gt;
&lt;br /&gt;
You can define your own functions in R, using the '''function''' keyword.  For example, Pythagoras' Theorem:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypotenuse &amp;lt;- function(x, y) {sqrt(x^2 + y^2)}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The braces ({}) are optional, but add clarity.&lt;br /&gt;
&lt;br /&gt;
To call the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypotenuse(3,4)&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can provide default values for the arguments, which can be overridden for any given invocation of the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot2 &amp;lt;- function(x=3 ,y=4) {sqrt(x^2 + y^2)}&lt;br /&gt;
&amp;gt; hypot2()&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;gt; hypot2(12,16)&lt;br /&gt;
[1] 20&lt;br /&gt;
&amp;gt; hypot2(y=16, x=12)&lt;br /&gt;
[1] 20&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can see that the order of the arguments is respected, unless the names are given, in which case the order can be changed. &lt;br /&gt;
&lt;br /&gt;
Longer functions can be spread over several lines.  We can also use the '''return''' keyword to control which value is returned by the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot3 &amp;lt;- function(x=3 ,y=4) {&lt;br /&gt;
+ x_sq &amp;lt;- x^2&lt;br /&gt;
+ y_sq &amp;lt;- y^2&lt;br /&gt;
+ return( sqrt(x_sq + y_sq) )}&lt;br /&gt;
&amp;gt; hypot3(6,8)&lt;br /&gt;
[1] 10&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can check on the contents of a function, by just typing it's name (without parentheses):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot3&lt;br /&gt;
function(x=3 ,y=4) {&lt;br /&gt;
x_sq &amp;lt;- x^2&lt;br /&gt;
y_sq &amp;lt;- y^2&lt;br /&gt;
return( sqrt(x_sq + y_sq) )}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or just check the arguments, using the '''args''' function.  (The body of the function in general is reported as NULL):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; args(hypot3)&lt;br /&gt;
function (x = 3, y = 4) &lt;br /&gt;
NULL&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Packages=&lt;br /&gt;
&lt;br /&gt;
Listed at http://cran.r-project.org/&lt;br /&gt;
&lt;br /&gt;
Let's install the '''multicore''' package, that will give us access to functions within R which will run on the multiple processors which we often find in our computers these days:   &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; install.packages(&amp;quot;multicore&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Et voila!  It is done.&lt;br /&gt;
&lt;br /&gt;
We can check which packages are currently loaded into the library available from our workspace:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; library()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If we need to add one, we type e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; library(multicore)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now, an example of using a function from the multicore package.  The '''lapply''' function, which is included in the standard R core, will map a given function over a list inputs, giving a list of the function outputs in return.  For example, we can map a squaring function over the list of integers from 1 to 3:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; lapply(1:3, function(x) {x^2})&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
which gives us the list:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[[1]]&lt;br /&gt;
[1] 1&lt;br /&gt;
&lt;br /&gt;
[[2]]&lt;br /&gt;
[1] 4&lt;br /&gt;
&lt;br /&gt;
[[3]]&lt;br /&gt;
[1] 9&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now, we can do the same work in parallel using:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; mclapply(1:3, function(x) {x^2})&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Reading Data from File=&lt;br /&gt;
&lt;br /&gt;
R provides some very useful functions for reading and writing data from/to file.&lt;br /&gt;
&lt;br /&gt;
==Text Files==&lt;br /&gt;
&lt;br /&gt;
Let's start with text files.  If your data is organised into a file such that it looks like a table with column headings:  &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Perhaps the simplest one is '''read.table()'''.  If I have a text file with the following contents:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
country              gold silver bronze&lt;br /&gt;
&amp;quot;USA&amp;quot;                46   29     29&lt;br /&gt;
&amp;quot;China&amp;quot;              38   27     23&lt;br /&gt;
&amp;quot;Great Britain&amp;quot;      29   17     19&lt;br /&gt;
&amp;quot;Russian Federation&amp;quot; 24   26     32&lt;br /&gt;
&amp;quot;Republic of Korea&amp;quot;  13   8      7&lt;br /&gt;
&amp;quot;Germany&amp;quot;            11   19     14&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It will be a simple matter to use the '''read.table()''' function to load the data into R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; medals.2012 &amp;lt;- read.table(&amp;quot;medals.txt&amp;quot;, header=TRUE)&lt;br /&gt;
&amp;gt; medals.2012&lt;br /&gt;
             country gold silver bronze&lt;br /&gt;
1                USA   46     29     29&lt;br /&gt;
2              China   38     27     23&lt;br /&gt;
3      Great Britain   29     17     19&lt;br /&gt;
4 Russian Federation   24     26     32&lt;br /&gt;
5  Republic of Korea   13      8      7&lt;br /&gt;
6            Germany   11     19     14&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is a corresponding '''write.table()''' function to export the contents of a data frame into a text file.&lt;br /&gt;
&lt;br /&gt;
CSV files can be easily handled by specifying '''sep=&amp;quot;,&amp;quot;''' as an argument to read.table().  However, for convenience, there are also '''read.csv()''' and '''write.csv()''' functions defined.  For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; write.csv(medals.2012,&amp;quot;medals.csv&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Gives us the file, '''medals.csv''', with the contents:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;quot;&amp;quot;,&amp;quot;country&amp;quot;,&amp;quot;gold&amp;quot;,&amp;quot;silver&amp;quot;,&amp;quot;bronze&amp;quot;&lt;br /&gt;
&amp;quot;1&amp;quot;,&amp;quot;USA&amp;quot;,46,29,29&lt;br /&gt;
&amp;quot;2&amp;quot;,&amp;quot;China&amp;quot;,38,27,23&lt;br /&gt;
&amp;quot;3&amp;quot;,&amp;quot;Great Britain&amp;quot;,29,17,19&lt;br /&gt;
&amp;quot;4&amp;quot;,&amp;quot;Russian Federation&amp;quot;,24,26,32&lt;br /&gt;
&amp;quot;5&amp;quot;,&amp;quot;Republic of Korea&amp;quot;,13,8,7&lt;br /&gt;
&amp;quot;6&amp;quot;,&amp;quot;Germany&amp;quot;,11,19,14&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Binary Files==&lt;br /&gt;
&lt;br /&gt;
The '''save()''' function will store an R data structure in binary form:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; save(medals.2012,file=&amp;quot;medals.RData&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~$ file medals.RData &lt;br /&gt;
medals.RData: gzip compressed data, from Unix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is, of course, a corresponding function to load such data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; load(&amp;quot;medals.RData&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Databases==&lt;br /&gt;
&lt;br /&gt;
If you would like to read and write data directly from/to a database, there are several packages to help you.  See http://cran.r-project.org/doc/manuals/r-release/R-data.html#Relational-databases for more information.&lt;br /&gt;
&lt;br /&gt;
==NetCDF==&lt;br /&gt;
&lt;br /&gt;
The [http://cran.r-project.org/web/packages/ncdf/index.html '''ncdf''' package] provides an interface to NetCDF files.  Before installing the package, you will need the Unidata NetCDF libraries installed on your system.  On Linux, the standard package managers conveniently provide this.  Note that you will need the 'development' packages.  Once the prerequisites are satisfied, you can use the standard R command to install the package from CRAN:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; install.packages(&amp;quot;ncdf&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Examples of Common Tasks=&lt;br /&gt;
&lt;br /&gt;
==Preparing Data==&lt;br /&gt;
&lt;br /&gt;
===Sorting===&lt;br /&gt;
&lt;br /&gt;
Using '''sort''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; railway.engines &amp;lt;- c(&amp;quot;thomas&amp;quot;, &amp;quot;henry&amp;quot;, &amp;quot;gordon&amp;quot;, &amp;quot;edward&amp;quot;, &amp;quot;james&amp;quot;)&lt;br /&gt;
&amp;gt; sort(railway.engines)&lt;br /&gt;
[1] &amp;quot;edward&amp;quot; &amp;quot;gordon&amp;quot; &amp;quot;henry&amp;quot;  &amp;quot;james&amp;quot;  &amp;quot;thomas&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/sort.html&lt;br /&gt;
&lt;br /&gt;
===Random Sampling===&lt;br /&gt;
&lt;br /&gt;
Using '''sample''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; railway.engines &amp;lt;- c(&amp;quot;thomas&amp;quot;, &amp;quot;henry&amp;quot;, &amp;quot;gordon&amp;quot;, &amp;quot;edward&amp;quot;, &amp;quot;james&amp;quot;)&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;gordon&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;james&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;edward&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;thomas&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;gordon&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;james&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/sample.html&lt;br /&gt;
&lt;br /&gt;
===Combining===&lt;br /&gt;
&lt;br /&gt;
Using '''rbind''' to add combine the rows to two data frames:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; country &amp;lt;- c(&amp;quot;France&amp;quot;, &amp;quot;Italy&amp;quot;, &amp;quot;Hungary&amp;quot;, &amp;quot;Australia&amp;quot;)&lt;br /&gt;
&amp;gt; gold &amp;lt;- c(11, 8, 8, 7)&lt;br /&gt;
&amp;gt; silver &amp;lt;- c(11, 9, 4, 16)&lt;br /&gt;
&amp;gt; bronze &amp;lt;- c(12, 11, 5, 12)&lt;br /&gt;
&amp;gt; extras.2012 &amp;lt;- data.frame(country, gold, silver, bronze)&lt;br /&gt;
&amp;gt; rbind(medals.2012, extras.2012)&lt;br /&gt;
              country gold silver bronze&lt;br /&gt;
1                 USA   46     29     29&lt;br /&gt;
2               China   38     27     23&lt;br /&gt;
3       Great Britain   29     17     19&lt;br /&gt;
4  Russian Federation   24     26     32&lt;br /&gt;
5   Republic of Korea   13      8      7&lt;br /&gt;
6             Germany   11     19     14&lt;br /&gt;
7              France   11     11     12&lt;br /&gt;
8               Italy    8      9     11&lt;br /&gt;
9             Hungary    8      4      5&lt;br /&gt;
10          Australia    7     16     12&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/cbind.html&lt;br /&gt;
&lt;br /&gt;
===Binning Data===&lt;br /&gt;
&lt;br /&gt;
Using '''cut''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; girls_2=c(83.8, 86.2, 85.1, 88.6, 83, 88.9, 89.7, 81.3, 88.7, 88.4)&lt;br /&gt;
&amp;gt; bins=cut(girls_2, breaks=3)&lt;br /&gt;
&amp;gt; bins&lt;br /&gt;
 [1] (81.3,84.1] (84.1,86.9] (84.1,86.9] (86.9,89.7] (81.3,84.1] (86.9,89.7]&lt;br /&gt;
 [7] (86.9,89.7] (81.3,84.1] (86.9,89.7] (86.9,89.7]&lt;br /&gt;
Levels: (81.3,84.1] (84.1,86.9] (86.9,89.7]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/cut.html&lt;br /&gt;
&lt;br /&gt;
==Linear Regression==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(cars)&lt;br /&gt;
&amp;gt; res=lm(dist ~ speed, data=cars)&lt;br /&gt;
&amp;gt; abline(res)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Image:R-lm(cars)-abline.png|400px|thumbnail|center|linear regression of stopping distance against speed from the built-in data set, cars]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
'''Exercises'''&lt;br /&gt;
* You may wish to compare different methods of estimation.  From the MASS package, you can fit a line with the '''rlm''' and '''lqs'' funtions.  You can plot all the lines against the data using:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; abline(res.lm, lty=1)&lt;br /&gt;
&amp;gt; abline(res.rlm, lty=2)&lt;br /&gt;
&amp;gt; abline(res.lqs, lty=3)&lt;br /&gt;
&amp;gt; legend(x=5, y=100, legend=c(&amp;quot;lm&amp;quot;,&amp;quot;rlm&amp;quot;,&amp;quot;lqs&amp;quot;), lty=c(1,2,3))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-patched/library/MASS/html/rlm.html and http://stat.ethz.ch/R-manual/R-devel/RHOME/library/MASS/html/lqs.html.&lt;br /&gt;
&lt;br /&gt;
* Weighted least squares.  The '''lm''' function will accept a vector of weights, '''lm(... weights=...)'''.  If given, the function will optimise the line of best fit according a the equation of weighted least squares.  Experiment with different linear model fits, given different weighting vectors.  Some handy hints for creating a vector of weights:&lt;br /&gt;
** '''w1&amp;lt;-rep(0.1,50)''' will give you a vector, length 50, where each element has a value of 0.1.  W1[1]&amp;lt;-10 will give the first element of the vector a value of 10.&lt;br /&gt;
** '''w2&amp;lt;-seq(from=0.02, to=1.0, by=0.02)''' provides a vector containing a sequence of values from 0.02 to 1.0 in steps of 0.02 (handily, again 50 in total).&lt;br /&gt;
&lt;br /&gt;
==Significance Testing==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; boys_2=c(90.2, 91.4, 86.4, 87.6, 86.7, 88.1, 82.2, 83.8, 91, 87.4)&lt;br /&gt;
&amp;gt; girls_2=c(83.8, 86.2, 85.1, 88.6, 83, 88.9, 89.7, 81.3, 88.7, 88.4)&lt;br /&gt;
&amp;gt; res=var.test(boys_2,girls_2)&lt;br /&gt;
&amp;gt; res&lt;br /&gt;
&lt;br /&gt;
	F test to compare two variances&lt;br /&gt;
&lt;br /&gt;
data:  boys_2 and girls_2 &lt;br /&gt;
F = 1.0186, num df = 9, denom df = 9, p-value = 0.9786&lt;br /&gt;
alternative hypothesis: true ratio of variances is not equal to 1 &lt;br /&gt;
95 percent confidence interval:&lt;br /&gt;
 0.2529956 4.1007126 &lt;br /&gt;
sample estimates:&lt;br /&gt;
ratio of variances &lt;br /&gt;
          1.018559 &lt;br /&gt;
&amp;gt; res=t.test(boys_2, girls_2, var.equal=TRUE, paired=FALSE)&lt;br /&gt;
&amp;gt; res&lt;br /&gt;
&lt;br /&gt;
	Two Sample t-test&lt;br /&gt;
&lt;br /&gt;
data:  boys_2 and girls_2 &lt;br /&gt;
t = 0.8429, df = 18, p-value = 0.4103&lt;br /&gt;
alternative hypothesis: true difference in means is not equal to 0 &lt;br /&gt;
95 percent confidence interval:&lt;br /&gt;
 -1.656675  3.876675 &lt;br /&gt;
sample estimates:&lt;br /&gt;
mean of x mean of y &lt;br /&gt;
    87.48     86.3&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Classification==&lt;br /&gt;
&lt;br /&gt;
===k Nearest Neighbours===&lt;br /&gt;
&lt;br /&gt;
This famous (Fisher's or Anderson's) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa (s), versicolor (c), and virginica (v).&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-patched/library/datasets/html/iris.html&lt;br /&gt;
&lt;br /&gt;
k-nearest neighbour classification for test set from training set: For each row of the test set, the k nearest (in Euclidean distance) training set vectors are found, and the classification is decided by majority vote, with ties broken at random. If there are ties for the kth nearest vector, all candidates are included in the vote.&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/class/html/knn.html&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(class)&lt;br /&gt;
train &amp;lt;- rbind(iris3[1:25,,1], iris3[1:25,,2], iris3[1:25,,3])&lt;br /&gt;
test &amp;lt;- rbind(iris3[26:50,,1], iris3[26:50,,2], iris3[26:50,,3])&lt;br /&gt;
cl &amp;lt;- factor(c(rep(&amp;quot;s&amp;quot;,25), rep(&amp;quot;c&amp;quot;,25), rep(&amp;quot;v&amp;quot;,25)))&lt;br /&gt;
iris3.knn &amp;lt;- knn(train, test, cl, k = 3, prob=TRUE)&lt;br /&gt;
table(predicted=iris3.knn, actual=cl)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
How did we do?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
         actual&lt;br /&gt;
predicted  c  s  v&lt;br /&gt;
        c 23  0  3&lt;br /&gt;
        s  0 25  0&lt;br /&gt;
        v  2  0 22&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Classification Trees===&lt;br /&gt;
&lt;br /&gt;
The kyphosis data frame has 81 rows and 4 columns. representing data on children who have had corrective spinal surgery.&lt;br /&gt;
&lt;br /&gt;
This data frame contains the following columns:&lt;br /&gt;
* Kyphosis: a factor with levels absent present indicating if a kyphosis (a type of deformation) was present after the operation.&lt;br /&gt;
* Age: in months&lt;br /&gt;
* Number: the number of vertebrae involved&lt;br /&gt;
* Start: the number of the first (topmost) vertebra operated on.&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/rpart/html/kyphosis.html&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
fit &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)&lt;br /&gt;
fit2 &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis,&lt;br /&gt;
              parms = list(prior = c(.65,.35), split = &amp;quot;information&amp;quot;))&lt;br /&gt;
fit3 &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis,&lt;br /&gt;
              control = rpart.control(cp = 0.05))&lt;br /&gt;
par(mfrow = c(1,2), xpd = NA) # otherwise on some devices the text is clipped&lt;br /&gt;
plot(fit)&lt;br /&gt;
text(fit, use.n = TRUE)&lt;br /&gt;
plot(fit2)&lt;br /&gt;
text(fit2, use.n = TRUE)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Image:R-classification-tree.png|500px|thumbnail|center|Classification tree for the kyphosis data frame.]]&lt;br /&gt;
&lt;br /&gt;
==Solving Systems of Linear Equations==&lt;br /&gt;
&lt;br /&gt;
See, e.g.: https://source.ggy.bris.ac.uk/wiki/NumMethodsPDEs&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; A &amp;lt;- array(c(1,3,2,3,5,4,-2,6,3), dim=c(3,3))&lt;br /&gt;
&amp;gt; b &amp;lt;- c(5,7,8)&lt;br /&gt;
&amp;gt; solve(A,b)&lt;br /&gt;
[1] -15   8   2&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Suggested Exercises=&lt;br /&gt;
&lt;br /&gt;
If you would like to work through some exercises, with model answers included, you could take a look at:&lt;br /&gt;
* http://www2.warwick.ac.uk/fac/sci/statistics/staff/academic-research/reed/rexercises.pdf&lt;br /&gt;
&lt;br /&gt;
=Writing Faster R Code=&lt;br /&gt;
&lt;br /&gt;
In the above sections we've introduced a number of features of R and have begun the journey to becoming a proficient and productive user of the language.  In the remaining sections, we'll switch tack and focus on a question commonly asked by those beginning to use R in anger--'''&amp;quot;My R code is slow.  How can I speed it up?&amp;quot;'''.  In this section we'll consider the related tasks of finding which bits of your R code is responsible for the majority of the run-time and what you can do about it.&lt;br /&gt;
&lt;br /&gt;
==Profiling &amp;amp; Timing==&lt;br /&gt;
&lt;br /&gt;
In order to remain productive (and sane, and have a social life...), it is essential that we first identify which portions of your R code are responsible for the majority of the run-time.  We could spend ages optimising a portion that we ''think'' may be running slowly, but computers have the gift(!) to constantly surprise us, and if that portion of your program accounted for, say, 10% of the run-time, then you will have sweated for absolutely no useful gain.&lt;br /&gt;
&lt;br /&gt;
The simplest method of investigation is to simply time the application of a function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
system.time(some.function())&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can get a more detailed analysis of a block of code using the built-in R profiler.  The general pattern of invocation is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
Rprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
# Do some work&lt;br /&gt;
Rprof()&lt;br /&gt;
summaryRprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For example, here's an R script, '''profile.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
Rprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
# Create a 10 x 100,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(100000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)})&lt;br /&gt;
Rprof()&lt;br /&gt;
summaryRprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Which I ran by typing:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
R CMD BATCH profile.r&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the output file, '''profile.r.Rout''', I found the following break down:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
               self.time self.pct total.time total.pct&lt;br /&gt;
&amp;quot;simpleLoess&amp;quot;       4.84    88.00       5.10     92.73&lt;br /&gt;
&amp;quot;rnorm&amp;quot;             0.22     4.00       0.22      4.00&lt;br /&gt;
&amp;quot;loess.smooth&amp;quot;      0.18     3.27       5.28     96.00&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The profile tells us that the function '''simpleLoess''' take 88% of the runtime, whereas '''rnorm''' takes only 4%.&lt;br /&gt;
&lt;br /&gt;
==Preallocation of Memory==&lt;br /&gt;
&lt;br /&gt;
As with other scripting languages, such as MATLAB, the simplest method that you can use to speed up your R code is to pre-allocate the storage for variables whenever possible.  To see the benefits of this, consider the following two functions:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; f1 &amp;lt;- function() {&lt;br /&gt;
+ v &amp;lt;- c()&lt;br /&gt;
+ for (i in 1:30000)&lt;br /&gt;
+   v[i] &amp;lt;- i^2&lt;br /&gt;
+ }&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; f2 &amp;lt;- function() {&lt;br /&gt;
+ v &amp;lt;- c(NA)&lt;br /&gt;
+ length(v) &amp;lt;- 30000&lt;br /&gt;
+ for (i in 1:30000)&lt;br /&gt;
+   v[i] &amp;lt;- i^2&lt;br /&gt;
+ }&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Timing calls to each of them shows that the pre-allocation of memory gives a whopping ~'''x30 speed-up'''.  Your mileage will vary depending upon the details of your code.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; system.time(f1())&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  1.720   0.040   1.762&lt;br /&gt;
&amp;gt; system.time(f2())&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.052   0.000   0.05&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Vectorised Operations==&lt;br /&gt;
&lt;br /&gt;
The other principle method for speeding up your R code is to eliminate loops whenever you can.  Many functions and operators in R will accept arrays as input, rather than just single values and this may allow you to not use a loop.  The examples in the previous section used for loops to step through an array, squaring each element.  However, you can achieve the same result far more quickly by passing the array ''en masse'' to exponentiation operator:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; system.time(v &amp;lt;- (1:1000000)^2)&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.024   0.004   0.026&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Here we've been able to square 1,000,000 items in half the time it took to process 30,000!&lt;br /&gt;
&lt;br /&gt;
==Calling Functions Written in a Compiled Language (e.g. C or Fortran)==&lt;br /&gt;
&lt;br /&gt;
Another way to get more speed is to outsource portions of R code that are found to be slow to a compiled language, such as C or Fortran.  A good starting point on this topic is:&lt;br /&gt;
&lt;br /&gt;
* http://mazamascience.com/WorkingWithData/?p=1067&lt;br /&gt;
&lt;br /&gt;
=R and HPC=&lt;br /&gt;
&lt;br /&gt;
If you've profiled your code and tried all that you can to speed it up, as described in the previous section, you might be interested in the various initiatives that exist to run R on high performance computers, such as bluecrsytal:  &lt;br /&gt;
&lt;br /&gt;
* http://cran.r-project.org/web/views/HighPerformanceComputing.html&lt;br /&gt;
&lt;br /&gt;
We will see in the following examples, the general approach to running R in parallel is to arrange your task so that a function is applied to a list of inputs, and then to split the list over several CPU cores or cluster worker nodes.&lt;br /&gt;
&lt;br /&gt;
==Multicore==&lt;br /&gt;
&lt;br /&gt;
The '''multicore''' package allows us to make use of several CPU cores within a single machine.  Note, however, that the package  does not work on a MS Windows computers.&lt;br /&gt;
&lt;br /&gt;
As an example, let's look at the use of the package's '''mclapply''' function, a multicore equivalent of R's built-in list apply mapper, '''lapply'''.   I saved the following commands into an R script called '''mutlicore.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(multicore)&lt;br /&gt;
# how many cores are present?&lt;br /&gt;
multicore:::detectCores()&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And used the following submission script to run it on bluecrystal phase2:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=00:00:05&lt;br /&gt;
&lt;br /&gt;
#! Ensure that we have the correct version of R loaded&lt;br /&gt;
module add languages/R-2.15.1&lt;br /&gt;
&lt;br /&gt;
#! change the working directory (default is home directory)&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
#! Run the R script&lt;br /&gt;
R CMD BATCH multicore.r&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After the job had run, I got the following output in the file '''multicore.r.Rout''':&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(multicore)&lt;br /&gt;
&amp;gt; # how many cores are present?&lt;br /&gt;
&amp;gt; multicore:::detectCores()&lt;br /&gt;
[1] 8&lt;br /&gt;
&amp;gt; # Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
&amp;gt; data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
&amp;gt; # Map a function over the matrix.  First in serial..&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.674   0.007   0.749 &lt;br /&gt;
&amp;gt; # .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.301   0.074   0.113 &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Rmpi==&lt;br /&gt;
&lt;br /&gt;
The '''Rmpi''' package allows us to create and use cohorts of message passing processes from within R.  It does so by providing an interface to the MPI (Message Passing Interface) library.&lt;br /&gt;
&lt;br /&gt;
In order to use the Rmpi package on BCp2, you will need the '''ofed/openmpi/gcc/64/1.4.2-qlc''' module loaded.&lt;br /&gt;
&lt;br /&gt;
Here's a short example that I saved as '''Rmpi.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(Rmpi)&lt;br /&gt;
# spawn as many slaves as possible&lt;br /&gt;
mpi.spawn.Rslaves()&lt;br /&gt;
mpi.remote.exec(mpi.get.processor.name())&lt;br /&gt;
mpi.remote.exec(runif(1))&lt;br /&gt;
mpi.close.Rslaves()&lt;br /&gt;
mpi.quit()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I submitted the job to BCp2 using the following submission script:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#PBS -l nodes=4:ppn=1,walltime=00:00:05&lt;br /&gt;
&lt;br /&gt;
#! Ensure that we have the correct version of R loaded&lt;br /&gt;
module add languages/R-2.15.1&lt;br /&gt;
&lt;br /&gt;
#! change the working directory (default is home directory)&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
#! Create a machine file (used for multi-node jobs)&lt;br /&gt;
cat $PBS_NODEFILE &amp;gt; machine.file.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
#! Run the R script&lt;br /&gt;
mpirun -np 1 -machinefile machine.file.$PBS_JOBID R CMD BATCH Rmpi.r&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and got the following output:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(Rmpi)&lt;br /&gt;
&amp;gt; # spawn as many slaves as possible&lt;br /&gt;
&amp;gt; mpi.spawn.Rslaves()&lt;br /&gt;
        4 slaves are spawned successfully. 0 failed.&lt;br /&gt;
master (rank 0, comm 1) of size 5 is running on: u03n074 &lt;br /&gt;
slave1 (rank 1, comm 1) of size 5 is running on: u03n098 &lt;br /&gt;
slave2 (rank 2, comm 1) of size 5 is running on: u04n029 &lt;br /&gt;
slave3 (rank 3, comm 1) of size 5 is running on: u04n030 &lt;br /&gt;
slave4 (rank 4, comm 1) of size 5 is running on: u03n074 &lt;br /&gt;
&amp;gt; mpi.remote.exec(mpi.get.processor.name())&lt;br /&gt;
$slave1&lt;br /&gt;
[1] &amp;quot;u03n098&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave2&lt;br /&gt;
[1] &amp;quot;u04n029&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave3&lt;br /&gt;
[1] &amp;quot;u04n030&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave4&lt;br /&gt;
[1] &amp;quot;u03n074&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;gt; mpi.remote.exec(runif(1))&lt;br /&gt;
         X1        X2        X3        X4&lt;br /&gt;
1 0.5154871 0.5154871 0.5154871 0.5154871&lt;br /&gt;
&amp;gt; mpi.close.Rslaves()&lt;br /&gt;
[1] 1&lt;br /&gt;
&amp;gt; mpi.quit()&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Snow==&lt;br /&gt;
&lt;br /&gt;
Calling MPI routines from within R may be too low level for many people to use comfortably.  Happily, the '''snow''' package provides a higher level abstraction for distributed memory programming from within R.&lt;br /&gt;
&lt;br /&gt;
Here's my example program that a saved as '''snow.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(snow)&lt;br /&gt;
# request a cluster of 3 worker nodes&lt;br /&gt;
cl &amp;lt;- makeCluster(3)&lt;br /&gt;
clusterCall(cl, function() Sys.info()[c(&amp;quot;nodename&amp;quot;,&amp;quot;machine&amp;quot;)])&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using snow, across a cluster of workers)&lt;br /&gt;
system.time(x &amp;lt;- clusterApply(cl, data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
stopCluster(cl)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I ran it on BCp2 using the same submission script given for Rmpi, save for changing Rmpi.r to snow.r.  The output was:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(snow)&lt;br /&gt;
&amp;gt; # request a cluster of 3 worker nodes&lt;br /&gt;
&amp;gt; cl &amp;lt;- makeCluster(3)&lt;br /&gt;
Loading required package: Rmpi&lt;br /&gt;
        3 slaves are spawned successfully. 0 failed.&lt;br /&gt;
&amp;gt; clusterCall(cl, function() Sys.info()[c(&amp;quot;nodename&amp;quot;,&amp;quot;machine&amp;quot;)])&lt;br /&gt;
[[1]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u01n105&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
[[2]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u02n014&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
[[3]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u03n098&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&amp;gt; # Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
&amp;gt; data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
&amp;gt; # Map a function over the matrix.  First in serial..&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.711   0.001   0.715 &lt;br /&gt;
&amp;gt; # .. and secondly in parallel (using snow, across a cluster of workers)&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- clusterApply(cl, data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.259   0.001   0.260 &lt;br /&gt;
&amp;gt; stopCluster(cl)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Parallel==&lt;br /&gt;
&lt;br /&gt;
The '''parallel''' package is an amalgamation of functionality from the multicore and snow packages.  The shared memory parallelism in this package runs on an MS Windows machine (unlike the multicore package). &lt;br /&gt;
&lt;br /&gt;
I trivial translation of our previous multicore example is:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(parallel)&lt;br /&gt;
# how many cores are present?&lt;br /&gt;
parallel:::detectCores()&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I have not been able to get a distributed memory cluster working on BCp2 using the parallel package.&lt;br /&gt;
&lt;br /&gt;
=Further Reading=&lt;br /&gt;
&lt;br /&gt;
* [http://shop.oreilly.com/product/9780596801717.do R in a Nutshell]&lt;br /&gt;
* [http://shop.oreilly.com/product/0636920021421.do Parallel R]&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=R1&amp;diff=9375</id>
		<title>R1</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=R1&amp;diff=9375"/>
		<updated>2013-11-28T16:36:33Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: /* Preparing Data */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[category:Pragmatic Programming]]&lt;br /&gt;
'''Open Source Statistics with R'''&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
R is a mature, open-source (i.e. free!) statistics package, with an intuitive interface, excellent graphics and a vibrant community constantly adding new methods for the statistical investigation of your data to the library of packages available.&lt;br /&gt;
&lt;br /&gt;
The goal of this tutorial is to introduce you to the R package, and not to be an introductory course in statistics.&lt;br /&gt;
&lt;br /&gt;
If you are working on a Linux system, you will typically start R from the command line.  On a Windows machine, or a Mac, you will typically start up R in some form of GUI.  However you get R started, you will have access to an R command prompt.  The good news is that the examples below will all work at the R command prompt, however you gained access to it.&lt;br /&gt;
&lt;br /&gt;
Further resources:&lt;br /&gt;
&lt;br /&gt;
* The R manual is a great resource for learning R: http://cran.r-project.org/doc/manuals/r-release/R-intro.pdf&lt;br /&gt;
* Some excellent examples of using R can also be found at: http://msenux.redwoods.edu/math/R/ and http://www.r-tutor.com/&lt;br /&gt;
&lt;br /&gt;
=Getting Started=&lt;br /&gt;
&lt;br /&gt;
The very simplest thing we can do with R is to perform some arithmetic at the command prompt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; phi &amp;lt;- (1+sqrt(5))/2&lt;br /&gt;
&amp;gt; phi&lt;br /&gt;
[1] 1.618034&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Parentheses are used to modify the usual order of precedence of the operators ('''/''' will typically be evaluated before '''+''').  Note the '''[1]''' accompanying the returned value.  All numbers entered at the console are interpreted as a vector.  The '[1]' indicates that the line in question is displaying the vector of values starting at first index.  We can use the handy sequence function to create a vector containing more than a single element:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; odds &amp;lt;- seq(from=1, to=67, by=2)&lt;br /&gt;
&amp;gt; odds&lt;br /&gt;
 [1]  1  3  5  7  9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49&lt;br /&gt;
[26] 51 53 55 57 59 61 63 65 67&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
From the above example, we can see that both the '''&amp;lt;-''' and '''=''' operators can be used for assignment.&lt;br /&gt;
&lt;br /&gt;
Vectors are commonly used data structures in R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
coords.bris &amp;lt;- c(51.5, 2.6)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As are matrices:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; magic &amp;lt;- matrix(data=c(2,7,6,9,5,1,4,3,8),nrow=3,ncol=3)&lt;br /&gt;
&amp;gt; magic&lt;br /&gt;
     [,1] [,2] [,3]&lt;br /&gt;
[1,]    2    9    4&lt;br /&gt;
[2,]    7    5    3&lt;br /&gt;
[3,]    6    1    8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the '''c''' function combines the arguments given in the parentheses.  We can access portions of the array using the syntax shown in the square brackets.  For example, we can access the first row using the '''[1,]''' notation, and similarly the second column using '''[,2]'''.  Since the square is 3x3 magic, the numbers in both slices should sum to 15:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; sum(magic[1,])&lt;br /&gt;
[1] 15&lt;br /&gt;
&amp;gt; sum(magic[,2])&lt;br /&gt;
[1] 15&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Single elements and ranges can also accessed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; magic[2,2]&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;gt; magic[2:3,2:3]&lt;br /&gt;
     [,1] [,2]&lt;br /&gt;
[1,]    5    3&lt;br /&gt;
[2,]    1    8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
R also provides '''arrays''', which have more than two dimensions, and '''lists''' to hold heterogeneous collections.&lt;br /&gt;
&lt;br /&gt;
An example list:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; list.r4 &amp;lt;- list(name=&amp;quot;Radio4&amp;quot;, frequency=&amp;quot;93.7&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The items of which, we can access in several ways:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; list.r4$frequency&lt;br /&gt;
[1] &amp;quot;93.7&amp;quot;&lt;br /&gt;
&amp;gt; list.r4[1]&lt;br /&gt;
$name&lt;br /&gt;
[1] &amp;quot;Radio4&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;gt; list.r4[[1]]&lt;br /&gt;
[1] &amp;quot;Radio4&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A very commonly used data structure is the '''data frame''', which R uses to store tabular data.  Given several vectors of equal length, we can collate them into a data frame:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; country &amp;lt;- c(&amp;quot;USA&amp;quot;, &amp;quot;China&amp;quot;, &amp;quot;GB&amp;quot;)&lt;br /&gt;
&amp;gt; gold &amp;lt;- c(46, 38, 29)&lt;br /&gt;
&amp;gt; silver &amp;lt;- c(29, 27, 17)&lt;br /&gt;
&amp;gt; bronze &amp;lt;- c(29, 23, 19)&lt;br /&gt;
&amp;gt; medals.2012 &amp;lt;- data.frame(country, gold, silver, bronze)&lt;br /&gt;
&amp;gt; medals.2012&lt;br /&gt;
    country  gold  silver  bronze&lt;br /&gt;
1       USA    46      29      29&lt;br /&gt;
2     China    38      27      23&lt;br /&gt;
3        GB    29      17      19&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can access columns of a data frame using the '''$''' operator:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; medals.2012$country&lt;br /&gt;
[1] USA   China GB   &lt;br /&gt;
Levels: China GB USA&lt;br /&gt;
&amp;gt; medals.2012$gold&lt;br /&gt;
[1] 46 38 29&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Standard Graphics: A taster=&lt;br /&gt;
&lt;br /&gt;
An aspect which makes R popular are it's graphing functions.  R also has some very handy built-in data sets--we'll use this to demonstrate just a small fraction of R's graphing abilities.&lt;br /&gt;
&lt;br /&gt;
First up is the humble '''plot()''' function.  Given a data frame of points, such as one charting the relationship between temperature and the vapour pressure of mercury, it will give us a (handily labelled) scatter plot: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(pressure)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See the gallery below for all the plots created in this section.&lt;br /&gt;
&lt;br /&gt;
The plot function will also accept a time-series (another class of object recognised by R) and will sensibly join the points with a line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(co2)&lt;br /&gt;
&amp;gt; class(co2)&lt;br /&gt;
[1] &amp;quot;ts&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Pie charts are easily constructed.  In this case, to show the relative proportions of electricity generated from different sources in the UK in 2011 (source: https://www.gov.uk/government/.../5942-uk-energy-in-brief-2012.pdf‎):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; uk.electricty.sources.2011 &amp;lt;- c(41,29,18,5,4,2,1)&lt;br /&gt;
&amp;gt; names(uk.electricty.sources.2011) &amp;lt;- (&amp;quot;Gas&amp;quot;, &amp;quot;Coal&amp;quot;, &amp;quot;Nuclear&amp;quot;, &amp;quot;Hydro &amp;amp; other&amp;quot;, &amp;quot;Wind&amp;quot;, &amp;quot;Imports&amp;quot;, &amp;quot;Oil&amp;quot;)&lt;br /&gt;
&amp;gt; pie(uk.electricty.sources.2011, main=&amp;quot;UK Electricty Generating Mix, 2011&amp;quot;, col=rainbow(7))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Next, let's create a bar chart of monthly average precipitation falling here in the fair city of Bristol (source: http://www.worldweatheronline.com):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; bristol.precip &amp;lt;- c(82.9, 56.1, 59.2, 69, 50.8, 50.9, 50.8, 74.8, 74.7, 91.1, 94.5, 93.6)&lt;br /&gt;
&amp;gt; names(bristol.precip) &amp;lt;- c(&amp;quot;Jan&amp;quot;, &amp;quot;Feb&amp;quot;, &amp;quot;Mar&amp;quot;, &amp;quot;Apr&amp;quot;, &amp;quot;May&amp;quot;, &amp;quot;Jun&amp;quot;, &amp;quot;Jul&amp;quot;, &amp;quot;Aug&amp;quot;, &amp;quot;Sep&amp;quot;, &amp;quot;Oct&amp;quot;, &amp;quot;Nov&amp;quot;, &amp;quot;Dec&amp;quot;)&lt;br /&gt;
&amp;gt; barplot(bristol.precip,&lt;br /&gt;
+ main=&amp;quot;Average Monthly Precipitation in Bristol&amp;quot;,&lt;br /&gt;
+ ylab=&amp;quot;Mean precipitation (mm)&amp;quot;,&lt;br /&gt;
+ ylim=c(0,100),&lt;br /&gt;
+ col=c(&amp;quot;darkblue&amp;quot;))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Box_plot 'Box and whisker' plots] are useful ways to graph the quartiles of some data.  In this case, the fuel efficiencies of various US cars, circa 1974:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; boxplot(mpg~cyl,data=mtcars, main=&amp;quot;Car Milage Data&amp;quot;,&lt;br /&gt;
+    xlab=&amp;quot;Number of Cylinders&amp;quot;, ylab=&amp;quot;Miles Per Gallon&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
R includes a very useful help facility.  In the case of the '''filled.contour()''' plotting function, the help page includes an example of it's use to plot the topology of a volcano in Auckland, NZ:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; ?filled.countour&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;gallery widths=300px heights=300px perrow=3&amp;gt;&lt;br /&gt;
File:Vapour-pressure.png|Vapour pressure of mercury against temperature&lt;br /&gt;
File:Mauna-loa.png|CO2 concentrations measured at Mauna-Loa between 1959 and 1997&lt;br /&gt;
File:Pie.png|The UK's electricity generating mix, 2011&lt;br /&gt;
File:Barplot.png|Average monthly precipitation in Bristol&lt;br /&gt;
File:Boxplot.png|Range of fuel efficiencies for different engine sizes &lt;br /&gt;
File:Maunga-Whau.png|Topology of Maunga Whau volcano in Auckland&lt;br /&gt;
&amp;lt;/gallery&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are many more example plots--complete with the R code required to create the plots (at the bottom of the page, after the comments)--on the following web page:&lt;br /&gt;
* http://gallery.r-enthusiasts.com/thumbs.php&lt;br /&gt;
&lt;br /&gt;
=Loops=&lt;br /&gt;
&lt;br /&gt;
A simple '''for''' loop:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; for (ii in seq(1,10)) print(ii)&lt;br /&gt;
[1] 1&lt;br /&gt;
[1] 2&lt;br /&gt;
[1] 3&lt;br /&gt;
[1] 4&lt;br /&gt;
[1] 5&lt;br /&gt;
[1] 6&lt;br /&gt;
[1] 7&lt;br /&gt;
[1] 8&lt;br /&gt;
[1] 9&lt;br /&gt;
[1] 10&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Some more exotic counting:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; for (ii in seq(from=10, to=0, by=-2)) print(ii)&lt;br /&gt;
[1] 10&lt;br /&gt;
[1] 8&lt;br /&gt;
[1] 6&lt;br /&gt;
[1] 4&lt;br /&gt;
[1] 2&lt;br /&gt;
[1] 0&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''while''' loops are for when we don't know the number of iterations in advance:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; ii &amp;lt;- runif(1,0,1)&lt;br /&gt;
&amp;gt; ii&lt;br /&gt;
[1] 0.3998513&lt;br /&gt;
&amp;gt; while (ii &amp;lt; 0.5) {print(ii); ii &amp;lt;- runif(1,0,1)}&lt;br /&gt;
[1] 0.3998513&lt;br /&gt;
[1] 0.05469244&lt;br /&gt;
&amp;gt; ii&lt;br /&gt;
[1] 0.8265036&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Functions=&lt;br /&gt;
&lt;br /&gt;
You can define your own functions in R, using the '''function''' keyword.  For example, Pythagoras' Theorem:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypotenuse &amp;lt;- function(x, y) {sqrt(x^2 + y^2)}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The braces ({}) are optional, but add clarity.&lt;br /&gt;
&lt;br /&gt;
To call the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypotenuse(3,4)&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can provide default values for the arguments, which can be overridden for any given invocation of the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot2 &amp;lt;- function(x=3 ,y=4) {sqrt(x^2 + y^2)}&lt;br /&gt;
&amp;gt; hypot2()&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;gt; hypot2(12,16)&lt;br /&gt;
[1] 20&lt;br /&gt;
&amp;gt; hypot2(y=16, x=12)&lt;br /&gt;
[1] 20&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can see that the order of the arguments is respected, unless the names are given, in which case the order can be changed. &lt;br /&gt;
&lt;br /&gt;
Longer functions can be spread over several lines.  We can also use the '''return''' keyword to control which value is returned by the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot3 &amp;lt;- function(x=3 ,y=4) {&lt;br /&gt;
+ x_sq &amp;lt;- x^2&lt;br /&gt;
+ y_sq &amp;lt;- y^2&lt;br /&gt;
+ return( sqrt(x_sq + y_sq) )}&lt;br /&gt;
&amp;gt; hypot3(6,8)&lt;br /&gt;
[1] 10&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can check on the contents of a function, by just typing it's name (without parentheses):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot3&lt;br /&gt;
function(x=3 ,y=4) {&lt;br /&gt;
x_sq &amp;lt;- x^2&lt;br /&gt;
y_sq &amp;lt;- y^2&lt;br /&gt;
return( sqrt(x_sq + y_sq) )}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or just check the arguments, using the '''args''' function.  (The body of the function in general is reported as NULL):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; args(hypot3)&lt;br /&gt;
function (x = 3, y = 4) &lt;br /&gt;
NULL&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Packages=&lt;br /&gt;
&lt;br /&gt;
Listed at http://cran.r-project.org/&lt;br /&gt;
&lt;br /&gt;
Let's install the '''multicore''' package, that will give us access to functions within R which will run on the multiple processors which we often find in our computers these days:   &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; install.packages(&amp;quot;multicore&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Et voila!  It is done.&lt;br /&gt;
&lt;br /&gt;
We can check which packages are currently loaded into the library available from our workspace:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; library()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If we need to add one, we type e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; library(multicore)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now, an example of using a function from the multicore package.  The '''lapply''' function, which is included in the standard R core, will map a given function over a list inputs, giving a list of the function outputs in return.  For example, we can map a squaring function over the list of integers from 1 to 3:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; lapply(1:3, function(x) {x^2})&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
which gives us the list:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[[1]]&lt;br /&gt;
[1] 1&lt;br /&gt;
&lt;br /&gt;
[[2]]&lt;br /&gt;
[1] 4&lt;br /&gt;
&lt;br /&gt;
[[3]]&lt;br /&gt;
[1] 9&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now, we can do the same work in parallel using:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; mclapply(1:3, function(x) {x^2})&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Reading Data from File=&lt;br /&gt;
&lt;br /&gt;
R provides some very useful functions for reading and writing data from/to file.&lt;br /&gt;
&lt;br /&gt;
==Text Files==&lt;br /&gt;
&lt;br /&gt;
Let's start with text files.  If your data is organised into a file such that it looks like a table with column headings:  &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Perhaps the simplest one is '''read.table()'''.  If I have a text file with the following contents:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
country              gold silver bronze&lt;br /&gt;
&amp;quot;USA&amp;quot;                46   29     29&lt;br /&gt;
&amp;quot;China&amp;quot;              38   27     23&lt;br /&gt;
&amp;quot;Great Britain&amp;quot;      29   17     19&lt;br /&gt;
&amp;quot;Russian Federation&amp;quot; 24   26     32&lt;br /&gt;
&amp;quot;Republic of Korea&amp;quot;  13   8      7&lt;br /&gt;
&amp;quot;Germany&amp;quot;            11   19     14&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It will be a simple matter to use the '''read.table()''' function to load the data into R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; medals.2012 &amp;lt;- read.table(&amp;quot;medals.txt&amp;quot;, header=TRUE)&lt;br /&gt;
&amp;gt; medals.2012&lt;br /&gt;
             country gold silver bronze&lt;br /&gt;
1                USA   46     29     29&lt;br /&gt;
2              China   38     27     23&lt;br /&gt;
3      Great Britain   29     17     19&lt;br /&gt;
4 Russian Federation   24     26     32&lt;br /&gt;
5  Republic of Korea   13      8      7&lt;br /&gt;
6            Germany   11     19     14&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is a corresponding '''write.table()''' function to export the contents of a data frame into a text file.&lt;br /&gt;
&lt;br /&gt;
CSV files can be easily handled by specifying '''sep=&amp;quot;,&amp;quot;''' as an argument to read.table().  However, for convenience, there are also '''read.csv()''' and '''write.csv()''' functions defined.  For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; write.csv(medals.2012,&amp;quot;medals.csv&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Gives us the file, '''medals.csv''', with the contents:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;quot;&amp;quot;,&amp;quot;country&amp;quot;,&amp;quot;gold&amp;quot;,&amp;quot;silver&amp;quot;,&amp;quot;bronze&amp;quot;&lt;br /&gt;
&amp;quot;1&amp;quot;,&amp;quot;USA&amp;quot;,46,29,29&lt;br /&gt;
&amp;quot;2&amp;quot;,&amp;quot;China&amp;quot;,38,27,23&lt;br /&gt;
&amp;quot;3&amp;quot;,&amp;quot;Great Britain&amp;quot;,29,17,19&lt;br /&gt;
&amp;quot;4&amp;quot;,&amp;quot;Russian Federation&amp;quot;,24,26,32&lt;br /&gt;
&amp;quot;5&amp;quot;,&amp;quot;Republic of Korea&amp;quot;,13,8,7&lt;br /&gt;
&amp;quot;6&amp;quot;,&amp;quot;Germany&amp;quot;,11,19,14&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Binary Files==&lt;br /&gt;
&lt;br /&gt;
The '''save()''' function will store an R data structure in binary form:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; save(medals.2012,file=&amp;quot;medals.RData&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~$ file medals.RData &lt;br /&gt;
medals.RData: gzip compressed data, from Unix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is, of course, a corresponding function to load such data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; load(&amp;quot;medals.RData&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Databases==&lt;br /&gt;
&lt;br /&gt;
If you would like to read and write data directly from/to a database, there are several packages to help you.  See http://cran.r-project.org/doc/manuals/r-release/R-data.html#Relational-databases for more information.&lt;br /&gt;
&lt;br /&gt;
==NetCDF==&lt;br /&gt;
&lt;br /&gt;
The [http://cran.r-project.org/web/packages/ncdf/index.html '''ncdf''' package] provides an interface to NetCDF files.  Before installing the package, you will need the Unidata NetCDF libraries installed on your system.  On Linux, the standard package managers conveniently provide this.  Note that you will need the 'development' packages.  Once the prerequisites are satisfied, you can use the standard R command to install the package from CRAN:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; install.packages(&amp;quot;ncdf&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Examples of Common Tasks=&lt;br /&gt;
&lt;br /&gt;
==Preparing Data==&lt;br /&gt;
&lt;br /&gt;
===Sorting===&lt;br /&gt;
&lt;br /&gt;
Using '''sort''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; railway.engines &amp;lt;- c(&amp;quot;thomas&amp;quot;, &amp;quot;henry&amp;quot;, &amp;quot;gordon&amp;quot;, &amp;quot;edward&amp;quot;, &amp;quot;james&amp;quot;)&lt;br /&gt;
&amp;gt; sort(railway.engines)&lt;br /&gt;
[1] &amp;quot;edward&amp;quot; &amp;quot;gordon&amp;quot; &amp;quot;henry&amp;quot;  &amp;quot;james&amp;quot;  &amp;quot;thomas&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/sort.html&lt;br /&gt;
&lt;br /&gt;
===Random Sampling===&lt;br /&gt;
&lt;br /&gt;
Using '''sample''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; railway.engines &amp;lt;- c(&amp;quot;thomas&amp;quot;, &amp;quot;henry&amp;quot;, &amp;quot;gordon&amp;quot;, &amp;quot;edward&amp;quot;, &amp;quot;james&amp;quot;)&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;gordon&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;james&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;edward&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;thomas&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;gordon&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;james&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/sample.html&lt;br /&gt;
&lt;br /&gt;
===Combining===&lt;br /&gt;
&lt;br /&gt;
Using '''rbind''' to add combine the rows to two data frames:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; country &amp;lt;- c(&amp;quot;France&amp;quot;, &amp;quot;Italy&amp;quot;, &amp;quot;Hungary&amp;quot;, &amp;quot;Australia&amp;quot;)&lt;br /&gt;
&amp;gt; gold &amp;lt;- c(11, 8, 8, 7)&lt;br /&gt;
&amp;gt; silver &amp;lt;- c(11, 9, 4, 16)&lt;br /&gt;
&amp;gt; bronze &amp;lt;- c(12, 11, 5, 12)&lt;br /&gt;
&amp;gt; extras.2012 &amp;lt;- data.frame(country, gold, silver, bronze)&lt;br /&gt;
&amp;gt; rbind(medals.2012, extras.2012)&lt;br /&gt;
              country gold silver bronze&lt;br /&gt;
1                 USA   46     29     29&lt;br /&gt;
2               China   38     27     23&lt;br /&gt;
3       Great Britain   29     17     19&lt;br /&gt;
4  Russian Federation   24     26     32&lt;br /&gt;
5   Republic of Korea   13      8      7&lt;br /&gt;
6             Germany   11     19     14&lt;br /&gt;
7              France   11     11     12&lt;br /&gt;
8               Italy    8      9     11&lt;br /&gt;
9             Hungary    8      4      5&lt;br /&gt;
10          Australia    7     16     12&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/cbind.html&lt;br /&gt;
&lt;br /&gt;
==Linear Regression==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(cars)&lt;br /&gt;
&amp;gt; res=lm(dist ~ speed, data=cars)&lt;br /&gt;
&amp;gt; abline(res)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Image:R-lm(cars)-abline.png|400px|thumbnail|center|linear regression of stopping distance against speed from the built-in data set, cars]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
'''Exercises'''&lt;br /&gt;
* You may wish to compare different methods of estimation.  From the MASS package, you can fit a line with the '''rlm''' and '''lqs'' funtions.  You can plot all the lines against the data using:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; abline(res.lm, lty=1)&lt;br /&gt;
&amp;gt; abline(res.rlm, lty=2)&lt;br /&gt;
&amp;gt; abline(res.lqs, lty=3)&lt;br /&gt;
&amp;gt; legend(x=5, y=100, legend=c(&amp;quot;lm&amp;quot;,&amp;quot;rlm&amp;quot;,&amp;quot;lqs&amp;quot;), lty=c(1,2,3))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-patched/library/MASS/html/rlm.html and http://stat.ethz.ch/R-manual/R-devel/RHOME/library/MASS/html/lqs.html.&lt;br /&gt;
&lt;br /&gt;
* Weighted least squares.  The '''lm''' function will accept a vector of weights, '''lm(... weights=...)'''.  If given, the function will optimise the line of best fit according a the equation of weighted least squares.  Experiment with different linear model fits, given different weighting vectors.  Some handy hints for creating a vector of weights:&lt;br /&gt;
** '''w1&amp;lt;-rep(0.1,50)''' will give you a vector, length 50, where each element has a value of 0.1.  W1[1]&amp;lt;-10 will give the first element of the vector a value of 10.&lt;br /&gt;
** '''w2&amp;lt;-seq(from=0.02, to=1.0, by=0.02)''' provides a vector containing a sequence of values from 0.02 to 1.0 in steps of 0.02 (handily, again 50 in total).&lt;br /&gt;
&lt;br /&gt;
==Significance Testing==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; boys_2=c(90.2, 91.4, 86.4, 87.6, 86.7, 88.1, 82.2, 83.8, 91, 87.4)&lt;br /&gt;
&amp;gt; girls_2=c(83.8, 86.2, 85.1, 88.6, 83, 88.9, 89.7, 81.3, 88.7, 88.4)&lt;br /&gt;
&amp;gt; res=var.test(boys_2,girls_2)&lt;br /&gt;
&amp;gt; res&lt;br /&gt;
&lt;br /&gt;
	F test to compare two variances&lt;br /&gt;
&lt;br /&gt;
data:  boys_2 and girls_2 &lt;br /&gt;
F = 1.0186, num df = 9, denom df = 9, p-value = 0.9786&lt;br /&gt;
alternative hypothesis: true ratio of variances is not equal to 1 &lt;br /&gt;
95 percent confidence interval:&lt;br /&gt;
 0.2529956 4.1007126 &lt;br /&gt;
sample estimates:&lt;br /&gt;
ratio of variances &lt;br /&gt;
          1.018559 &lt;br /&gt;
&amp;gt; res=t.test(boys_2, girls_2, var.equal=TRUE, paired=FALSE)&lt;br /&gt;
&amp;gt; res&lt;br /&gt;
&lt;br /&gt;
	Two Sample t-test&lt;br /&gt;
&lt;br /&gt;
data:  boys_2 and girls_2 &lt;br /&gt;
t = 0.8429, df = 18, p-value = 0.4103&lt;br /&gt;
alternative hypothesis: true difference in means is not equal to 0 &lt;br /&gt;
95 percent confidence interval:&lt;br /&gt;
 -1.656675  3.876675 &lt;br /&gt;
sample estimates:&lt;br /&gt;
mean of x mean of y &lt;br /&gt;
    87.48     86.3&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Classification==&lt;br /&gt;
&lt;br /&gt;
===k Nearest Neighbours===&lt;br /&gt;
&lt;br /&gt;
This famous (Fisher's or Anderson's) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa (s), versicolor (c), and virginica (v).&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-patched/library/datasets/html/iris.html&lt;br /&gt;
&lt;br /&gt;
k-nearest neighbour classification for test set from training set: For each row of the test set, the k nearest (in Euclidean distance) training set vectors are found, and the classification is decided by majority vote, with ties broken at random. If there are ties for the kth nearest vector, all candidates are included in the vote.&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/class/html/knn.html&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(class)&lt;br /&gt;
train &amp;lt;- rbind(iris3[1:25,,1], iris3[1:25,,2], iris3[1:25,,3])&lt;br /&gt;
test &amp;lt;- rbind(iris3[26:50,,1], iris3[26:50,,2], iris3[26:50,,3])&lt;br /&gt;
cl &amp;lt;- factor(c(rep(&amp;quot;s&amp;quot;,25), rep(&amp;quot;c&amp;quot;,25), rep(&amp;quot;v&amp;quot;,25)))&lt;br /&gt;
iris3.knn &amp;lt;- knn(train, test, cl, k = 3, prob=TRUE)&lt;br /&gt;
table(predicted=iris3.knn, actual=cl)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
How did we do?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
         actual&lt;br /&gt;
predicted  c  s  v&lt;br /&gt;
        c 23  0  3&lt;br /&gt;
        s  0 25  0&lt;br /&gt;
        v  2  0 22&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Classification Trees===&lt;br /&gt;
&lt;br /&gt;
The kyphosis data frame has 81 rows and 4 columns. representing data on children who have had corrective spinal surgery.&lt;br /&gt;
&lt;br /&gt;
This data frame contains the following columns:&lt;br /&gt;
* Kyphosis: a factor with levels absent present indicating if a kyphosis (a type of deformation) was present after the operation.&lt;br /&gt;
* Age: in months&lt;br /&gt;
* Number: the number of vertebrae involved&lt;br /&gt;
* Start: the number of the first (topmost) vertebra operated on.&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/rpart/html/kyphosis.html&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
fit &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)&lt;br /&gt;
fit2 &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis,&lt;br /&gt;
              parms = list(prior = c(.65,.35), split = &amp;quot;information&amp;quot;))&lt;br /&gt;
fit3 &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis,&lt;br /&gt;
              control = rpart.control(cp = 0.05))&lt;br /&gt;
par(mfrow = c(1,2), xpd = NA) # otherwise on some devices the text is clipped&lt;br /&gt;
plot(fit)&lt;br /&gt;
text(fit, use.n = TRUE)&lt;br /&gt;
plot(fit2)&lt;br /&gt;
text(fit2, use.n = TRUE)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Image:R-classification-tree.png|500px|thumbnail|center|Classification tree for the kyphosis data frame.]]&lt;br /&gt;
&lt;br /&gt;
==Solving Systems of Linear Equations==&lt;br /&gt;
&lt;br /&gt;
See, e.g.: https://source.ggy.bris.ac.uk/wiki/NumMethodsPDEs&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; A &amp;lt;- array(c(1,3,2,3,5,4,-2,6,3), dim=c(3,3))&lt;br /&gt;
&amp;gt; b &amp;lt;- c(5,7,8)&lt;br /&gt;
&amp;gt; solve(A,b)&lt;br /&gt;
[1] -15   8   2&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Suggested Exercises=&lt;br /&gt;
&lt;br /&gt;
If you would like to work through some exercises, with model answers included, you could take a look at:&lt;br /&gt;
* http://www2.warwick.ac.uk/fac/sci/statistics/staff/academic-research/reed/rexercises.pdf&lt;br /&gt;
&lt;br /&gt;
=Writing Faster R Code=&lt;br /&gt;
&lt;br /&gt;
In the above sections we've introduced a number of features of R and have begun the journey to becoming a proficient and productive user of the language.  In the remaining sections, we'll switch tack and focus on a question commonly asked by those beginning to use R in anger--'''&amp;quot;My R code is slow.  How can I speed it up?&amp;quot;'''.  In this section we'll consider the related tasks of finding which bits of your R code is responsible for the majority of the run-time and what you can do about it.&lt;br /&gt;
&lt;br /&gt;
==Profiling &amp;amp; Timing==&lt;br /&gt;
&lt;br /&gt;
In order to remain productive (and sane, and have a social life...), it is essential that we first identify which portions of your R code are responsible for the majority of the run-time.  We could spend ages optimising a portion that we ''think'' may be running slowly, but computers have the gift(!) to constantly surprise us, and if that portion of your program accounted for, say, 10% of the run-time, then you will have sweated for absolutely no useful gain.&lt;br /&gt;
&lt;br /&gt;
The simplest method of investigation is to simply time the application of a function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
system.time(some.function())&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can get a more detailed analysis of a block of code using the built-in R profiler.  The general pattern of invocation is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
Rprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
# Do some work&lt;br /&gt;
Rprof()&lt;br /&gt;
summaryRprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For example, here's an R script, '''profile.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
Rprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
# Create a 10 x 100,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(100000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)})&lt;br /&gt;
Rprof()&lt;br /&gt;
summaryRprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Which I ran by typing:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
R CMD BATCH profile.r&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the output file, '''profile.r.Rout''', I found the following break down:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
               self.time self.pct total.time total.pct&lt;br /&gt;
&amp;quot;simpleLoess&amp;quot;       4.84    88.00       5.10     92.73&lt;br /&gt;
&amp;quot;rnorm&amp;quot;             0.22     4.00       0.22      4.00&lt;br /&gt;
&amp;quot;loess.smooth&amp;quot;      0.18     3.27       5.28     96.00&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The profile tells us that the function '''simpleLoess''' take 88% of the runtime, whereas '''rnorm''' takes only 4%.&lt;br /&gt;
&lt;br /&gt;
==Preallocation of Memory==&lt;br /&gt;
&lt;br /&gt;
As with other scripting languages, such as MATLAB, the simplest method that you can use to speed up your R code is to pre-allocate the storage for variables whenever possible.  To see the benefits of this, consider the following two functions:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; f1 &amp;lt;- function() {&lt;br /&gt;
+ v &amp;lt;- c()&lt;br /&gt;
+ for (i in 1:30000)&lt;br /&gt;
+   v[i] &amp;lt;- i^2&lt;br /&gt;
+ }&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; f2 &amp;lt;- function() {&lt;br /&gt;
+ v &amp;lt;- c(NA)&lt;br /&gt;
+ length(v) &amp;lt;- 30000&lt;br /&gt;
+ for (i in 1:30000)&lt;br /&gt;
+   v[i] &amp;lt;- i^2&lt;br /&gt;
+ }&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Timing calls to each of them shows that the pre-allocation of memory gives a whopping ~'''x30 speed-up'''.  Your mileage will vary depending upon the details of your code.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; system.time(f1())&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  1.720   0.040   1.762&lt;br /&gt;
&amp;gt; system.time(f2())&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.052   0.000   0.05&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Vectorised Operations==&lt;br /&gt;
&lt;br /&gt;
The other principle method for speeding up your R code is to eliminate loops whenever you can.  Many functions and operators in R will accept arrays as input, rather than just single values and this may allow you to not use a loop.  The examples in the previous section used for loops to step through an array, squaring each element.  However, you can achieve the same result far more quickly by passing the array ''en masse'' to exponentiation operator:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; system.time(v &amp;lt;- (1:1000000)^2)&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.024   0.004   0.026&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Here we've been able to square 1,000,000 items in half the time it took to process 30,000!&lt;br /&gt;
&lt;br /&gt;
==Calling Functions Written in a Compiled Language (e.g. C or Fortran)==&lt;br /&gt;
&lt;br /&gt;
Another way to get more speed is to outsource portions of R code that are found to be slow to a compiled language, such as C or Fortran.  A good starting point on this topic is:&lt;br /&gt;
&lt;br /&gt;
* http://mazamascience.com/WorkingWithData/?p=1067&lt;br /&gt;
&lt;br /&gt;
=R and HPC=&lt;br /&gt;
&lt;br /&gt;
If you've profiled your code and tried all that you can to speed it up, as described in the previous section, you might be interested in the various initiatives that exist to run R on high performance computers, such as bluecrsytal:  &lt;br /&gt;
&lt;br /&gt;
* http://cran.r-project.org/web/views/HighPerformanceComputing.html&lt;br /&gt;
&lt;br /&gt;
We will see in the following examples, the general approach to running R in parallel is to arrange your task so that a function is applied to a list of inputs, and then to split the list over several CPU cores or cluster worker nodes.&lt;br /&gt;
&lt;br /&gt;
==Multicore==&lt;br /&gt;
&lt;br /&gt;
The '''multicore''' package allows us to make use of several CPU cores within a single machine.  Note, however, that the package  does not work on a MS Windows computers.&lt;br /&gt;
&lt;br /&gt;
As an example, let's look at the use of the package's '''mclapply''' function, a multicore equivalent of R's built-in list apply mapper, '''lapply'''.   I saved the following commands into an R script called '''mutlicore.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(multicore)&lt;br /&gt;
# how many cores are present?&lt;br /&gt;
multicore:::detectCores()&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And used the following submission script to run it on bluecrystal phase2:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=00:00:05&lt;br /&gt;
&lt;br /&gt;
#! Ensure that we have the correct version of R loaded&lt;br /&gt;
module add languages/R-2.15.1&lt;br /&gt;
&lt;br /&gt;
#! change the working directory (default is home directory)&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
#! Run the R script&lt;br /&gt;
R CMD BATCH multicore.r&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After the job had run, I got the following output in the file '''multicore.r.Rout''':&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(multicore)&lt;br /&gt;
&amp;gt; # how many cores are present?&lt;br /&gt;
&amp;gt; multicore:::detectCores()&lt;br /&gt;
[1] 8&lt;br /&gt;
&amp;gt; # Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
&amp;gt; data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
&amp;gt; # Map a function over the matrix.  First in serial..&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.674   0.007   0.749 &lt;br /&gt;
&amp;gt; # .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.301   0.074   0.113 &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Rmpi==&lt;br /&gt;
&lt;br /&gt;
The '''Rmpi''' package allows us to create and use cohorts of message passing processes from within R.  It does so by providing an interface to the MPI (Message Passing Interface) library.&lt;br /&gt;
&lt;br /&gt;
In order to use the Rmpi package on BCp2, you will need the '''ofed/openmpi/gcc/64/1.4.2-qlc''' module loaded.&lt;br /&gt;
&lt;br /&gt;
Here's a short example that I saved as '''Rmpi.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(Rmpi)&lt;br /&gt;
# spawn as many slaves as possible&lt;br /&gt;
mpi.spawn.Rslaves()&lt;br /&gt;
mpi.remote.exec(mpi.get.processor.name())&lt;br /&gt;
mpi.remote.exec(runif(1))&lt;br /&gt;
mpi.close.Rslaves()&lt;br /&gt;
mpi.quit()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I submitted the job to BCp2 using the following submission script:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#PBS -l nodes=4:ppn=1,walltime=00:00:05&lt;br /&gt;
&lt;br /&gt;
#! Ensure that we have the correct version of R loaded&lt;br /&gt;
module add languages/R-2.15.1&lt;br /&gt;
&lt;br /&gt;
#! change the working directory (default is home directory)&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
#! Create a machine file (used for multi-node jobs)&lt;br /&gt;
cat $PBS_NODEFILE &amp;gt; machine.file.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
#! Run the R script&lt;br /&gt;
mpirun -np 1 -machinefile machine.file.$PBS_JOBID R CMD BATCH Rmpi.r&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and got the following output:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(Rmpi)&lt;br /&gt;
&amp;gt; # spawn as many slaves as possible&lt;br /&gt;
&amp;gt; mpi.spawn.Rslaves()&lt;br /&gt;
        4 slaves are spawned successfully. 0 failed.&lt;br /&gt;
master (rank 0, comm 1) of size 5 is running on: u03n074 &lt;br /&gt;
slave1 (rank 1, comm 1) of size 5 is running on: u03n098 &lt;br /&gt;
slave2 (rank 2, comm 1) of size 5 is running on: u04n029 &lt;br /&gt;
slave3 (rank 3, comm 1) of size 5 is running on: u04n030 &lt;br /&gt;
slave4 (rank 4, comm 1) of size 5 is running on: u03n074 &lt;br /&gt;
&amp;gt; mpi.remote.exec(mpi.get.processor.name())&lt;br /&gt;
$slave1&lt;br /&gt;
[1] &amp;quot;u03n098&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave2&lt;br /&gt;
[1] &amp;quot;u04n029&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave3&lt;br /&gt;
[1] &amp;quot;u04n030&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave4&lt;br /&gt;
[1] &amp;quot;u03n074&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;gt; mpi.remote.exec(runif(1))&lt;br /&gt;
         X1        X2        X3        X4&lt;br /&gt;
1 0.5154871 0.5154871 0.5154871 0.5154871&lt;br /&gt;
&amp;gt; mpi.close.Rslaves()&lt;br /&gt;
[1] 1&lt;br /&gt;
&amp;gt; mpi.quit()&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Snow==&lt;br /&gt;
&lt;br /&gt;
Calling MPI routines from within R may be too low level for many people to use comfortably.  Happily, the '''snow''' package provides a higher level abstraction for distributed memory programming from within R.&lt;br /&gt;
&lt;br /&gt;
Here's my example program that a saved as '''snow.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(snow)&lt;br /&gt;
# request a cluster of 3 worker nodes&lt;br /&gt;
cl &amp;lt;- makeCluster(3)&lt;br /&gt;
clusterCall(cl, function() Sys.info()[c(&amp;quot;nodename&amp;quot;,&amp;quot;machine&amp;quot;)])&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using snow, across a cluster of workers)&lt;br /&gt;
system.time(x &amp;lt;- clusterApply(cl, data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
stopCluster(cl)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I ran it on BCp2 using the same submission script given for Rmpi, save for changing Rmpi.r to snow.r.  The output was:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(snow)&lt;br /&gt;
&amp;gt; # request a cluster of 3 worker nodes&lt;br /&gt;
&amp;gt; cl &amp;lt;- makeCluster(3)&lt;br /&gt;
Loading required package: Rmpi&lt;br /&gt;
        3 slaves are spawned successfully. 0 failed.&lt;br /&gt;
&amp;gt; clusterCall(cl, function() Sys.info()[c(&amp;quot;nodename&amp;quot;,&amp;quot;machine&amp;quot;)])&lt;br /&gt;
[[1]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u01n105&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
[[2]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u02n014&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
[[3]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u03n098&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&amp;gt; # Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
&amp;gt; data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
&amp;gt; # Map a function over the matrix.  First in serial..&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.711   0.001   0.715 &lt;br /&gt;
&amp;gt; # .. and secondly in parallel (using snow, across a cluster of workers)&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- clusterApply(cl, data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.259   0.001   0.260 &lt;br /&gt;
&amp;gt; stopCluster(cl)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Parallel==&lt;br /&gt;
&lt;br /&gt;
The '''parallel''' package is an amalgamation of functionality from the multicore and snow packages.  The shared memory parallelism in this package runs on an MS Windows machine (unlike the multicore package). &lt;br /&gt;
&lt;br /&gt;
I trivial translation of our previous multicore example is:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(parallel)&lt;br /&gt;
# how many cores are present?&lt;br /&gt;
parallel:::detectCores()&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I have not been able to get a distributed memory cluster working on BCp2 using the parallel package.&lt;br /&gt;
&lt;br /&gt;
=Further Reading=&lt;br /&gt;
&lt;br /&gt;
* [http://shop.oreilly.com/product/9780596801717.do R in a Nutshell]&lt;br /&gt;
* [http://shop.oreilly.com/product/0636920021421.do Parallel R]&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=R1&amp;diff=9374</id>
		<title>R1</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=R1&amp;diff=9374"/>
		<updated>2013-11-28T16:26:48Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: /* Examples of Common Tasks */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[category:Pragmatic Programming]]&lt;br /&gt;
'''Open Source Statistics with R'''&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
R is a mature, open-source (i.e. free!) statistics package, with an intuitive interface, excellent graphics and a vibrant community constantly adding new methods for the statistical investigation of your data to the library of packages available.&lt;br /&gt;
&lt;br /&gt;
The goal of this tutorial is to introduce you to the R package, and not to be an introductory course in statistics.&lt;br /&gt;
&lt;br /&gt;
If you are working on a Linux system, you will typically start R from the command line.  On a Windows machine, or a Mac, you will typically start up R in some form of GUI.  However you get R started, you will have access to an R command prompt.  The good news is that the examples below will all work at the R command prompt, however you gained access to it.&lt;br /&gt;
&lt;br /&gt;
Further resources:&lt;br /&gt;
&lt;br /&gt;
* The R manual is a great resource for learning R: http://cran.r-project.org/doc/manuals/r-release/R-intro.pdf&lt;br /&gt;
* Some excellent examples of using R can also be found at: http://msenux.redwoods.edu/math/R/ and http://www.r-tutor.com/&lt;br /&gt;
&lt;br /&gt;
=Getting Started=&lt;br /&gt;
&lt;br /&gt;
The very simplest thing we can do with R is to perform some arithmetic at the command prompt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; phi &amp;lt;- (1+sqrt(5))/2&lt;br /&gt;
&amp;gt; phi&lt;br /&gt;
[1] 1.618034&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Parentheses are used to modify the usual order of precedence of the operators ('''/''' will typically be evaluated before '''+''').  Note the '''[1]''' accompanying the returned value.  All numbers entered at the console are interpreted as a vector.  The '[1]' indicates that the line in question is displaying the vector of values starting at first index.  We can use the handy sequence function to create a vector containing more than a single element:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; odds &amp;lt;- seq(from=1, to=67, by=2)&lt;br /&gt;
&amp;gt; odds&lt;br /&gt;
 [1]  1  3  5  7  9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49&lt;br /&gt;
[26] 51 53 55 57 59 61 63 65 67&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
From the above example, we can see that both the '''&amp;lt;-''' and '''=''' operators can be used for assignment.&lt;br /&gt;
&lt;br /&gt;
Vectors are commonly used data structures in R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
coords.bris &amp;lt;- c(51.5, 2.6)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As are matrices:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; magic &amp;lt;- matrix(data=c(2,7,6,9,5,1,4,3,8),nrow=3,ncol=3)&lt;br /&gt;
&amp;gt; magic&lt;br /&gt;
     [,1] [,2] [,3]&lt;br /&gt;
[1,]    2    9    4&lt;br /&gt;
[2,]    7    5    3&lt;br /&gt;
[3,]    6    1    8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the '''c''' function combines the arguments given in the parentheses.  We can access portions of the array using the syntax shown in the square brackets.  For example, we can access the first row using the '''[1,]''' notation, and similarly the second column using '''[,2]'''.  Since the square is 3x3 magic, the numbers in both slices should sum to 15:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; sum(magic[1,])&lt;br /&gt;
[1] 15&lt;br /&gt;
&amp;gt; sum(magic[,2])&lt;br /&gt;
[1] 15&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Single elements and ranges can also accessed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; magic[2,2]&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;gt; magic[2:3,2:3]&lt;br /&gt;
     [,1] [,2]&lt;br /&gt;
[1,]    5    3&lt;br /&gt;
[2,]    1    8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
R also provides '''arrays''', which have more than two dimensions, and '''lists''' to hold heterogeneous collections.&lt;br /&gt;
&lt;br /&gt;
An example list:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; list.r4 &amp;lt;- list(name=&amp;quot;Radio4&amp;quot;, frequency=&amp;quot;93.7&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The items of which, we can access in several ways:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; list.r4$frequency&lt;br /&gt;
[1] &amp;quot;93.7&amp;quot;&lt;br /&gt;
&amp;gt; list.r4[1]&lt;br /&gt;
$name&lt;br /&gt;
[1] &amp;quot;Radio4&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;gt; list.r4[[1]]&lt;br /&gt;
[1] &amp;quot;Radio4&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A very commonly used data structure is the '''data frame''', which R uses to store tabular data.  Given several vectors of equal length, we can collate them into a data frame:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; country &amp;lt;- c(&amp;quot;USA&amp;quot;, &amp;quot;China&amp;quot;, &amp;quot;GB&amp;quot;)&lt;br /&gt;
&amp;gt; gold &amp;lt;- c(46, 38, 29)&lt;br /&gt;
&amp;gt; silver &amp;lt;- c(29, 27, 17)&lt;br /&gt;
&amp;gt; bronze &amp;lt;- c(29, 23, 19)&lt;br /&gt;
&amp;gt; medals.2012 &amp;lt;- data.frame(country, gold, silver, bronze)&lt;br /&gt;
&amp;gt; medals.2012&lt;br /&gt;
    country  gold  silver  bronze&lt;br /&gt;
1       USA    46      29      29&lt;br /&gt;
2     China    38      27      23&lt;br /&gt;
3        GB    29      17      19&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can access columns of a data frame using the '''$''' operator:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; medals.2012$country&lt;br /&gt;
[1] USA   China GB   &lt;br /&gt;
Levels: China GB USA&lt;br /&gt;
&amp;gt; medals.2012$gold&lt;br /&gt;
[1] 46 38 29&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Standard Graphics: A taster=&lt;br /&gt;
&lt;br /&gt;
An aspect which makes R popular are it's graphing functions.  R also has some very handy built-in data sets--we'll use this to demonstrate just a small fraction of R's graphing abilities.&lt;br /&gt;
&lt;br /&gt;
First up is the humble '''plot()''' function.  Given a data frame of points, such as one charting the relationship between temperature and the vapour pressure of mercury, it will give us a (handily labelled) scatter plot: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(pressure)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See the gallery below for all the plots created in this section.&lt;br /&gt;
&lt;br /&gt;
The plot function will also accept a time-series (another class of object recognised by R) and will sensibly join the points with a line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(co2)&lt;br /&gt;
&amp;gt; class(co2)&lt;br /&gt;
[1] &amp;quot;ts&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Pie charts are easily constructed.  In this case, to show the relative proportions of electricity generated from different sources in the UK in 2011 (source: https://www.gov.uk/government/.../5942-uk-energy-in-brief-2012.pdf‎):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; uk.electricty.sources.2011 &amp;lt;- c(41,29,18,5,4,2,1)&lt;br /&gt;
&amp;gt; names(uk.electricty.sources.2011) &amp;lt;- (&amp;quot;Gas&amp;quot;, &amp;quot;Coal&amp;quot;, &amp;quot;Nuclear&amp;quot;, &amp;quot;Hydro &amp;amp; other&amp;quot;, &amp;quot;Wind&amp;quot;, &amp;quot;Imports&amp;quot;, &amp;quot;Oil&amp;quot;)&lt;br /&gt;
&amp;gt; pie(uk.electricty.sources.2011, main=&amp;quot;UK Electricty Generating Mix, 2011&amp;quot;, col=rainbow(7))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Next, let's create a bar chart of monthly average precipitation falling here in the fair city of Bristol (source: http://www.worldweatheronline.com):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; bristol.precip &amp;lt;- c(82.9, 56.1, 59.2, 69, 50.8, 50.9, 50.8, 74.8, 74.7, 91.1, 94.5, 93.6)&lt;br /&gt;
&amp;gt; names(bristol.precip) &amp;lt;- c(&amp;quot;Jan&amp;quot;, &amp;quot;Feb&amp;quot;, &amp;quot;Mar&amp;quot;, &amp;quot;Apr&amp;quot;, &amp;quot;May&amp;quot;, &amp;quot;Jun&amp;quot;, &amp;quot;Jul&amp;quot;, &amp;quot;Aug&amp;quot;, &amp;quot;Sep&amp;quot;, &amp;quot;Oct&amp;quot;, &amp;quot;Nov&amp;quot;, &amp;quot;Dec&amp;quot;)&lt;br /&gt;
&amp;gt; barplot(bristol.precip,&lt;br /&gt;
+ main=&amp;quot;Average Monthly Precipitation in Bristol&amp;quot;,&lt;br /&gt;
+ ylab=&amp;quot;Mean precipitation (mm)&amp;quot;,&lt;br /&gt;
+ ylim=c(0,100),&lt;br /&gt;
+ col=c(&amp;quot;darkblue&amp;quot;))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Box_plot 'Box and whisker' plots] are useful ways to graph the quartiles of some data.  In this case, the fuel efficiencies of various US cars, circa 1974:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; boxplot(mpg~cyl,data=mtcars, main=&amp;quot;Car Milage Data&amp;quot;,&lt;br /&gt;
+    xlab=&amp;quot;Number of Cylinders&amp;quot;, ylab=&amp;quot;Miles Per Gallon&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
R includes a very useful help facility.  In the case of the '''filled.contour()''' plotting function, the help page includes an example of it's use to plot the topology of a volcano in Auckland, NZ:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; ?filled.countour&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;gallery widths=300px heights=300px perrow=3&amp;gt;&lt;br /&gt;
File:Vapour-pressure.png|Vapour pressure of mercury against temperature&lt;br /&gt;
File:Mauna-loa.png|CO2 concentrations measured at Mauna-Loa between 1959 and 1997&lt;br /&gt;
File:Pie.png|The UK's electricity generating mix, 2011&lt;br /&gt;
File:Barplot.png|Average monthly precipitation in Bristol&lt;br /&gt;
File:Boxplot.png|Range of fuel efficiencies for different engine sizes &lt;br /&gt;
File:Maunga-Whau.png|Topology of Maunga Whau volcano in Auckland&lt;br /&gt;
&amp;lt;/gallery&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are many more example plots--complete with the R code required to create the plots (at the bottom of the page, after the comments)--on the following web page:&lt;br /&gt;
* http://gallery.r-enthusiasts.com/thumbs.php&lt;br /&gt;
&lt;br /&gt;
=Loops=&lt;br /&gt;
&lt;br /&gt;
A simple '''for''' loop:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; for (ii in seq(1,10)) print(ii)&lt;br /&gt;
[1] 1&lt;br /&gt;
[1] 2&lt;br /&gt;
[1] 3&lt;br /&gt;
[1] 4&lt;br /&gt;
[1] 5&lt;br /&gt;
[1] 6&lt;br /&gt;
[1] 7&lt;br /&gt;
[1] 8&lt;br /&gt;
[1] 9&lt;br /&gt;
[1] 10&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Some more exotic counting:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; for (ii in seq(from=10, to=0, by=-2)) print(ii)&lt;br /&gt;
[1] 10&lt;br /&gt;
[1] 8&lt;br /&gt;
[1] 6&lt;br /&gt;
[1] 4&lt;br /&gt;
[1] 2&lt;br /&gt;
[1] 0&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''while''' loops are for when we don't know the number of iterations in advance:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; ii &amp;lt;- runif(1,0,1)&lt;br /&gt;
&amp;gt; ii&lt;br /&gt;
[1] 0.3998513&lt;br /&gt;
&amp;gt; while (ii &amp;lt; 0.5) {print(ii); ii &amp;lt;- runif(1,0,1)}&lt;br /&gt;
[1] 0.3998513&lt;br /&gt;
[1] 0.05469244&lt;br /&gt;
&amp;gt; ii&lt;br /&gt;
[1] 0.8265036&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Functions=&lt;br /&gt;
&lt;br /&gt;
You can define your own functions in R, using the '''function''' keyword.  For example, Pythagoras' Theorem:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypotenuse &amp;lt;- function(x, y) {sqrt(x^2 + y^2)}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The braces ({}) are optional, but add clarity.&lt;br /&gt;
&lt;br /&gt;
To call the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypotenuse(3,4)&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can provide default values for the arguments, which can be overridden for any given invocation of the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot2 &amp;lt;- function(x=3 ,y=4) {sqrt(x^2 + y^2)}&lt;br /&gt;
&amp;gt; hypot2()&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;gt; hypot2(12,16)&lt;br /&gt;
[1] 20&lt;br /&gt;
&amp;gt; hypot2(y=16, x=12)&lt;br /&gt;
[1] 20&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can see that the order of the arguments is respected, unless the names are given, in which case the order can be changed. &lt;br /&gt;
&lt;br /&gt;
Longer functions can be spread over several lines.  We can also use the '''return''' keyword to control which value is returned by the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot3 &amp;lt;- function(x=3 ,y=4) {&lt;br /&gt;
+ x_sq &amp;lt;- x^2&lt;br /&gt;
+ y_sq &amp;lt;- y^2&lt;br /&gt;
+ return( sqrt(x_sq + y_sq) )}&lt;br /&gt;
&amp;gt; hypot3(6,8)&lt;br /&gt;
[1] 10&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can check on the contents of a function, by just typing it's name (without parentheses):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot3&lt;br /&gt;
function(x=3 ,y=4) {&lt;br /&gt;
x_sq &amp;lt;- x^2&lt;br /&gt;
y_sq &amp;lt;- y^2&lt;br /&gt;
return( sqrt(x_sq + y_sq) )}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or just check the arguments, using the '''args''' function.  (The body of the function in general is reported as NULL):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; args(hypot3)&lt;br /&gt;
function (x = 3, y = 4) &lt;br /&gt;
NULL&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Packages=&lt;br /&gt;
&lt;br /&gt;
Listed at http://cran.r-project.org/&lt;br /&gt;
&lt;br /&gt;
Let's install the '''multicore''' package, that will give us access to functions within R which will run on the multiple processors which we often find in our computers these days:   &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; install.packages(&amp;quot;multicore&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Et voila!  It is done.&lt;br /&gt;
&lt;br /&gt;
We can check which packages are currently loaded into the library available from our workspace:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; library()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If we need to add one, we type e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; library(multicore)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now, an example of using a function from the multicore package.  The '''lapply''' function, which is included in the standard R core, will map a given function over a list inputs, giving a list of the function outputs in return.  For example, we can map a squaring function over the list of integers from 1 to 3:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; lapply(1:3, function(x) {x^2})&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
which gives us the list:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[[1]]&lt;br /&gt;
[1] 1&lt;br /&gt;
&lt;br /&gt;
[[2]]&lt;br /&gt;
[1] 4&lt;br /&gt;
&lt;br /&gt;
[[3]]&lt;br /&gt;
[1] 9&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now, we can do the same work in parallel using:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; mclapply(1:3, function(x) {x^2})&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Reading Data from File=&lt;br /&gt;
&lt;br /&gt;
R provides some very useful functions for reading and writing data from/to file.&lt;br /&gt;
&lt;br /&gt;
==Text Files==&lt;br /&gt;
&lt;br /&gt;
Let's start with text files.  If your data is organised into a file such that it looks like a table with column headings:  &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Perhaps the simplest one is '''read.table()'''.  If I have a text file with the following contents:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
country              gold silver bronze&lt;br /&gt;
&amp;quot;USA&amp;quot;                46   29     29&lt;br /&gt;
&amp;quot;China&amp;quot;              38   27     23&lt;br /&gt;
&amp;quot;Great Britain&amp;quot;      29   17     19&lt;br /&gt;
&amp;quot;Russian Federation&amp;quot; 24   26     32&lt;br /&gt;
&amp;quot;Republic of Korea&amp;quot;  13   8      7&lt;br /&gt;
&amp;quot;Germany&amp;quot;            11   19     14&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It will be a simple matter to use the '''read.table()''' function to load the data into R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; medals.2012 &amp;lt;- read.table(&amp;quot;medals.txt&amp;quot;, header=TRUE)&lt;br /&gt;
&amp;gt; medals.2012&lt;br /&gt;
             country gold silver bronze&lt;br /&gt;
1                USA   46     29     29&lt;br /&gt;
2              China   38     27     23&lt;br /&gt;
3      Great Britain   29     17     19&lt;br /&gt;
4 Russian Federation   24     26     32&lt;br /&gt;
5  Republic of Korea   13      8      7&lt;br /&gt;
6            Germany   11     19     14&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is a corresponding '''write.table()''' function to export the contents of a data frame into a text file.&lt;br /&gt;
&lt;br /&gt;
CSV files can be easily handled by specifying '''sep=&amp;quot;,&amp;quot;''' as an argument to read.table().  However, for convenience, there are also '''read.csv()''' and '''write.csv()''' functions defined.  For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; write.csv(medals.2012,&amp;quot;medals.csv&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Gives us the file, '''medals.csv''', with the contents:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;quot;&amp;quot;,&amp;quot;country&amp;quot;,&amp;quot;gold&amp;quot;,&amp;quot;silver&amp;quot;,&amp;quot;bronze&amp;quot;&lt;br /&gt;
&amp;quot;1&amp;quot;,&amp;quot;USA&amp;quot;,46,29,29&lt;br /&gt;
&amp;quot;2&amp;quot;,&amp;quot;China&amp;quot;,38,27,23&lt;br /&gt;
&amp;quot;3&amp;quot;,&amp;quot;Great Britain&amp;quot;,29,17,19&lt;br /&gt;
&amp;quot;4&amp;quot;,&amp;quot;Russian Federation&amp;quot;,24,26,32&lt;br /&gt;
&amp;quot;5&amp;quot;,&amp;quot;Republic of Korea&amp;quot;,13,8,7&lt;br /&gt;
&amp;quot;6&amp;quot;,&amp;quot;Germany&amp;quot;,11,19,14&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Binary Files==&lt;br /&gt;
&lt;br /&gt;
The '''save()''' function will store an R data structure in binary form:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; save(medals.2012,file=&amp;quot;medals.RData&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~$ file medals.RData &lt;br /&gt;
medals.RData: gzip compressed data, from Unix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is, of course, a corresponding function to load such data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; load(&amp;quot;medals.RData&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Databases==&lt;br /&gt;
&lt;br /&gt;
If you would like to read and write data directly from/to a database, there are several packages to help you.  See http://cran.r-project.org/doc/manuals/r-release/R-data.html#Relational-databases for more information.&lt;br /&gt;
&lt;br /&gt;
==NetCDF==&lt;br /&gt;
&lt;br /&gt;
The [http://cran.r-project.org/web/packages/ncdf/index.html '''ncdf''' package] provides an interface to NetCDF files.  Before installing the package, you will need the Unidata NetCDF libraries installed on your system.  On Linux, the standard package managers conveniently provide this.  Note that you will need the 'development' packages.  Once the prerequisites are satisfied, you can use the standard R command to install the package from CRAN:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; install.packages(&amp;quot;ncdf&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Examples of Common Tasks=&lt;br /&gt;
&lt;br /&gt;
==Preparing Data==&lt;br /&gt;
&lt;br /&gt;
===Sorting===&lt;br /&gt;
&lt;br /&gt;
For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; railway.engines &amp;lt;- c(&amp;quot;thomas&amp;quot;, &amp;quot;henry&amp;quot;, &amp;quot;gordon&amp;quot;, &amp;quot;edward&amp;quot;, &amp;quot;james&amp;quot;)&lt;br /&gt;
&amp;gt; sort(railway.engines)&lt;br /&gt;
[1] &amp;quot;edward&amp;quot; &amp;quot;gordon&amp;quot; &amp;quot;henry&amp;quot;  &amp;quot;james&amp;quot;  &amp;quot;thomas&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/sort.html&lt;br /&gt;
&lt;br /&gt;
===Random Sampling===&lt;br /&gt;
&lt;br /&gt;
For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; railway.engines &amp;lt;- c(&amp;quot;thomas&amp;quot;, &amp;quot;henry&amp;quot;, &amp;quot;gordon&amp;quot;, &amp;quot;edward&amp;quot;, &amp;quot;james&amp;quot;)&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;gordon&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;james&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;edward&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;thomas&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;gordon&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;james&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/base/html/sample.html&lt;br /&gt;
&lt;br /&gt;
==Linear Regression==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(cars)&lt;br /&gt;
&amp;gt; res=lm(dist ~ speed, data=cars)&lt;br /&gt;
&amp;gt; abline(res)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Image:R-lm(cars)-abline.png|400px|thumbnail|center|linear regression of stopping distance against speed from the built-in data set, cars]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
'''Exercises'''&lt;br /&gt;
* You may wish to compare different methods of estimation.  From the MASS package, you can fit a line with the '''rlm''' and '''lqs'' funtions.  You can plot all the lines against the data using:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; abline(res.lm, lty=1)&lt;br /&gt;
&amp;gt; abline(res.rlm, lty=2)&lt;br /&gt;
&amp;gt; abline(res.lqs, lty=3)&lt;br /&gt;
&amp;gt; legend(x=5, y=100, legend=c(&amp;quot;lm&amp;quot;,&amp;quot;rlm&amp;quot;,&amp;quot;lqs&amp;quot;), lty=c(1,2,3))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-patched/library/MASS/html/rlm.html and http://stat.ethz.ch/R-manual/R-devel/RHOME/library/MASS/html/lqs.html.&lt;br /&gt;
&lt;br /&gt;
* Weighted least squares.  The '''lm''' function will accept a vector of weights, '''lm(... weights=...)'''.  If given, the function will optimise the line of best fit according a the equation of weighted least squares.  Experiment with different linear model fits, given different weighting vectors.  Some handy hints for creating a vector of weights:&lt;br /&gt;
** '''w1&amp;lt;-rep(0.1,50)''' will give you a vector, length 50, where each element has a value of 0.1.  W1[1]&amp;lt;-10 will give the first element of the vector a value of 10.&lt;br /&gt;
** '''w2&amp;lt;-seq(from=0.02, to=1.0, by=0.02)''' provides a vector containing a sequence of values from 0.02 to 1.0 in steps of 0.02 (handily, again 50 in total).&lt;br /&gt;
&lt;br /&gt;
==Significance Testing==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; boys_2=c(90.2, 91.4, 86.4, 87.6, 86.7, 88.1, 82.2, 83.8, 91, 87.4)&lt;br /&gt;
&amp;gt; girls_2=c(83.8, 86.2, 85.1, 88.6, 83, 88.9, 89.7, 81.3, 88.7, 88.4)&lt;br /&gt;
&amp;gt; res=var.test(boys_2,girls_2)&lt;br /&gt;
&amp;gt; res&lt;br /&gt;
&lt;br /&gt;
	F test to compare two variances&lt;br /&gt;
&lt;br /&gt;
data:  boys_2 and girls_2 &lt;br /&gt;
F = 1.0186, num df = 9, denom df = 9, p-value = 0.9786&lt;br /&gt;
alternative hypothesis: true ratio of variances is not equal to 1 &lt;br /&gt;
95 percent confidence interval:&lt;br /&gt;
 0.2529956 4.1007126 &lt;br /&gt;
sample estimates:&lt;br /&gt;
ratio of variances &lt;br /&gt;
          1.018559 &lt;br /&gt;
&amp;gt; res=t.test(boys_2, girls_2, var.equal=TRUE, paired=FALSE)&lt;br /&gt;
&amp;gt; res&lt;br /&gt;
&lt;br /&gt;
	Two Sample t-test&lt;br /&gt;
&lt;br /&gt;
data:  boys_2 and girls_2 &lt;br /&gt;
t = 0.8429, df = 18, p-value = 0.4103&lt;br /&gt;
alternative hypothesis: true difference in means is not equal to 0 &lt;br /&gt;
95 percent confidence interval:&lt;br /&gt;
 -1.656675  3.876675 &lt;br /&gt;
sample estimates:&lt;br /&gt;
mean of x mean of y &lt;br /&gt;
    87.48     86.3&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Classification==&lt;br /&gt;
&lt;br /&gt;
===k Nearest Neighbours===&lt;br /&gt;
&lt;br /&gt;
This famous (Fisher's or Anderson's) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa (s), versicolor (c), and virginica (v).&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-patched/library/datasets/html/iris.html&lt;br /&gt;
&lt;br /&gt;
k-nearest neighbour classification for test set from training set: For each row of the test set, the k nearest (in Euclidean distance) training set vectors are found, and the classification is decided by majority vote, with ties broken at random. If there are ties for the kth nearest vector, all candidates are included in the vote.&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/class/html/knn.html&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(class)&lt;br /&gt;
train &amp;lt;- rbind(iris3[1:25,,1], iris3[1:25,,2], iris3[1:25,,3])&lt;br /&gt;
test &amp;lt;- rbind(iris3[26:50,,1], iris3[26:50,,2], iris3[26:50,,3])&lt;br /&gt;
cl &amp;lt;- factor(c(rep(&amp;quot;s&amp;quot;,25), rep(&amp;quot;c&amp;quot;,25), rep(&amp;quot;v&amp;quot;,25)))&lt;br /&gt;
iris3.knn &amp;lt;- knn(train, test, cl, k = 3, prob=TRUE)&lt;br /&gt;
table(predicted=iris3.knn, actual=cl)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
How did we do?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
         actual&lt;br /&gt;
predicted  c  s  v&lt;br /&gt;
        c 23  0  3&lt;br /&gt;
        s  0 25  0&lt;br /&gt;
        v  2  0 22&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Classification Trees===&lt;br /&gt;
&lt;br /&gt;
The kyphosis data frame has 81 rows and 4 columns. representing data on children who have had corrective spinal surgery.&lt;br /&gt;
&lt;br /&gt;
This data frame contains the following columns:&lt;br /&gt;
* Kyphosis: a factor with levels absent present indicating if a kyphosis (a type of deformation) was present after the operation.&lt;br /&gt;
* Age: in months&lt;br /&gt;
* Number: the number of vertebrae involved&lt;br /&gt;
* Start: the number of the first (topmost) vertebra operated on.&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/rpart/html/kyphosis.html&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
fit &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)&lt;br /&gt;
fit2 &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis,&lt;br /&gt;
              parms = list(prior = c(.65,.35), split = &amp;quot;information&amp;quot;))&lt;br /&gt;
fit3 &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis,&lt;br /&gt;
              control = rpart.control(cp = 0.05))&lt;br /&gt;
par(mfrow = c(1,2), xpd = NA) # otherwise on some devices the text is clipped&lt;br /&gt;
plot(fit)&lt;br /&gt;
text(fit, use.n = TRUE)&lt;br /&gt;
plot(fit2)&lt;br /&gt;
text(fit2, use.n = TRUE)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Image:R-classification-tree.png|500px|thumbnail|center|Classification tree for the kyphosis data frame.]]&lt;br /&gt;
&lt;br /&gt;
==Solving Systems of Linear Equations==&lt;br /&gt;
&lt;br /&gt;
See, e.g.: https://source.ggy.bris.ac.uk/wiki/NumMethodsPDEs&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; A &amp;lt;- array(c(1,3,2,3,5,4,-2,6,3), dim=c(3,3))&lt;br /&gt;
&amp;gt; b &amp;lt;- c(5,7,8)&lt;br /&gt;
&amp;gt; solve(A,b)&lt;br /&gt;
[1] -15   8   2&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Suggested Exercises=&lt;br /&gt;
&lt;br /&gt;
If you would like to work through some exercises, with model answers included, you could take a look at:&lt;br /&gt;
* http://www2.warwick.ac.uk/fac/sci/statistics/staff/academic-research/reed/rexercises.pdf&lt;br /&gt;
&lt;br /&gt;
=Writing Faster R Code=&lt;br /&gt;
&lt;br /&gt;
In the above sections we've introduced a number of features of R and have begun the journey to becoming a proficient and productive user of the language.  In the remaining sections, we'll switch tack and focus on a question commonly asked by those beginning to use R in anger--'''&amp;quot;My R code is slow.  How can I speed it up?&amp;quot;'''.  In this section we'll consider the related tasks of finding which bits of your R code is responsible for the majority of the run-time and what you can do about it.&lt;br /&gt;
&lt;br /&gt;
==Profiling &amp;amp; Timing==&lt;br /&gt;
&lt;br /&gt;
In order to remain productive (and sane, and have a social life...), it is essential that we first identify which portions of your R code are responsible for the majority of the run-time.  We could spend ages optimising a portion that we ''think'' may be running slowly, but computers have the gift(!) to constantly surprise us, and if that portion of your program accounted for, say, 10% of the run-time, then you will have sweated for absolutely no useful gain.&lt;br /&gt;
&lt;br /&gt;
The simplest method of investigation is to simply time the application of a function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
system.time(some.function())&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can get a more detailed analysis of a block of code using the built-in R profiler.  The general pattern of invocation is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
Rprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
# Do some work&lt;br /&gt;
Rprof()&lt;br /&gt;
summaryRprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For example, here's an R script, '''profile.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
Rprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
# Create a 10 x 100,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(100000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)})&lt;br /&gt;
Rprof()&lt;br /&gt;
summaryRprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Which I ran by typing:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
R CMD BATCH profile.r&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the output file, '''profile.r.Rout''', I found the following break down:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
               self.time self.pct total.time total.pct&lt;br /&gt;
&amp;quot;simpleLoess&amp;quot;       4.84    88.00       5.10     92.73&lt;br /&gt;
&amp;quot;rnorm&amp;quot;             0.22     4.00       0.22      4.00&lt;br /&gt;
&amp;quot;loess.smooth&amp;quot;      0.18     3.27       5.28     96.00&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The profile tells us that the function '''simpleLoess''' take 88% of the runtime, whereas '''rnorm''' takes only 4%.&lt;br /&gt;
&lt;br /&gt;
==Preallocation of Memory==&lt;br /&gt;
&lt;br /&gt;
As with other scripting languages, such as MATLAB, the simplest method that you can use to speed up your R code is to pre-allocate the storage for variables whenever possible.  To see the benefits of this, consider the following two functions:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; f1 &amp;lt;- function() {&lt;br /&gt;
+ v &amp;lt;- c()&lt;br /&gt;
+ for (i in 1:30000)&lt;br /&gt;
+   v[i] &amp;lt;- i^2&lt;br /&gt;
+ }&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; f2 &amp;lt;- function() {&lt;br /&gt;
+ v &amp;lt;- c(NA)&lt;br /&gt;
+ length(v) &amp;lt;- 30000&lt;br /&gt;
+ for (i in 1:30000)&lt;br /&gt;
+   v[i] &amp;lt;- i^2&lt;br /&gt;
+ }&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Timing calls to each of them shows that the pre-allocation of memory gives a whopping ~'''x30 speed-up'''.  Your mileage will vary depending upon the details of your code.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; system.time(f1())&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  1.720   0.040   1.762&lt;br /&gt;
&amp;gt; system.time(f2())&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.052   0.000   0.05&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Vectorised Operations==&lt;br /&gt;
&lt;br /&gt;
The other principle method for speeding up your R code is to eliminate loops whenever you can.  Many functions and operators in R will accept arrays as input, rather than just single values and this may allow you to not use a loop.  The examples in the previous section used for loops to step through an array, squaring each element.  However, you can achieve the same result far more quickly by passing the array ''en masse'' to exponentiation operator:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; system.time(v &amp;lt;- (1:1000000)^2)&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.024   0.004   0.026&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Here we've been able to square 1,000,000 items in half the time it took to process 30,000!&lt;br /&gt;
&lt;br /&gt;
==Calling Functions Written in a Compiled Language (e.g. C or Fortran)==&lt;br /&gt;
&lt;br /&gt;
Another way to get more speed is to outsource portions of R code that are found to be slow to a compiled language, such as C or Fortran.  A good starting point on this topic is:&lt;br /&gt;
&lt;br /&gt;
* http://mazamascience.com/WorkingWithData/?p=1067&lt;br /&gt;
&lt;br /&gt;
=R and HPC=&lt;br /&gt;
&lt;br /&gt;
If you've profiled your code and tried all that you can to speed it up, as described in the previous section, you might be interested in the various initiatives that exist to run R on high performance computers, such as bluecrsytal:  &lt;br /&gt;
&lt;br /&gt;
* http://cran.r-project.org/web/views/HighPerformanceComputing.html&lt;br /&gt;
&lt;br /&gt;
We will see in the following examples, the general approach to running R in parallel is to arrange your task so that a function is applied to a list of inputs, and then to split the list over several CPU cores or cluster worker nodes.&lt;br /&gt;
&lt;br /&gt;
==Multicore==&lt;br /&gt;
&lt;br /&gt;
The '''multicore''' package allows us to make use of several CPU cores within a single machine.  Note, however, that the package  does not work on a MS Windows computers.&lt;br /&gt;
&lt;br /&gt;
As an example, let's look at the use of the package's '''mclapply''' function, a multicore equivalent of R's built-in list apply mapper, '''lapply'''.   I saved the following commands into an R script called '''mutlicore.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(multicore)&lt;br /&gt;
# how many cores are present?&lt;br /&gt;
multicore:::detectCores()&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And used the following submission script to run it on bluecrystal phase2:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=00:00:05&lt;br /&gt;
&lt;br /&gt;
#! Ensure that we have the correct version of R loaded&lt;br /&gt;
module add languages/R-2.15.1&lt;br /&gt;
&lt;br /&gt;
#! change the working directory (default is home directory)&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
#! Run the R script&lt;br /&gt;
R CMD BATCH multicore.r&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After the job had run, I got the following output in the file '''multicore.r.Rout''':&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(multicore)&lt;br /&gt;
&amp;gt; # how many cores are present?&lt;br /&gt;
&amp;gt; multicore:::detectCores()&lt;br /&gt;
[1] 8&lt;br /&gt;
&amp;gt; # Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
&amp;gt; data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
&amp;gt; # Map a function over the matrix.  First in serial..&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.674   0.007   0.749 &lt;br /&gt;
&amp;gt; # .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.301   0.074   0.113 &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Rmpi==&lt;br /&gt;
&lt;br /&gt;
The '''Rmpi''' package allows us to create and use cohorts of message passing processes from within R.  It does so by providing an interface to the MPI (Message Passing Interface) library.&lt;br /&gt;
&lt;br /&gt;
In order to use the Rmpi package on BCp2, you will need the '''ofed/openmpi/gcc/64/1.4.2-qlc''' module loaded.&lt;br /&gt;
&lt;br /&gt;
Here's a short example that I saved as '''Rmpi.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(Rmpi)&lt;br /&gt;
# spawn as many slaves as possible&lt;br /&gt;
mpi.spawn.Rslaves()&lt;br /&gt;
mpi.remote.exec(mpi.get.processor.name())&lt;br /&gt;
mpi.remote.exec(runif(1))&lt;br /&gt;
mpi.close.Rslaves()&lt;br /&gt;
mpi.quit()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I submitted the job to BCp2 using the following submission script:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#PBS -l nodes=4:ppn=1,walltime=00:00:05&lt;br /&gt;
&lt;br /&gt;
#! Ensure that we have the correct version of R loaded&lt;br /&gt;
module add languages/R-2.15.1&lt;br /&gt;
&lt;br /&gt;
#! change the working directory (default is home directory)&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
#! Create a machine file (used for multi-node jobs)&lt;br /&gt;
cat $PBS_NODEFILE &amp;gt; machine.file.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
#! Run the R script&lt;br /&gt;
mpirun -np 1 -machinefile machine.file.$PBS_JOBID R CMD BATCH Rmpi.r&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and got the following output:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(Rmpi)&lt;br /&gt;
&amp;gt; # spawn as many slaves as possible&lt;br /&gt;
&amp;gt; mpi.spawn.Rslaves()&lt;br /&gt;
        4 slaves are spawned successfully. 0 failed.&lt;br /&gt;
master (rank 0, comm 1) of size 5 is running on: u03n074 &lt;br /&gt;
slave1 (rank 1, comm 1) of size 5 is running on: u03n098 &lt;br /&gt;
slave2 (rank 2, comm 1) of size 5 is running on: u04n029 &lt;br /&gt;
slave3 (rank 3, comm 1) of size 5 is running on: u04n030 &lt;br /&gt;
slave4 (rank 4, comm 1) of size 5 is running on: u03n074 &lt;br /&gt;
&amp;gt; mpi.remote.exec(mpi.get.processor.name())&lt;br /&gt;
$slave1&lt;br /&gt;
[1] &amp;quot;u03n098&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave2&lt;br /&gt;
[1] &amp;quot;u04n029&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave3&lt;br /&gt;
[1] &amp;quot;u04n030&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave4&lt;br /&gt;
[1] &amp;quot;u03n074&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;gt; mpi.remote.exec(runif(1))&lt;br /&gt;
         X1        X2        X3        X4&lt;br /&gt;
1 0.5154871 0.5154871 0.5154871 0.5154871&lt;br /&gt;
&amp;gt; mpi.close.Rslaves()&lt;br /&gt;
[1] 1&lt;br /&gt;
&amp;gt; mpi.quit()&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Snow==&lt;br /&gt;
&lt;br /&gt;
Calling MPI routines from within R may be too low level for many people to use comfortably.  Happily, the '''snow''' package provides a higher level abstraction for distributed memory programming from within R.&lt;br /&gt;
&lt;br /&gt;
Here's my example program that a saved as '''snow.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(snow)&lt;br /&gt;
# request a cluster of 3 worker nodes&lt;br /&gt;
cl &amp;lt;- makeCluster(3)&lt;br /&gt;
clusterCall(cl, function() Sys.info()[c(&amp;quot;nodename&amp;quot;,&amp;quot;machine&amp;quot;)])&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using snow, across a cluster of workers)&lt;br /&gt;
system.time(x &amp;lt;- clusterApply(cl, data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
stopCluster(cl)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I ran it on BCp2 using the same submission script given for Rmpi, save for changing Rmpi.r to snow.r.  The output was:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(snow)&lt;br /&gt;
&amp;gt; # request a cluster of 3 worker nodes&lt;br /&gt;
&amp;gt; cl &amp;lt;- makeCluster(3)&lt;br /&gt;
Loading required package: Rmpi&lt;br /&gt;
        3 slaves are spawned successfully. 0 failed.&lt;br /&gt;
&amp;gt; clusterCall(cl, function() Sys.info()[c(&amp;quot;nodename&amp;quot;,&amp;quot;machine&amp;quot;)])&lt;br /&gt;
[[1]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u01n105&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
[[2]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u02n014&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
[[3]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u03n098&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&amp;gt; # Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
&amp;gt; data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
&amp;gt; # Map a function over the matrix.  First in serial..&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.711   0.001   0.715 &lt;br /&gt;
&amp;gt; # .. and secondly in parallel (using snow, across a cluster of workers)&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- clusterApply(cl, data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.259   0.001   0.260 &lt;br /&gt;
&amp;gt; stopCluster(cl)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Parallel==&lt;br /&gt;
&lt;br /&gt;
The '''parallel''' package is an amalgamation of functionality from the multicore and snow packages.  The shared memory parallelism in this package runs on an MS Windows machine (unlike the multicore package). &lt;br /&gt;
&lt;br /&gt;
I trivial translation of our previous multicore example is:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(parallel)&lt;br /&gt;
# how many cores are present?&lt;br /&gt;
parallel:::detectCores()&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I have not been able to get a distributed memory cluster working on BCp2 using the parallel package.&lt;br /&gt;
&lt;br /&gt;
=Further Reading=&lt;br /&gt;
&lt;br /&gt;
* [http://shop.oreilly.com/product/9780596801717.do R in a Nutshell]&lt;br /&gt;
* [http://shop.oreilly.com/product/0636920021421.do Parallel R]&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
	<entry>
		<id>https://source.geography.bristol.ac.uk/mediawiki/index.php?title=R1&amp;diff=9373</id>
		<title>R1</title>
		<link rel="alternate" type="text/html" href="https://source.geography.bristol.ac.uk/mediawiki/index.php?title=R1&amp;diff=9373"/>
		<updated>2013-11-28T16:18:11Z</updated>

		<summary type="html">&lt;p&gt;GethinWilliams: /* Examples of Common Tasks */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[category:Pragmatic Programming]]&lt;br /&gt;
'''Open Source Statistics with R'''&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
R is a mature, open-source (i.e. free!) statistics package, with an intuitive interface, excellent graphics and a vibrant community constantly adding new methods for the statistical investigation of your data to the library of packages available.&lt;br /&gt;
&lt;br /&gt;
The goal of this tutorial is to introduce you to the R package, and not to be an introductory course in statistics.&lt;br /&gt;
&lt;br /&gt;
If you are working on a Linux system, you will typically start R from the command line.  On a Windows machine, or a Mac, you will typically start up R in some form of GUI.  However you get R started, you will have access to an R command prompt.  The good news is that the examples below will all work at the R command prompt, however you gained access to it.&lt;br /&gt;
&lt;br /&gt;
Further resources:&lt;br /&gt;
&lt;br /&gt;
* The R manual is a great resource for learning R: http://cran.r-project.org/doc/manuals/r-release/R-intro.pdf&lt;br /&gt;
* Some excellent examples of using R can also be found at: http://msenux.redwoods.edu/math/R/ and http://www.r-tutor.com/&lt;br /&gt;
&lt;br /&gt;
=Getting Started=&lt;br /&gt;
&lt;br /&gt;
The very simplest thing we can do with R is to perform some arithmetic at the command prompt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; phi &amp;lt;- (1+sqrt(5))/2&lt;br /&gt;
&amp;gt; phi&lt;br /&gt;
[1] 1.618034&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Parentheses are used to modify the usual order of precedence of the operators ('''/''' will typically be evaluated before '''+''').  Note the '''[1]''' accompanying the returned value.  All numbers entered at the console are interpreted as a vector.  The '[1]' indicates that the line in question is displaying the vector of values starting at first index.  We can use the handy sequence function to create a vector containing more than a single element:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; odds &amp;lt;- seq(from=1, to=67, by=2)&lt;br /&gt;
&amp;gt; odds&lt;br /&gt;
 [1]  1  3  5  7  9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49&lt;br /&gt;
[26] 51 53 55 57 59 61 63 65 67&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
From the above example, we can see that both the '''&amp;lt;-''' and '''=''' operators can be used for assignment.&lt;br /&gt;
&lt;br /&gt;
Vectors are commonly used data structures in R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
coords.bris &amp;lt;- c(51.5, 2.6)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As are matrices:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; magic &amp;lt;- matrix(data=c(2,7,6,9,5,1,4,3,8),nrow=3,ncol=3)&lt;br /&gt;
&amp;gt; magic&lt;br /&gt;
     [,1] [,2] [,3]&lt;br /&gt;
[1,]    2    9    4&lt;br /&gt;
[2,]    7    5    3&lt;br /&gt;
[3,]    6    1    8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the '''c''' function combines the arguments given in the parentheses.  We can access portions of the array using the syntax shown in the square brackets.  For example, we can access the first row using the '''[1,]''' notation, and similarly the second column using '''[,2]'''.  Since the square is 3x3 magic, the numbers in both slices should sum to 15:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; sum(magic[1,])&lt;br /&gt;
[1] 15&lt;br /&gt;
&amp;gt; sum(magic[,2])&lt;br /&gt;
[1] 15&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Single elements and ranges can also accessed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; magic[2,2]&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;gt; magic[2:3,2:3]&lt;br /&gt;
     [,1] [,2]&lt;br /&gt;
[1,]    5    3&lt;br /&gt;
[2,]    1    8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
R also provides '''arrays''', which have more than two dimensions, and '''lists''' to hold heterogeneous collections.&lt;br /&gt;
&lt;br /&gt;
An example list:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; list.r4 &amp;lt;- list(name=&amp;quot;Radio4&amp;quot;, frequency=&amp;quot;93.7&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The items of which, we can access in several ways:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; list.r4$frequency&lt;br /&gt;
[1] &amp;quot;93.7&amp;quot;&lt;br /&gt;
&amp;gt; list.r4[1]&lt;br /&gt;
$name&lt;br /&gt;
[1] &amp;quot;Radio4&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;gt; list.r4[[1]]&lt;br /&gt;
[1] &amp;quot;Radio4&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A very commonly used data structure is the '''data frame''', which R uses to store tabular data.  Given several vectors of equal length, we can collate them into a data frame:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; country &amp;lt;- c(&amp;quot;USA&amp;quot;, &amp;quot;China&amp;quot;, &amp;quot;GB&amp;quot;)&lt;br /&gt;
&amp;gt; gold &amp;lt;- c(46, 38, 29)&lt;br /&gt;
&amp;gt; silver &amp;lt;- c(29, 27, 17)&lt;br /&gt;
&amp;gt; bronze &amp;lt;- c(29, 23, 19)&lt;br /&gt;
&amp;gt; medals.2012 &amp;lt;- data.frame(country, gold, silver, bronze)&lt;br /&gt;
&amp;gt; medals.2012&lt;br /&gt;
    country  gold  silver  bronze&lt;br /&gt;
1       USA    46      29      29&lt;br /&gt;
2     China    38      27      23&lt;br /&gt;
3        GB    29      17      19&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can access columns of a data frame using the '''$''' operator:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; medals.2012$country&lt;br /&gt;
[1] USA   China GB   &lt;br /&gt;
Levels: China GB USA&lt;br /&gt;
&amp;gt; medals.2012$gold&lt;br /&gt;
[1] 46 38 29&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Standard Graphics: A taster=&lt;br /&gt;
&lt;br /&gt;
An aspect which makes R popular are it's graphing functions.  R also has some very handy built-in data sets--we'll use this to demonstrate just a small fraction of R's graphing abilities.&lt;br /&gt;
&lt;br /&gt;
First up is the humble '''plot()''' function.  Given a data frame of points, such as one charting the relationship between temperature and the vapour pressure of mercury, it will give us a (handily labelled) scatter plot: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(pressure)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See the gallery below for all the plots created in this section.&lt;br /&gt;
&lt;br /&gt;
The plot function will also accept a time-series (another class of object recognised by R) and will sensibly join the points with a line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(co2)&lt;br /&gt;
&amp;gt; class(co2)&lt;br /&gt;
[1] &amp;quot;ts&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Pie charts are easily constructed.  In this case, to show the relative proportions of electricity generated from different sources in the UK in 2011 (source: https://www.gov.uk/government/.../5942-uk-energy-in-brief-2012.pdf‎):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; uk.electricty.sources.2011 &amp;lt;- c(41,29,18,5,4,2,1)&lt;br /&gt;
&amp;gt; names(uk.electricty.sources.2011) &amp;lt;- (&amp;quot;Gas&amp;quot;, &amp;quot;Coal&amp;quot;, &amp;quot;Nuclear&amp;quot;, &amp;quot;Hydro &amp;amp; other&amp;quot;, &amp;quot;Wind&amp;quot;, &amp;quot;Imports&amp;quot;, &amp;quot;Oil&amp;quot;)&lt;br /&gt;
&amp;gt; pie(uk.electricty.sources.2011, main=&amp;quot;UK Electricty Generating Mix, 2011&amp;quot;, col=rainbow(7))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Next, let's create a bar chart of monthly average precipitation falling here in the fair city of Bristol (source: http://www.worldweatheronline.com):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; bristol.precip &amp;lt;- c(82.9, 56.1, 59.2, 69, 50.8, 50.9, 50.8, 74.8, 74.7, 91.1, 94.5, 93.6)&lt;br /&gt;
&amp;gt; names(bristol.precip) &amp;lt;- c(&amp;quot;Jan&amp;quot;, &amp;quot;Feb&amp;quot;, &amp;quot;Mar&amp;quot;, &amp;quot;Apr&amp;quot;, &amp;quot;May&amp;quot;, &amp;quot;Jun&amp;quot;, &amp;quot;Jul&amp;quot;, &amp;quot;Aug&amp;quot;, &amp;quot;Sep&amp;quot;, &amp;quot;Oct&amp;quot;, &amp;quot;Nov&amp;quot;, &amp;quot;Dec&amp;quot;)&lt;br /&gt;
&amp;gt; barplot(bristol.precip,&lt;br /&gt;
+ main=&amp;quot;Average Monthly Precipitation in Bristol&amp;quot;,&lt;br /&gt;
+ ylab=&amp;quot;Mean precipitation (mm)&amp;quot;,&lt;br /&gt;
+ ylim=c(0,100),&lt;br /&gt;
+ col=c(&amp;quot;darkblue&amp;quot;))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Box_plot 'Box and whisker' plots] are useful ways to graph the quartiles of some data.  In this case, the fuel efficiencies of various US cars, circa 1974:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; boxplot(mpg~cyl,data=mtcars, main=&amp;quot;Car Milage Data&amp;quot;,&lt;br /&gt;
+    xlab=&amp;quot;Number of Cylinders&amp;quot;, ylab=&amp;quot;Miles Per Gallon&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
R includes a very useful help facility.  In the case of the '''filled.contour()''' plotting function, the help page includes an example of it's use to plot the topology of a volcano in Auckland, NZ:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; ?filled.countour&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;gallery widths=300px heights=300px perrow=3&amp;gt;&lt;br /&gt;
File:Vapour-pressure.png|Vapour pressure of mercury against temperature&lt;br /&gt;
File:Mauna-loa.png|CO2 concentrations measured at Mauna-Loa between 1959 and 1997&lt;br /&gt;
File:Pie.png|The UK's electricity generating mix, 2011&lt;br /&gt;
File:Barplot.png|Average monthly precipitation in Bristol&lt;br /&gt;
File:Boxplot.png|Range of fuel efficiencies for different engine sizes &lt;br /&gt;
File:Maunga-Whau.png|Topology of Maunga Whau volcano in Auckland&lt;br /&gt;
&amp;lt;/gallery&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are many more example plots--complete with the R code required to create the plots (at the bottom of the page, after the comments)--on the following web page:&lt;br /&gt;
* http://gallery.r-enthusiasts.com/thumbs.php&lt;br /&gt;
&lt;br /&gt;
=Loops=&lt;br /&gt;
&lt;br /&gt;
A simple '''for''' loop:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; for (ii in seq(1,10)) print(ii)&lt;br /&gt;
[1] 1&lt;br /&gt;
[1] 2&lt;br /&gt;
[1] 3&lt;br /&gt;
[1] 4&lt;br /&gt;
[1] 5&lt;br /&gt;
[1] 6&lt;br /&gt;
[1] 7&lt;br /&gt;
[1] 8&lt;br /&gt;
[1] 9&lt;br /&gt;
[1] 10&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Some more exotic counting:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; for (ii in seq(from=10, to=0, by=-2)) print(ii)&lt;br /&gt;
[1] 10&lt;br /&gt;
[1] 8&lt;br /&gt;
[1] 6&lt;br /&gt;
[1] 4&lt;br /&gt;
[1] 2&lt;br /&gt;
[1] 0&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''while''' loops are for when we don't know the number of iterations in advance:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; ii &amp;lt;- runif(1,0,1)&lt;br /&gt;
&amp;gt; ii&lt;br /&gt;
[1] 0.3998513&lt;br /&gt;
&amp;gt; while (ii &amp;lt; 0.5) {print(ii); ii &amp;lt;- runif(1,0,1)}&lt;br /&gt;
[1] 0.3998513&lt;br /&gt;
[1] 0.05469244&lt;br /&gt;
&amp;gt; ii&lt;br /&gt;
[1] 0.8265036&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Functions=&lt;br /&gt;
&lt;br /&gt;
You can define your own functions in R, using the '''function''' keyword.  For example, Pythagoras' Theorem:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypotenuse &amp;lt;- function(x, y) {sqrt(x^2 + y^2)}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The braces ({}) are optional, but add clarity.&lt;br /&gt;
&lt;br /&gt;
To call the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypotenuse(3,4)&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can provide default values for the arguments, which can be overridden for any given invocation of the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot2 &amp;lt;- function(x=3 ,y=4) {sqrt(x^2 + y^2)}&lt;br /&gt;
&amp;gt; hypot2()&lt;br /&gt;
[1] 5&lt;br /&gt;
&amp;gt; hypot2(12,16)&lt;br /&gt;
[1] 20&lt;br /&gt;
&amp;gt; hypot2(y=16, x=12)&lt;br /&gt;
[1] 20&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can see that the order of the arguments is respected, unless the names are given, in which case the order can be changed. &lt;br /&gt;
&lt;br /&gt;
Longer functions can be spread over several lines.  We can also use the '''return''' keyword to control which value is returned by the function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot3 &amp;lt;- function(x=3 ,y=4) {&lt;br /&gt;
+ x_sq &amp;lt;- x^2&lt;br /&gt;
+ y_sq &amp;lt;- y^2&lt;br /&gt;
+ return( sqrt(x_sq + y_sq) )}&lt;br /&gt;
&amp;gt; hypot3(6,8)&lt;br /&gt;
[1] 10&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can check on the contents of a function, by just typing it's name (without parentheses):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; hypot3&lt;br /&gt;
function(x=3 ,y=4) {&lt;br /&gt;
x_sq &amp;lt;- x^2&lt;br /&gt;
y_sq &amp;lt;- y^2&lt;br /&gt;
return( sqrt(x_sq + y_sq) )}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or just check the arguments, using the '''args''' function.  (The body of the function in general is reported as NULL):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; args(hypot3)&lt;br /&gt;
function (x = 3, y = 4) &lt;br /&gt;
NULL&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Packages=&lt;br /&gt;
&lt;br /&gt;
Listed at http://cran.r-project.org/&lt;br /&gt;
&lt;br /&gt;
Let's install the '''multicore''' package, that will give us access to functions within R which will run on the multiple processors which we often find in our computers these days:   &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; install.packages(&amp;quot;multicore&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Et voila!  It is done.&lt;br /&gt;
&lt;br /&gt;
We can check which packages are currently loaded into the library available from our workspace:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; library()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If we need to add one, we type e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; library(multicore)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now, an example of using a function from the multicore package.  The '''lapply''' function, which is included in the standard R core, will map a given function over a list inputs, giving a list of the function outputs in return.  For example, we can map a squaring function over the list of integers from 1 to 3:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; lapply(1:3, function(x) {x^2})&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
which gives us the list:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[[1]]&lt;br /&gt;
[1] 1&lt;br /&gt;
&lt;br /&gt;
[[2]]&lt;br /&gt;
[1] 4&lt;br /&gt;
&lt;br /&gt;
[[3]]&lt;br /&gt;
[1] 9&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now, we can do the same work in parallel using:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; mclapply(1:3, function(x) {x^2})&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Reading Data from File=&lt;br /&gt;
&lt;br /&gt;
R provides some very useful functions for reading and writing data from/to file.&lt;br /&gt;
&lt;br /&gt;
==Text Files==&lt;br /&gt;
&lt;br /&gt;
Let's start with text files.  If your data is organised into a file such that it looks like a table with column headings:  &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Perhaps the simplest one is '''read.table()'''.  If I have a text file with the following contents:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
country              gold silver bronze&lt;br /&gt;
&amp;quot;USA&amp;quot;                46   29     29&lt;br /&gt;
&amp;quot;China&amp;quot;              38   27     23&lt;br /&gt;
&amp;quot;Great Britain&amp;quot;      29   17     19&lt;br /&gt;
&amp;quot;Russian Federation&amp;quot; 24   26     32&lt;br /&gt;
&amp;quot;Republic of Korea&amp;quot;  13   8      7&lt;br /&gt;
&amp;quot;Germany&amp;quot;            11   19     14&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It will be a simple matter to use the '''read.table()''' function to load the data into R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; medals.2012 &amp;lt;- read.table(&amp;quot;medals.txt&amp;quot;, header=TRUE)&lt;br /&gt;
&amp;gt; medals.2012&lt;br /&gt;
             country gold silver bronze&lt;br /&gt;
1                USA   46     29     29&lt;br /&gt;
2              China   38     27     23&lt;br /&gt;
3      Great Britain   29     17     19&lt;br /&gt;
4 Russian Federation   24     26     32&lt;br /&gt;
5  Republic of Korea   13      8      7&lt;br /&gt;
6            Germany   11     19     14&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is a corresponding '''write.table()''' function to export the contents of a data frame into a text file.&lt;br /&gt;
&lt;br /&gt;
CSV files can be easily handled by specifying '''sep=&amp;quot;,&amp;quot;''' as an argument to read.table().  However, for convenience, there are also '''read.csv()''' and '''write.csv()''' functions defined.  For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; write.csv(medals.2012,&amp;quot;medals.csv&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Gives us the file, '''medals.csv''', with the contents:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;quot;&amp;quot;,&amp;quot;country&amp;quot;,&amp;quot;gold&amp;quot;,&amp;quot;silver&amp;quot;,&amp;quot;bronze&amp;quot;&lt;br /&gt;
&amp;quot;1&amp;quot;,&amp;quot;USA&amp;quot;,46,29,29&lt;br /&gt;
&amp;quot;2&amp;quot;,&amp;quot;China&amp;quot;,38,27,23&lt;br /&gt;
&amp;quot;3&amp;quot;,&amp;quot;Great Britain&amp;quot;,29,17,19&lt;br /&gt;
&amp;quot;4&amp;quot;,&amp;quot;Russian Federation&amp;quot;,24,26,32&lt;br /&gt;
&amp;quot;5&amp;quot;,&amp;quot;Republic of Korea&amp;quot;,13,8,7&lt;br /&gt;
&amp;quot;6&amp;quot;,&amp;quot;Germany&amp;quot;,11,19,14&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Binary Files==&lt;br /&gt;
&lt;br /&gt;
The '''save()''' function will store an R data structure in binary form:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; save(medals.2012,file=&amp;quot;medals.RData&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gethin@gethin-desktop:~$ file medals.RData &lt;br /&gt;
medals.RData: gzip compressed data, from Unix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is, of course, a corresponding function to load such data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; load(&amp;quot;medals.RData&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Databases==&lt;br /&gt;
&lt;br /&gt;
If you would like to read and write data directly from/to a database, there are several packages to help you.  See http://cran.r-project.org/doc/manuals/r-release/R-data.html#Relational-databases for more information.&lt;br /&gt;
&lt;br /&gt;
==NetCDF==&lt;br /&gt;
&lt;br /&gt;
The [http://cran.r-project.org/web/packages/ncdf/index.html '''ncdf''' package] provides an interface to NetCDF files.  Before installing the package, you will need the Unidata NetCDF libraries installed on your system.  On Linux, the standard package managers conveniently provide this.  Note that you will need the 'development' packages.  Once the prerequisites are satisfied, you can use the standard R command to install the package from CRAN:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; install.packages(&amp;quot;ncdf&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Examples of Common Tasks=&lt;br /&gt;
&lt;br /&gt;
==Random Sampling==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; railway.engines &amp;lt;- c(&amp;quot;thomas&amp;quot;, &amp;quot;henry&amp;quot;, &amp;quot;gordon&amp;quot;, &amp;quot;edward&amp;quot;, &amp;quot;james&amp;quot;)&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;gordon&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;james&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;edward&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;thomas&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;gordon&amp;quot;&lt;br /&gt;
&amp;gt; sample(railway.engines, 1, replace = TRUE, prob = NULL)&lt;br /&gt;
[1] &amp;quot;james&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
==Linear Regression==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; plot(cars)&lt;br /&gt;
&amp;gt; res=lm(dist ~ speed, data=cars)&lt;br /&gt;
&amp;gt; abline(res)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Image:R-lm(cars)-abline.png|400px|thumbnail|center|linear regression of stopping distance against speed from the built-in data set, cars]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
'''Exercises'''&lt;br /&gt;
* You may wish to compare different methods of estimation.  From the MASS package, you can fit a line with the '''rlm''' and '''lqs'' funtions.  You can plot all the lines against the data using:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; abline(res.lm, lty=1)&lt;br /&gt;
&amp;gt; abline(res.rlm, lty=2)&lt;br /&gt;
&amp;gt; abline(res.lqs, lty=3)&lt;br /&gt;
&amp;gt; legend(x=5, y=100, legend=c(&amp;quot;lm&amp;quot;,&amp;quot;rlm&amp;quot;,&amp;quot;lqs&amp;quot;), lty=c(1,2,3))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-patched/library/MASS/html/rlm.html and http://stat.ethz.ch/R-manual/R-devel/RHOME/library/MASS/html/lqs.html.&lt;br /&gt;
&lt;br /&gt;
* Weighted least squares.  The '''lm''' function will accept a vector of weights, '''lm(... weights=...)'''.  If given, the function will optimise the line of best fit according a the equation of weighted least squares.  Experiment with different linear model fits, given different weighting vectors.  Some handy hints for creating a vector of weights:&lt;br /&gt;
** '''w1&amp;lt;-rep(0.1,50)''' will give you a vector, length 50, where each element has a value of 0.1.  W1[1]&amp;lt;-10 will give the first element of the vector a value of 10.&lt;br /&gt;
** '''w2&amp;lt;-seq(from=0.02, to=1.0, by=0.02)''' provides a vector containing a sequence of values from 0.02 to 1.0 in steps of 0.02 (handily, again 50 in total).&lt;br /&gt;
&lt;br /&gt;
==Significance Testing==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; boys_2=c(90.2, 91.4, 86.4, 87.6, 86.7, 88.1, 82.2, 83.8, 91, 87.4)&lt;br /&gt;
&amp;gt; girls_2=c(83.8, 86.2, 85.1, 88.6, 83, 88.9, 89.7, 81.3, 88.7, 88.4)&lt;br /&gt;
&amp;gt; res=var.test(boys_2,girls_2)&lt;br /&gt;
&amp;gt; res&lt;br /&gt;
&lt;br /&gt;
	F test to compare two variances&lt;br /&gt;
&lt;br /&gt;
data:  boys_2 and girls_2 &lt;br /&gt;
F = 1.0186, num df = 9, denom df = 9, p-value = 0.9786&lt;br /&gt;
alternative hypothesis: true ratio of variances is not equal to 1 &lt;br /&gt;
95 percent confidence interval:&lt;br /&gt;
 0.2529956 4.1007126 &lt;br /&gt;
sample estimates:&lt;br /&gt;
ratio of variances &lt;br /&gt;
          1.018559 &lt;br /&gt;
&amp;gt; res=t.test(boys_2, girls_2, var.equal=TRUE, paired=FALSE)&lt;br /&gt;
&amp;gt; res&lt;br /&gt;
&lt;br /&gt;
	Two Sample t-test&lt;br /&gt;
&lt;br /&gt;
data:  boys_2 and girls_2 &lt;br /&gt;
t = 0.8429, df = 18, p-value = 0.4103&lt;br /&gt;
alternative hypothesis: true difference in means is not equal to 0 &lt;br /&gt;
95 percent confidence interval:&lt;br /&gt;
 -1.656675  3.876675 &lt;br /&gt;
sample estimates:&lt;br /&gt;
mean of x mean of y &lt;br /&gt;
    87.48     86.3&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Classification==&lt;br /&gt;
&lt;br /&gt;
===k Nearest Neighbours===&lt;br /&gt;
&lt;br /&gt;
This famous (Fisher's or Anderson's) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa (s), versicolor (c), and virginica (v).&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-patched/library/datasets/html/iris.html&lt;br /&gt;
&lt;br /&gt;
k-nearest neighbour classification for test set from training set: For each row of the test set, the k nearest (in Euclidean distance) training set vectors are found, and the classification is decided by majority vote, with ties broken at random. If there are ties for the kth nearest vector, all candidates are included in the vote.&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/class/html/knn.html&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(class)&lt;br /&gt;
train &amp;lt;- rbind(iris3[1:25,,1], iris3[1:25,,2], iris3[1:25,,3])&lt;br /&gt;
test &amp;lt;- rbind(iris3[26:50,,1], iris3[26:50,,2], iris3[26:50,,3])&lt;br /&gt;
cl &amp;lt;- factor(c(rep(&amp;quot;s&amp;quot;,25), rep(&amp;quot;c&amp;quot;,25), rep(&amp;quot;v&amp;quot;,25)))&lt;br /&gt;
iris3.knn &amp;lt;- knn(train, test, cl, k = 3, prob=TRUE)&lt;br /&gt;
table(predicted=iris3.knn, actual=cl)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
How did we do?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
         actual&lt;br /&gt;
predicted  c  s  v&lt;br /&gt;
        c 23  0  3&lt;br /&gt;
        s  0 25  0&lt;br /&gt;
        v  2  0 22&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Classification Trees===&lt;br /&gt;
&lt;br /&gt;
The kyphosis data frame has 81 rows and 4 columns. representing data on children who have had corrective spinal surgery.&lt;br /&gt;
&lt;br /&gt;
This data frame contains the following columns:&lt;br /&gt;
* Kyphosis: a factor with levels absent present indicating if a kyphosis (a type of deformation) was present after the operation.&lt;br /&gt;
* Age: in months&lt;br /&gt;
* Number: the number of vertebrae involved&lt;br /&gt;
* Start: the number of the first (topmost) vertebra operated on.&lt;br /&gt;
&lt;br /&gt;
See: http://stat.ethz.ch/R-manual/R-devel/library/rpart/html/kyphosis.html&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
fit &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)&lt;br /&gt;
fit2 &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis,&lt;br /&gt;
              parms = list(prior = c(.65,.35), split = &amp;quot;information&amp;quot;))&lt;br /&gt;
fit3 &amp;lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis,&lt;br /&gt;
              control = rpart.control(cp = 0.05))&lt;br /&gt;
par(mfrow = c(1,2), xpd = NA) # otherwise on some devices the text is clipped&lt;br /&gt;
plot(fit)&lt;br /&gt;
text(fit, use.n = TRUE)&lt;br /&gt;
plot(fit2)&lt;br /&gt;
text(fit2, use.n = TRUE)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Image:R-classification-tree.png|500px|thumbnail|center|Classification tree for the kyphosis data frame.]]&lt;br /&gt;
&lt;br /&gt;
==Solving Systems of Linear Equations==&lt;br /&gt;
&lt;br /&gt;
See, e.g.: https://source.ggy.bris.ac.uk/wiki/NumMethodsPDEs&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; A &amp;lt;- array(c(1,3,2,3,5,4,-2,6,3), dim=c(3,3))&lt;br /&gt;
&amp;gt; b &amp;lt;- c(5,7,8)&lt;br /&gt;
&amp;gt; solve(A,b)&lt;br /&gt;
[1] -15   8   2&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Suggested Exercises=&lt;br /&gt;
&lt;br /&gt;
If you would like to work through some exercises, with model answers included, you could take a look at:&lt;br /&gt;
* http://www2.warwick.ac.uk/fac/sci/statistics/staff/academic-research/reed/rexercises.pdf&lt;br /&gt;
&lt;br /&gt;
=Writing Faster R Code=&lt;br /&gt;
&lt;br /&gt;
In the above sections we've introduced a number of features of R and have begun the journey to becoming a proficient and productive user of the language.  In the remaining sections, we'll switch tack and focus on a question commonly asked by those beginning to use R in anger--'''&amp;quot;My R code is slow.  How can I speed it up?&amp;quot;'''.  In this section we'll consider the related tasks of finding which bits of your R code is responsible for the majority of the run-time and what you can do about it.&lt;br /&gt;
&lt;br /&gt;
==Profiling &amp;amp; Timing==&lt;br /&gt;
&lt;br /&gt;
In order to remain productive (and sane, and have a social life...), it is essential that we first identify which portions of your R code are responsible for the majority of the run-time.  We could spend ages optimising a portion that we ''think'' may be running slowly, but computers have the gift(!) to constantly surprise us, and if that portion of your program accounted for, say, 10% of the run-time, then you will have sweated for absolutely no useful gain.&lt;br /&gt;
&lt;br /&gt;
The simplest method of investigation is to simply time the application of a function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
system.time(some.function())&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can get a more detailed analysis of a block of code using the built-in R profiler.  The general pattern of invocation is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
Rprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
# Do some work&lt;br /&gt;
Rprof()&lt;br /&gt;
summaryRprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For example, here's an R script, '''profile.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
Rprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
# Create a 10 x 100,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(100000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)})&lt;br /&gt;
Rprof()&lt;br /&gt;
summaryRprof(filename=&amp;quot;~/rprof.out&amp;quot;)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Which I ran by typing:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
R CMD BATCH profile.r&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the output file, '''profile.r.Rout''', I found the following break down:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
               self.time self.pct total.time total.pct&lt;br /&gt;
&amp;quot;simpleLoess&amp;quot;       4.84    88.00       5.10     92.73&lt;br /&gt;
&amp;quot;rnorm&amp;quot;             0.22     4.00       0.22      4.00&lt;br /&gt;
&amp;quot;loess.smooth&amp;quot;      0.18     3.27       5.28     96.00&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The profile tells us that the function '''simpleLoess''' take 88% of the runtime, whereas '''rnorm''' takes only 4%.&lt;br /&gt;
&lt;br /&gt;
==Preallocation of Memory==&lt;br /&gt;
&lt;br /&gt;
As with other scripting languages, such as MATLAB, the simplest method that you can use to speed up your R code is to pre-allocate the storage for variables whenever possible.  To see the benefits of this, consider the following two functions:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; f1 &amp;lt;- function() {&lt;br /&gt;
+ v &amp;lt;- c()&lt;br /&gt;
+ for (i in 1:30000)&lt;br /&gt;
+   v[i] &amp;lt;- i^2&lt;br /&gt;
+ }&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; f2 &amp;lt;- function() {&lt;br /&gt;
+ v &amp;lt;- c(NA)&lt;br /&gt;
+ length(v) &amp;lt;- 30000&lt;br /&gt;
+ for (i in 1:30000)&lt;br /&gt;
+   v[i] &amp;lt;- i^2&lt;br /&gt;
+ }&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Timing calls to each of them shows that the pre-allocation of memory gives a whopping ~'''x30 speed-up'''.  Your mileage will vary depending upon the details of your code.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; system.time(f1())&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  1.720   0.040   1.762&lt;br /&gt;
&amp;gt; system.time(f2())&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.052   0.000   0.05&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Vectorised Operations==&lt;br /&gt;
&lt;br /&gt;
The other principle method for speeding up your R code is to eliminate loops whenever you can.  Many functions and operators in R will accept arrays as input, rather than just single values and this may allow you to not use a loop.  The examples in the previous section used for loops to step through an array, squaring each element.  However, you can achieve the same result far more quickly by passing the array ''en masse'' to exponentiation operator:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
&amp;gt; system.time(v &amp;lt;- (1:1000000)^2)&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.024   0.004   0.026&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Here we've been able to square 1,000,000 items in half the time it took to process 30,000!&lt;br /&gt;
&lt;br /&gt;
==Calling Functions Written in a Compiled Language (e.g. C or Fortran)==&lt;br /&gt;
&lt;br /&gt;
Another way to get more speed is to outsource portions of R code that are found to be slow to a compiled language, such as C or Fortran.  A good starting point on this topic is:&lt;br /&gt;
&lt;br /&gt;
* http://mazamascience.com/WorkingWithData/?p=1067&lt;br /&gt;
&lt;br /&gt;
=R and HPC=&lt;br /&gt;
&lt;br /&gt;
If you've profiled your code and tried all that you can to speed it up, as described in the previous section, you might be interested in the various initiatives that exist to run R on high performance computers, such as bluecrsytal:  &lt;br /&gt;
&lt;br /&gt;
* http://cran.r-project.org/web/views/HighPerformanceComputing.html&lt;br /&gt;
&lt;br /&gt;
We will see in the following examples, the general approach to running R in parallel is to arrange your task so that a function is applied to a list of inputs, and then to split the list over several CPU cores or cluster worker nodes.&lt;br /&gt;
&lt;br /&gt;
==Multicore==&lt;br /&gt;
&lt;br /&gt;
The '''multicore''' package allows us to make use of several CPU cores within a single machine.  Note, however, that the package  does not work on a MS Windows computers.&lt;br /&gt;
&lt;br /&gt;
As an example, let's look at the use of the package's '''mclapply''' function, a multicore equivalent of R's built-in list apply mapper, '''lapply'''.   I saved the following commands into an R script called '''mutlicore.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(multicore)&lt;br /&gt;
# how many cores are present?&lt;br /&gt;
multicore:::detectCores()&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And used the following submission script to run it on bluecrystal phase2:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=00:00:05&lt;br /&gt;
&lt;br /&gt;
#! Ensure that we have the correct version of R loaded&lt;br /&gt;
module add languages/R-2.15.1&lt;br /&gt;
&lt;br /&gt;
#! change the working directory (default is home directory)&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
#! Run the R script&lt;br /&gt;
R CMD BATCH multicore.r&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After the job had run, I got the following output in the file '''multicore.r.Rout''':&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(multicore)&lt;br /&gt;
&amp;gt; # how many cores are present?&lt;br /&gt;
&amp;gt; multicore:::detectCores()&lt;br /&gt;
[1] 8&lt;br /&gt;
&amp;gt; # Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
&amp;gt; data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
&amp;gt; # Map a function over the matrix.  First in serial..&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.674   0.007   0.749 &lt;br /&gt;
&amp;gt; # .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.301   0.074   0.113 &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Rmpi==&lt;br /&gt;
&lt;br /&gt;
The '''Rmpi''' package allows us to create and use cohorts of message passing processes from within R.  It does so by providing an interface to the MPI (Message Passing Interface) library.&lt;br /&gt;
&lt;br /&gt;
In order to use the Rmpi package on BCp2, you will need the '''ofed/openmpi/gcc/64/1.4.2-qlc''' module loaded.&lt;br /&gt;
&lt;br /&gt;
Here's a short example that I saved as '''Rmpi.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(Rmpi)&lt;br /&gt;
# spawn as many slaves as possible&lt;br /&gt;
mpi.spawn.Rslaves()&lt;br /&gt;
mpi.remote.exec(mpi.get.processor.name())&lt;br /&gt;
mpi.remote.exec(runif(1))&lt;br /&gt;
mpi.close.Rslaves()&lt;br /&gt;
mpi.quit()&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I submitted the job to BCp2 using the following submission script:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#PBS -l nodes=4:ppn=1,walltime=00:00:05&lt;br /&gt;
&lt;br /&gt;
#! Ensure that we have the correct version of R loaded&lt;br /&gt;
module add languages/R-2.15.1&lt;br /&gt;
&lt;br /&gt;
#! change the working directory (default is home directory)&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
#! Create a machine file (used for multi-node jobs)&lt;br /&gt;
cat $PBS_NODEFILE &amp;gt; machine.file.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
#! Run the R script&lt;br /&gt;
mpirun -np 1 -machinefile machine.file.$PBS_JOBID R CMD BATCH Rmpi.r&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and got the following output:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(Rmpi)&lt;br /&gt;
&amp;gt; # spawn as many slaves as possible&lt;br /&gt;
&amp;gt; mpi.spawn.Rslaves()&lt;br /&gt;
        4 slaves are spawned successfully. 0 failed.&lt;br /&gt;
master (rank 0, comm 1) of size 5 is running on: u03n074 &lt;br /&gt;
slave1 (rank 1, comm 1) of size 5 is running on: u03n098 &lt;br /&gt;
slave2 (rank 2, comm 1) of size 5 is running on: u04n029 &lt;br /&gt;
slave3 (rank 3, comm 1) of size 5 is running on: u04n030 &lt;br /&gt;
slave4 (rank 4, comm 1) of size 5 is running on: u03n074 &lt;br /&gt;
&amp;gt; mpi.remote.exec(mpi.get.processor.name())&lt;br /&gt;
$slave1&lt;br /&gt;
[1] &amp;quot;u03n098&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave2&lt;br /&gt;
[1] &amp;quot;u04n029&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave3&lt;br /&gt;
[1] &amp;quot;u04n030&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$slave4&lt;br /&gt;
[1] &amp;quot;u03n074&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;gt; mpi.remote.exec(runif(1))&lt;br /&gt;
         X1        X2        X3        X4&lt;br /&gt;
1 0.5154871 0.5154871 0.5154871 0.5154871&lt;br /&gt;
&amp;gt; mpi.close.Rslaves()&lt;br /&gt;
[1] 1&lt;br /&gt;
&amp;gt; mpi.quit()&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Snow==&lt;br /&gt;
&lt;br /&gt;
Calling MPI routines from within R may be too low level for many people to use comfortably.  Happily, the '''snow''' package provides a higher level abstraction for distributed memory programming from within R.&lt;br /&gt;
&lt;br /&gt;
Here's my example program that a saved as '''snow.r''':&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(snow)&lt;br /&gt;
# request a cluster of 3 worker nodes&lt;br /&gt;
cl &amp;lt;- makeCluster(3)&lt;br /&gt;
clusterCall(cl, function() Sys.info()[c(&amp;quot;nodename&amp;quot;,&amp;quot;machine&amp;quot;)])&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using snow, across a cluster of workers)&lt;br /&gt;
system.time(x &amp;lt;- clusterApply(cl, data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
stopCluster(cl)&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I ran it on BCp2 using the same submission script given for Rmpi, save for changing Rmpi.r to snow.r.  The output was:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; library(snow)&lt;br /&gt;
&amp;gt; # request a cluster of 3 worker nodes&lt;br /&gt;
&amp;gt; cl &amp;lt;- makeCluster(3)&lt;br /&gt;
Loading required package: Rmpi&lt;br /&gt;
        3 slaves are spawned successfully. 0 failed.&lt;br /&gt;
&amp;gt; clusterCall(cl, function() Sys.info()[c(&amp;quot;nodename&amp;quot;,&amp;quot;machine&amp;quot;)])&lt;br /&gt;
[[1]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u01n105&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
[[2]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u02n014&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
[[3]]&lt;br /&gt;
 nodename   machine &lt;br /&gt;
&amp;quot;u03n098&amp;quot;  &amp;quot;x86_64&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&amp;gt; # Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
&amp;gt; data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
&amp;gt; # Map a function over the matrix.  First in serial..&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.711   0.001   0.715 &lt;br /&gt;
&amp;gt; # .. and secondly in parallel (using snow, across a cluster of workers)&lt;br /&gt;
&amp;gt; system.time(x &amp;lt;- clusterApply(cl, data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
   user  system elapsed &lt;br /&gt;
  0.259   0.001   0.260 &lt;br /&gt;
&amp;gt; stopCluster(cl)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Parallel==&lt;br /&gt;
&lt;br /&gt;
The '''parallel''' package is an amalgamation of functionality from the multicore and snow packages.  The shared memory parallelism in this package runs on an MS Windows machine (unlike the multicore package). &lt;br /&gt;
&lt;br /&gt;
I trivial translation of our previous multicore example is:&lt;br /&gt;
&amp;lt;source&amp;gt;&lt;br /&gt;
library(parallel)&lt;br /&gt;
# how many cores are present?&lt;br /&gt;
parallel:::detectCores()&lt;br /&gt;
# Create a 10 x 10,000 matrix of random numbers&lt;br /&gt;
data &amp;lt;- lapply(1:10, function(x) {rnorm(10000)})&lt;br /&gt;
# Map a function over the matrix.  First in serial..&lt;br /&gt;
system.time(x &amp;lt;- lapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
# .. and secondly in parallel (using multicore, within a node)&lt;br /&gt;
system.time(x &amp;lt;- mclapply(data, function(x) {loess.smooth(x,x)}))&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I have not been able to get a distributed memory cluster working on BCp2 using the parallel package.&lt;br /&gt;
&lt;br /&gt;
=Further Reading=&lt;br /&gt;
&lt;br /&gt;
* [http://shop.oreilly.com/product/9780596801717.do R in a Nutshell]&lt;br /&gt;
* [http://shop.oreilly.com/product/0636920021421.do Parallel R]&lt;/div&gt;</summary>
		<author><name>GethinWilliams</name></author>
	</entry>
</feed>