<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-4741864931549758726</id><updated>2011-07-28T07:15:18.939-07:00</updated><category term='Krasner'/><category term='web2py'/><category term='MVC'/><category term='MTV'/><category term='django'/><category term='python'/><category term='web frameworks'/><category term='zope'/><category term='turbogears'/><category term='pylons'/><title type='text'>Program-o-Babble</title><subtitle type='html'>I use tools like R, Python, and Hadoop for Big Data analytics.  This is my blog of technical scribbles.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://data-analytics-tools.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4741864931549758726/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://data-analytics-tools.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>miked98</name><uri>http://www.blogger.com/profile/02261350400113392769</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://2.bp.blogspot.com/_GffqQSsYpkk/SqXuzznf7QI/AAAAAAAAAoA/5Ah6uX9I7G0/S220/md_bw2_GLG_png.png'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>7</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-4741864931549758726.post-3214504899685016295</id><published>2009-12-23T16:02:00.000-08:00</published><updated>2009-12-23T16:05:51.099-08:00</updated><title type='text'>Three easy functions for manipulating data frames in R</title><content type='html'>&lt;p&gt;I do a lot of basic manipulation of data, so here are two built-in functions ( &lt;strong&gt;transform&lt;/strong&gt; , &lt;strong&gt;subset&lt;/strong&gt; ) and one library ( &lt;strong&gt;sqldf&lt;/strong&gt; ) that I use daily (this post originally appeared as &lt;a href="http://stackoverflow.com/questions/1295955/what-is-the-most-useful-r-trick/"&gt; an answer I gave on StackOverflow)&lt;/a&gt;.&lt;/p&gt;&lt;h2&gt;create sample sales data&lt;/h2&gt;&lt;pre class="brush: bash"&gt;sales &amp;lt;- expand.grid(country = c('USA', 'UK', 'FR'),&lt;br /&gt;                     product = c(1, 2, 3))&lt;br /&gt;sales$revenue &amp;lt;- rnorm(dim(sales)[1], mean=100, sd=10)&lt;br /&gt;&lt;br /&gt;&amp;gt; sales&lt;br /&gt;  country product   revenue&lt;br /&gt;1     USA       1 108.45965&lt;br /&gt;2      UK       1  97.07981&lt;br /&gt;3      FR       1  99.66225&lt;br /&gt;4     USA       2 100.34754&lt;br /&gt;5      UK       2  87.12262&lt;br /&gt;6      FR       2 112.86084&lt;br /&gt;7     USA       3  95.87880&lt;br /&gt;8      UK       3  96.43581&lt;br /&gt;9      FR       3  94.59259&lt;br /&gt;&lt;/pre&gt;&lt;h2&gt;use transform() to add a column&lt;/h2&gt;&lt;pre class="brush: bash"&gt;## transform currency to euros&lt;br /&gt;usd2eur &amp;lt;- 1.434&lt;br /&gt;transform(sales, euro = revenue * usd2eur)&lt;br /&gt;&lt;br /&gt;&amp;gt;&lt;br /&gt;&lt;br /&gt;  country product   revenue     euro&lt;br /&gt;1     USA       1 108.45965 155.5311&lt;br /&gt;2      UK       1  97.07981 139.2125&lt;br /&gt;3      FR       1  99.66225 142.9157&lt;br /&gt;...&lt;br /&gt;&lt;/pre&gt;&lt;h2&gt;use subset() to slice the data&lt;/h2&gt;&lt;pre class="brush: bash"&gt;subset(sales,&lt;br /&gt;       country == 'USA' &amp;amp; product %in% c(1, 2),&lt;br /&gt;       select = c('product', 'revenue'))&lt;br /&gt;&lt;br /&gt;&amp;gt;&lt;br /&gt;  product  revenue&lt;br /&gt;1       1 108.4597&lt;br /&gt;4       2 100.3475&lt;br /&gt;&lt;/pre&gt;&lt;h2&gt;use sqldf() to slice and aggregate with SQL&lt;/h2&gt;&lt;p&gt;The &lt;a href="http://code.google.com/p/sqldf/" rel="nofollow"&gt;sqldf package&lt;/a&gt; provides an SQL interface to R data frames&lt;/p&gt;&lt;pre class="brush: bash"&gt;##  recast the previous subset() expression in SQL&lt;br /&gt;sqldf('SELECT product, revenue FROM sales \&lt;br /&gt;       WHERE country = "USA" \&lt;br /&gt;       AND product IN (1,2)')&lt;br /&gt;&lt;br /&gt;&amp;gt;&lt;br /&gt;  product  revenue&lt;br /&gt;1       1 108.4597&lt;br /&gt;2       2 100.3475&lt;br /&gt;&lt;/pre&gt;&lt;p&gt;Perform an aggregation or GROUP BY&lt;/p&gt;&lt;pre class="brush: bash"&gt;sqldf('select country, sum(revenue) revenue \&lt;br /&gt;       FROM sales \&lt;br /&gt;       GROUP BY country')&lt;br /&gt;&lt;br /&gt;&amp;gt;&lt;br /&gt;  country  revenue&lt;br /&gt;1      FR 307.1157&lt;br /&gt;2      UK 280.6382&lt;br /&gt;3     USA 304.6860&lt;br /&gt;&lt;/pre&gt;&lt;p&gt;For more sophisticated map-reduce-like functionality on data frames, check out the &lt;a href="http://crantastic.org/packages/plyr" rel="nofollow"&gt;plyr&lt;/a&gt; package.  And if find yourself wanting to pull your hair out, I recommend checking out &lt;a href="http://rads.stackoverflow.com/amzn/click/0387747303" rel="nofollow"&gt;Data Manipulation with R&lt;/a&gt;.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4741864931549758726-3214504899685016295?l=data-analytics-tools.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://data-analytics-tools.blogspot.com/feeds/3214504899685016295/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4741864931549758726&amp;postID=3214504899685016295' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4741864931549758726/posts/default/3214504899685016295'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4741864931549758726/posts/default/3214504899685016295'/><link rel='alternate' type='text/html' href='http://data-analytics-tools.blogspot.com/2009/12/three-easy-functions-for-manipulating.html' title='Three easy functions for manipulating data frames in R'/><author><name>miked98</name><uri>http://www.blogger.com/profile/02261350400113392769</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://2.bp.blogspot.com/_GffqQSsYpkk/SqXuzznf7QI/AAAAAAAAAoA/5Ah6uX9I7G0/S220/md_bw2_GLG_png.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4741864931549758726.post-1605757663152577889</id><published>2009-12-10T19:01:00.000-08:00</published><updated>2009-12-11T01:09:59.674-08:00</updated><title type='text'>Hacking Up a Map Function for Bash</title><content type='html'>The map function is an example of an elegant programming pattern loved by &lt;a href="http://lambda-the-ultimate.org/"&gt;language purists&lt;/a&gt;, but equally useful to &lt;a href="http://www.joelonsoftware.com/items/2009/09/23.html"&gt;duct tape programmers&lt;/a&gt;.  Map is an example of a &lt;a href="http://en.wikipedia.org/wiki/Higher-order_function"&gt;"higher-order"&lt;/a&gt; function: &amp;nbsp;one that operates and creates yet other functions.&lt;br /&gt;&lt;br /&gt;In an age of multi-core machines, the value of a map function resides principally in one feature:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="background-color: #ffe599;"&gt;map functions define operations that are inherently&lt;/span&gt;&lt;span style="background-color: #ffe599;"&gt; parallel&lt;/span&gt;&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;In practice, most for loops could be re-written as map tasks. &amp;nbsp;And map's parallel nature is why it is a key element in distributed processing patterns and platforms.&lt;br /&gt;&lt;br /&gt;While languages have map functions (it's 'map' in Perl, Python, Haskell, and Lisp dialects like Clojure; 'lapply' in R; accessible via 'select' in LinQ), to my disappointment, no built-in map function exists for Bash.&lt;br /&gt;&lt;br /&gt;So I took it upon myself to cook up something that implements a bare-bones map function (note, there does exist an amazing set of scripts called &lt;a href="http://blog.last.fm/2009/04/06/mapreduce-bash-script"&gt;bashreduce&lt;/a&gt;, which implement both map and reduce).&lt;br /&gt;&lt;br /&gt;My map function is simple: it applies a given function f to a set of files one directory, and returns a transformed set of files to an output directory. &amp;nbsp;It runs in parallel, spawning as many mappers as CPU cores (or as you explicitly set). &lt;br /&gt;&lt;br /&gt;Of note is that, should you need to pass user-defined functions in Bash to other functions -- you must mark them for export to sub-shell environments using the "&lt;b&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;declare -fx&lt;/span&gt;&lt;/b&gt;" command (formerly 'typeset' in Korn).&lt;br /&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;br /&gt;&lt;b&gt;&lt;span style="font-size: large;"&gt;running parallel jobs with map&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&amp;nbsp;Here's a basic example that shows off how to use this map bash function.    &lt;br /&gt;&lt;pre class="brush: bash"&gt;## make a test 'in' directory consisting of some words&lt;br /&gt;mkdir in&lt;br /&gt;cd in&lt;br /&gt;split -C 500k /usr/share/dict/words&lt;br /&gt;## you should have five or six files, I get&lt;br /&gt;## &amp;gt; ls in&lt;br /&gt;## xaa  xab  xac  xad  xae  xaf  xag  xah  xai  xaj&lt;br /&gt;cd ..&lt;br /&gt;map 'wc -l' in counts&lt;br /&gt;## output results into files with same names, but in dir 'counts' &lt;br /&gt;## &amp;gt; ls counts&lt;br /&gt;## xaa  xab  xac  xad  xae  xaf  xag  xah  xai  xaj&lt;br /&gt;&lt;/pre&gt;A more sophisticated example includes defining a user-defined function -- one that unzips, sorts, uniqifies (N.B. the -u flag on sort), and rezips the file -- and mapping it over a set of files.  &lt;br /&gt;&lt;pre class="brush: bash"&gt;gzsort () { gunzip -c $1 | sort -u | gzip --fast; }&lt;br /&gt;declare -fx gzsort  ##  export to subshell&lt;br /&gt;## test on the same example above&lt;br /&gt;map 'gzip -c' in gzin&lt;br /&gt;map gzsort gzin gzout&lt;br /&gt;&lt;/pre&gt;The most recent version of this &lt;a href="http://github.com/dataspora/big-data-tools/blob/master/map.sh"&gt;map function for Bash&lt;/a&gt; will always available at my &lt;a href="http://github.com/dataspora/big-data-tools"&gt;GitHub repository for Big Data tools&lt;/a&gt;, but I've also posted the code below to save you a click.  In an earlier version I had implemented a queue in which files were pulled from (as they finished).  But it turns out that the xargs function now supports the "-P" flag, which allow one to specify how many processes to run in parallel.  In essence, my script is a crufty wrapper around &lt;b&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;xargs&lt;/span&gt;&lt;/b&gt;, but with a cleaner syntax.  &lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;span style="font-size: large;"&gt;map: &amp;nbsp;not for industrial use&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;One final comment about where I've found this map script useful: &amp;nbsp;I don't suggest using this for heavy lifting of Big Data -- by which I mean data transforms over terabytes. &amp;nbsp;For that, Hadoop is your friend. &amp;nbsp;But for those in-between tasks, where you'd like to crunch 100GB of log files on an Amazon EC2 c1.xlarge instance -- just &amp;nbsp;for example --&amp;nbsp;this little tool can be the difference between 8 hours (a day) and 1 hour (lunch).&lt;br /&gt;&lt;br /&gt;&lt;pre class="brush: bash"&gt;#!/bin/bash&lt;br /&gt;# set -e&lt;br /&gt;CMD=$(basename $0)&lt;br /&gt;HELP=$(cat &amp;lt;&amp;lt;EOF &lt;br /&gt;Usage: $CMD [FUNCTION] [INDIR] [OUTDIR] [MAPPERS] ...&lt;br /&gt;\n&lt;br /&gt;\n Map a FUNCTION over a set of files in the directory, INDIR, and output&lt;br /&gt;\n results to a directory, OUTDIR, with the same base name.  FUNCTION&lt;br /&gt;\n must accept a file name as its first parameter (such as 'grep',  &lt;br /&gt;\n 'sort', or 'awk').  A set of parallel processes are launched, equal to&lt;br /&gt;\n MAPPERS.  If MAPPERS is not given, it defaults to the number of CPUs &lt;br /&gt;\n detected on the system, or 2 otherwise. &lt;br /&gt;\n&lt;br /&gt;\n Examples:  $CMD sort /tmp/files /tmp/sorted 4&lt;br /&gt;\n&lt;br /&gt;\n           # using map with a user-defined function&lt;br /&gt;\n           gzsort () { gunzip -c $1 | sort | gzip --fast; } &lt;br /&gt;\n           declare -fx gzsort  ##  export to subshell&lt;br /&gt;\n           $CMD gzsort ingz outgz                    &lt;br /&gt;EOF&lt;br /&gt;)&lt;br /&gt;&lt;br /&gt;if [ $# -eq 4 ]; then&lt;br /&gt;    nmap=$4       &lt;br /&gt;elif [ $# -eq 3 ]; then   ## guess no. CPUs, default to 2&lt;br /&gt;    nmap=`grep '^processor' /proc/cpuinfo | wc -l`&lt;br /&gt;    if [ $? -eq 1 ];  then&lt;br /&gt; nmap=2&lt;br /&gt;    fi&lt;br /&gt;elif [ $# -lt 3 ]; then  ## too few args&lt;br /&gt;    echo -e $HELP&lt;br /&gt;    exit 1&lt;br /&gt;fi&lt;br /&gt; &lt;br /&gt;func=$1&lt;br /&gt;in=$2&lt;br /&gt;out=$3&lt;br /&gt;export func in out nmap&lt;br /&gt; &lt;br /&gt;## make output directory&lt;br /&gt;if [ -d $out ]; then &lt;br /&gt;    echo "output dir $out exists"&lt;br /&gt;    exit 1&lt;br /&gt;else &lt;br /&gt;    mkdir $out&lt;br /&gt;fi&lt;br /&gt; &lt;br /&gt;ls $in |  xargs -P $nmap -I{} sh -c '$func "$in"/"$1" &amp;gt; "$out"/"$1"' -- {}&lt;br /&gt; &lt;br /&gt;## cleanup in event of any failure&lt;br /&gt;if [ $? -eq 1 ]; then&lt;br /&gt;    rm -fr $out&lt;br /&gt;fi&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4741864931549758726-1605757663152577889?l=data-analytics-tools.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://data-analytics-tools.blogspot.com/feeds/1605757663152577889/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4741864931549758726&amp;postID=1605757663152577889' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4741864931549758726/posts/default/1605757663152577889'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4741864931549758726/posts/default/1605757663152577889'/><link rel='alternate' type='text/html' href='http://data-analytics-tools.blogspot.com/2009/12/map-function-for-bash.html' title='Hacking Up a Map Function for Bash'/><author><name>miked98</name><uri>http://www.blogger.com/profile/02261350400113392769</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://2.bp.blogspot.com/_GffqQSsYpkk/SqXuzznf7QI/AAAAAAAAAoA/5Ah6uX9I7G0/S220/md_bw2_GLG_png.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4741864931549758726.post-6947142854544647399</id><published>2009-09-06T01:23:00.000-07:00</published><updated>2009-09-08T21:02:48.601-07:00</updated><title type='text'>Reservoir Sampling Algorithm in Python and Perl</title><content type='html'>Algorithms that perform calculations on evolving data streams, but in fixed memory,  have increasing relevance in the Age of Big Data.&lt;br /&gt;&lt;br /&gt;The reservoir sampling algorithm outputs a sample of N lines from a file of undetermined size.  It does so in a single pass, using memory proportional to N.&lt;br /&gt;&lt;br /&gt;These two features -- (i) a constant memory footprint and (ii) a capacity to operate on files of indeterminate size -- make it ideal for working with very large data sets common to event processing.&lt;br /&gt;&lt;br /&gt;While it has likely been multiply discovered and implemented, like many algorithms, it was codified by &lt;a href="http://gregable.com/2007/10/reservoir-sampling.html"&gt;Knuth's The Art of Computer Programming&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The trick of this algorithm is to first fill up the sample buffer, and afterwards, to probabilistically replace it with additional lines of input.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Python version&lt;/b&gt;&lt;br /&gt;&lt;pre class="brush: python"&gt;&lt;br /&gt;#!/usr/bin/python&lt;br /&gt;import sys&lt;br /&gt;import random&lt;br /&gt;&lt;br /&gt;if len(sys.argv) == 3:&lt;br /&gt;    input = open(sys.argv[2],'r')&lt;br /&gt;elif len(sys.argv) == 2:&lt;br /&gt;    input = sys.stdin;&lt;br /&gt;else:&lt;br /&gt;    sys.exit(&amp;quot;Usage:  python samplen.py &amp;lt;lines&amp;gt; &amp;lt;?file&amp;gt;&amp;quot;)&lt;br /&gt;&lt;br /&gt;N = int(sys.argv[1]);&lt;br /&gt;sample = [];&lt;br /&gt;&lt;br /&gt;for i,line in enumerate(input):&lt;br /&gt;    if i &amp;lt; N:&lt;br /&gt;        sample.append(line)&lt;br /&gt;    elif i &amp;gt;= N and random.random() &amp;lt; N/float(i+1):&lt;br /&gt;        replace = random.randint(0,len(sample)-1)&lt;br /&gt;        sample[replace] = line&lt;br /&gt;&lt;br /&gt;for line in sample:&lt;br /&gt;    sys.stdout.write(line)&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Perl version&lt;/b&gt;&lt;br /&gt;&lt;pre class="brush: c"&gt;&lt;br /&gt;#!/usr/bin/perl -sw&lt;br /&gt;&lt;br /&gt;$IN = 'STDIN' if (@ARGV == 1);&lt;br /&gt;open($IN, '&amp;lt;'.$ARGV[1]) if (@ARGV == 2);&lt;br /&gt;die &amp;quot;Usage:  perl samplen.pl &amp;lt;lines&amp;gt; &amp;lt;?file&amp;gt;\n&amp;quot; if (!defined($IN));&lt;br /&gt;&lt;br /&gt;$N = $ARGV[0];&lt;br /&gt;@sample = ();&lt;br /&gt;&lt;br /&gt;while (&amp;lt;$IN&amp;gt;) {&lt;br /&gt;    if ($. &amp;lt;= $N) {&lt;br /&gt; $sample[$.-1] = $_;&lt;br /&gt;    } elsif (($. &amp;gt; $N) &amp;&amp; (rand() &amp;lt; $N/$.)) {&lt;br /&gt; $replace = int(rand(@sample));&lt;br /&gt; $sample[$replace] = $_;&lt;br /&gt;    }&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;print foreach (@sample);&lt;br /&gt;close($IN);&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;For example, imagine we are to sample 5 lines randomly from a 6-line file.  Call i the line number of the input, and N the size of sample desired.  For the first 5 lines (where i &amp;lt; = N), our sample fills entirely.  (For the non-Perl hackers:  the current line number i is held by the variable &lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;b&gt;$.&lt;/b&gt;&lt;/span&gt;, just as the special variable &lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;b&gt;$_&lt;/b&gt;&lt;/span&gt; holds the current line value).&lt;br /&gt;&lt;br /&gt;It's at successive lines of input that the probabilistic sampling starts:  the 6th line has a 5/6th (N/i) chance of being sampled, and if chosen, it will replace one of the previously 5 chosen lines with a 1/5 chance:  leaving them a (5/6 * 1/5) = 5/6  chance of being  sampled.  Thus all 6 lines have an equal chance of being sampled.&lt;br /&gt;&lt;br /&gt;In general, as more lines are seen, the chance that any &lt;i&gt;additional &lt;/i&gt;line is chosen for the sample falls;  but the chance that any &lt;i&gt;previously&lt;/i&gt; chosen line could be replaced grows.  These two balance such that the probability for any given line of input to be sampled is identical.&lt;br /&gt;&lt;br /&gt;A more sophisticated variation of this algorithm is one that can take into consideration a weighted sampling.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4741864931549758726-6947142854544647399?l=data-analytics-tools.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://data-analytics-tools.blogspot.com/feeds/6947142854544647399/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4741864931549758726&amp;postID=6947142854544647399' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4741864931549758726/posts/default/6947142854544647399'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4741864931549758726/posts/default/6947142854544647399'/><link rel='alternate' type='text/html' href='http://data-analytics-tools.blogspot.com/2009/09/reservoir-sampling-algorithm-in-perl.html' title='Reservoir Sampling Algorithm in Python and Perl'/><author><name>miked98</name><uri>http://www.blogger.com/profile/02261350400113392769</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://2.bp.blogspot.com/_GffqQSsYpkk/SqXuzznf7QI/AAAAAAAAAoA/5Ah6uX9I7G0/S220/md_bw2_GLG_png.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4741864931549758726.post-276468182400799423</id><published>2009-03-12T20:34:00.000-07:00</published><updated>2009-03-12T23:30:48.850-07:00</updated><title type='text'>How to build a dynamic DNS server with Slicehost</title><content type='html'>&lt;p&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;Or, How to give your local machine a permanent home on the interweb.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;For years I had a method of accessing Linux boxes or Windows machines that lacked permanent IPs.   I would:  (1)  find my IP at some 'whats-my-ip.com' site, (2)  e-mail it to myself, and (3) pray that it wouldn't change while I was away.&lt;br /&gt;&lt;/p&gt;&lt;p&gt;Unfortunately, particularly for DSL connections, this faith-based method frequently failed.&lt;/p&gt;&lt;p&gt;In an ideal world, we wouldn't need local storage or local CPU power.  But the cold, hard truth of network latency -- which ever larger data sets don't help -- means that most of us need local machines.  And life is a lot easier when these local machines have -- or least appear to have -- permanent addresses on the interweb, like "bigdata.dataspora.com" -- even if their actual IPs are changing. &lt;br /&gt;&lt;/p&gt;&lt;p&gt;What follows is a the solution that I've implemented for my setup.  Basically it's a cron script that runs on our office server ('bigdata') and checks its WAN IP address hourly.  If this IP address has changed, it updates the DNS record for 'bigdata.dataspora.com'  on our top-level server (hosted at Slicehost).  The script lives in the /etc/cron.hourly directory of my local server.&lt;/p&gt;&lt;p&gt;Slicehost has a nice API that makes this process relatively painless, and this script was modified from their documentation (link can be found in the comments).  Three important prerequisites for this code to work:  &lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;you need to Enable API access via  &lt;a href="https://manage.slicehost.com/api/"&gt;https://manage.slicehost.com/api/&lt;/a&gt;   &lt;/li&gt;&lt;li&gt;you must have a 'bigdata.dataspora.com' (insert your domain) Type A record at slicehost pointing to some IP address, and... &lt;/li&gt;&lt;li&gt;you must have Jared Kuolt's &lt;a href="http://code.google.com/p/pyactiveresource/"&gt;pyactiveresource&lt;/a&gt; Python library installed.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;So, without further ado, here's the little script that makes your local machine a self-updating DNS dynamo!&lt;/p&gt;&lt;br /&gt;&lt;pre  style=" background-border: 1px dashed #999999; line-height: 14px; padding: 5px; overflow: auto; width: 100%font-family:Andale Mono, Lucida Console, Monaco, fixed, monospace;font-size:12px;color:#000000;"&gt;&lt;code&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: rgb(204, 0, 0);"&gt;#!/usr/bin/python&lt;br /&gt;## dynamic-dns.py&lt;br /&gt;## author:  Michael Driscoll &lt;/span&gt;&lt;mike at="" com=""&gt;&lt;span class="Apple-style-span" style="color: rgb(255, 0, 0);"&gt;&lt;span class="Apple-style-span" style="color: rgb(204, 0, 0);"&gt;&lt;br /&gt;## 12 mar 2009&lt;br /&gt;##&lt;br /&gt;## A script that dynamically updates DNS records on Slicehost&lt;br /&gt;##&lt;br /&gt;## For more detailed information, see Slicehost's excellent API docs&lt;br /&gt;##   http://articles.slicehost.com/2008/5/13/slicemanager-api-documentation&lt;br /&gt;##&lt;br /&gt;## This code requires Jared Kuolt's pyactiveresource library, via&lt;br /&gt;##   http://code.google.com/p/pyactiveresource/&lt;br /&gt;##&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;from pyactiveresource.activeresource import ActiveResource&lt;br /&gt;import urllib&lt;br /&gt;import os&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: rgb(204, 0, 0);"&gt;## Let's define some constants for securely connecting to the Slicehost API&lt;br /&gt;## API access key and resulting URL string&lt;br /&gt;&lt;/span&gt;api_password = &lt;span class="Apple-style-span" style="color: rgb(0, 153, 0);"&gt;'your-API-key-goes-here'&lt;/span&gt;&lt;br /&gt;api_site = 'https://%s@api.slicehost.com/' % api_password&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: rgb(204, 0, 0);"&gt;## the type and name of the single record I'm updating&lt;br /&gt;&lt;/span&gt;record_type = 'A'&lt;br /&gt;record_name = &lt;span class="Apple-style-span" style="color: rgb(0, 153, 0);"&gt;'&lt;/span&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="color: rgb(0, 153, 0);"&gt;bigdata&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="color: rgb(0, 153, 0);"&gt;'&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: rgb(204, 0, 0);"&gt;## The URL for any server that can return your IP in plaintext&lt;br /&gt;## I hand-rolled mine with a one line file called 'myip.php':&lt;br /&gt;##  &lt;/span&gt;&lt;span class="Apple-style-span" style="color: rgb(153, 0, 0);"&gt; &lt;/span&gt;&lt;br /&gt;ip_site = 'http://labs.dataspora.com/tools/myip.php'&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: rgb(204, 0, 0);"&gt;## where to store the last ip detected&lt;/span&gt;&lt;br /&gt;ip_path = '/etc/lastip'&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: rgb(204, 0, 0);"&gt;## if last ip exists, read it in&lt;br /&gt;&lt;/span&gt;if (os.path.exists(ip_path)):&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;f = open(ip_path, 'r')&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;lastip = f.read()&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;f.close()&lt;br /&gt;else:&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;lastip = ''&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: rgb(204, 0, 0);"&gt;## Let's get our current ip&lt;/span&gt;&lt;br /&gt;ip = urllib.urlopen('http://labs.dataspora.com/tools/myip.php').read()&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: rgb(204, 0, 0);"&gt;## If they differ, update our last ip file and Slicehost&lt;/span&gt;&lt;br /&gt;if (ip != lastip):&lt;br /&gt;&lt;span class="Apple-style-span" style="color: rgb(204, 0, 0);"&gt;## update our file&lt;/span&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;f = open(ip_path, 'w')&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;f.write(ip)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;f.close()&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: rgb(204, 0, 0);"&gt;## update Slicehost&lt;/span&gt;&lt;br /&gt;class Record(ActiveResource):&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;_site = api_site&lt;br /&gt;&lt;br /&gt;results = Record.find(record_type="A", name=record_name)&lt;br /&gt;record = results[0]&lt;br /&gt;record.data = ip&lt;br /&gt;record.save()&lt;br /&gt;&lt;/mike&gt;&lt;/code&gt;&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4741864931549758726-276468182400799423?l=data-analytics-tools.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://data-analytics-tools.blogspot.com/feeds/276468182400799423/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4741864931549758726&amp;postID=276468182400799423' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4741864931549758726/posts/default/276468182400799423'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4741864931549758726/posts/default/276468182400799423'/><link rel='alternate' type='text/html' href='http://data-analytics-tools.blogspot.com/2009/03/how-to-build-your-own-dynamic-dns.html' title='How to build a dynamic DNS server with Slicehost'/><author><name>miked98</name><uri>http://www.blogger.com/profile/02261350400113392769</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://2.bp.blogspot.com/_GffqQSsYpkk/SqXuzznf7QI/AAAAAAAAAoA/5Ah6uX9I7G0/S220/md_bw2_GLG_png.png'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4741864931549758726.post-686049789952239515</id><published>2009-02-26T21:00:00.001-08:00</published><updated>2009-02-26T23:10:04.221-08:00</updated><title type='text'>Four Simple Steps to a Secure Samba Server</title><content type='html'>&lt;p&gt;In this post I describe how you can get Samba to securely serve files from your Linux box in four easy steps, with a 'smb.conf' file just 14 lines long.&lt;/p&gt;&lt;p&gt;I've always found Samba to be unnecessarily complex, and until now, my minimal effort hack was to set up a world-writeable '/share' folder.  But necessity calls (my hard drive clucked the &lt;a href="http://en.wikipedia.org/wiki/Click_of_death"&gt;click of death&lt;/a&gt; last week), and I decided to find a basic Samba setup that does the following:  (i) makes my  Linux box the unique home for all my files, (ii) allows access to that box from any other OS client, and (iii) manages security and file stamp permissions properly.  For the approach I take below, you don't need some GUI control panel, just your text editor of choice.&lt;/p&gt;&lt;p&gt;This procedure was modified from the &lt;a href="http://www.samba.org/samba/docs/man/Samba-HOWTO-Collection/FastStart.html"&gt;Samba HOWTO and Reference Guide&lt;/a&gt;, specifically the "Secure Read-Write File and Print Server" section.  This is the clearest documentation I have found anywhere on the subject.&lt;/p&gt;&lt;p&gt;So here's my setup and goals.  My setup is a Linux server named 'kube' running Ubuntu 8 (8.04) with the Samba 3.0 package already installed.  My goal is to read and write to my home directory on &lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;kube&lt;/span&gt;, &lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;/home/mdriscol&lt;/span&gt;, from my Windows machine (XP or Vista) -- or any other client (Mac OS X) that lives inside my LAN.&lt;/p&gt;&lt;p&gt;Command-line code should be executed by a root-privileged user (via sudo or directly).&lt;/p&gt;&lt;p&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;1. Create your smb.conf file&lt;/span&gt; (typically found at &lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;/etc/samba/smb.conf&lt;/span&gt;)&lt;/p&gt;&lt;p&gt;Unlike the needlessly complex smb.conf examples you'll find on the web, mine is just a handful of lines.  Its split into three sections:  the most relevant is the 'homes' section, which contains directives about how the server's home directories are shared (I've commented the file below):&lt;/p&gt;&lt;p&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;[global]               &lt;span class="Apple-style-span" style="color: rgb(204, 0, 0);"&gt;##  global settings&lt;/span&gt;&lt;br /&gt;workgroup = WORKGROUP  &lt;span class="Apple-style-span" style="color: rgb(204, 0, 0);"&gt;##  sometimes MSHOME&lt;/span&gt;&lt;br /&gt;netbios name = KUBE    &lt;span class="Apple-style-span" style="color: rgb(204, 0, 0);"&gt;##  name that server is broadcast as&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;[homes]                &lt;span class="Apple-style-span" style="color: rgb(204, 0, 0);"&gt;##  how /home directories are shared&lt;/span&gt;&lt;br /&gt;comment = Home Directories&lt;br /&gt;valid users = %S       &lt;span class="Apple-style-span" style="color: rgb(204, 0, 0);"&gt;##  %S means 'all Samba users'&lt;/span&gt;&lt;br /&gt;read only = No&lt;br /&gt;browseable = No&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;[public]               &lt;span class="Apple-style-span" style="color: rgb(204, 0, 0);"&gt;## public dir w/ global read/write&lt;/span&gt;&lt;br /&gt;comment = Data&lt;br /&gt;path = /export         &lt;span class="Apple-style-span" style="color: rgb(204, 0, 0);"&gt;## make sure this exists&lt;/span&gt;&lt;br /&gt;force user = mdriscol  &lt;span class="Apple-style-span" style="color: rgb(204, 0, 0);"&gt;## writes will be assigned this user&lt;/span&gt;&lt;br /&gt;force group = mdriscol  &lt;span class="Apple-style-span" style="color: rgb(204, 0, 0);"&gt;## and this group&lt;/span&gt;&lt;br /&gt;read only = No&lt;/span&gt;&lt;/p&gt;&lt;p&gt;The last section is optional.  It's for a public folder that any user can write to, but files will be stamped with a default user and group (in this case, me).&lt;br /&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;2.  Create Samba users.&lt;/span&gt;  Because Samba keeps its own list of users and passwords, separate from the server's, you must assign Samba passwords to the users in /home  (I keep them the same for sanity's sake) by executing the following as root:&lt;/p&gt;&lt;p&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;  smbpasswd -a mdriscol&lt;/span&gt;&lt;/p&gt;&lt;p&gt;Repeat this for any other users whose home directories you wish to make accessible.&lt;/p&gt;&lt;p&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;3.  Restart the Samba service&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;  /etc/init.d/samba restart&lt;/span&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;4.  Login from your client of choice&lt;/span&gt;&lt;/p&gt;&lt;p&gt;For Windows, open up the run prompt with [Windows-Key]-run and enter "\\KUBE" - the netbios name you gave your server.  Login with your Samba password.  Huzzah - it works!&lt;/p&gt;&lt;p&gt;Now you have seamless, secure access to a centralized file server. &lt;br /&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;Future steps for me:  (i) Since this entire process was motivated by a hard disk crash, I plan to set up nightly incremental backups of my Ubuntu file server, and (ii) some simple jiggering should allow me to mount this volume remotely anywhere I go.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_GffqQSsYpkk/SaeDDE4scbI/AAAAAAAAAgA/EwuFvG1Io2c/s1600-h/alt-run-samba.png"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 320px; height: 181px;" src="http://2.bp.blogspot.com/_GffqQSsYpkk/SaeDDE4scbI/AAAAAAAAAgA/EwuFvG1Io2c/s320/alt-run-samba.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5307354774753800626" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_GffqQSsYpkk/SaeCL9O7z_I/AAAAAAAAAfw/kRST4T_jaCU/s1600-h/samba-connect-prompt.png"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 320px; height: 270px;" src="http://4.bp.blogspot.com/_GffqQSsYpkk/SaeCL9O7z_I/AAAAAAAAAfw/kRST4T_jaCU/s320/samba-connect-prompt.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5307353827806793714" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_GffqQSsYpkk/SaeBjvWhUdI/AAAAAAAAAfg/SpDOnGtbl4A/s1600-h/hooray-connected.png"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 400px; height: 193px;" src="http://1.bp.blogspot.com/_GffqQSsYpkk/SaeBjvWhUdI/AAAAAAAAAfg/SpDOnGtbl4A/s400/hooray-connected.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5307353136885748178" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4741864931549758726-686049789952239515?l=data-analytics-tools.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://data-analytics-tools.blogspot.com/feeds/686049789952239515/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4741864931549758726&amp;postID=686049789952239515' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4741864931549758726/posts/default/686049789952239515'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4741864931549758726/posts/default/686049789952239515'/><link rel='alternate' type='text/html' href='http://data-analytics-tools.blogspot.com/2009/02/four-simple-steps-to-secure-samba.html' title='Four Simple Steps to a Secure Samba Server'/><author><name>miked98</name><uri>http://www.blogger.com/profile/02261350400113392769</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://2.bp.blogspot.com/_GffqQSsYpkk/SqXuzznf7QI/AAAAAAAAAoA/5Ah6uX9I7G0/S220/md_bw2_GLG_png.png'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_GffqQSsYpkk/SaeDDE4scbI/AAAAAAAAAgA/EwuFvG1Io2c/s72-c/alt-run-samba.png' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4741864931549758726.post-7072645657208984758</id><published>2008-10-10T12:07:00.000-07:00</published><updated>2009-02-26T22:24:23.465-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MVC'/><category scheme='http://www.blogger.com/atom/ns#' term='Krasner'/><category scheme='http://www.blogger.com/atom/ns#' term='django'/><category scheme='http://www.blogger.com/atom/ns#' term='python'/><category scheme='http://www.blogger.com/atom/ns#' term='MTV'/><title type='text'>Why Django is not a pure MVC framework</title><content type='html'>In the acronym jungle of web development, much has been made of the Model-View-Controller (MVC) design pattern.   It's lauded as one of Ruby on Rails' strengths and has clearly influenced Django's design.&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_GffqQSsYpkk/SO_6JuveZII/AAAAAAAAAZ8/eJ-LjMu1fNY/s1600-h/Krasner_MVC.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://2.bp.blogspot.com/_GffqQSsYpkk/SO_6JuveZII/AAAAAAAAAZ8/eJ-LjMu1fNY/s400/Krasner_MVC.jpg" border="0" alt="" id="BLOGGER_PHOTO_ID_5255694335237579906" /&gt;&lt;/a&gt;&lt;div&gt;Django's creators, Holovaty and Kaplan-Moss, write in &lt;a href="http://www.djangobook.com/en/1.0/chapter01/"&gt;Chapter 1 of &lt;span class="Apple-style-span" style="font-style: italic;"&gt;The Definitive Guide to Django&lt;/span&gt;&lt;/a&gt; that:&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style=" line-height: 18px; font-family:georgia;"&gt;&lt;blockquote&gt;&lt;span class="Apple-style-span" style="color: rgb(102, 0, 0);"&gt;Simply put, MVC defines a way of developing software so that the code for defining and accessing data (the model) is separate from request routing logic (the controller), which in turn is separate from the user interface (the view).&lt;/span&gt;&lt;/blockquote&gt;&lt;/span&gt;&lt;div&gt;&lt;div&gt;This didn't make intuitive sense to me  -- shouldn't the "user interface" be considered the "controller"?  And there's no mention of templates here at all-- a fundamental component of Django.  Those seemed most naturally part of the "view" in Django. &lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Looking for some clarity, I went back and read a highly-cited 1988 paper by Glenn Krasner and Stephen Pope (where the above image derives from), entitled &lt;span class="Apple-style-span" style="font-style: italic;"&gt;"A cookbook for using the model-view controller user interface paradigm in Smalltalk-80"&lt;/span&gt;, where they describe the MVC concept as it was originally implemented.   They write:&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;blockquote&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:large;"&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="color: rgb(102, 0, 0);"&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;Models&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="color: rgb(102, 0, 0);"&gt;The model of an application is the domain-specific software simulation or implementation of the application's central structure.  This can be as simple as an integer (as the model of a counter) or string (as the model of a text editor), or it can be a complex object that is an instance of a subclass [...]&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="color: rgb(102, 0, 0);"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:large;"&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="color: rgb(102, 0, 0);"&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;Views&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="color: rgb(102, 0, 0);"&gt;In this metaphor, views deal with everything graphical; they request data from their model, and display the data. [...]&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="color: rgb(102, 0, 0);"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:large;"&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="color: rgb(102, 0, 0);"&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;Controllers&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="color: rgb(102, 0, 0);"&gt;Controllers contain the interface between their associated models and views and the input devices (e.g. keyboard, pointing device, time).  Controllers also deal with scheduling interactions with other view-controller pairs [...].&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;/blockquote&gt;&lt;div&gt;A-ha!  This makes more sense.  The views handle how data looks, and the user interface is the controller -- in contrast to what Django's creators stated above.  &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;However, thinking I had stumped the Django folks, I came across this &lt;a href="http://www.djangobook.com/en/1.0/chapter05/"&gt;concession in Chapter 5 of &lt;span class="Apple-style-span" style="font-style: italic;"&gt;The Definitive Guide to Django&lt;/span&gt;&lt;/a&gt;&lt;a href="http://www.djangobook.com/en/1.0/chapter01/"&gt;:&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;blockquote&gt;&lt;span class="Apple-style-span" style="line-height: 18px; "&gt;&lt;span class="Apple-style-span" style="color: rgb(102, 0, 0);"&gt;If you’re familiar with other MVC Web-development frameworks, such as Ruby on Rails, you may consider Django views to be the “controllers” and Django templates to be the “views.” &lt;/span&gt;&lt;/span&gt;&lt;/blockquote&gt;&lt;span class="Apple-style-span"   style="  line-height: 18px; font-family:Verdana;font-size:12px;"&gt;&lt;/span&gt;Indeed, I think this is an unfortunate accident, a minor (yet confusing) sin of semantics on Django's part.  Quite clearly MVC controllers are Django views, and MVC views are Django templates.  This understanding (and ordering) is reflected in the &lt;a href="http://jeffcroft.com/blog/2007/jan/11/django-and-mtv/"&gt;acronym "MTV" -- Model-Template-View --&lt;/a&gt; that some have used to describe Django.&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4741864931549758726-7072645657208984758?l=data-analytics-tools.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://data-analytics-tools.blogspot.com/feeds/7072645657208984758/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4741864931549758726&amp;postID=7072645657208984758' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4741864931549758726/posts/default/7072645657208984758'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4741864931549758726/posts/default/7072645657208984758'/><link rel='alternate' type='text/html' href='http://data-analytics-tools.blogspot.com/2008/10/why-django-is-not-pure-mvc-framework.html' title='Why Django is not a pure MVC framework'/><author><name>miked98</name><uri>http://www.blogger.com/profile/02261350400113392769</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://2.bp.blogspot.com/_GffqQSsYpkk/SqXuzznf7QI/AAAAAAAAAoA/5Ah6uX9I7G0/S220/md_bw2_GLG_png.png'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_GffqQSsYpkk/SO_6JuveZII/AAAAAAAAAZ8/eJ-LjMu1fNY/s72-c/Krasner_MVC.jpg' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4741864931549758726.post-2734361835319731148</id><published>2008-10-10T11:45:00.000-07:00</published><updated>2008-10-10T18:07:03.178-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='web frameworks'/><category scheme='http://www.blogger.com/atom/ns#' term='django'/><category scheme='http://www.blogger.com/atom/ns#' term='web2py'/><category scheme='http://www.blogger.com/atom/ns#' term='python'/><category scheme='http://www.blogger.com/atom/ns#' term='pylons'/><category scheme='http://www.blogger.com/atom/ns#' term='turbogears'/><category scheme='http://www.blogger.com/atom/ns#' term='zope'/><title type='text'>Python web frameworks -- who's winning?</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_GffqQSsYpkk/SO_8BmCqg6I/AAAAAAAAAaE/YtHG-2c_DPA/s1600-h/python_web_frameworks.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://2.bp.blogspot.com/_GffqQSsYpkk/SO_8BmCqg6I/AAAAAAAAAaE/YtHG-2c_DPA/s400/python_web_frameworks.jpg" border="0" alt="" id="BLOGGER_PHOTO_ID_5255696394486449058" /&gt;&lt;/a&gt;&lt;div&gt;After reading Jeff Croft's &lt;a href="http://jeffcroft.com/blog/2008/sep/06/back-great-frameworks-debate/"&gt;back to the great frameworks debate&lt;/a&gt; post last week, and as someone who is wading into the Django framework - I decided to look at the hard evidence for which Python web frameworks are gaining currency, at least in the eyes of Google.  I compared worldwide search volume for django, zope, web2py, and turbogears.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The result is pictured -- and it turns out that Django is at the top and gaining.  The recent release of the 1.0 version of the Django, as well as getting &lt;a href="http://code.google.com/p/google-app-engine-django/"&gt;an explicit nod of approval&lt;/a&gt; from Guido van Rossum over at Google's App Engine certainly helps.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Alas, Python's Django is a long way from having the kind of mindshare that Ruby on Rails still enjoys, but as Robert F. Kennedy said about politics- what's right isn't always popular, and what's popular isn't always right.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4741864931549758726-2734361835319731148?l=data-analytics-tools.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://data-analytics-tools.blogspot.com/feeds/2734361835319731148/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4741864931549758726&amp;postID=2734361835319731148' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4741864931549758726/posts/default/2734361835319731148'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4741864931549758726/posts/default/2734361835319731148'/><link rel='alternate' type='text/html' href='http://data-analytics-tools.blogspot.com/2008/10/python-frameworks-whos-winning.html' title='Python web frameworks -- who&apos;s winning?'/><author><name>miked98</name><uri>http://www.blogger.com/profile/02261350400113392769</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://2.bp.blogspot.com/_GffqQSsYpkk/SqXuzznf7QI/AAAAAAAAAoA/5Ah6uX9I7G0/S220/md_bw2_GLG_png.png'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_GffqQSsYpkk/SO_8BmCqg6I/AAAAAAAAAaE/YtHG-2c_DPA/s72-c/python_web_frameworks.jpg' height='72' width='72'/><thr:total>0</thr:total></entry></feed>
