I do a lot of basic manipulation of data, so here are two built-in functions ( transform , subset ) and one library ( sqldf ) that I use daily (this post originally appeared as an answer I gave on StackOverflow).
create sample sales data
sales <- expand.grid(country = c('USA', 'UK', 'FR'), product = c(1, 2, 3)) sales$revenue <- rnorm(dim(sales)[1], mean=100, sd=10) > sales country product revenue 1 USA 1 108.45965 2 UK 1 97.07981 3 FR 1 99.66225 4 USA 2 100.34754 5 UK 2 87.12262 6 FR 2 112.86084 7 USA 3 95.87880 8 UK 3 96.43581 9 FR 3 94.59259
use transform() to add a column
## transform currency to euros usd2eur <- 1.434 transform(sales, euro = revenue * usd2eur) > country product revenue euro 1 USA 1 108.45965 155.5311 2 UK 1 97.07981 139.2125 3 FR 1 99.66225 142.9157 ...
use subset() to slice the data
subset(sales, country == 'USA' & product %in% c(1, 2), select = c('product', 'revenue')) > product revenue 1 1 108.4597 4 2 100.3475
use sqldf() to slice and aggregate with SQL
The sqldf package provides an SQL interface to R data frames
## recast the previous subset() expression in SQL sqldf('SELECT product, revenue FROM sales \ WHERE country = "USA" \ AND product IN (1,2)') > product revenue 1 1 108.4597 2 2 100.3475
Perform an aggregation or GROUP BY
sqldf('select country, sum(revenue) revenue \ FROM sales \ GROUP BY country') > country revenue 1 FR 307.1157 2 UK 280.6382 3 USA 304.6860
For more sophisticated map-reduce-like functionality on data frames, check out the plyr package. And if find yourself wanting to pull your hair out, I recommend checking out Data Manipulation with R.
No comments:
Post a Comment