Reading big files in R

Posted: April 12th, 2012 | Author: | Filed under: R, Statistics | No Comments »

As the lone statistician in my workplace, I end up introducing many people to R. After the inevitable pirate jokes, my coworkers who program in real languages (C++, Java, Python, PHP, etc.) ultimately end up complaining about R, which does a couple of things very well and a lot of things VERY poorly. Each has complained about data tables and reading data into R.

For those that don’t know, the default data type for a .csv is not an array or list, its a data table, which takes up far more memory than it should and converts all string to factors for easier use in regression. This is dumb and in my experience will 5x-10x the time it takes to read in the file and memory it takes up. For a quick fix, set stringsAsFactors=F. If you don’t have column headings, which I normally don’t, set header=F as well:

data = read.csv(“datafile.csv”,header=F,stringsAsFactors = F)