Extracting specific lines from a large (compressed) text file

May 27, 2018 in R, Programming

A few days ago a friend asked me the following question: how to efficiently extract some specific lines from a large text file, possibily compressed by Gzip? He mentioned that he tried some R functions such as read.table(skip = ...), but found that reading the data was too slow. Hence he was looking for some alternative ways to extracting the data. This is a common task in preprocessing large data sets, since in data exploration, very often we want to peek at a small subset of the whole data to gain some insights.

Using system fonts in R graphs

January 1, 2014 in Programming, R

This is a pretty old topic in R graphics. A classical article in R NEWS, Non-standard fonts in PostScript and PDF graphics, describes how to use and embed system fonts in the PDF/PostScript device. More recently, Winston Chang developed the extrafont package, which makes the procedure much easier. A useful introduction article can be found in the readme page of extrafont, and also from the Revolution blog. Now, we have another choice: the showtext package.

Handwriting recognition using R

December 18, 2011 in Programming, R, Statistics

This title is a bit exaggerating since handwriting recognition is an advanced topic in machine learning involving complex techniques and algorithms. In this blog I’ll show you a simple demo illustrating how to recognize a single number (0 ~ 9) using R. The overall process is that, you draw a number in a graphics device in R using your mouse, and then the program will “guess” what you have input. It is just for FUN.

Windows binary of RMySQL

October 22, 2011 in Programming, R

This binary package supports R 2.13.x (32-bit/64-bit) and MySQL 5.5.16 (32-bit/64-bit). RMySQL 0.8-0 for MySQL 5.5.16

How to run regression on large datasets in R

October 2, 2011 in Programming, R, Statistics

It’s well known that R is a memory based software, meaning that datasets must be copied into memory before being manipulated. For small or medium scale datasets, this doesn’t cause any troubles. However, when you need to deal with larger ones, for instance, financial time series or log data from the Internet, the consumption of memory is always a nuisance. Just to give a simple illustration, you can put in the following code into R to allocate a matrix named x and a vector named y.

Extracting specific lines from a large (compressed) text file

Using system fonts in R graphs

Handwriting recognition using R

Windows binary of RMySQL

How to run regression on large datasets in R

Yixuan Qiu