A few days ago a friend asked me the following question: how to efficiently extract some specific lines from a large text file, possibily compressed by Gzip? He mentioned that he tried some R functions such as read.table(skip = ...), but found that reading the data was too slow. Hence he was looking for some alternative ways to extracting the data.
This is a common task in preprocessing large data sets, since in data exploration, very often we want to peek at a small subset of the whole data to gain some insights.
It has been one year since my last article, and here is a quick post indicating that my blog is not down. Instead, it has a new look thanks to blogdown. Yes, pun intended. :-)
blogdown, mostly written by Yihui, is an R package that can help you rapidly create a static blog or website. The package name has nothing to do with the status of a website (as in “the server is down”), but rather follows the convention of other Markdown-based packages such as rmarkdown and bookdown.
Have you ever tried to find a lightweight yet nice theme for the R Markdown documents, like this page?
Themes for R Markdown With the powerful rmarkdown package, we could easily create nice HTML document by adding some meta information in the header, for example
--- title: Nineteen Years Later author: Harry Potter date: July 31, 2016 output: rmarkdown::html_document: theme: lumen --- The html_document engine uses the Bootswatch theme library to support different styles of the document.
A Quick View of Recommender System The main task of recommender system is to predict unknown entries in the rating matrix based on observed values, as is shown in the table below:
Each cell with number in it is the rating given by some user on a specific item, while those marked with question marks are unknown ratings that need to be predicted. In some other literatures, this problem may be named collaborative filtering, matrix completion, matrix recovery, etc.
Introduction I have seen several conversations in Rcpp-devel mailing list asking how to compute numerical integration or optimization in Rcpp. While R in fact has the functions Rdqags, Rdqagi, nmmin, vmmin etc. in its API to accomplish such tasks, it is not so straightforward to use them with Rcpp.
For my own research projects I need to do a lot of numerical integration, root finding and optimization, so to make my life a little bit easier, I just created the RcppNumerical package that simplifies these procedures.
In January 2016, I was honored to receive an “Honorable Mention” of the John Chambers Award 2016. This article was written for R-bloggers, whose builder, Tal Galili, kindly invited me to write an introduction to the rARPACK package.
A Short Story of rARPACK Eigenvalue decomposition is a commonly used technique in numerous statistical problems. For example, principal component analysis (PCA) basically conducts eigenvalue decomposition on the sample covariance of a data matrix: the eigenvalues are the component variances, and eigenvectors are the variable loadings.
This semester I’m taking a course in big data computing using Scala/Spark, and we are asked to finish a course project related to big data analysis. Since statistical modeling heavily relies on linear algebra, I investigated some existing libraries in Scala/Java that deal with matrix and linear algebra algorithms.
1. Set-up Scala/Java libraries are usually distributed as *.jar files. To use them in Scala, we can create a directory to hold them and set up the environment variable to let Scala know about this path.
Thanks to the issue report by yufree and Yihui’s kind work, from version 1.6.10 (development version), knitr starts to support using showtext to change fonts in R plots. To demonstrate its usage, this document itself serves as an example. (Rmd source code)
We first do some setup work, mainly about setting options that control the appearance of the plots. Notice that if you create plots in PNG format (the default format for HTML output), it is strongly recommended to use the CairoPNG device rather than the default png, since the latter one could produce quite ugly plots when using showtext.
Today I gave a presentation for GSO(Graduate Student Organization) of our department, mainly about the idea of dynamic document and its implementation using knitr.
Here are the slides I showed in the talk, written with Markdown and knitr.
This is a pretty old topic in R graphics. A classical article in R NEWS, Non-standard fonts in PostScript and PDF graphics, describes how to use and embed system fonts in the PDF/PostScript device. More recently, Winston Chang developed the extrafont package, which makes the procedure much easier. A useful introduction article can be found in the readme page of extrafont, and also from the Revolution blog.
Now, we have another choice: the showtext package.