Distributed Statistical Model Fitting with ADMM Algorithm

Content

What is ADMM?
Distributed Computing with ADMM
Implementation in Scala/Spark
Computing on Real Data

What is ADMM?

Full name: Alternating Direction Method of Multipliers
An algorithm for constrained convex optimization problems
Highlighted features:
- Provides easy-to-apply solutions to many statistical models
- Can be easily incorporated into the distributed computing framework
- A good friend of statistical models

The Problem

ADMM is used to solve the following optimization problem:

and are vectors to be optimized
, and are known coefficient matrices/vectors
and are convex functions
Looks bizarre? Let's see two familiar examples

Example — Lasso

Solve
With constraint

Example — Median Regression (Least Absolute Deviations)

Solve
With constraint

How ADMM Works?

can be any positive number
Usually and are easier to compute than the original problem

Examples

Lasso

Median Regression

where

How to Parallelize?

Statistical Models

Usually statistical models have the following form:
Here are partitions of the neg-log-likelihood function: If sample is cut into partitions, the log-likelihood can be written as the summation of parts
can be some penalty terms or prior information
Models of this form can be easily parallelized

Parallelized ADMM

The previous model can be rewritten as

The iteration steps are

Parallelized ADMM

A diagram of P-ADMM

Implementation in Scala/Spark

ADMM-Spark

https://github.com/yixuan/ADMM-Spark
Currently used to solve logistic lasso regression
Can be easily extended to general penalized likelihood model

Structure

Abstract class

class PADMML1
For general L1-penalized likelihood model
Implementing the following iteration algorithm
-step is undefined (abstract)

Derived class

class PLogisticLasso extends PADMML1
Solves sparse logistic regression
Only needs to implement the function for -step
Calls the solver of logistic ridge regression

Solver class

class LogisticRidge
class LogisticRidgeNative
Solves the problem
LogisticRidge: pure Java code
LogisticRidgeNative: using C++ code to speed up

Example Usage

import statr.stat598bd.PLogisticLasso

// Create RDD objects x and y representing
// the data matrix and response vector
// ...

val model = new PLogisticLasso(x, y, sc)
model.set_lambda(2.0)
model.set_opts(max_iter = 500,
               eps_abs = 1e-3,
               eps_rel = 1e-3,
               logs = true)
model.run()

val beta = model.coef

Real Data Computing

Tradeshift Data

Obtained from the Kaggle competition (https://www.kaggle.com/c/tradeshift-text-classification)
Consists of thousands of documents, in which words are detected and combined to form text blocks

1.7 million rows (text blocks) / 145 variables

Text Classification

Classify each text block to one or more labels (33 in total), such as date, invoice number or monetary amount

We consider the 33rd label

Features

Numbers: Continuous/discrete numerical values
Boolean: The values include YES (true) or NO (false)
Encrypted text

Overall Flow

Preprocess the data by removing text and missing values, and encoding YES/NO values to 1/0
Store the cleaned data to HDFS
Read data from Spark and randomly split into training set (99%) and tuning set (1%)
Standardize the training set
Consider 10 values ranging from to , equally spaced in the log scale

Overall Flow - Cont.

Fit a logistic lasso regression for each , and calculate the log-likelihood on tuning set
Select the “optimal”
Obtain the final regression coefficient based on the selected

Timings

With 10 / 20 Executors
Cleaning data: 41.3 s / 41.3 s
Reading data in Spark: 23.6 s / 17.6 s
Data standardization: 7.9 s / 7.7 s
Setting up model: 9.4 s / 6.6 s
Model fitting:

Timings

Coding time: FOREVER

“Multithreading: Theory and Practice”

(Image from http://wrathematics.github.io/RparallelGuide/)

References

Boyd, S., Parikh, N., Chu, E., Peleato, B., & Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine Learning, 3(1), 1-122.
Minka, T. P. (2003). Algorithms for maximum-likelihood logistic regression.
Spark documentation
Breeze documentation
Java Native Interface (JNI)

Distributed Statistical Model Fitting with ADMM Algorithm

Content

What is ADMM?

What is ADMM?

The Problem

Example — Lasso

Example — Median Regression (Least Absolute Deviations)

How ADMM Works?

Examples

How to Parallelize?

Statistical Models

Parallelized ADMM

Parallelized ADMM

Implementation in Scala/Spark

ADMM-Spark

Structure

Abstract class

Derived class

Solver class

Example Usage

Real Data Computing

Tradeshift Data

Text Classification

Features

Overall Flow

Overall Flow - Cont.

Timings

Timings

References

Thank you!