# EMPCA notes

#### 16 Mar 2019

Compare nipals/empca with no missing Compare nipals/empca with missing, equal weight Compare nipals/empca with missing, unequal weight

## Complete data

# Python - Coeff (scores)
[[-2.809  0.097  0.244  0.050]
[-1.834  0.286  0.010 -0.135]
[-0.809  0.963 -0.341  0.078]
[-0.155 -1.129  0.548  0.026]
[0.707  -0.723 -0.736 -0.024]
[1.830  -0.290 -0.157  0.030]
[3.070   0.796  0.431 -0.026]]

# m1e <- empca(x=B1, w=B1wt, ncomp=4)
# Un-sweep the eigenvalues to compare to python results
# R round( sweep( m1e$scores, 2, m1e$eig, "*"), 3)
PC1    PC2    PC3    PC4
G1 -2.809  0.097 -0.244  0.050
G2 -1.834  0.286 -0.010 -0.135
G3 -0.809  0.963  0.341  0.078
G4 -0.155 -1.129 -0.548  0.026
G5  0.707 -0.723  0.736 -0.024
G6  1.830 -0.290  0.157  0.030
G7  3.070  0.796 -0.431 -0.026

# Matlab - P (scores)
0.5590   0.0517   0.2210   0.2910
0.3650   0.1520   0.0095  -0.7840
0.1610   0.5120  -0.3080   0.4530
0.0309  -0.6010   0.4950   0.1510
-0.1410  -0.3850  -0.6640  -0.1380
-0.3650  -0.1540  -0.1420   0.1760
-0.6110   0.4230   0.3890  -0.1490

# R round(m1e$scores, 3) PC1 PC2 PC3 PC4 G1 -0.559 -0.052 0.221 -0.291 G2 -0.365 -0.152 0.009 0.784 G3 -0.161 -0.512 -0.308 -0.453 G4 -0.031 0.601 0.495 -0.151 G5 0.141 0.385 -0.664 0.138 G6 0.365 0.154 -0.142 -0.176 G7 0.611 -0.423 0.389 0.149 ## Missing data # Python with initial Identity matrix [[2.791 0.125 0.325 -0.035] [1.528 -0.989 -0.211 0.172] [0.990 -0.651 -0.117 -0.186] [0.159 1.463 0.530 0.020] [-0.628 0.862 -0.730 -0.032] [-1.738 0.406 -0.139 -0.071] [-2.917 -0.712 0.520 -0.047]] Eigvec (loadings) [[-0.309 -0.839 -0.298 0.300] [-0.502 0.014 0.154 -0.615] [-0.470 -0.086 0.766 0.219] [-0.441 0.521 -0.236 0.615] [-0.487 0.128 -0.496 -0.326]] # R R> m2e <- empca(x=B2, w=B2wt, ncomp=4, seed=NULL) # # Un-sweep the eigenvalues to compare to python results R> round( sweep( m2e$scores, 2, m2e$eig, "*"), 3) PC1 PC2 PC3 PC4 G1 -2.791 0.216 -0.356 0.066 G2 -1.528 -0.942 0.187 -0.150 G3 -0.990 -0.620 0.101 0.207 G4 -0.159 1.472 -0.522 -0.032 G5 0.628 0.844 0.744 0.021 G6 1.738 0.351 0.161 0.050 G7 2.917 -0.808 -0.493 0.019 R> round( m2e$loadings, 3)
PC1    PC2    PC3    PC4
E1 0.309 -0.839  0.298 -0.300
E2 0.502  0.014 -0.154  0.615
E3 0.470 -0.086 -0.766 -0.219
E4 0.441  0.521  0.236 -0.615
E5 0.487  0.128  0.496  0.326


## Python

Python code by Bailey (2012), retrieved 1 Mar 2019 from https://github.com/sbailey/empca .

The Python code is difficult to read in places for a person [like me] not well-versed with Python. Three examples:

1. It is not clear what values k takes in for k in range(self.nvec).
2. Gram-Schmidt orthogonalization is accomplished with a pair of nested for loops instead of a function.
3. The Model class structure makes it a bit tricky to figure out what objects have actually been modified inside a function.

The Python code iterates these two EM steps:

1. Calculate the coefficient matrix C.
2. Calculate ALL ncomp principal components P simultaneously (iterate each to convergence). Orthogonalize P.

For a complete-data problem, python and R give similar results. Note the Coeff matrix in python does NOT have eigenvalues swept out of the columns.

For the missing-data problem, the python results are somewhat different from R.

## Matlab

Matlab code by Vicente Parot, retrieved 1 Mar 2019 from https://www.mathworks.com/matlabcentral/fileexchange/45353-empca.

The Matlab code feels similar to R.

The Matlab code calculates principal components sequentially, one at time. For each principal component, the algorithm iterates between these two steps:

1. Calculate C[,h]
2. Calculate P[,h]

While this is a type of EM algorithm, it is not the algorithm described by Bailey (2012) and is considered further.

Bailey, Stephen. 2012. “Principal Component Analysis with Noisy and/or Missing Data.” Publications of the Astronomical Society of the Pacific 124 (919): 1015–23.