# Difference between revisions of "Principal component analysis"

Line 33: | Line 33: | ||

==Computation of PCA== | ==Computation of PCA== | ||

− | + | ===Eigenvalues=== | |

The cross-correlation matrix is diagonalized, producing a set eigenvalues which should decay to zero (the slower the decay, the more eigenvolumes will be relevant). This computation occurs very fast. | The cross-correlation matrix is diagonalized, producing a set eigenvalues which should decay to zero (the slower the decay, the more eigenvolumes will be relevant). This computation occurs very fast. | ||

− | + | ===Eigenvolumes=== | |

To each eigenvalue an eigenvector is attached. Eigenvectors are called ''[[eigenvolumes]]'' in this context. | To each eigenvalue an eigenvector is attached. Eigenvectors are called ''[[eigenvolumes]]'' in this context. | ||

Note that they will be only defined inside the classification mask attached to the classification. | Note that they will be only defined inside the classification mask attached to the classification. | ||

− | + | ===Eigencomponents=== | |

{{main|Eigentable|Eigentable}} | {{main|Eigentable|Eigentable}} | ||

Also a time consuming step (although much less intensive than the computation of the ccmatrix). Each particle is compared to each eigenvolume. | Also a time consuming step (although much less intensive than the computation of the ccmatrix). Each particle is compared to each eigenvolume. |

## Latest revision as of 18:03, 28 March 2020

In general, a Principal Component Analysis (PCA) aims at analyzing a data set and discovering a set of coordinates that capture the most representative features of said data. Often the term *PCA classification* is loosely used. PCA is not a classification method: classification itself is performed on the features extracted through PCA.

In *Dynamo*, the PCA is the process of finding a reduced set of "eigenvolumes" that allow to approximatively represent each particle in our data set as a combination of these eigenvolumes. Which this representation, a generic particle can be represented by the contributions of each "eigenvolume" to the particle, i.e., by a set of "eigencomponents", normally in a number no much higher than 20.

Once the particles are represent by small sets of scalars, they can be classified with standard methods like k-means.

## Contents

# Operative steps

PCA classifications are most easily handled through *classification wrokrkflows*. These projects can be controled through GUIs or the command line

In whichever way you control the classification project, operatively a PCA based classification will require the completion of these steps:

- Selecting the input
- a data folder, a table, a mask
- Computing a cross-correlation matrix
- Computing the eigenvalues, eigenvolumes and eigencomponents
- Using the eigencomponents to create a classification.

## Input

PCA is computed on a set of aligned particles. Thus, you need a data folder and a table that describes the alignment. In the most common case, you want to focus the classification in a region of the box, so that you need a classification mask.

Additionally, there are some fine tuning parameters that can be passed: particles can be symmetrized, resized or bandpassed.

## Computation of cross-correlation matrix

### Computation of cross-correlation matrix

*Main article: Cross correlation matrix*

All the aligned particles are compared to each other through cross correlation. This produces an NxN matrix for a set of N matrix. This is typically the most time consuming part of the PCA workflow.

## Computation of PCA

### Eigenvalues

The cross-correlation matrix is diagonalized, producing a set eigenvalues which should decay to zero (the slower the decay, the more eigenvolumes will be relevant). This computation occurs very fast.

### Eigenvolumes

To each eigenvalue an eigenvector is attached. Eigenvectors are called *eigenvolumes* in this context.
Note that they will be only defined inside the classification mask attached to the classification.

### Eigencomponents

*Main article: Eigentable*

Also a time consuming step (although much less intensive than the computation of the ccmatrix). Each particle is compared to each eigenvolume.