Icasso documentation
About documentation: These web pages only give a general view to Icasso. The detailed documentation of the functions is found in the help texts in MATLAB, e.g.,
>> help icassoStruct
Introduction
Icasso is based on running
FastICA several times
(resampling). Icasso pools all the estimates together and forms
clusters bottom-up among them. The basic idea is that a tight cluster
of estimates is considered to be a candidate for including a "good"
estimate. A centroid of such cluster is considered a more reliable
estimates than any estimate from an arbitrary run. (Instead of an
average as a centroid, Icasso visualizes and returns a
centrotype from each cluster. This is the one of the original
estimates that is most similar to other estimates in the same
cluster. You can compute the average by using Icasso functions.)
The basic procedure
Icasso is a sequential procedure that is split into several phases
(functions). In general, Icasso consists of the following steps:
- Parameters for the estimation algorithm(s) are selected: e.g.,
for FastICA the estimation approach (symmetrical or deflatory),
contrast function, etc. The estimation is run N times using
the selected training parameters. Each time the data is bootstrapped
and/or the initial conditions of the estimation algorithm are changed.
- Mutual similarities between all the estimates are computed. As the
measure of similarity, we use the absolute value of the linear
correlation coefficient between the independent components. The
estimates are clustered according to their mutual
(dis)similarities. In principle, the clustering method can be freely
selected. We apply agglomerative clustering with average-linkage
criterion.
- The clustering is visualized as a dendrogram and a 2D plot. The
user investigates how dense the clusters are. The clustering of the
estimates is expected to yield information on the reliability
(robustness) of estimation. A compact cluster emerges when a similar
estimate repeatedly comes up despite of the randomization.
- The user can retrieve the estimates belonging to certain
cluster(s) for further analysis and visualization.
Read illustrative examples on using Icasso in the
publications.
Some parameters
Firstly, you have to select the parameters for FastICA. In
particular,
- the (reduced) data dimension (d) that may be less than the
original input data dimension (PCA dimension reduction is often
applied in FastICA) and
- the number of ICA estimates (m) extracted on each
resampling cycle.
are of interest here.
For Icasso you have to select also
- the resampling mode,
- the number of resampling cycles (N), and
- the number of estimate-clusters (L).
Resampling mode
Yon can use
- both a different random initial condition and resampling of the data (by bootstrapping) in each resampling cycle,
- different random initial condition for FastICA on each resampling cycle but keep the training data set fixed, or
- fixed initial condition in each cycle but bootstraps every time the data.
Number of resampling cycles (N)
Basically, the more cycles the better. However, Icasso uses currently
hierarchical clustering which causes a computational
bottleneck. Icasso can currently handle a moderate
total number of
estimates M, say, 1000-2000, and consequently, a moderate
number of resampling cycles (
N). For example, if you
extract 15 independent components at one resampling cycle 50
resamplings might be appropriate
M=
mN=15x50=750.
Number of ICA estimates (estimate-clusters) (L)
Often, ICA is performed so that the number of the components is the
same as the input data dimension (possibly
after PCA dimension
reduction)
m=d. If you use
L=
d=
m it means that you try to find
as many estimates as there are data dimensions - and the quality index
and centroid estimate for all of these.
The default in Icasso is
to set the number of estimate-clusters L=d.
In FastICA, you can extract less independent components than there are
dimesnsions in FastICA (m < d). In Icasso, you
can also freely select the number of estimate-clusters. For example,
you can run FastICA in the deflatory mode and extract, e.g., only one
component at each run but extract several "robust" estimates
after Icasso. You can also group the estimates to bigger or smaller
number of estimate-clusters. Interpreting the results is up to you.
Results
Sources, demixing matrix (W), and mixing matrix (A)
FastICA estimates the demixing matrix (
W). In the Icasso
procedure this is done several times, and the estimates are
clustered. Icasso returns a centroid (centrotype) estimate
W from each estimate-cluster. This should represent a more
reliable estimate than any single estimate from one run of
FastICA. You can also return
all estimates in a cluster by
using appropriate Icasso functions.
However, the computational results that Icasso give do usually not
represent a strictly orthogonal base in the whitened data space
since they are directly the natural centroids (centrotypes) of the
estimate-clusters. You have to orthogonalize the result in an
appropriate manner if necessary.
The mixing matrix A is a pseudoinverse of W and the
sources are returned by computing S=WX by using the
original data that is stored in Icasso data structure.
Estimate stability index (Iq)
Icasso returns a stability (quality) index (Iq) for each
estimate-cluster. This gives a rank for the corresponding ICA
estimate. In the ideal case of
m one-dimensional
independent components, the estimates are concentrated in
m compact
and close-to-orthogonal clusters. In this case the index to all
estimate-clusters is (very close) to one. The value drops when the
clusters grow wider and mix up.
R-index
R-index should be addressed only in exploratory work (if wish to
explore different clustering solutions). The R-index is a heuristic
Davies-Bouldin type relative measure for a "natural" number of
clusters.
Local minima of this index are "good" solutions in terms of having
mutually isolated "natural" clusters.
As any relative clustering
validity index The index is heuristic and should be used only as a
guideline. If the structure of the estimate space is complex,
this index is dubious.
Implementating the procedure using Icasso functions
See script
megdemo
for example.
First step (icassoEst
) is to compute randomized ICA
estimates N times from data X using function
icassoEst
. Output of this function (we will use variable
name sR
is called Icasso result data structure. It
logs all the methods and parameters used in the process, and the
results from the Icasso procedure. You can extract information from
this data structure either directly or by using functions
icassoResult
and icassoGet
.
The batch of Icasso functions that perform similarity computation, clustering
and the 2D projection are collected in icassoExp
Finally, you can explore the clustering and get the results by
launching icassoShow
. You can examine rel ationships between
estimates and clusters in detail.
Functions that start with string icasso
are main
functions: they use the Icasso result structure as input and/or
output.
icassoEst
- FastICA parameters and resampling
- [
icassoStruct
]
-
This is a subfunction automatically called by
icassoEst
.
However, its help text describes in detail the Icasso data
structure with reference to the Icasso process. This might be of interest to you if you are
going extract information directly from the data structure.
icassoExp
- Performs clustering and projections for visualization
icassoShow
- Visualizes and returns results
icassoResult
- Returns results
Function
icasso
implements
the basic procedure from resampling to visualization in one batch.
More functions and details of the Icasso process
Page maintained by webmaster@cis.hut.fi,
last updated Tuesday, 21-Dec-2010 15:55:12 EET