Linear Projection

Various linear projection methods with explorative data analysis and intelligent data visualization enhancements.

Signals

Inputs:
  • Examples (ExampleTable)

    Input data set.

  • Example Subset (ExampleTable)

    A subset of data instances from Examples.

  • Attribute Selection List

    List of attributes to be shown in the visualization.

Outputs:
  • Selected Examples (ExampleTable)

    A subset of examples that user has manually selected from the scatterplot.

  • Unselected Examples (ExampleTable)

    All other examples (examples not included in the user’s selection).

  • Attribute Selection List

    List of attributes used in the visualization.

Warning: this widget combines a number of visualization methods that are currently in research. Eventually, it will break down to a set of simpler widgets, each implementing its own method.

Description

This widget provides an interface to a number of linear projection methods that all deal with class-labeled data and aim at finding the two-dimensional projection where instances of different classes are best separated. Consider, for a start, a projection of a zoo.tab data set (animal species and their features) shown below. Notice that it is breast-feeding (milk) and hair that nicely characterizes mamals from the other organisms, and that laying eggs is something that birds do. This specific visualization was obtained using FreeViz ([1]), while the widget also implements an interface to supervised principal component analysis ([2]), partial least squares (for a nice introduction, see [3]), and RadViz visualization and associated intelligent data visualization technique called VizRank ([4])

Lienar Projection on zoo data set

Projection search methods are invoked from Optimization Dialogs in the Main tab. Other controls in this tab and controls in the Settings tab are just like those with other visualization widgets; please refer to a documentation of Scatter Plot widget for further information.

FreeViz screen shot

FreeViz button in Main tab opens a dialog from which four different methods are accessed. The first one is FreeViz, which uses a paradigm borrowed from particle physics: points in the same class attract each other, those from different class repel each other, and the resulting forces are exerted on the anchors of the attributes, that is, on unit vectors of each of the dimensional axis. The points cannot move (are projected in the projection space), but the attribute anchors can, so the optimization process is a hill-climbing optimization where at the end the anchors are placed such that forces are in equilibrium. The FreeViz optimization dialog is used to invoke the optimization process (Optimize Separation) or execute a single step of optimization (Single Step). The result of the optimization may depend on the initial placement of the anchors, which can be set in a circle, arbitrary or even manually (Set anchor positions). The later also works at any stage of optimization, and we recommend to play with this option in order to understand how a change of one anchor affects the positions of the data points. Controls in Forces box are used to set the parameters that define the type of the forces between the data points (see [1]). In any linear projection, projections of unit vector that are very short compared to the others indicate that their associated attribute is not very informative for particular classification task. Those vectors, that is, their corresponding anchors, may be hidden from the visualization using controls in Show anchors box.

The other two, quite prominent visualization methods, are accessible through FreeViz’s Dimensionality Reduction tab (not shown here). These includes supervised principal component analysis and partial least squares method. The general objection of these two approaches is the same as for FreeViz (find a projection that separates data instances of different class), but the results - because of different optimization methods and differences in their bias - may be quite different.

The fourth projection search technique that can be accessed from this widget is VizRank search algorithm with RadViz visualization ([4]). This is essentially the same visualization and projection search method as implemented in Radviz.

Like other point-based visualization widget, Linear Projection also includes explorative analysis functions (selection of data instances and zooming). See documentation for Scatter Plot widget for documentation of these as implemented in Zoom / Select toolbox in the Main tab of the widget.

References

[1](1, 2) Demsar J, Leban G, Zupan B. FreeViz-An intelligent multivariate visualization approach to explorative analysis of biomedical data. J Biomed Inform 40(6):661-71, 2007.
[2]Koren Y, Carmel L. Visualization of labeled data using linear transformations, in: Proceedings of IEEE Information Visualization 2003 (InfoVis‘03), 2003. PDF
[3]Boulesteix A-L, Strimmer K (2006) Partial least squares: a versatile tool for the analysis of high-dimensional genomic data, Briefings in Bioinformatics 8(1): 32-44. Abstract
[4](1, 2) Leban, G., B. Zupan, et al. (2006). “VizRank: Data Visualization Guided by Machine Learning.” Data Mining and Knowledge Discovery 13(2): 119-136.