IN4086P Information visualisation

The aim of this Information Visualisation exercise is to become familiar with a set of standard visualisation methods for multvariate data. Although ParaView and the underlying VTK libraries are well suited for a variety of data types, the XmdvTool software package we will use is specifically geared towards a subset of information visualisation. XmdvTool is developed by the group of Matt Ward at Worcester Polytechnic Institute and can be downloaded at http://davis.wpi.edu/~xmdv .

Although the tool is cross-platform, for this exercise we will be using the v7.0 Windows-only version to have access to the latest-and-greatest features. This version is already installed on the lab-machines, so you might have to reboot the machine to log-in to Windows.. The most recent version v8.0 doesn't seem to have the option for multiple brushes.

Targeted time: one afternoon

Note: The XmdvTool provides nice html-based help files with a short explanation of all the options available. Some of the texts and images in the help-files are directly inlined or rewritten here. You can either choose the Help menu option to directly access this from the application or use a web-browser to browse them at the following location:

C:\Program Files\XmdvTool\bin\htmlhelp

.

During these exercises, assistants might ask you to replicate some steps and explain results. For each of these questions posed here try to adhere to the following points.

1. Getting to know XmdvTool

General introduction

XmdvTool allows users to examine multidimensional data. For this, it provides five different visualisation methods for multivariate data:

  1. Scatterplots
  2. Star Glyphs
  3. Parallel Coordinates
  4. (Dimensional Stacking)
  5. (Pixel-oriented Display)

In these exercises we will mainly concentrate on (1), (2) and (3).

We refer to these basic methods of N-D multidimensional visualizing as Flat Displays. To explore larger scale data sets, interactive hierarchical displays are provided upon the four basic displays, which are referred to as Hierarchical Displays. They are hierarchical scatterplots, glyphs, parallel coordinates, and dimensional stacking. In the hierarchical displays, all the data items in a data set is constructed into a hierarchical cluster tree.

Visualisation and interface basics

Before you start exploring, we open the small Iris dataset (File>>Open, CTRL-o), named iris.okc. This dataset was introduced by R. A. Fisher as an example for discriminant analysis. The data report four characteristics (sepal width, sepal length, petal width and petal length) of three species of Iris flower.

http://upload.wikimedia.org/wikipedia/commons/thumb/7/78/Petal-sepal.jpg/300px-Petal-sepal.jpg

The main data window contains a parallel coordinates flat display of the data points as polylines. You can interactively switch between display techniques, e.g. use the buttons on the right side of the screen to change the display. As an extra, you can use multiple display panels in seperate, auxiliary views (View>>Auxiliary Display). Using the zoom buttons (top row) you can navigate through data. Mouse-based interaction is also available for navigation, see the help file. You can select and reorder dimensions through the Manual Dimension tool.

Familiarize yourself with the viewing and navigation interface by reordering, browsing, zooming and scrolling through the various displays.

question.png

Do you notice any obvious "trends" in the data from just visualising?

Brushing

A major tool in XmdvTool for providing insights into multidimensional spatial relations is the flat brush, which allows users to perform operations on the data points which fall within a user-specified N-D subspace of the total space defined by the data. You can brush directly with the mouse (left-button, or SHIFT left button) and manipulate brush regions. Alternatively, you can access brush functionality from the various brush menus. In the Brush Toolbox you can define multiple brushes, which might be interesting when comparing multiple selections in N-D subspaces.

Read the main help file on the various flat display modes, and consider the strength and weaknesses of each. Read the Brushes help file, experiment with brush-drawing in the multiple views.

question.png

How do the brush changes in parallel coordinates affect the brush in scatterplot? And glyph mode?

question.png

Is there a set of observations with high sepal width, but low values for all other attributes?

question.png

Can you spot clusters of observations? Comment on the effectivity of the different visual representations (parallel coordinates, scatter plots, glyph mode) w.r.t. cluster detection.

2. A larger dataset

Load the Cars-dataset. For a short description of this dataset and its variables see http://davis.wpi.edu/~xmdv/datasets/cars.html. There are a lot of other datasets available via the Xmdv website, see http://davis.wpi.edu/~xmdv/datasets.html. You can also convert your own data. On the lab machines, some of the example datasets are stored in the following location: C:\Program Files\XmdvTool\data Use the various display options and brushes to give insight in the various relations.

question.png

Find one or more interesting relations between attributes, Find one or more clusters and outliers

Consider the following hypotheses:

  1. Cars with high MPG (Miles-per-gallon) will be mostly 4 cylinder cars with low weight and low acceleration.
  2. American cars are heavier than the Japanese.

question.png

Can you accept or refute these hypotheses based on these tools?

3. Hierarchical Displays

Hierarchical display modes can be used to visualize large scale data sets in a multi-resolution way approach. Here is a simple description of this approach:

First, Xmdv constructs a hierarchical cluster tree for a data set . Similar data items are grouped into clusters, similar clusters are grouped into larger clusters. Second, this hierarchy is visualized in the hierarchical display modes. To visualize a data cluster, a mean is used to indicate the average value of all the data items in the cluster, and a band is used to indicate the range of the cluster. You can visualize the data set in different level of details and highlight interesting clusters through structure-based brush.

Reload the Iris-dataset and try out the hierarchical displays (lower set of five buttons on the right of the screen). Open up the Structure-Based Brush for Hierarchical Displays toolbox to edit the brushes.

Please refer to Structure-Based Brush for Hier. Displays section in Brushes Menu help file to find out how to control the bands, change levels of details and interactively play with the hierarchical displays. See also the paper (PDF here) on Structured-based brushing for detailed information.

From the help file we see the following description: structure_brush_dialog.gif (a) Hierarchical tree frame (b) Contour corresponding to current level-of-detail (in brushed region, it is referred to as brushed cluster radius; in non-brushed region, it is referred to as non-brushed cluster radius) (c) Leaf contour approximates shape of hierarchical tree (d) Structure-based brush (e) Interactive brush handles (left handle and right handle) (f) Colormap legend for level-of-detail contour

Experiment with the structure-based brush and hierarchical displays to see nice clustering effects. Hint: reposition the brush handles such to cover the whole pyramid (so you brush the whole space). Enable the Showband tick box. If you now move the Brushed Cluster Radius directly with your mouse or via the slider from 0 to 1, observe the hierarchical parallel-axis, scatterplot and glyph displays to see how data points are clustered.

question.png

Can you find the same (three) clusters in the Iris set as you found in an earlier effort?

question.png

(extra) Look at the Dimension Reduction Tool. Can you see how the hierarchy built up in this dataset

You can read more about the dimension reduction tool in XMDV in this article.

4. Air Pollution Dataset

Load the no2.okc dataset from the in4086p data site. This data is a subset of 500 observations from a study where air pollution is related to traffic volume and meteorological variables. It was collected by the Norwegian Public Roads Administration measured at Alnabru in Oslo, Norway, between October 2001 and August 2003. (original source :http://lib.stat.cmu.edu/datasets/, Submitted by Magne Aldrin).

The response variable (column 1) consist of hourly values of the logarithm of the concentration of NO2 (particles), measured at Alnabru in Oslo, Norway, between October 2001 and August 2003. The predictor variables (columns 2 to 8) are the logarithm of the number of cars per hour, temperature 2 meter above ground (degree C), wind speed (meters/second), the temperature difference between 25 and 2 meters above ground (degree C), wind direction (degrees between 0 and 360), hour of day and day number from October 1. 2001.

question.png

Traffic:

question.png

Concentration NO2:

5. XmdvTool evaluation

question.png

Shortly evaluate the following:

Xmdv as a tool:

Four different flat visualization options:

Courses/in4086p/Exercises/Information (last edited 2010-11-25 11:00:56 by GerwinDeHaan)