Overview "Scatter"
- Summary: Showing dependencies between two (three) dimensions
- Number of Dimensions: 2 plus 1 encoded by color
- Data Types: Numerical, Nominal, Dates
The plotting facility of RapidMiner is certainly one of its strongest parts. In total, several dozen plotters for data, models, and weights are provided and allow the interactive inspection of your data analysis results. In many data mining projects, an explorative approach is the first step towards the understanding of the data and the problem at hand. The visualization of high-dimensional data sets and models hence is an important part and for exactly this reason we decided to incorporate those powerful plotters into RapidMiner.
We got a lot of requests of RapidMiner users who wonder what is the exact meaning of the options of the different plotters. Although some are pretty simple and self-explanatory, others are harder to understand. Today we will start a series of blog entries which will describe all available plotters with its options in order to allow for deeper insights into your data and models. From time to time we will add a new blog entry until all plotters are covered here.
The first plotter is one of the most simple ones: the Scatter plot. It might look simple at a first glance, but since several of its options are also part of the other plotters I consider those as very important. First, let's have a look on the plotter:
The scatter plot is a simple two-dimensional plot with two axes: x and y. The x-axis is plotted horizontally and the y-axis vertically. If you plot a data set, each point will be located at the position which corresponds to the values with respect to those two axes.
The axes and the data points are plotted on the right part of the plotter, on the left you can see the plotter controls which are located there for all plotters and provide all options for the different types of plotters.
The first option is called Plotter. You can select the type of the plotter here. In this example, the type is Scatter - we will cover the other plotter types in future posts. Directly below the Plotter-option you will find two boxes where you can select the attribute (variable, dimension) of your data set or model which should be used for the x-Axis and for the y-Axis. Those two options both have to be set, the plotter will not show anything otherwise. By the way, you can use numerical attributes as well as nominal attributes for the axes. Even date attributes are supported!
As you can see, you can also identify if the selected attribute should be transformed on a log scale. Just check the box below the corresponding axis.
The next option is called Color Column. If you select an attribute of your data or model here, the values of this attribute will be used for determining the color of each of the data points. It does not matter if the selected column is numerical or nominal, both scenarios will work. Below you will find an image showing the well-known Iris data set where the class (the label) was chosen as color column:
You can see the an additional legend is now shown at the top of the plot indicating the meaning of the used colors. In case of a numerical color column, the legend shows the colors together with the minimum and the maximum values.
You might have noticed the Jitter option. This option is quite useful if several data points are located at the same point in the two-dimensional space. Just move around the jitter slider and look what's happening: the points are moving a bit to a random direction showing if and which points are lying below.
The last two options are pretty simple: Rotate Labels causes that the labels of the x-Axis are rotated by 90 degrees. Especially if you use a nominal attribute for the x-axis, the values can then be easily read. Export Image opens a dialog which allows you to export the current plotter with all its settings into one of the dozens supported image formats.
Zooming and Panning
In general, you can zoom into a RapidMiner plotter by dragging a zomming rectangle indicating which part of the plot should bedrawed to a larger scale. The zooming rectangle is indicated by a blueish rectangle like the one in the following picture:
In order to zoom in, please drag the rectangle from the top left to the bottom right, in order to zoom out, please simply drag a rectangle in the opposite direction, i.e. from the bottom right to somewhere upper left. Simply try this, you will quickly get used to this.
If you have zoomed in, you probably want to move around in order to watch other parts of the plot. This movage in a plot is called panning and can be done by holding down the CTRL key while dragging the mouse. The plot will then be moved into the direction you dragged the mouse to.
I hope that you like this series of plotter explanations. Please let me know what you think and if we should continue with this!