I have some proposals and questions.

1) I suggest that “Cumulative Variance Plot” in SVD and PCA should use “Cumulative Variance” (with values scaled into the range from 0% to 100%) instead of “Proportion of Variance”. The reasoning is that the percentage of the explained variance is easier to understand at glance than the absolute values of the explained variance.

And for illustration, MATLAB uses an example in the documentation with percentage as well (

http://www.mathworks.com/help/toolbox/stats/brkgqnt.html#f75476). And if someone still needs the plot with the absolute values then there is always a possibility to copy the values from “Eigenvalues” tab and plot them manually.

Furthermore I propose that small numerical values up to +-1,000 should be printed either as integers (123) or fixed-point numbers (123.4) rather than in scientific notation (1.23E4). For example the Cumulative Variance Plot from PCA on Iris dataset looks ridiculous with scientific notation:

The reasoning behind this proposal is that “1” is much easier to understand than “1.00E0” and it even takes less space.

The described problem applies to PCA’s and SVD’s “Cumulative Variance Plot”. As a solution I suggest to plot the y-axis in percentage and to plot x-axis in integers whenever the count of attributes is smaller than 1.000 (eventually when the count is smaller than 100.000 as it still takes five alphanumeric symbols and a dot as the scientific notation).

2) The update manager is not really user friendly. Why am I always asked to confirm the license whenever a plugin is updated? If there was a change in the license I would understand that but if the license is still the same then the update process can be irritating (imagine that 10 plugins are installed and updated – it will take at least 10 clicks to finish the update process). Hence I propose to skip the confirmation whenever the license doesn’t change (detect the change either by the name/version of the license or hash). And the update manager is quite inconsistent with the rest of RapidMiner - the checkboxes must be clicked twice to select or deselect the item. And it’s not possible to select/deselect an item with the space bar although it’s possible to do that in the extension manager.

3) It would be great if it was possible to connect blocks not only by dragging a connection line from output to input, but even from input to output. Then the “wiring” of the schema would have been a bit more user friendlier.

4) Do you plan to extend “Filter Tokens (by POS)” and “Filter Tokens (by POS Ratios)” by other languages like Portuguese, particularly when they are already supported by OpenNLP (

http://opennlp.sourceforge.net/models-1.5/)?