com.rapidminer.operator.io
Class DatabaseExampleSource

java.lang.Object
  extended by com.rapidminer.operator.Operator
      extended by com.rapidminer.operator.io.AbstractReader<ExampleSet>
          extended by com.rapidminer.operator.io.AbstractExampleSource
              extended by com.rapidminer.operator.io.ResultSetExampleSource
                  extended by com.rapidminer.operator.io.DatabaseExampleSource
All Implemented Interfaces:
ConfigurationListener, PreviewListener, ParameterHandler, LoggingHandler

public class DatabaseExampleSource
extends ResultSetExampleSource

This operator reads an ExampleSet from an SQL database. The SQL query can be passed to RapidMiner via a parameter or, in case of long SQL statements, in a separate file. Please note that column names are often case sensitive. Databases may behave differently here.

The most convenient way of defining the necessary parameters is the configuration wizard. The most important parameters (database URL and user name) will be automatically determined by this wizard and it is also possible to define the special attributes like labels or ids.

Please note that this operator supports two basic working modes:

  1. reading the data from the database and creating an example table in main memory
  2. keeping the data in the database and directly working on the database table

The latter possibility will be turned on by the parameter "work_on_database". Please note that this working mode is still regarded as experimental and errors might occur. In order to ensure proper data changes the database working mode is only allowed on a single table which must be defined with the parameter "table_name". IMPORTANT: If you encounter problems during data updates (e.g. messages that the result set is not updatable) you probably have to define a primary key for your table.

If you are not directly working on the database, the data will be read with an arbitrary SQL query statement (SELECT ... FROM ... WHERE ...) defined by "query" or "query_file". The memory mode is the recommended way of using this operator. This is especially important for following operators like learning schemes which would often load (most of) the data into main memory during the learning process. In these cases a direct working on the database is not recommended anyway.

Warning
As the java ResultSetMetaData interface does not provide information about the possible values of nominal attributes, the internal indices the nominal values are mapped to will depend on the ordering they appear in the table. This may cause problems only when processes are split up into a training process and an application or testing process. For learning schemes which are capable of handling nominal attributes, this is not a problem. If a learning scheme like a SVM is used with nominal data, RapidMiner pretends that nominal attributes are numerical and uses indices for the nominal values as their numerical value. A SVM may perform well if there are only two possible values. If a test set is read in another process, the nominal values may be assigned different indices, and hence the SVM trained is useless. This is not a problem for label attributes, since the classes can be specified using the classes parameter and hence, all learning schemes intended to use with nominal data are safe to use.

Author:
Ingo Mierswa ingomierswa Exp $
To do:
Fix the above problem. This may not be possible effeciently since it is not supported by the Java ResultSet interface.

Field Summary
static java.lang.String PARAMETER_CLASSES
          The parameter name for "Whitespace separated list of possible class values of the label attribute.
static java.lang.String PARAMETER_DATABASE_SYSTEM
          The parameter name for "Indicates the used database system"
static java.lang.String PARAMETER_DATABASE_URL
          The parameter name for "The complete URL connection string for the database, e.g.
static java.lang.String PARAMETER_PASSWORD
          The parameter name for "Password for the database.
static java.lang.String PARAMETER_QUERY
          The parameter name for "SQL query.
static java.lang.String PARAMETER_QUERY_FILE
          The parameter name for "File containing the query.
static java.lang.String PARAMETER_TABLE_NAME
          The parameter name for "Use this table if work_on_database is true or no other query is specified.
static java.lang.String PARAMETER_USERNAME
          The parameter name for "Database username.
static java.lang.String PARAMETER_WORK_ON_DATABASE
          The parameter name for "If set to true, the data read from the database is NOT copied to main memory.
 
Fields inherited from class com.rapidminer.operator.io.ResultSetExampleSource
PARAMETER_DATAMANAGEMENT, PARAMETER_ID_ATTRIBUTE, PARAMETER_LABEL_ATTRIBUTE, PARAMETER_WEIGHT_ATTRIBUTE
 
Constructor Summary
DatabaseExampleSource(OperatorDescription description)
           
 
Method Summary
 ExampleSet createExampleSet()
          Creates (or reads) the ExampleSet that will be returned by AbstractReader.apply().
protected  DatabaseHandler getConnectedDatabaseHandler()
           
 java.util.List<ParameterType> getParameterTypes()
          Returns a list of ParameterTypes describing the parameters of this operator.
 java.sql.ResultSet getResultSet()
          This method reads the file whose name is given, extracts the database access information and the query from it and executes the query.
 void processFinished()
          Called at the end of the process.
 void setNominalValues(java.util.List attributeList, java.sql.ResultSet resultSet, Attribute label)
          Since the ResultSet does not provide information about possible values of nominal attributes, subclasses must set these by implementing this method.
 void tearDown()
          This method is invoked at the end of the data query process.
 
Methods inherited from class com.rapidminer.operator.io.ResultSetExampleSource
createExampleSet
 
Methods inherited from class com.rapidminer.operator.io.AbstractExampleSource
read
 
Methods inherited from class com.rapidminer.operator.io.AbstractReader
apply, getInputClasses, getOutputClasses
 
Methods inherited from class com.rapidminer.operator.Operator
addError, addValue, addWarning, apply, checkDeprecations, checkForStop, checkIO, checkProperties, clearErrorList, cloneOperator, createExperimentTree, createExperimentTree, createFromXML, createMarkedExperimentTree, createMarkedProcessTree, createProcessTree, createProcessTree, getAddOnlyAdditionalOutput, getApplyCount, getDeliveredOutputClasses, getDeprecationInfo, getDesiredInputClasses, getEncoding, getErrorList, getExperiment, getInnerOperatorsXML, getInput, getInput, getInput, getInputDescription, getIOContainerForInApplyLoopBreakpoint, getIODescription, getLog, getName, getOperatorClassName, getOperatorDescription, getParameter, getParameterAsBoolean, getParameterAsColor, getParameterAsDouble, getParameterAsFile, getParameterAsFile, getParameterAsInputStream, getParameterAsInt, getParameterAsMatrix, getParameterAsString, getParameterList, getParameters, getParameterType, getParent, getProcess, getStartTime, getStatus, getUserDescription, getValue, getValues, getXML, hasBreakpoint, hasBreakpoint, hasInput, inApplyLoop, isDebugMode, isEnabled, isExpanded, isParallel, isParameterSet, log, logError, logNote, logWarning, performAdditionalChecks, processStarts, register, registerOperator, remove, rename, resume, setApplyCount, setBreakpoint, setEnabled, setExpanded, setInput, setListParameter, setOperatorParameters, setParameter, setParameters, setParent, setUserDescription, toString, unregisterOperator, writeXML
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

PARAMETER_WORK_ON_DATABASE

public static final java.lang.String PARAMETER_WORK_ON_DATABASE
The parameter name for "If set to true, the data read from the database is NOT copied to main memory. All operations that change data will modify the database."

See Also:
Constant Field Values

PARAMETER_DATABASE_SYSTEM

public static final java.lang.String PARAMETER_DATABASE_SYSTEM
The parameter name for "Indicates the used database system"

See Also:
Constant Field Values

PARAMETER_DATABASE_URL

public static final java.lang.String PARAMETER_DATABASE_URL
The parameter name for "The complete URL connection string for the database, e.g. 'jdbc:mysql://foo.bar:portnr/database'"

See Also:
Constant Field Values

PARAMETER_USERNAME

public static final java.lang.String PARAMETER_USERNAME
The parameter name for "Database username."

See Also:
Constant Field Values

PARAMETER_PASSWORD

public static final java.lang.String PARAMETER_PASSWORD
The parameter name for "Password for the database."

See Also:
Constant Field Values

PARAMETER_QUERY

public static final java.lang.String PARAMETER_QUERY
The parameter name for "SQL query. If not set, the query is read from the file specified by 'query_file'."

See Also:
Constant Field Values

PARAMETER_QUERY_FILE

public static final java.lang.String PARAMETER_QUERY_FILE
The parameter name for "File containing the query. Only evaluated if 'query' is not set."

See Also:
Constant Field Values

PARAMETER_TABLE_NAME

public static final java.lang.String PARAMETER_TABLE_NAME
The parameter name for "Use this table if work_on_database is true or no other query is specified."

See Also:
Constant Field Values

PARAMETER_CLASSES

public static final java.lang.String PARAMETER_CLASSES
The parameter name for "Whitespace separated list of possible class values of the label attribute."

See Also:
Constant Field Values
Constructor Detail

DatabaseExampleSource

public DatabaseExampleSource(OperatorDescription description)
Method Detail

createExampleSet

public ExampleSet createExampleSet()
                            throws OperatorException
Description copied from class: AbstractExampleSource
Creates (or reads) the ExampleSet that will be returned by AbstractReader.apply().

Overrides:
createExampleSet in class ResultSetExampleSource
Throws:
OperatorException

tearDown

public void tearDown()
Description copied from class: ResultSetExampleSource
This method is invoked at the end of the data query process. Subclasses might want to clean up things, e.g. close statements.

Specified by:
tearDown in class ResultSetExampleSource

setNominalValues

public void setNominalValues(java.util.List attributeList,
                             java.sql.ResultSet resultSet,
                             Attribute label)
                      throws UndefinedParameterError
Description copied from class: ResultSetExampleSource
Since the ResultSet does not provide information about possible values of nominal attributes, subclasses must set these by implementing this method.

Specified by:
setNominalValues in class ResultSetExampleSource
Parameters:
attributeList - List of Attribute
Throws:
UndefinedParameterError

getConnectedDatabaseHandler

protected DatabaseHandler getConnectedDatabaseHandler()
                                               throws OperatorException,
                                                      java.sql.SQLException
Throws:
OperatorException
java.sql.SQLException

getResultSet

public java.sql.ResultSet getResultSet()
                                throws OperatorException
This method reads the file whose name is given, extracts the database access information and the query from it and executes the query. The query result is returned as a ResultSet.

Specified by:
getResultSet in class ResultSetExampleSource
Throws:
OperatorException

processFinished

public void processFinished()
Description copied from class: Operator
Called at the end of the process. The default implementation does nothing.

Overrides:
processFinished in class Operator

getParameterTypes

public java.util.List<ParameterType> getParameterTypes()
Description copied from class: Operator
Returns a list of ParameterTypes describing the parameters of this operator. The default implementation returns an empty list if no input objects can be retained and special parameters for those input objects which can be prevented from being consumed.

Specified by:
getParameterTypes in interface ParameterHandler
Overrides:
getParameterTypes in class ResultSetExampleSource


Copyright © 2001-2009 by Rapid-I