KNIME: FAQ

Sections:

  1. General
  2. Getting Started
  3. Developers FAQ

This is only a small selection of possible FAQ. If you have any question which is not answered in the FAQ, the quickstart guide and the extension howto don't hesitate to write us an email and we will try to answer your question and augment our FAQ section.

General

Questions:

What is KNIME, what does KNIME stand for and who has developed KNIME?
KNIME stands for KoNstanz Information MinEr and is pronounced: [naim]. It is developed at the University of Konstanz, Chair for Bioinformatics and Information Mining.
Can I modify, publish, transmit, transfer or sell, reproduce, create derivative works from, distribute, perform, display, or in any way exploit any of the content, in whole or in part?
You may not modify, publish, transmit, transfer or sell, reproduce, create derivative works from, distribute, perform, display, or in any way exploit any of the content, in whole or in part, except as otherwise expressly permitted in writing by the copyright owner.
KNIME is available under a dual licensing model. An open source license is available for non-profit use. For commercial usage of KNIME, please contact us at contact@knime.org Please refer to the Copyright for more information.
How many data can I process with KNIME?
Basically, there are no limits, since the data is buffered in an intelligent way. Nevertheless, some algorithms may require too much time and memory for very huge datasets.
I'm getting errors like java.lang.OutOfMemoryError: PermGenSpace. What is wrong?
This is a known bug in Sun's Java that occurs if many classes are loaded. This sometimes occurs in KNIME/Eclipse if you have many or huge plugins. A workaround is to pass the option -XX:MaxPermSize=128m to the java command. The KNIME product is already using this setting by default. You can also try another Java implementation like the ones from IBM or BEA.
See also Eclipse's and Sun's bug reports.
I try to import a KNIME workflow from a directory or zip-file but after I browsed for the directory (zip-file) the project list is still empty. Why is the project I want to import not listed?
This effect is due to a known missing feature in the import functionality of Eclipse. It is an issue in old versions of KNIME only. As of version 1.2.0 we have implemented a workaround for this; it does list these projects now. But projects can only be imported if no project with the same name exists in the workspace. Unfortunately, the import wizard has to be canceled, projects need to be renamed, and the import has to be started again.
The Node Description window doesn’t work on Linux, it shows the error "System browser cannot be initialized. No nodedescription will be displayed."
KNIME uses the SWT browser widget to display HTML content. This widget requires a proper web browser to be installed. For instance, on SuSE Linux 10.1 the installation of the package mozilla-xulrunner and exporting an environment variable (export MOZILLA_FIVE_HOME=/usr/lib/xulrunner) fixes the problem. Please note that exporting this variable to, for instance /usr/lib/firefox or /usr/lib/mozilla is likely not going to work. For other distributions refer to The SWT FAQ.
When coping Meta nodes such as Cross Validation or Meta Nodes x:x the inner nodes are not copied.
This is a known issue for all meta nodes. KNIME does not copy the internal workflow of metanodes because of potential index conflicts. A workaround is to create a new meta node of the same type and subsequentially copy the internals of the meta node separately. We are working on a fix for this issue.
I'm working with Windows Vista. In the Node-dialogs, I can not open a FileChooser to select a data file.
This is a known bug in Java's JFileChooser and the Windows Vista Look & Feel. ( http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6449933). It has been fixed in the Java Runtime Environment version 1.5.0_11 and 1.6.0, so you have to upgrade your existing Runtime Environment to one of these versions.
The chemistry extensions have changed a lot in version 1.2.0, in particular the molecular parsing nodes, the 2D structure rendering, and the CDK integration. Why?

Ah, right we changed some stuff along those lines. The reasoning behind this was that we wanted to stop to automatically (=hiddenly) convert between formats and leave control over such conversions to the user. Before we converted SMILES automatically into a CDK representation which enabled us to show 2D representations.

Now you can read in, for example a SMILES string and explicitly translate it into a CDK representation (or something else using Open Babel). (Note that the FileReader now allows you to choose SMILES directly for columns containing string-like elements.)

The reason why you do not see structures in the CDK cell is that we split this functionality. A lot of functions in CDK are somewhat, let's say "premature", so we are trying to not combine too much into one node. So the SMILES->CDK converter does not automatically trigger the 2D coordinate generation - you need to use the appropriate node to do this. Afterwards the CDK cell does contain this information and the 2D renderer is available. Similarly for 3D representation but the CDK 3D coordinate generator crashes frequently, so we did not include it in our standard CDK distribution. However, if you, for example, read SDF files containing 3D coordinate information, convert this to CDK you will sequentially be able to see those 3D structures.

So, to summarize. In order to see structures in 2D, you need to read in SMILES or SDF (or generate something via Open Babel), translate those to CDK types using the appropriate translator, add 2D coordinates (if not already included in the SDF representation) and then connect the Interactive View.

Note that the Tripos tools add a specific renderer for SMILES, so once their tools come out you will also be able to see 2D representations for columns holding "just" SMILES.

I get the following error message when trying to update from the v1.1 KNIME Product to version 1.2: "Resulting configuration does not contain the platform."
That is correct. It is not possible to update the 1.1 product to the new KNIME version using the Eclipse build-in update mechanism. You need to download the new Product and then install additional features via the update site. Note that the KNIME developer version can be updated without downloading the new version but we still recommend using Eclipse v3.2 together with GEF v3.2 or later.
I'm running the Gnome window manager under Linux and can't open any Weka Dialog. Why is that and how can I fix it?

The Look and Feel (L&F) implementation that is used under Gnome (com.sun.java.swing.plaf.gtk.GTKLookAndFeel) has a bug, which causes a NullPointerException to be thrown at:
weka.gui.PropertySheetPanel.addPropertyChangeListener(PropertySheetPanel.java:160)

The workaround for this problem is to specify another default L&F when launching the java process by providing the commandline argument
-Dswing.systemlaf=javax.swing.plaf.metal.MetalLookAndFeel
Our next release 1.2.1 will have this fix included. In the current release of KNIME you can enable this option depending on the product you use as follows:

KNIME Product
Either add this line to the .knime.ini file (located in the KNIME directory) or launch the knime.sh executable with the option -vmargs -Dswing.systemplatform...
KNIME Developer Version
Either add this line to the eclipse.ini file or launch the executable eclipse with the option -vmargs -Dswing.systemplatform...
KNIME Runtime Workspace (the one you get when testing your own node implementation)
Add the above mentioned line to the "Arguments" tab in the run configuration, i.e. when you choose "Run" -> "Run..." and then selecting the appropriate run configuration on the left.
Is there any way to run KNIME in batch mode, i.e. only on command line and without graphical user interface?

There is a (experimental(!) and therefore undocumented) command line option allowing the user to run KNIME 1.2.1 in batch mode. To see a list of possible arguments execute the following line on a command prompt (for Linux):

knime.sh -nosplash -application org.knime.product.KNIME_BATCH_APPLICATION

On a Windows system, you need to add two more options to enable system messages (by default any message to System.out gets suppressed):

knime.exe -consoleLog -noexit -nosplash -application org.knime.product.KNIME_BATCH_APPLICATION

The -consoleLog option causes a new window to be opened containing the log messages and -noexit will keep the window open after the execution has finished - you will need to shutdown the window manually, and, unfortunately, get an error message from the java process, which you can safely ignore. (If you happen to find out how this procedure can be avoided or simplified, please let us know.) Windows users: please remember to add these two options to the command line examples below in order to see KNIME's output messages.

In order to run a (pre-configured) workflow, say Knime_project, contained in the workspace directory, execute (in one line)

knime.sh -nosplash -application org.knime.product.KNIME_BATCH_APPLICATION
         -workflowDir="workspace/Knime_project"

It's also possible to change configuration options of individual nodes contained in the workflow. This becomes handy if you, for instance want to change the URL of the file that the file reader node reads from. You will need to find out what's the ID of the target node and the name and type of the option you wish to change. To do so browse the project directory (for instance workspace/Knime_project) and find the configuration directory of the target node (e.g. "File Reader (#4)", whereby 4 is the ID of the node). Open the file settings.xml and look for the xml element "model" (all elements nested in "model" can be changed). The individual options should be more or less self-explanatory, in this example we would look for the option "DataURL", which contains a String value. The command line then looks as follows:

knime.sh -nosplash -application org.knime.product.KNIME_BATCH_APPLICATION 
         -workflowDir="workspace/Knime_project" 
         -option=4,DataURL,"file:/home/wiswedel/benchmarks/iris/data.tst",String
Since KNIME 1.3.1 I've had problems with the node description window; it shows an error "Unable to create view: Plug-in org.knime.workbench.helpview was unable to load class org.knime.workbench.helpview.view.HelpView."

This is a bug in KNIME 1.3.1. It occurs in a multi-user environment, when KNIME is installed under a different user name (for instance root) than it is being used as. KNIME is trying to write to the installation directory, which causes problems as it does not have write permissions.

This bug was fixed in KNIME 1.3.3.

Back to top

Getting Started

Questions

How can I create/use another workspace location?
Call KNIME with the command line argument "-data <workspace_location>". For Windows create a shortcut to the KNIME application, edit its properties and add this option at the end of the "target" field (outside the quotes).
After the orange splash screen I get an error message referring to a log file and that's it. What does this mean?
Check the log file - but one known reason why this happens is if an old version of Java is installed on your machine. Make sure you update your Java to version 5.0 or higher. You can get the latest version from http://java.sun.com/j2se/1.5.0/download.jsp. If it's of any help, you could also uninstall/disable your current Java installation, and KNIME will then use the one that is delivered with the installation.
The workspace is empty how do I create a new project?
In the Navigator view (left top window) right-click and select "New", then "New KNIME Project". Provide a name for this new project and press the OK button.
The Node Repository shows only a few nodes (or none at all) - is that all?!?
Did you enter something in the search field of the Node Repository view? Click into the edit field at the top of that view and press ESC. This should return all nodes included in the installation. If that didn't help your installation might be corrupt.
Under Linux I cannot see the table in the output port view, the window shows only blank content. What is wrong here?
This is very likely a problem with the Compiz OpenGL window manager (Bug report at Sun). Other effects may be empty sub-dialogs, e.g. in the File Reader. You have to switch off Compiz in this case. Another workaround has been posted here, but we haven't tested it ourselves. For the non-germans you have to
  1. Install the libXP package, by e.g. yum -y install libXp.
  2. Add export AWT_TOOLKIT=MToolkit to your ~/.bashrc (or whatever file your shell executes upon startup).

Back to top

Developers FAQ

Questions:

How do I implement my own Node?
Use the KNIME extension wizard:
Select "New", "Other", and in the category "Konstanz Information Miner" select "Create new KNIME Node-Extension". Follow the instructions of the wizard. For further information refer to the extension howto where it is explained in detail.
In the All-in-One version of KNIME the auto-completion does not work when editing source code (using Ctrl-Space). Why is that?
That's a bug in version 1.1.0 (and any earlier version). The bug fix will be available in the next version of KNIME, i.e. version 1.2.0. The problem is that this shortcut has been defined twice: It is used for auto-completion in the java editor and also a shortcut in the KNIME perspective. To overcome this situation, you need to disable the latter one. To do so, go to "Window", "Preferences", "General", "Keys" and look in the "View" under "KNIME Workflow Editor commands" for the Ctrl-Space shortcut. Select it, and click "Edit" to finally delete this shortcut.
What version of Java do I need?
KNIME requires java 1.5.0 or any later version. We recommend to use at least version 1.5.0_11 since previous versions of java have caused problems when opening file chooser, in particular on Windows Vista. Java is available at http://java.sun.com. You only need the java runtime environment (JRE) as Eclipse brings along its own compiler (no JDK required). Note, the developer version of KNIME already contains an appropriate JRE, it's contained in the archive under the folder jre.
How do I use KNIME in the All-in-One version? I only see an ordinary Eclipse IDE...
For using KNIME in the All-in-One version you need to open the KNIME perspective, i.e. "Window", "Open Perspective", "Other", "Konstanz Information Miner". Note, the perspective will not contain any node implementation that you have been developing in your current workspace. Please refer to the extension howto to see how to do that.
I used the KNIME Node-Extension wizard which created a new plugin and one Node. Now I want to add a Node in this plugin. How can I do this?
You have to create the necessary classes by hand. At least you need a NodeModel and a NodeFactory. The NodeDialog and the NodeView are optional. Right-click on the existing package where you want to create the classes and choose "New", "Class" and follow the instructions. Note, that you have to extend the NodeModel class (the NodeFactory, NodeDialogPane and NodeView, respectively). After having created the necessary classes refer to the extension howto for more information about the implementation of the classes. After you have created all the classes, open the plug-in's "plugin.xml" and go to the tab "Runtime". Add your new package to the "Exported packages" list. In the tab "Extensions" right-click on "org.knime.workbench.repository.node" → "New" → "node"; and then fill the fields of the new list entry accordingly.
My new node appears on the top level of the node repository How do I put it into a (new) category?

A new category is defined using the extension point org.knime.workbench.repository.categories. Use the Eclipse editor of the "plugin.xml", which is contained in the plugin project, to register this extension point. Then go to the tab Extensions and click the "add..." button to add the org.knime.workbench.repository.categories extension (if not already there). On the new list entry, do a right click and follow "New", "category" as shown in the following screenshot:

Category screenshot


In the "Extension Element Details" section on the right, you need to enter information to the new category, that is:
name: The name as it appears in the node repository
path: The path in the node repository. Put '/' to put it into the root.
level-id: The internal identifier for this category. Enter this id in the node extension to put the node into this category or use it in another category's 'path' element to get sub categories.
after: Allows to order the categories. Find out which category identifiers are available by looking into the KNIME plugins.
description: The tooltip text.
icon: The icon of this category.

To finally add nodes to this category, go to each node's extension definition and enter the category identifier (along with a preceding '/') in the category-path.

What is exactly the idea behind the node model, node dialog and the node view? Where are the differences between the node dialog and the node view? Why doesn’t the node dialog write directly to the node model?
The underlying design follows the Model-View-Controller Concept (see also the Extension HowTo). The Dialog really acts as a Controller rather than a view on the model so they are treated quite differently.
  • The views: they require access to the entire model as we do not know (and cannot know) which pieces of the (node) model are needed to display it properly. In some instances (Scatterplot...) it can be very useful to see different plots at the same time (imagine looking at three different combinations of variables).
  • The dialog: it changes settings of the (node) model that control its operation (what to compute...). It has nothing to do with what is subsequently contained in the model itself; this is determined during execution, based on the settings that were changed by the dialog. So why do we not enable the dialog to write directly into the (node)model? There are two reasons:
    1. We want to be able to store those settings when the workflow is saved. If we make sure everything is transported from the dialog to the model in a clearly defined container (the NodeSettings object) we can serialize this object and be sure that we do not lose anything.
    2. More importantly: we want to be able to cancel a dialog and check before writing everything to the model that the new settings are correct. In order to do this we cannot have the dialog write directly into the model because then we would not be able to reverse to the previous settings. This is also the reason for the separate validate and apply methods. When a dialog wants to apply the new settings, the model validates them first (and rejects them if incorrect) and only afterwards are the entire settings written to the model.
  • The (node)model: in order for us to not only load the configuration (nodes, connections, and node settings) but also the content of the models and the data itself, we need to store what the node created during execution. Since we don't really have control over what happens inside the model during execute() we leave it to the user to write this out to a specific directory. For some nodes it may be sufficient to do nothing because all information is already contained in the NodeSettings (which are stored automatically) - for instance a column filter or a node computing some properties. Also the data provided at the data outport is stored automatically. For other nodes (such as a decision tree) we do need to store the entire tree. Note that this is not necessarily the same as what is transported to the Model-Ports - the tree inside the node also needs to remember which Rows to highlight when a branch is selected. In almost all instances you only need to worry about writing and reading NodeContents if that node provides views. To summarize it:
    • The validateSettings(), loadValidatedSettings() and saveSettings() methods have to be implemented if the node has a dialog (and therefore settings).
    • Use the load/saveInternals() if the node provides views.
    • If the node has a model outport implement the load/and saveModelContent().
When should the modelChanged() function be called explicitly?
The modelChanged() function is essentially the notification to all views that a model has changed (reset or execute) inside the MVC-model and is called internally by the framework. Therefore there is no need to call it explicitly.
Why there are different types (for example in the chem plugin the SMILES data type, etc.)? Wouldn’t it be easier to have only strings and the node to care about the content of the string?
We believe that whenever we have a String that actually represents something else and we want a subsequent node to only operate on strings representing this particular type, we should add this as a specific type X and have either a string-to-X converter (parser...) node or a file-reader that reads only files containing X.
My node generates values based on the values of the input data. Should I add this information or simply output the new values?
Nodes which produce additional columns based on information already existent in the table should, by default, attach this information to the table as a new column. If the node converts the information in one column to another format (parser, binner, ...) it should offer a checkbox (by default disabled): replace original column.
How do I handle errors and exceptions during execution of the node model?
There are basically two ways to handle exceptions and errors occurred during execution:
  • If the error is severe that no data can be provided at the outport throw an exception. Then the node stays unexecuted and an error icon with the message of that exception is displayed.
  • If something unusual happened or you want to inform the user about some implicitly made decisions you can set a warning with setWarningMessage(String message) in the execute method. The node will be executed but with a warning icon displaying the text of the warning.
How do I set the progress bar correctly?
If the progress, for example, depends on the number of rows and there is only one task to do then it could be set in the execute method with: exec.setProgress(currentRowNr/numberOfRows, "Processing row nr: " + currentRowNr); If the task to be done is divided into some subtasks then you can create a subprogress with the fraction of the whole task. Having to equally long subtasks the code would be: ExecutionMonitor exec1 = exec.createSubProgress(0.5);
ExecutionMonitor exec2 = exec.createSubProgress(0.5);
task1(input, exec1);
// and task two
exec2.createBufferedDataTable(result, exec2);
Where do I set default values for my user settings?
One good place is the NodeModel's configure method. There you can look at the incoming table spec and the current user values to decide if you can create default values. They also will appear in the node’s dialog as default settings. Sometimes you just can’t guess useful default settings, but you still need to show something, when the dialog opens. In this case the dialog’s loadSettings method is probably the appropriate place to put in these values. If the model has no (default) values, it will not write values into the settings object (in its saveSettings method), thus the dialog will miss these values when it tries to load settings. In that case it needs to set some default (or initial) values to be displayed in its components.
What is the difference between the configure() and the validateSettings() method?
In validateStettings() you do basic checks on the new values. In configure() you check whether the node can run with the current settings and the values are consistent with the incoming table spec. In validateSettings() settings are rejected only if required values are missing or values are obviously invalid (e.g. you read a negative number when you know the value must be positive or you get a null or empty string for a column name). You can also check the consistency of the values to each other (like a lower bound value should be smaller than an upper bound value). At this point in time you can’t check the consistency of the settings with respect to the incoming data table. This will be done in the configure() method. Here you complain if a chosen column name doesn’t exist in an incoming table spec. Or a selected column is of incorrect datatype. If configure() goes through the node will be in the executable state.
Why is validateSettings() and loadValidatedSettings() split into two methods. Isn’t that duplicating code?
Sometimes the implementation of both methods looks indeed very similar. It is split into two methods to ensure that the implementation will either take over the full set of new settings, or reject them entirely. It would be dreadful, if, during load settings part of the settings would be taken over (by assigning them to the internal variables), just to realize half way through that some values are invalid - and then ending it with an exception. Separating the validation step from the assigning (loading) step adds robustness to the application.
How can I show debug messages for selected packages only?
KNIME currently uses Log4j for logging. Inside the .metadata directory of your runtime workspace (not your developing workspace), there is a subdirectory called knime with the default log4j configuration in it (log4j.xml). Inside the file there is a small comment about how to enable debug messages for selected packages only. However, enabling debug messages in that way only affects the output written to stdout which will show up in the Console of your Eclipse IDE, but not in the KNIME Console.
If I use the javax.swing.JFileChooser or KNIME's DialogComponentFileChooser, it takes very long to open the dialog or the dialog does not open at all causing KNIME to hang.
This is a known bug in the Java Runtime Environment, see Sun bugreport. The problem occurs if you initialize one of those classes within the constructor or as class member during class creation of your derived NodeDialogPane. Two possible solutions are: (1) Initialize the file chooser on demand, that is, the first time you need to access the file system, or (2) Add the creation to the event dispatching thread by using SwingUtilities.invokeAndWait(new Runnable() { ... });.
I downloaded the developer All-in-One and the it doesn't seem to use my default java installation. Why?
We decided to include the java runtime environment (JRE) 1.5.0_11 in the All-in-One version of KNIME. It's contained in the archive under the folder jre. By removing or renaming this folder, you will disable this JRE and Eclipse will (try to) use the JRE installed on your operating system.
How do I include and use external java libraries in my new KNIME plugin?
Follow these steps:
  • Create a lib directory in your KNIME plugin.
  • Copy the file(s) into the lib directory. (Java libraries are packed either as zip or jar archive.)
  • Edit the plugin.xml file with "Plug-in Manifest Editor".
  • Go to the tab "Runtime" and add all necessary libraries to the "Classpath" list on the right bottom corner using the "Add..." button.
  • Go the to tab "Build" and add the files to the list contained in the section "Extra classpath entries".
  • Make sure to have the lib directory selected in both the "Binary Build" and "Source Build" list (in the same tab).
  • (Please note, adding the jar files to the plugins build path, i.e. project context menu → "Java Build Path" → "Libraries" is not necessary.)
You should now be able to use the libraries within your node implementation.

Valid XHTML 1.0 Strict

Clicky Web Analytics