KNIME: Example Applications
Some example applications of KNIME are listed below to illustrate some of it's capabilities.
Analysis of a Salary Poll
KNIME has been used to analyze a salary poll from a german computer magazine. Some of the results were presented in an article in this magazine. The major part of the analysis consisted of several data preprocessing steps such as in almost all data mining tasks. This includes data conversion (currency transformation from SFr to EURO), outlier detection, and column and row filtering.
Databases and Data Preprocessing
KNIME supports the integration of different databases like MySQL, Oracle, DB2, and others. Thus, data from different database sources (Reader) can be joined, manipulated, partitioned, and transformed in KNIME and finally stored again in databases (Writer).
KNIME Boosting
The flow below demonstrates the use of KNIME's meta node concept. Pipelining tools usually do not allow for loops in the flow, but there are several algorithms that require parts of a workflow to be executed repeatedly. Two examples are cross validation, where a classifier is trained with e.g. 10 different parts of the input table and the learned model is used to predict the remaining 90%. The error rates of the 10 iterations give quite a good estimation on the quality of the classifier.
Another example is boosting where the same classifier is trained several hundred times with each iteration given more weight on the wrongly classified examples from the previous. By this a meta classifier is built out of the classifiers from each single iteration, whose performance is usually superior to a single classifier.
The figure show these two nodes in the main flow at the top. The flow at the bottom is the inner flow of the boosting node.
Virtual High Throughput Screening
(Experimental/Internal use - not all nodes are part of the KNIME distribution)
KNIME has also been successfully applied to vHTS data (Virtual High Troughput Screening). The challenge of processing huge amounts of data (several GB) is mastered by most KNIME nodes without any difficulties. The results of predicting the activity of yet untested compounds can be visualized by e.g. the Enrichment Plotter that was specially developed for this purpose (see figure). Another useful tool for inspecting the data are the so-called neighborgrams, where the neighborhood of data points labeled as "active" are shown. (The neighborgrams exist as an additional feature for KNIME and can be downloaded here). The colors of the points indicate the activity of the molecules represented by the data points.
Cell Miner*
(Experimental/Internal use - not all nodes are part of the KNIME distribution)
KNIME has been used to analyze cell images.
A new data cell for images was integrated and
a picture file reader node added to the repository.
A segmenter node has been implemented to locate cells
in the images. Multiple feature extraction nodes were
used to extract data for the classifier algorithm.
The learner interactively adapts to the different cell
types and classifies then huge amounts of images.
Interested readers are referred to the publication:
Nicolas Cebron, Michael R. Berthold,
Adaptive Active Classification of Cell Assay Images ,
Knowledge Discovery in Databases: PKDD 2006 (PKDD/ECML, Berlin, Germany),
vol. 4213, pp. 79-90, Springer Berlin / Heidelberg, 2006
PDF
