KNIME provides many nodes to read and write data from and to different sources in various formats. Besides accessing files it is also possible to connect to databases or Hadoop (Hive, Impala) in order to fetch or write data.
The KNIME software provides various reader nodes to read data from different file formats. The most common formats for storing data in files are csv-like formats. The File Reader node and sometimes, more specifically, the CSV Reader are used in KNIME to read these kinds of files.
The File Reader node reads all kinds of text files and estimates what the proper settings, such as column separator and column type should be during the configuration of the node. Settings can be further adjusted if the File Reader guess was not 100% correct.
Microsoft Excel files can also be read with the XLS Reader node.
Additional file formats, such as PMML, XML, R, PDF, Microsoft Word, ARFF and many more can be accessed with the appropriate reader node.
Similar to the reader nodes, writer nodes write data to files in a variety of formats. Use the CSV Writer node to write data to a file in CSV format or the XLS Writer to write data to an Excel sheet. Again, other formats, such as PMML, XML, or ARFF can also be used to write data to a file.
Many reader and writer nodes support URLs. This means that you can specify a URL from which to read (or write) a file. The file can either be located on your local machine or on a remote KNIME Server or web server, or a server that is accessible via ssh. The following video shows you how to use URLs in the writer nodes. The same holds for many reader nodes.
Arrange your workflows in Workflow Groups (folders) in your workspace. To create a workflow group simply right-click inside your local workspace in the KNIME Explorer view and select “New Workflow Group”. A folder is created in the workspace directory for each workflow group. Workflow Groups can also contain files. Create a “data” folder called Workflow Group, place your data files in the corresponding folder, and access them from the KNIME Explorer view. When a data file with a known format is dragged & dropped from the KNIME Explorer view into the workflow editor space, the appropriate reader node is automatically created and configured.
KNIME Explorer now supports data as well!
The Database Connector nodes enable you to connect to databases or to a selection of data, and to write and update database tables. To make use of powerful database servers or Hadoop clusters, many nodes provide in-database processing:
A number of extra nodes have been introduced to separate the connection from the table selection phase:
Using the Database Connector node it is possible to connect to any database for which a JBDC driver is registered in KNIME. In addition there are dedicated connector nodes, such as the MySQL Connector or the PostgreSQL connector that bundle the appropriate JDBC driver, so that no additional driver has to be registered in KNIME. There is a dedicated connector node for HP Vertica too.
Some of these database connector nodes belong to the big data extension and can be used to access big data platforms. You can read this article for information about how to connect to a Hadoop cluster via Hive or Impala, run in-database processing, and importprocessed data into KNIME, or, if you prefer, watch the YouTube video:
In the white paper KNIME opens the doors to Big Data a practical example is described about the integration of any Big Data Platform into KNIME for time series analysis.
We compared the runtime performance of the MapReduce execution engine with the Tez execution engine for Hive within KNIME. Here you can see the results.
To find out more about the big data extension, see: KNIME Big Data Extension.
In KNIME Labs there is a Twitter and Google Analytics API extension. With these nodes it is possible to connect to the Twitter and Google Analytics and to import data from these sources.
The Twitter nodes allow you to search for tweets made over the previous week and download the results. The Google Analytics nodes enable you to download Google Analytics data. In our blog, you can read this article for more information about the Twitter nodes and this one which is about the Google Analytics nodes. Watch the following video about the Twitter and Google API nodes.
Besides the dedicated nodes, any web service API can be accessed via REST service. Check the chapter "Data Preparation: Reading, Enriching, Transforming" in the following white paper.
The Text Processing extension in KNIME Labs includes nodes for reading PDF and Microsoft Word files (PDF Parser and Word Parser) and extracting the text contained in these files. The following video explains how to use them.
The KNIME FAQ page is worth a look if you run into problems e.g. about how to connect to a Microsoft Access database or if you get OutOfMemoryErrors when reading from a database using the Database Reader.