KNIME logo
Contact usDownload
Read time: 5 min

How to Work with Collections, Lists & Sets in KNIME

Presenting Your Collection Cell Cookbook

October 25, 2021
Data basics how-to
0-lists-and-sets-knime-collection-cells.jpg
Stacked TrianglesPanel BG

Today we would like to show you how to work with what we call collection cells - or lists/sets - in KNIME. These are cells that represent a collection of cells, for example a collection of strings representing a frequent item set or items in a transaction. Read on to find out what the differences are between collection, list, and set!

1-working-with-collection-cells.png
Fig. 1. The Collection Cookbook workflow: The goal of this workflow is to demonstrate creating and working with collection cells.

Collection, List, or Set - What's the Difference?

KNIME supports two different types of collection cells that differ in the way they handle duplicates and order elements.

  • List: A list cell corresponds to a mathematical sequence in that it contains all elements (including duplicates) in the order they have been added to the collection cell
  • Set: A set cell corresponds to a mathematical set in that it contains each element only once in an arbitrary order

Example of a List and Set

Let us add the following elements in this order to a collection cell: 1, 2, 2, 3, 3, 4, 5.

A list cell would contain all elements in the same order in which they are added e.g. (1, 2, 2, 3, 3, 4, 5) whereas the set cell would contain only unique elements in an arbitrary order e.g. {3, 1, 2, 4, 5}.

In the workflow, the “Collection Types” section demonstrates the creation and behavior of the two different set types.

2-working-with-collection-cells.png
Fig. 2. The Collection Types section of the workflow demonstrates the creation and behavior of the two different set types.

Create and Convert Collection Cells

Collection cells are either created manually by the user or are the result of a KNIME node such as the Item Set Finder node. To manually create a collection cell you can combine either the cells of several columns or the cells of several rows into a single collection cell.

Combine cells row wise to create a list or set cell

You can use the GroupBy node to combine the cells of one column of several rows into a single collection cell. The number of rows and therefore the number of the elements depends on the selected group columns. If you don't select a group column, all rows are added to a single collection cell. In the aggregation section you select the column you want to use and the collection type you want to create.

  • You can choose to create a list cell that contains all elements in the order they appear in the corresponding group or have them sorted based on their value.
  • You can also create a set cell that contains only the unique values of the selected aggregation column.
Convert a collection cell back into individual rows

The node that converts a collection cell back into individual rows is the Ungroup node. For each collection element it creates a new data row. If the collection cell is a list cell, the row order from top to bottom reflects the order of the collection elements from left to right. If you ungroup two collection cells with different numbers of elements simultaneously the missing elements are filled with missing values.

Note. By using the GroupBy node and the Ungroup node you can collapse a KNIME table into a single row and expand it again without losing any information. Simply use the GroupBy node with List as the aggregation method for all columns to aggregate all rows into a single row. Later on you can use the Ungroup node on all collection columns to expand the row into a table.

Combine cells column wise to create a list or set cell

you can use either the Create Collection Column node or Column Aggregator to combine several columns of a single row into a new collection cell.

The Create Collection Column node allows you to create a collection cell from a set of columns that are selected either manually or based on their name or type. Depending on the node setting ("Create a collection type 'set'") the node creates:

  • Either a list cell, which contains the elements in the order they appear in the selected columns from left to right
  • Or a set cell, which contains only the unique elements of the columns

The Column Aggregator, like the GroupBy node, also allows you, in addition to many other aggregation methods, to create either a list cell, which contains all elements in the order of the selected columns or a cell in which all the elements are sorted based on their value. You can also create a set cell which contains only the unique values of the selected columns.

Convert a collection cell back into individual columns

The node that converts a collection cell back into individual columns is the Split Collection Column node. It splits the single elements up into columns. If the collection cell is a list cell, the order of the elements is maintained when creating the columns.

You can try this out in the example workflow. The “Collection creation and conversion” section of it demonstrates row- and column-wise creation and conversion of collection cells, as depicted below.

3-working-with-collection-cells.png
Fig. 3. The “Collection creation and conversion” section of the workflow that demonstrates row and column wise creation and conversion of collection cells.

How to Work with Collection Cells

KNIME provides several nodes that work with collection cells. For example the Column Aggregator node and the GroupBy node provide aggregation methods to create collection cells but also methods to perform set operations e.g. union, intersection, exclusive-or and element counting.

The Create Bit Vector node not only allows you to create a bit vector from multiple columns but also from a single collection column, e.g. an item list of a shopping cart. Each unique element is assigned a position in the resulting bit vector resulting in bit vectors with a length equal to the number of unique elements in all collection cells.

The Item Set Finder (Borgelt) node provides several algorithms to search for frequently co-occurring items in a given collection column. The result of the node ‒ the discovered frequent item sets ‒ is represented as a set cell that contains the frequently co-occurring items. The Subset Matcher node allows you to search to see if a given subset, such as a discovered frequent itemset, exists within a given collection cell e.g. an item list. For example, you can use the node to discover all transactions that contain a specific frequent item set.

The “Working with collections” section of the example workflow, depicted below, contains a subset of the nodes that support collection cells.

4-working-with-collection-cells.png
Fig. 4. The “Working with collections” section of the workflow, depicted below, contains a subset of the nodes that support collection cells.

To Summarize the Differences/Similarities between Collection, List, and Set

A collection is the super-type so set and list are both collections

A list contains duplicate entries, as it saves all elements in the same order

A set doesn't contain duplicates and entries are stored in an arbitrary order