In this blog series we’ll be experimenting with the most interesting blends of data and tools. Whether it’s mixing traditional sources with modern data lakes, open-source devops on the cloud with protected internal legacy tools, SQL with noSQL, web-wisdom-of-the-crowd with in-house handwritten notes, or IoT sensor data with idle chatting, we’re curious to find out: will they blend? Want to find out what happens when IBM Watson meets Google News, Hadoop Hive meets Excel, R meets Python, or MS Word meets MongoDB?
Follow us here and send us your ideas for the next data blending challenge you’d like to see at willtheyblend@knime.com.
Authors: Rosaria Silipo and Iris Adä
The ChallengeThank God it’s Friday! And with Friday, some free time! What shall we do? Watch a movie or read a book? What do the other KNIME users do? Let’s check!
When it comes to KNIME users, the major video source is YouTube; the major reading source is the KNIME blog. So, do KNIME users prefer to watch videos or read blog posts? In this experiment we extract the number of views for both sources and compare them.
YouTube offers an access REST API service as part of the Google API. As for all Google APIs, you do not need a full account if all you want to do is search; a key API is enough. You can request your own key API directly on the Google API Console. Remember to enable the key for the YouTube API services. The available services and the procedure to get a key API are described in these 2 introductory links:
https://developers.google.com/apis-explorer/?hl=en_US#p/youtube/v3/
https://developers.google.com/youtube/v3/getting-started#before-you-start
On YouTube the KNIME TV channel hosts more than 100 tutorial videos. However, on YouTube you can also find a number of other videos about KNIME Analytics Platform posted by community members. For the KNIME users who prefer to watch videos, it could be interesting to know which videos are the most popular, in terms of number of views of course.
The KNIME blog has been around for a few years now and hosts weekly or biweekly content on tips and tricks for KNIME users. Here too, it would be interesting to know which blog posts are the most popular ones among the KNIME users who prefer to read – also in terms of number of views! The numbers for the blog posts can be extracted from the weblog file of the KNIME web site.
YouTube with REST API access on one side and blog page with weblog file on the other side. Will they blend?
Topic. Popularity (i.e. number of views) of blog posts and YouTube videos.
Challenge. Extract metadata from YouTube videos and metadata from KNIME blog posts.
Access Mode. WebLog Reader and REST service.
We are using three YouTube REST API services:
A variation of the same metanode, with name starting with “YouTube API”, is used to invoke all three YouTube REST API services. All metanodes have the same structure:
See figure 1 for the content of a “YouTube API…” metanode.
Figure 1. Sub-workflow in metanode to access YouTube REST API.

The upper branch of the final workflow has been built around such metanodes to access the YouTube REST API and to extract the videos related to the given keywords, their details, and the attached comments.
Note. Language recognition is performed by means of the Tika Language Detector node. Given a text, this node produces a language hypothesis and a confidence measure. We take only comments in English with confidence above 0.8.
The plurality of languages shows how widely KNIME is used around the world. However, we limited ourselves to English just for comprehension reasons.
The KNIME blog is part of the general KNIME web site. All access details about the KNIME blog are available in the weblog file from the KNIME web site. Among those details, the access data for each blog post are available in the weblog file.
The lower branch of this experiment’s workflow focuses on reading, parsing, and extracting information about the KNIME blog posts from the site weblog file.
Note. Actually, as you can see from figure 2, the metanode “Extract Post Entries” exhibits 2 loops. The second loop loops around the blog post titles, one by one, as described above. The first loop, the parallel chunk loop, is just a utility loop, used to parallelize and speed up its loop body.
Figure 2. Content of Metanode “Extract Post Entries”.

The upper branch has the data from YouTube, aggregated to show the number of views for the top 10 most viewed KNIME tagged videos.
The lower branch has the data from the weblog file, aggregated to show the number of views for the top 10 most read KNIME blog posts.
The two datasets are joined through a Joiner node and sent to the workflow report project.
The final workflow is shown in figure 3.
Figure 3. Final workflow. Upper branch connects to YouTube REST API. Lower branch parses weblog file.

(click on the image to see it in full size)
For privacy reasons, we could not make this workflow available as it is on the KNIME EXAMPLES server.
However, you can find an example workflow for the WebLog Reader node on the EXAMPLES server under 01_Data_Access/07_WebLog_Files/01_Example_for_Apache_Logfile_Analysis01_Data_Access/07_WebLog_Files/01_Example_for_Apache_Logfile_Analysis*.
The upper part of this workflow can be found on the EXAMPLES server under 01_Data_Access/05_REST_Web_Services/03_Access_YouTube_REST_API01_Data_Access/05_REST_Web_Services/03_Access_YouTube_REST_API*, without the key API information. You will need to get your own key API, enabled for the YouTube REST API, from the Google API Console.
The report created from the workflow is exported as pdf document in figure 4. On page 2 and 3, you will see two bar charts reporting the number of views for the top 10 most viewed YouTube videos and the top 10 most read blog posts, respectively.
Here are the top 10 YouTube videos:
Here are the top 10 KNIME blog posts:
In both lists, we find abundant material for KNIME beginners. The readers of the KNIME blog posts seem to enjoy a post or two about some specific topics, such as gene editing technology or the KNIME Server REST API, but in general they also use the KNIME blog posts to learn new how-to procedures.
The last page of the pdf report document contains the word cloud of comments in English on the YouTube videos. We would like to take the opportunity in this blog post to thank the YouTube watchers for their kind words of appreciation.
In general, we have more views on the YouTube videos than on the blog post. It is also true that the KNIME TV channel started4 years ago, while the KNIME blog only 2 years ago. Since we have not set any time limits, the number of views are counted from each video/post uploading date. So, it is hard to conclude the proportion of KNIME users who prefer watching videos over reading posts.
Summarizing, in this experiment we tried to blend data from a weblog file with metadata from YouTube videos. Again, the most important conclusion is: Yes, they blend!
What about you? Which kind of KNIME user are you? A video watcher or a blog post reader?
News! If you are a video watcher type, we have a new treat for you. A full e-learning course to introduce new and old users to the secrets of KNIME Analytics Platform is now available on the KNIME web site.
This e-learning course consists of a series of short units. Each unit involves a brief YouTube video and often a dedicated exercise to allow data scientists to learn about KNIME Analytics Platform and ETL operations at their own pace.
So, if you are looking for something fun to do this evening, you can start exploring this new e-learning course. The intro page has a lot of information.
Figure 4. Final Report as PDF document. At page 2 and 3, we find 2 bar charts with the number of views for the top most watched KNIME tagged YouTube videos and with the number of views of the most read KNIME blog posts respectively. In the last page, the flattering word cloud from the watchers comments on the YouTube videos.

If you enjoyed this, please share this generously and let us know your ideas for future blends.
We’re looking forward to the next challenge. There we will find out if we can blend two different SQL dialects: Spark SQL and Hive SQL. Will they blend?
* The link will open the workflow directly in KNIME Analytics Platform (requirements: Windows; KNIME Analytics Platform must be installed with the Installer version 3.2.0 or higher)