Will they blend? The Blog Post Collection


The book “Will they blend? The Blog Post Collection” is now available and downloadable for free at the KNIME Press web page: https://www.knime.org/knimepress/will-they-blend

Data scientist has been named as the sexiest job of the 21st century, according to an article on Harvard Business Review.

One evening in September 2016, when some colleagues and I were sitting around, amiably cleaning up data – as one often does as a data scientist - we started a discussion about what the sexiest job of the 21st century actually entails. In addition to machine learning and parallel computing, it also involves data cleaning, data integration, data preparation, and other more obscure tasks in the data science discipline.

Data integration or, as it is called nowadays, data blending is a key process to enrich the dataset and augment its dimensionality. And since more data often means better models, it is easy to see how data blending has become such an important part of data science. Data blending is an integration process and, like all integration processes, its problem is the diversity of the players involved. So bring on the players!

In a first experiment, we need to get data from a database. The next experiment might require the same data but from another database, which means speaking a slightly different SQL dialect. Including some data from a REST service API could be useful too and when we’re talking REST APIs, we need to be able to parse and integrate XML and JSON formatted data structures. Of course, we can’t leave out the omnipresent Excel file. But what about a simple text file? It should be possible to integrate the data whatever its original format. The most advanced among us might also need to connect to a big data platform which takes us to the question of deciding which one - as they all rely on a slightly different version of Hive and/or Spark.

Integrating data of different types is another thing we data scientists have to take care of: structured and unstructured data, data from a CRM database with texts from a customer experience tool, or perhaps documents from a Kindle with images from a public repository. In these cases, more than with data blending we are dealing with type blending. And mixing data of different types, like images and text frequency measures, can be non-intuitive.

Time blending is another issue. For those of us who are somewhat vestige of an older analytics era, we often have to blend current data with older data from legacy systems. Migrations are costly and resource intensive. So legacy tools and legacy data easily ignore hypes and survive amidst modern technology.

Lab leaders might dream of having a single data analytics tool for the whole lab, but this is rarely a reality. Which quickly takes us from data blending to tool blending. Legacy tools need to interact with a few contemporary tools either operating in the same sectors or in slightly different sectors. Tool blending is the new frontier of data blending.

After so much discussion, my colleagues and I came to the conclusion that a series of blog posts to share experiences on the blending topic would help many data scientists who are running a real-life instance of the sexiest job of the 21st century.

Digging up blending memories on YouTube, we decided to experiment the craziest blending tasks in a “Will they blend?” blog post series. All posts from the series have now been collected in a book to pass data blending know-how to the next generation of data scientists.

I hope you will enjoy these blending stories as much as we did.

KNIME Analytics Platform version 3.3.2 is now available.


KNIME Analytics Platform 3.3.2 has been released, providing some minor bug-fixes. Check out the changelog. You can try it out for yourself by updating your existing KNIME installation using the "Update KNIME..." action in the "File" menu or downloading it from here.

KNIME Server 4.4.1 has been released alongside KNIME Analytics Platform 3.3.2. The release includes a number of minor bug-fixes. The compatibility matrix, and links to the updates are available here.

KNIME Spring Summit 2017 in Berlin - Many Firsts and Lasting Impressions


Guided analytics, the Internet of Things, speech analysis and model process automation were just some of the interesting topics discussed at this year’s KNIME Spring Summit in Berlin. It was also our 10th meeting of this kind, drawing the biggest audience of KNIME enthusiasts yet from around the world.

From the speakers who shared their many original ideas, the KNIME partners who demonstrated new ways of integrating KNIME to the many enthusiastic KNIME users who came and found out more about our Analytics Platform -  a big, warm, heartfelt “Thank you!” from us to everyone who made this Summit one of the best we have had yet.

We had a few new “firsts” in our summit format. We parallel-tracked sessions for the first time – one track for Life Sciences and one on Connectivity & Customer Intelligence.  We also ran an interactive panel session with KNIME users. According to the feedback we got, these seemed to have worked well and we’re planning a repeat next year. 

The Summit agenda along with many of the presentations are posted on our website. Share all of that freely and generously.  Whether you made the Summit or missed it, our next opportunity to share will be at the KNIME Fall Summit November 1-3 in Austin Texas.  Mark your calendars!


Follow KNIME on the Road!


We have been quiet for a while. Maybe you noticed that we kind of disappeared off the meetup map in the last few months of 2016?

Indeed, due to some resource over-booking, we didn’t manage to take KNIME on the road as much as we would have liked in the past few months.

But we are back now! We also bring with us a complete new series of presentations, targeting use cases in different sectors, innovative architectures, and new courses.

We start with a meetup Stuttgart (Germany) on April 3 on the Industrial Internet of Things and one in San Francisco (CA, USA) on April 5 on a new architecture, including big data and data lakes besides KNIME Analytics Platform.

In parallel we are commencing with a series of courses for 2017 that begins in Stuttgart (Germany) on April 3 about KNIME usage in IoT.

After that, you will find us in Istanbul (Turkey) on April 5 and in Melbourne (Australia) on April 13 for meetup events, in San Francisco (CA, USA) again on April 25-26-27 and in Zurich (Switzerland) on May 8-9-10 for 3 days of KNIME courses, in Bracknell (UK)  for a Cheminformatics workshop. And after that, in Seattle (WA, USA) on May 8, in Budapest (Hungary) at some point in May, in Barcelona (Spain) and so on.

Just keep a look-out for us! We might come close to your workplace and you will not want to miss it!

This is where you can find us in the next weeks.

(click on the image to see it in full size)

Location Date Type Title Topic
Stuttgart (Germany) April 3-4-5 Courses KNIME Analytics Platform & IoT IoT, basic and advanced KNIME
Stuttgart (Germany) April 3 Meetup Industry 4.0 Industry 4.0, IoT, Data Spaces with KNIME
San Francisco (CA, USA) April 5 Meetup Power up KNIME with the Cloud Big Data, Data Lakes, Azure, deep learning, and KNIME
Istanbul (Turkey) April 5 Meetup Finansta deep learning uygulamalari Deep learning
Melbourne (Australia) April 13 Meetup Custom Nodes for Bioinformatics Node development & Bioinformatics
San Francisco (CA, USA) April 25-26-27 Courses KNIME, Big Data, & Server KNIME Analytics Platform, Server, & Big Data
Bracknell (UK) April 25 Workshop KNIME Cheminformatics Workshop Cheminformatics & KNIME
Stuttgart (Germany) April 26 Meetup Data Analytics Best Practices Data Analytics
Zurich (Switzerland) May 8-9-10 Courses KNIME & Server KNIME Analytics Platform & Server
Seattle (WA, USA) May 8 Meetup KNIME Image Processing KNIME and ImageJ2

KNIME Strengthens Commitment to Open Source and Completes €20 Million Investment from Invus


Berlin, March 15, 2017 – Today at the KNIME Summit, KNIME.com AG announced that equity investor INVUS has invested €20 million in the company to support its ongoing work in transforming the data science industry.

“We’re excited to have an investor that truly understands open source and recognizes the potential for expanding across the enterprise,” says CEO Michael Berthold of KNIME.com AG. “Even though KNIME was already profitable and is growing strongly, we see a huge window of opportunity for our open source strategy combined with our vision of bringing Guided Analytics to the large group of users who have not been able to benefit from using advanced analytics to date.”

KNIME, with a growing group of software companies, believes that opening up previously closed or exclusive platforms, processes, tools, organizational boundaries and idea sourcing can speed up innovation while reducing risk. That understanding provides the basis for KNIME’s software development as well as its approach to working with the community in the analytics space.

Today, KNIME users can be found in large-scale enterprises in over 50 countries and across a wide range of industries including life sciences, financial services, publishers, retailers and e-tailers, manufacturing, consulting firms, government and research.

“We see KNIME as a company with a real competitive advantage and a vision for expanding on it,” say Mario Kaloustian and Wassim Sacre, Managing Directors at INVUS and Board Members of KNIME.com AG. “That, combined with passionate founders who care about growing the business long-term, an enthusiastic user community and an excellent platform, adds up to a clear opportunity.”

With independent analysts overwhelmingly acknowledging KNIME as a leader, recent interest from venture capital firms has been high. But it was important to KNIME to work with a partner that would not change the focus, the management team or – most importantly – the open source philosophy that has contributed so much to the company’s success.

“Invus is a perfect match for KNIME: they have a reputation for long-term involvement and have pledged to empower the KNIME management team. They also understand and support our open source model for bringing advanced analytics to many users,” says Berthold. “This will in no way change our dedication to open source or our relationships with our customers while enhancing our ability to help organizations tap into the benefits of advanced analytics.”

For more information, contact KNIME at info@knime.com, or download KNIME and discover it for yourself. KNIME – Open for Innovation.

About Invus

Since 1985, Invus has been an equity investor in companies who seek to transform their industries. Invus partners with owner-managers of private and public companies to help them achieve extraordinary business performance. Over its 30 year history, Invus has achieved both cash on cash multiples and annual internal rates of return that are at the very top of the private equity industry. Today Invus manages assets over $5 billion through an evergreen fund structure and has offices in New York, London, Paris and Hong Kong. Invus has invested in companies across a wide range of industries including consumer products and services, specialty retail, software, biotech, medical devices. Read more about Invus here.