Using KNIME to Find Out What Your Users Are Thinking

Author: Cathy Pearl, User Experience Consultant
 

Did you know there are nearly 4000 online dating sites out there?  If you’re a Sea Captain, an Ayn Rand fan, or love Star Trek, there is a dating site for you.

I’m not an online dater myself, but I’m fascinated by the science of attraction.  Looking at online dating habits is one way to evaluate modern-day courtship.  Nowadays, 33% of couples have met online (not necessarily through dating sites).  That number is projected to go up to 70% by 2040, which is not surprising, given how much of our lives we spend online these days.

I wanted to find out what people really think about online dating.  I decided to use Twitter data, because it’s free, and it’s plentiful.  I chose to look at Tweets about three of the most popular dating sites out there:  Tinder, OkCupid, and eHarmony.  

Lucky for me, KNIME now has a Twitter node.  It makes it very easy to grab a whole bunch of public Twitter data using whatever keyword searches you want.

Before I get into how to get Twitter data using KNIME, a little background on Twitter:  although the data is free, Twitter limits the amount it will actually give you access to.  Twitter will send back only about 1% of its total firehose of data, so remember, you’re only getting a glimpse of the Twitterverse when you do a data grab. Hopefully, this 1% represents a statistical sample of the full tweet ensemble.

Also, remember that the data you get on Twitter might be skewed, that is, not necessarily representative of how everyone thinks.  Not everyone Tweets!  Here’s a good article on the pitfalls of relying on Twitter data to make general conclusions.

Ok, let’s dive into getting that Twitter data!

Step 1:  Set Up Your Twitter Account

To get access to Twitter’s (public) data stream, you need a developer account, which is free.  Go to https://dev.twitter.com/ and sign in (if you already have a Twitter account, use those credentials).

From there, go to Tools (at the bottom of the dev.twitter.com home page) and click on “Manage Your  Apps”.  Click on “Create New App” and fill in the fields (don’t worry about a website;  you can just put a placeholder URL, like www.placeholder.com).  

Now you have Twitter credentials.

Step 2: Set up your Twitter API Connector

In your KNIME workflow, add a Twitter API Connector node.  To configure, you will need to fill in four fields from your developer account:

Go to the Application Management page on dev.twitter.com by clicking on the application you created in Step 1.  Under Application Settings, click on “manage keys and access tokens”.  

The API key maps to the Consumer Key, and the API secret, to the Consumer Secret.  Access token and Access token secret map to the same names.

See also @KNIME Twitter Nodes about how to set up and use the KNIME Twitter nodes.

Step 3:  Start Searching Twitter

There are different ways to access Twitter data, by looking in the past, or by grabbing Tweets as they are posted.  The KNIME Twitter Search node looks for Tweets matching your search term that have occurred in the past (generally in the last week).  

To make a simple Twitter Search, add a Twitter Search node, and fill in a query, how many rows you would like, and whether you’d prefer recent Tweets or most popular (or a mixture of both).  To learn more about how the query field works, check out https://twitter.com/search-home and click on “operators”.  

Examples

twitter search      containing both "twitter" and "search". This is the default operator.
“happy hour”        containing the exact phrase "happy hour".
love OR hate        containing either "love" or "hate" (or both).

For my Twitter analysis, I wanted to get Tweets using three different sets of terms, and I wanted the distribution to be as even as possible.  So rather than putting “tinder OR eharmony OR okcupid”, I ran them as three separate queries.  To do that, I needed a loop.  Here’s the workflow:

In my Table Creator node, I specified the different ways I wanted to search for the three dating sites:

In the Twitter Search node, I set the number of rows to 10,000.  I chose “recent” for the type of Tweets, because I did not want my data to be biased by only getting the most popular ones.

I used a Cell Replacer node to append the name of the dating site to my data, added a Table Writer node to write out my data, and ran it.

My search took a few minutes on my MacBook Pro, but a word of caution:  Twitter limits the amount of queries you can run within a given timeframe.  I recommended starting with a very small query to make sure your workflow is working properly, and then up the number of rows/queries you wish to make.  If you get a timeout error from Twitter, you need to wait until the next window of time opens and try again.

Analysis

After I ran my Twitter query, I had about 21,000 Tweets.  I had requested 30,000 (10,000 for each of the three dating sites) but Tinder was the only one to return 10,000.  Tweets about Tinder are much more common than Tweets about eHarmony or OkCupid.

My hypothesis was that people who were Tweeting about Tinder were saying different things than the ones Tweeting about eHarmony or OkCupid.  Tinder is a very popular new app that you “play” on your phone:  you are shown a photo of someone who is in your area, and if you like them, you right-swipe.  If that person has also right-swiped on your photo, it declares you’re a match, and you can chat with him or her.  

Tinder has a reputation as an app used to hook up, not for people looking for serious relationships (although a few marriages have come from Tinder matches).  eHarmony, on the other hand, requires a paid membership, and is generally for people looking for more serious commitments/marriage.  eHarmony requires filling in a lengthy questionnaire, and only shows you a few matches at a time.

I took my Tweets and did some data cleanup (removed retweets, got rid of punctuation, capitalization, numbers, etc).  I also tagged the parts of speech, and kept nouns, verbs, and adjectives.  Finally, I grouped by term frequency, and created my tag clouds.  Here’s Tinder:

And here’s eHarmony:

I was hoping to find lots of immediate and interesting differences.. but it didn’t turn out to be quite so simple.

I found that a lot of the words were related to whatever viral story was going around at the time.  For example, in the Tinder results, the words “fast-growing,” “truth”, and “disruptions” were all related to this New York Times story. And the words “Jamy,” “20-something”, and “quest” were from this woman’s blog about 40 dates in 40 nights.

For eHarmony, popular stories people were Tweeting about included “More couples meet on Twitter than eHarmony” (never found a source for that one), “dimensions” (eHarmony has 29 “personality dimensions” to match people), and “commercials”:  mostly negative comments on eHarmony’s latest ad campaign.

I did find a couple of interesting differences, such as the word “kids”:  it was the 86th most popular term for eHarmony Tweets, and 614th for Tinder!  The word “meet” was also much more popular on eHarmony (9th) than Tinder (50th).

My plan is now to extend the current analysis to a larger amount of data. In particular to observe the user experience evolution through Tweets over time, for example by collecting data on a weekly basis.  On a wider timeframe it should be possible to observe short spikes due to temporary stories/gossips/events and some less temporary trends about dating and dating sites emerge for the longer run. On this more stable data, I would like to run some analytics to monitor topic shift over time about each dating site and about dating preferences in general. It would also be interesting to try to predict the future importance of new dating topics by using machine learning algorithms, and ultimately to predict the importance and user niche of each one of the observed dating site.

If you want to know how that turns out, come see me in Berlin at the KNIME UGM2015!  Or check out my blog, Love Data, to find out about sweaty t-shirt parties, common problems with online dating sites (and how to fix them), and how wearable tech can help relationships.

Article Author: Cathy Pearl

Cathy Pearl is an user experience consultant who has designed everything from helicopter pilot simulators at NASA to a conversational iPad app that has Esquire Magazine's style columnist tell you what you should wear on a first date.  She has a B.S. in Cognitive Science from UCSD and an M.S. in Computer Science from Indiana University.  She loves to use KNIME to see what people in the online dating world are Tweeting about.  Check out more of her work on her blog, Love Data: http://lovedatablog.wordpress.com.

Requirements

- KNIME Twitter Nodes

- KNIME Text Processing extension

Further Reading:

- @KNIME Twitter Nodes

- Sentiment Analysis