Balancing privacy and ethical rights when detecting harmful behaviors

Over the past few years, companies like Google, Twitter, or Facebook have had to walk a fine line between policing negative behaviors such as hate speech, terrorism, fake news, etc., and supporting basic rights of free speech online. With the increase in aggressive, terrorism, cyberbullying, hate speech, and racist communication, as well as expressions of depression or suicidal thoughts on social media, there are however increased pressures and incentives on these organizations and researchers (like myself) to develop fully or semi-automated methods to detect these negative behaviors. However, as pointed out in my dissertation, balancing the prevention of these harmful behaviors with privacy and ethical rights is a challenge, but this conversation is something that should be had.

In my PhD research, for instance, I aimed at building antisocial behavior detection models that could be incorporated into real-world systems and act as early warning systems for security enforcement organizations or institutions like schools. Although the research respected privacy rights, with these types of detection there is always the concern over the access and analysis of personal information, with concerns that it might be just be a step away from ‘big brother’.

Facebook, with its almost two billion users and use of sophisticated algorithms, privacy and ethical rights are always a question mark. This year, for example, it introduced a semi-automated depression detection algorithm that analyzes and spot patterns of posts that are potential suicidal, and then sends them to the Facebook team for appropriate response [1, 2]. Earlier, this identification process used to be only manual.

It is true that a lot of data is public or that users on certain platforms give away the rights to their data. This makes it easier to develop these detection algorithms, however, it might not be the wish of a Facebook or twitter user to have their data analyzed for purposes that they have not given their consent to. This was for instance observed when it was reported in June 2014 [3] that Facebook was using status updates to manipulate users’ moods and observe how that manipulation translated to their status updates. Even though with Facebook one does agree to their data policy when using their application, people however are often not aware of what is being done with their data and it is hard to draw the line of whether it is alright or if it is infringing on privacy rights. Though, an argument could be made that we must just accept that anything that is posted on social media platforms is not private.

Ethical and privacy issues do arouse real concerns that have an impact on the broad areas of developing and using detection algorithms to detect harmful behaviors. If data is public or accessible, should researchers or organizations still ask for consent to use the data for research or development purposes? If yes, to whom should the consent be acquired from? With so much available data, it is the task of researchers and practitioners to make sure that the data acquired is only used for good. But how do we measure what is good and what might be considered invasive or infringement of free speech? It might be argued that if data and technology are used to prevent crimes or improve quality of life, then it should be allowed. However, under those reasons, organizations can have the excuse or reason to automatically monitor and prevent the posting or sharing of messages that are deemed harmful from their perspective.  With such policing, one might wonder if uprisings like those in Egypt in 2011 which made use of social media to organize, schedule, and spread the uprisings would have been possible [4, 5].

It is hard to see what a good solution to this would be, but perhaps to take advice from Ray Kurzweil [6], maybe the answer to these ethical and privacy concerns is to have a set of standards that are established through a whole social discussion between technologists and society, within and across different societies.

 

References:

[1] Matt Burgess, (6 March 2017), How tech giants are using AI to prevent self-harm and suicide, Wired, http://www.wired.co.uk/article/facebook-safety-self-harm-suicide-ai-instagram, (visited on 2017-04-27).

[1] Natt Garun, (1 March 2017), Facebook leverages artificial intelligence for suicide prevention, The Verge, http://www.theverge.com/2017/3/1/14779120/facebook-suicide-prevention-tool-artificial-intelligence-live-messenger, (visited on 2017-04-27).

[3] Kashmir Hill (28 June 2014), Facebook manipulated 689, 003 users’ emotions for science, Forbes, https://www.forbes.com/sites/kashmirhill/2014/06/28/facebook-manipulated-689003-users-emotions-for-science/#4736d97197c5, (visited on 2017-04-27).

[4] Erick Schonfeld, (16 February 2011), The Egyptian behind #Jan25: “Twitter is a very important tool for protesters”, TechCruch, https://techcrunch.com/2011/02/16/jan25-twitter-egypt/ (visited on 2017-04-29)

[5] Sam Gustin, (11 February 2011), Social media sparked, accelerated Egypt’s revolutionary fire, Wired, https://www.wired.com/2011/02/egypts-revolutionary-fire/ (visited on 2017-04-29)

[6] Bill Joy and Ray Kurzweil, (12 July 2001), Future shock: High technology and the human aspect, Hoover Institution, http://www.hoover.org/research/future-shock-high-technology-and-human-prospect (visited on 2017-04-27).

Detecting antisocial behavior in text

The words we use and our writing styles can reveal information about our preferences, thoughts, emotions and intentions. Using this information, I developed machine learning models that can detect antisocial behaviors, such as hate speech and indications of violence, from texts, as part of my recently defended PhD dissertation, titled “Leveraging emotion and word based features for antisocial behavior detection in user-generated content.”

Historically, most attempts to address antisocial behavior have been done from educational, social and psychological points of view. My PhD research, however, demonstrated the potential of using natural language processing techniques to develop state-of-the-art solutions to detect antisocial behavior in written communication.

The research created solutions that can be integrated in web forums or social media websites to automatically or semi-automatically detect potential incidences of antisocial behavior with high accuracy, allowing for fast and reliable warnings and interventions to be made before the possible acts of violence are committed.

One of the great challenges in detecting antisocial behavior is first defining what precisely counts as antisocial behavior and then determining how to detect such phenomena. Thus, using an exploratory and interdisciplinary approach, I applied natural language processing techniques to identify, extract, and utilize the linguistic features, including emotional features, pertaining to antisocial behavior.

The research investigated emotions and their role or presence in antisocial behavior. Literature in the fields of psychology and cognitive science shows that emotions have a direct or indirect role in instigating antisocial behavior. Thus, for the analysis of emotions in written language, the research created a novel resource for analyzing emotions. This resource further contributes to sub-fields of natural language processing, such as emotion and sentiment analysis.

Because a problem in researching antisocial behavior in written language was that there was no adequate collection of texts, the research, in addition, created a novel corpus of antisocial behavior texts. The corpus allowed and will continue to allow for gaining deeper insight and understanding of how antisocial behavior is expressed in written language.

The study showed that natural language processing techniques can help detect antisocial behavior, which is a step towards its prevention in society. With continued research on the relationships between natural language and societal concerns and with a multidisciplinary effort in building automated means to assess the probability of harmful behavior, much progress can be made.

Doctoral dissertation is available for download at: http://epublications.uef.fi/pub/urn_isbn_978-952-61-2464-3/index_en.html

In the press:

Hilary Lamb (13th April 2017), Computers taught to recognise hate speech and violent language, Engineering and Technology, https://eandt.theiet.org/content/articles/2017/04/computers-taught-to-recognise-hate-speech-and-violent-language/

University of Eastern Finland (12 April 2017), New machine learning models can detect hate speech, violence from texts, ScienceDaily, www.sciencedaily.com/releases/2017/04/170412091222.htm

Terhi Nevalainen, (11th April 2017), Tietokone voi tunnistaa terroristin, Karjalainen, http://www.karjalainen.fi/uutiset/uutis-alueet/kotimaa/item/138639-tietokone-voi-tunnistaa-terroristin

Trump speech generation using Markoc chains

Moving from Trump speech analysis (see previous post) to speech generation, in this post I investigate the possibilities of automatically generating text that could plausibly have been by Trump. For the generation, I experimented with using Markov chains. Markov chains, named after Andrey Markov, are stateless mathematical models that describe sequences of possible events. They are used in several real world applications such as autocomplete suggestions, speech recognition, text identifiers, path recognition, and in many other artificial intelligence tools.

A Markov chain tells the probability of transitioning from one state to any other state. In text generation, Markov chains learn the probability of the next word or character given one, two, or more words or characters. In this post, I will not delve into the details and mathematical formulation of Markov chains. For a visual description of how Markov chains work, see Victor Power’s site.

As this was my first practical experiment working with Markov chains, I thus began by using python code written by Nevo [1], which I then slightly modified for the purpose of generating Trump like speeches.

I started by creating a collection of Trump’s speeches. The collection included words from his inauguration speech and his first and final presidential debates. In total, the text collection consisted of 16,667 words. I first trained a Markov chain model on uni-grams (one word), then bigrams (two words) and trigrams (three words) and continuously increased the n-gram order. Then the model was given a random word or words to begin from, and it started generating the rest. I limited the length of generated text to 150 words.

Below are the results. What do you think, do the following texts sound like a speech made by Trump?

A: Text generated from model learned with unigrams

“I am a brilliant community, you tried to happen. And what do not show that. It’s an airplane on to that. I mean, I told them $1,500, and peaceful transfer of state when you right around election time, including a small group came out about young and that’s true throughout the true way you find this? Is it was so bad. First of our country. And I am going to Aleppo? OK, so many different reality exists: mothers and I should have taken the war in every year. You haven’t even worse, and I can’t do damage to be obsolete, because of the likes of them to have led our country, whether a routine audit. And I’ll go forward to myself, “I can’t bring into the clips where so much tougher and look — now, and other country, you would be a disaster.”

B: Text generated from model learned with bigrams

“We share one heart, one home, and one glorious destiny. The oath of allegiance to the manufacturing industry. You go to Mexico and other countries, while allowing for the presidency of the people she wants to make America great again. Thank you. Today’s ceremony, however, has very special meaning because today, we are not keeping up with the nations of the greatest assets anywhere in the world, beyond the United States got along well, that would help you, I would like to… It’s not an accurate one at all. It’s not an accurate one at all. It’s not nice. And I think I’ve ever seen in any government whatsoever. Wrong. CLINTON: … it is… Lester, how much? Typical politician. All talk, no action. Sounds good, doesn’t work. Never going to create new deals.”

C: Text generated from model learned with trigrams

“We, the citizens of America, are now joined in a great national effort to rebuild our country and our economy generally is — look, we owe $20 trillion. We cannot do it any longer, Lester. HOLT: Back to the question, though. How do you bring back — specifically bring back jobs, American manufacturers? How do you miss $6 billion? You ran the State Department, $6 billion was missing. How do you bring back — specifically bring back jobs, American manufacturers? How do you miss $6 billion? You ran the State Department, $6 billion was missing. How do you make them bring the jobs back? Well, the first thing you do is don’t let the jobs leave. The companies are leaving. I could name, I mean, there are thousands of them. They’re leaving, and they’re leaving in bigger numbers than ever.”

As can be seen, the text generated by the model trained with unigrams (A) is nonsensical. With the bigrams trained model (B), the text starts getting a bit better, but still does not make much sense. With the trigrams trained model (C), the text begins to sound like some thing Trump would say.

From my experiments, trigrams were identified as the highest possible order before the generated text resulted in direct quotations of the text used in the training and thus the models were no longer generating text per se. I also tried generating longer texts of more than 150 words, but it resulted in nonsensical texts. This is because the Markov chain models do no have a memory beyond the set n-grams. Thus, new sentences might be generated with topics that are not related to prior sentences.

Based on this experiment, I would say that Markov chains are not the best approach in generating speeches of considerable length, and future experiments involve identifying better approaches. For short texts, like Tweets, I have observed good results being achieved by recurrent neural networks. As an example, a Trump Twitterbot @DeepDrumpf produces quiet Donald Trump sounding tweets, for instance the following tweet “America has never been more harmed by the  vote. I made a lot of money on that. I am doing big jobs in places, now everything is Benghazi.” Thus, a most likely next experiment will be to experiment with recurrent neural networks for generating longer texts.

 

Source:

[1] Omer Nevo (2016). Poetry in Python: Using Markov Chains to generate texts. https://www.youtube.com/watch?v=-51qWZdA8zM

Trump’s usage of adjectives and adverbs

By now, many of us have heard President Donald Trump speeches, or at least snippets of them. One thing I have noticed, among many other things, is that he tends to use a lot adjectives and adverbs or at least I always get the notion that they are many. Most likely, it could be just that it’s the same short adjectives and adverbs that are repeated over and over and thus sounds like he uses them a lot, for example in the following phrases; “…build a very huge wall”; “It’s going to be really great“; “so sad, tremendously sad, greatest sadness ever.”

Since the overuse of adjectives and adverbs can be seen as embellishing and can clutter sentences pointlessly, especially in formal speeches, I was curious about how Trump actually uses adjectives and adverbs in comparison to for instance the former president Obama, whose speeches have been said to be more eloquent.

To investigate, I compared transcripts of the inauguration speeches by Trump (2017) and by Obama (2009, 2013), and their first news press conference as president elect (Trump in 2017 and Obama in 2008). The press conference speeches included the president elects’ answers to posed questions by reporters. Table below summarizes the number of words in each speech. From the table, we can observe that there was a greater range in the number of words used by Trump in his first news press conference and inauguration speech, while Obama’s three speeches are relatively about the same in the number of words.

Table: Total number of words in presidential inauguration speeches.

For the analysis of the adjectives and adverbs, I made use of TreeTagger. TreeTagger is a tool for annotating text with part-of-speech tags. Part-of-speech (POS) tagging is the process of marking up words in a text with their part of speech, e.g., noun, verb, adjective, adverb, etc.  After performing the part-of-speech tagging, I retrieved for each speech, only the word-POS pairs, where the POS tag was an adjective or adverb. From the retrieved list, I performed a comparative analysis of the usage of the adjectives and adverbs between Trump and Obama in their inauguration speeches (A) and in their first news press conference as president elect (B).

A. Inauguration speeches

The figures below show the distribution of adjectives and adverbs in each inauguration speech, as a percentage of the total words in the speech. From the figures, we can see that there is a higher use of adjectives and adverbs in Trump’s inauguration speech than in both Obama’s 09 and 13 speeches.  Interestingly from the figures, we can see that Obama’s use of adjectives and adverbs has relatively been the same across both of his speeches.

 

From the results alone, it is difficult to judge whether they indicate an under, normal or over-usage of adjectives and adverbs. Thus, to get some indication, I included in the figures a LIWC mean score for both the adjectives and adverbs. The LIWC mean score was obtained from the popular Linguistic Inquirer Word Count (LIWC) text analysis tool. The tool includes a dictionary of words built from analyzing over 100,00 files of text , representing over 250 million words. In building the dictionary, it was identified that on average, adjectives constituted 4.49% of the total words and adverbs 5.27%. Thus, from the figures we can observe that while both Trump and Obama used significantly a higher number of adjectives than the LIWC mean, with adverbs, Obama is around the average, while Trump is visibly above.

Unfortunately, since majority of the text files used in developing the LIWC tool were not political type of speeches, a bigger comparison with other speeches to identify what is ‘average’ in the political context and where Trump’s speeches fall, would need to be conducted.

Exactly which adjectives and adverbs does Trump uses, was the focus of my next analysis. Figure below reveals the top 20 adjectives and adverbs that were most frequently us by Trump in his inauguration speech. Using those same 20 words, I identified how frequently they appeared in Obama’s speeches. The results are also shown in the figure.

From the figure, we can see Trump’s speech had high usage of the words “Americans,” “again,” “back,” and “great.” This is reflective of his inauguration speech, which was America-centric and was focused on making America great again and bringing things back to Americans.

Surprisingly one of the most frequent term in the three inauguration speeches is the adverb “not”, with Obama using it more often in his speeches than Trump. For example, it was used in Obama’s 13 speech in the following phrases: “our journey is ‘not’ complete until our wives…”; “‘not’ complete until our gay brothers…”; “‘not’ complete until no citizen…”

The adverbs “new” and “now” were also emphasized in all the inauguration speeches, perhaps indicating the presidents’ desires to bring in new things now or bring in change as presidents.

B. News press conference speeches

Moving a bit back in time from the inauguration speeches to the then president elects’ first news press conference, I analyzed how they used adjectives and adverbs. The analysis revealed that in his press conference, Trump used 5,95% adjectives while Obama used 6,97% and Trump used 7,79% adverbs while Obama used 5,61%.

In addition, I also looked at the top ten most frequent adjectives and adverbs. These are shown in the figures below.

From the figures, we can see the different usage of adjectives and adverbs. In particular, the adverb “very” is used significantly more by Trump than Obama. It was used by Trump in phrases such as “I am going to work very hard”; “I’m very proud..”; ” I look very much forward”; “… going to have a very, very elegant day.”  From the press conference results, we can also see that there is a difference in the length of adjectives and adverbs used. Specifically, the average length of the adjectives and adverbs used by Trump is 4 characters while for Obama it is 5,8 characters.

In summary, this analysis has mostly served to reveal the actual usage of adjectives and adverbs in Trump’s speeches. It is interesting to see for instance the change in the top adjectives and adverbs used by Trump from the press conference to the inauguration speech. Notably, the adverb “very” was used significantly less in the inauguration speech. In addition, we can see that from all the speeches analyzed, there was a tendency for Trump to use more adverbs than adjectives when compared to Obama. However, due to the small sample of speeches analyzed, it is not possible to make any conclusive deductions. Further studies will need to be conducted.

 

Sources:

Trump’s 2017 inauguration transcript – https://www.washingtonpost.com/news/the-fix/wp/2017/01/20/donald-trumps-full-inauguration-speech-transcript-annotated/?utm_term=.7c244dd73119

Trump’s 2017 news press conference transcript – https://www.nytimes.com/2017/01/11/us/politics/trump-press-conference-transcript.html?_r=0

Obama’s 2013 inauguration transcript – https://www.theguardian.com/world/2013/jan/21/barack-obama-2013-inaugural-address

Obama’s 2009 inauguration transcript – http://abcnews.go.com/Politics/Inauguration/president-obama-inauguration-speech-transcript/story?id=6689022

Obama’s 2008 news press conference transcript – http://www.washingtonpost.com/wp-dyn/content/article/2009/03/24/AR2009032403036.html

Any change in diversity with the 2017 Oscar nominations?

When the 2015 and 2016 Academy Award nominations were released, many in Hollywood and on social media were deeply offended by the lack of racial diversity among the nominees, especially in the prominent categories of best actor / actress, and best supporting actor / actress, where only white actors and actresses were nominated. So did any changes take place in 2016 that are reflected in the 2017 nomination list?, especially the representation of people of color in the nomination list?

After the 2015 nomination announcement, the #OscarSoWhite and #OscarNorms hashtags began trending on Twitter as that was the first since 1998 that no person of color, Hispanic or Asian was nominated for the Academy Awards in the acting categories. Moreover, when the 2016 nominations were released and again the acting categories were only white people, the #OscarSoWhite hashtag resurfaced, with an outcry on social media and a boycott of the Oscars spearheaded by Jada Pinkett-Smith and Spike Lee. The outcry after the 2016 nominations was also a result from many feeling that 2015 had produced material that was worthy of nominations and that the Academy had passed over well-reviewed performances in the movies Creed, Straight Outta Compton and overlooked prominent actors of color like Idris Elba (Beasts of No Nation), Michael B. Jordan (Creed), Will Smith (in Concussion), and the many young actors in “Straight Outta  Compton”. And even when there was nominations for these movies, only white people were nominated. For example, for the movie Creed, only Sylvester Stallone was nominated, while the film’s black writer-director, Ryan Coogler as well as the lead actor, Michael B. Jordan, were not nominated.

Because of the upset from the 2016 nominations, Academy president Cheryl Boone Issacs ushered in new membership rules and added 683 new members as a way to diversify a predominantly white, male and elderly group. The academy now numbers 6,687 people. These are the first Oscars voted since this change  [source].

So did 2016 bring in new opportunities for diversity at the Oscars? The 2017 Oscar nominations were released last week on the 24th January and in this post I analyze the changes, if any, that have occurred in this year’s nominations in comparison to those of 2015 and 2016. For this post, I only analyze the nominees in the acting categories as well as the directing categories, this is because many on social media also felt that there were directors of color that were passed over by the Academy, for example Ava DuAvernay for her work in Selma.

Table below summarizes the nominees for the 2017 Oscars in the four acting categories.

In comparing the above nominees to previous two years, the Figure below shows the ethnic composition among the nominees in the four acting categories. The figure captures the distribution of white people (blue), people of color (orange) and other ethnicities – ‘other’ (grey).

From the figure above we see that there has been a positive change in the number of nominations for people of color in 2017 when compared to 2015 and 2016. Especially in the best supporting actress category, where majority of the actress nominated are black.

I further looked at the changes, if any, among the nominated directors. These include directors nominated in the Best Director category and the directors of the documentaries included in the Best Documentary Feature category. Figure below gives the distribution of white people (blue), people of color (orange) and other ethnicities (grey) in the years 2015 to 2017.

From the figure above, we see that the 2017 nomination list had a higher representation of persons of color than 2015 and 2016, especially for the Best Documentary Feature category. Notably, four out of the five directors in the nominated feature documentary are persons of color: Ava DuVernay, Raoul Peck, Roger Ross Williams, and Ezra Edelman. In particular, among the nominated directors for both film and documentaries, there is still a dominance of males, with only two females in 2015 (Laura Poitras and Rory Kennedy), one in 2016 (Liz Garbus) and one in 2017 (Ava DuVernay), all being in the Documentary Feature category. Ava DuVernay is actually the only director who is female and a person of color among the analyzed three years.

So, has there been more diversity among the 2017 nominees? Based on the analysis above, this year’s nominations definitely have a higher representation of persons of color in the acting and director categories than the previous two years. With a total increase from zero nominees in 2015 and 2016 to  11 nominees in 2017 (see Figure below).  In particular, the biggest change is reflected among the best supporting actress and best feature documentary categories. However, There is still a small number of other ethinicities represented, even in the 2017 nominations.

There is no reason however to see these changes as due to the social media outcry or the boycott. Many of these movies have been years in the making and all the actors and actress are on those lists due to their own merits. More good quality movies directed, produced and acted by a diverse set of people will lead to more diversity in the Oscar nominations.

 

(P.s: If you notice any mistakes in the data above please let me know.)

 

 

 

 

 

 

 

Using content imagery experiments to increase Netflix viewership

If you are like me, you might have experienced long tiresome moments when browsing and searching for something interesting to watch on Netflix. Majority of this browsing is usually constant scrolling up and down through the images of the content until I find something that catches my eye. When I do, I usually read the story of the title, the actors and rating to see if its worth a try.  But often, there is nothing that catches my interest and I quit or settle on an old Friend’s episode.

As a data collecting savvy company, I wondered what Netflix did with this user behavior and whether it performed any experiments to see how the imagery affects the content watched.

In following up with this wondering, I stumbled upon two articles from Netflix, detailing the experiments they do with content imagery as a way to identify ways to improve viewership. Netflix knows that with all the large amount of content they provide, they have a short time to capture the attention and interest of users. Particularly, since the human brain can process an image in as little as 13 milliseconds [1]. In addition, they understand that imagery of their content is the most efficient and compelling way to do that [2]. In one of their consumer research studies, done in 2014, they found that the imagery was the biggest influencer to a user’s decision to watch content, and that it also constituted over 82% of users focus while browsing the content. They also found that users spent an average of 1.8 seconds considering each content that is available on Netflix, which is a very, very short time.

Knowing this, Netflix thus conducts several experiments to try and find a ‘right’ imagery that captures the attention and interest of the users in that short amount of time. The experiments are usually run using A/B testing (explained here). During the experiments (detailed more here), Netflix collects various measurements such as click through rate, aggregate play duration, fraction of plays with short duration, fraction of content viewed, etc. [3].

In this post, however, I would like to share some of their key learnings from conducting imagery experiments as a way to improve their service offering [2].

1. Images showing faces with complex emotions outperform stoic or benign expressions. They identified that seeing a range of emotions actually compels people to watch a content more. This could be because complex emotions have a better ability to convey a large amount information regarding the tone or feel of the content. This was observed for instance in their testing of a ‘right’ image of the second season of Unbreakable Kimmy Schmidt.

kimmy-schimidt

2. Regional differences still exist, and are important for some content and imagery. A good example of when they observed the importance of regional presentation of a content and how it can impact its discovery among users around the world, was with the Sense8 TV Show. Sense8 has an international cast and storylines that give it a diverse appeal and makes it resonate with varying types of people. Thus, when developing the imagery for Sense8, this diversity was reflected in the final images, showing how much they varied between different countries and cultures.

 sense8

3. Personally, I have thought that villains more than heroes make or destroy a movie, especially in the case of action movies. Thus it was not surprising to me that one of Netflix’s finding was that using visible, recognizable characters (and especially polarizing ones) results in more engagement. In particular, Netflix found that their users respond to villainous characters surprisingly well in both kids and action genres. For instance, in Dragons: Race to the Edge, the two images of villainous characters seen below significantly outperformed all others.

 dragons-tvshow

4. Three is apparently the upper limit of cast size one can show in the imagery. This is particularly for small sized artwork where a large cast size is not as effective in helping users decide to play a content. For huge billboards this might be the opposite though. During experimentation, Netflix observed this preference when they saw a drop tendency when an imagery contained more than three people. This finding directly informed their imagery for Orange is the New Black.

Season 1

orange-s1

Season 2

orange-s2

Season 3

orange-s3

Based on the experiments, it is clear to Netflix that using better images to represent content significantly increased overall streaming hours and engagement from their members [2]. Many of the above four findings are intuitively known by many of us Netflix users, but it is always interesting to read what companies are doing to improve their offering and product experience.

Sources:

[1] Trafton, Anne (16-01-2014), In the Blink of an Eye. MIT News. Retrieved on 23rd July 2016 http://news.mit.edu/2014/in-the-blink-of-an-eye-0116

[2] Nelson, Nick (03-05-2016). The Power of a Picture. Netflix Media Center. Retrieved on 26th June 2016, https://media.netflix.com/en/company-blog/the-power-of-a-picture

[3] Krishnan, Gopal (03-05-2016), Selecting the Best Artwork for Videos through A/B Testing. The Netflix Tech Blog, 26th June 2016, http://techblog.netflix.com/2016/05/selecting-best-artwork-for-videos.html

Keep data as a tool and the brain as the decision maker

Having been working with in the area of data and text analysis, and experiment-driven software development for a couple of years now, combined with my occasional enjoyment of well-made TV shows, I naturally found myself listening to Sebastian Wernicke’s TED talk to the finish. The talk is about using data to make (smart) decisions. I found the talk quiet relevant in today’s world of big data, and increased availability and accessibility to software usage data. Big data has rapidly moved into many real-life decision making processes in the workplace, law enforcement, medicine, etc., where serious decisions are being driven or aided by data.

What I liked from Wernicke’s TED talk titled ‘How to use data to make a hit TV Show’ was that it was a reminder that even though access to huge data has opened up many opportunities in various fields, data is still just a tool and decisions should not be solely driven by it. Wernicke emphasizes that the thing between our ears, i.e., brain (taking into account that that thing has the expertise to make sense of what the data is saying), should be the driver of decision-making.

In the talk, Wernicke gives two examples from two very competitive and data-savvy companies (Amazon and Netflix), who both collect and analyze millions of data points from their customers/users, to illustrate his point. In one example, big data was used successfully (in the case of Netflix with the House of Cards TV show) and in another example, not so successfully (case of Amazon in their creation of the Alpha House TV show).

As Wernicke explains, when Amazon wanted to create a TV show, they started by taking various ideas from people. From those ideas, they selected eight TV show candidates. Then they made a first episode of each one of these eight shows and put them online for free for the public to watch. Then Roy Price (Head of Amazon Studios) and the team at Amazon recorded everything, i.e., from when somebody pressed play, pressed pause, what parts they skipped, what parts they repeated, etc. They collected millions of data points because they want to use those data points to then decide which shows they should make. They did all the data crunching from those millions of collected data points and an answer emerged. And the answer in this case was that “Amazon should do a sitcom about four Republican US Senators”, and they made that show. They used the data to drive their decision making and ended up with a show that was not so successful – ‘Alpha House’ (Alpha House has an IMDB score of: 7.6).

This not so successful case of Alpha House was compared by Wernicke to the more successful story of Netflix in their creation of House of Cards (House of Cards has an IMDB score of: 9.0).  Netflix approach was to start by looking at all the data they already had bout their viewers such as the viewer ratings, viewing history, and so on. Then they used that data to explore and discover little bits and pieces about the audience: what kinds of shows the viewers liked, the producers they liked, the actors, etc. With all these pieces collected, they took a risk and decided to license a drama series bout a single senator which was ‘House of Cards’.

Decision-Making

Wernicke uses these two examples to explain the difference in how data can be used to make decisions. In the talk, he continues to explain that whenever we, as humans, are solving complex problems, we are essentially doing two things: The first is to break that problem into bits and pieces so that we can deeply analyze each of those bits and pieces. The second part involves then putting all these bits and pieces back together to come to our solution – and sometimes, this is an iterative process. Data and data analysis are only good for the first part, that is, no matter how powerful that data and data analysis is, it can only help us in taking a problem apart and understanding its pieces. It’s not suited to putting those pieces back together again and then to come to a conclusion. Wernicke points out that there is another tool that can put the pieces back together, and its available to all of all of us, this tool of course is the brain, and one of the things it is good at, is taking those bits and pieces back together again, even when there is incomplete information, and coming to a good conclusion. But to come to a good conclusion, that brain has to have some expertise.

Of course data helps us see what we might miss, find new avenues or business ideas. Thus in our data analysis work, we must get the balance right. Yes, “more data is better and can deliver brilliant insights, but in the end it has to be integrated by expert human brains for complex issues like producing a brilliant TV show”, Wernicke states. Humans as decision makers still need to be part of the data analysis equation, but what this also means for data scientists is also to have the expertise to make ‘often’ right conclusions or risks, Wernicke adds. The sentiment of the brain as the decision maker was also echoed by Beverly Wright, executive director at the business analytics center at Georgia Tech, in a Keynote Panel Discussion at Global Big Data Conference  [Source].

From the two examples that Wernicke gives, making the right conclusions often as a data scientist or knowing when to take risks, comes with practice. For instance, from the time when those two TV show examples were created, both companies have learned a lot since then. For example, this year, 2016, Amazon had two original series nominated for the golden globe awards.