In defense of “nothing interesting”

A tribute to useful, but less interesting research findings

A few years ago as a Data Scientist I was presenting to co-workers an analysis I’d been working on. The presentation went fine and the work was well-received, but I could tell the group was a little underwhelmed. Towards the end of the presentation, one co-worker asked, “Did you find anything that surprised you? Anything we didn’t already know?” I had uncovered some new information, but most of what I’d found was well-aligned with what we already thought to be true. Still, I understood their sentiment. Any Data Scientist or Researcher will tell you that the most common thing we find when analyzing a dataset is… nothing interesting. It happens constantly. Many of our findings corroborate what we and our business partners already thought to be true, even when we’ve asked the right question. This can be frustrating for Data Scientists, Researchers, and our partners, but finding “nothing interesting” is very different from finding “nothing useful,” and I’m a strong believer that finding nothing interesting after asking the right question is still worthy of celebration. The Utility-Interest plane Before diving in, I want to emphasize the difference between a result being interesting and a result being useful. All analytical results (and the questions that spawned them) will fall somewhere in the Utility-Interest plane. A. Useful results that are also interesting are the holy grail. Findings from these analyses drum up tons of excitement with stakeholders and have the potential to create a huge impact. B. Useful results that aren’t too interesting are less exciting, but are equally valuable! These are the only types of uninteresting results that are still defensible (the main topic of this post!). Useful, yet uninteresting results often arise when evaluating a hypothesis that everyone had assumed to be true, or when tackling a question that’d been answered through other methods in the past. C. Useless results that aren’t very interesting are just a poor use of time. These come from asking the wrong question, and a question to which everyone already knew the answer. These won’t gain much traction with stakeholders, and the primary downside is just wasting your own time. D. Useless, but interesting results are dangerous. Very dangerous! Useless, yet interesting results arise when finding an exciting answer to the wrong question. Stakeholders can latch onto these findings and invest their own time into addressing a topic that should be lower priority. By finding “nothing interesting” in the data (i.e., a result in quadrant B) and presenting it to your stakeholders, you’re able to make decisions with more confidence, ask meaningful follow-up questions, and increase stakeholders’ trust in using data in the future. Knowing when your intuition is right is just as important as knowing when it’s wrong Asking a question of the data means you’re unsure about something: maybe a course of action to take, the reason behind something happening, or something else. Exploring a dataset and finding no surprises just means that, in this case, your intuition wasn’t too far off. Even when you and your business partners have some intuition about a problem area, evaluating your hypotheses with data will let you know, without a doubt, if your hypotheses were true. Knowing when you’re right is just as important as knowing when you’re not, and by evaluating your hypotheses you’ve learned to either maintain or change course. Asking meaningful follow-ups Assuming you asked a worthwhile question of the data, finding “nothing interesting” will help inform what questions you should ask in the future. Any useful finding — whether interesting or not — gives you more confidence in the problem area and refines your area of focus, helping you to ask better questions going forward. Reinforcing confidence in data Findings that contradict our intuition can be hard to accept — especially when the findings tell us that not only was our intuition wrong, but that our actions or plans were too. By finding and presenting “nothing interesting,” you help build trust between your stakeholders and the data, making it easier for them to accept information from you in the future, especially when it’s counter to some of their beliefs. What to do now Ask the right questions of your data. Of course, the points above only hold true if you’ve asked the right question in the first place. Poor questions can sometimes lead to interesting answers, but the usefulness of these answers will be limited. My favorite way to refine a research question is to brainstorm with a cross-functional group of stakeholders (plus with this approach, you get stakeholder buy-in at the same time). Celebrate “nothing interesting.” A finding doesn’t have to be interesting in order to be useful. Next time you find “nothing interesting,” remember to celebrate it. Further reading For resources on asking good questions, I really like Asking Great Questions as a Data Scientist by Kristen Kehrer, and How to solve a business problem using data by Laura Ellis. (Please let me know if you have any others!) ...

Feb 19, 2020 Â· Brian Weinstein

Speaking like a president

Natural language processing on the first 2016 presidential debate

The first debate in the 2016 presidential race was held on September 26. It’s no secret that Clinton and Trump are running on drastically different platforms, but how do they compare when it comes to their speech patterns and word choice? To quantify this, I dug into the data, using the debate transcript and natural language processing. I measured the sentiment of Clinton’s and Trump’s responses, and examined how emotional their words were throughout the debate. I also looked at each candidate’s most commonly used adjectives. Building off the work of Alvin Chang at Vox, I was also able to examine how the speech patterns of Clinton and Trump each changed when directly responding to and when skirting the questions. Sentiment Using the Google Cloud Natural Language API, I measured the sentiment of each candidate’s answers. The polarity of a response is a measure of how positive or negative it is, and the magnitude indicates how much emotion the words convey. The chart below shows the polarity of each candidate’s responses, weighted by the magnitude. Trump and Clinton matched each other’s polarity for the first half of the debate, but after his defense of stop-and-frisk around 9:50 PM, Trump’s words became much more negative. Throughout the rest of the debate — during the questions on birtherism, cyber security, homegrown terrorism, nuclear weapons, and Clinton’s looks and stamina — Clinton became more positive and Trump more negative. The combination of polarity and magnitude gives us the best understanding of each line’s overall sentiment, and each candidate’s most positive and negative responses are posted here. Braggadocios, and other adjectives I was also interested in the adjectives each candidate used most frequently during the debate. Using syntax analysis to extract each word’s part of speech, I identified the most-used adjectives of each candidate. Answers vs non-answers As Chang found, the candidates spent a lot of time not answering Holt’s questions — 48% of Clinton’s words and a whopping 69% of Trump’s words were used in non-answers — and using the data Chang compiled, I was able to look at how the candidate’s speech patterns differed when answering and not answering the questions. Sentence subjects (“I alone can fix it”) Using part-of-speech tagging, I also identified the subjects of each candidate’s sentences. Clinton was more inclusive in her words, but only when directly responding to questions — using the plural “we” more frequently than the singular “I” — and the the opposite was true for her when avoiding a response. Trump, on the other hand, was always more likely to use “I” over “we”. Non-answer phrases The words each candidate used when directly answering the questions are all, unsurprisingly, highly related to the questions Holt asked. What’s interesting here are the topics the candidates defaulted to when avoiding a response. A handful of my findings didn’t make it into this post. If you’re interested in more, there’s some additional analysis, including multiple classification models, in the project’s GitHub repo. The text of this article (excluding this sentence) has polarity -0.4 and magnitude 15.5, so despite my best efforts it’s leaning slightly negative. Many thanks to Alvin Chang and Vox for their permission to use their annotated transcript, and to Kelsey Scherer for designing the charts and lead image. Analysis was performed in R. Plots were generated using ggplot2, and then styled by Scherer using Sketch. The sentiment scores, part of speech tags, and all of the other NLP datasets can be found in the GitHub repo. ...

Oct 3, 2016 Â· Brian Weinstein

Mapping the frozen yogurt shop closest to each Manhattan apartment

I love frozen yogurt. When I first moved to New York three years ago, I lived only 1/8th of a mile from the closest froyo shop. The convenience of this 4-minute walk is something I neither appreciated nor utilized enough at the time. After moving to Harlem last year, it’s been harder than ever to satisfy my near-constant craving for this cold candy soup — I’m now a 24-minute walk to the nearest frozen yogurt. As someone who loves data and has too much time to spare, I decided to find the locations in Manhattan with highest and lowest froyo densitiy. Inspired by Ben Wellington’s work on I Quant NY, I calculated the distance from every lot in Manhattan to the nearest froyo shop and mapped it out. The highest density of froyo is right around West 33rd St. and 8th Ave., with three shops within a 1-block radius. The lowest density is right in Harlem. The red circle on the map shows the location farthest from frozen yogurt. The record belongs to 700 Esplanade Gardens Plaza, a co-op right by the 145th St. stop on the 3-train, with a 51-minute trek across Manhattan to the Pinkberry by Columbia. The map shows all of the froyo shops in Manhattan, and you can click on any lot to find the distance to the closest shop. R code posted here. All distances in the map are measured using great-circle distance (i.e., ”as the crow flies”), according to the law of cosines. Frozen yogurt locations were found via the Google Places Nearby Search API. The API returned some non-froyo-exclusive shops like Ben and Jerry’s, which I kept in the dataset since they technically serve some frozen yogurt (although we all know these shops don’t really count). I only included froyo shops that were in Manhattan, so some lots may have a closer shop than the one listed if we include those in other boroughs. Manhattan lot locations are from PLUTO. The map was created using CartoDB. Tons of inspiration for this came from Ben Wellington’s work on I Quant NY. ...

May 31, 2016 Â· Brian Weinstein