3 months ago

Five Essential Points on Data Visualization

Data visualization is more than taking data and mixing it with some dazzling art skills. It’s about behaviour change and the context.You’d better understand your goal and what the users need before build an analytics dashboard or dataviz, a living, continuously evolving thing, is not an art piece sitting in a museum (even though often it ends up like that).The following passage gives instruction on what makes for data visualization of true value.

It takes you 8 minutes to read.

1. How Not to Data Visualize

At the most commonly found level, a data visualization project consist of having some data, having a vague idea of the user’s actual needs, and a strong conviction that what you’re doing is perfect for the problem. Well, it’s in 99.99% of the cases it’s not.

This process could be visualized kind of like this.

It’s very easy to take data, mix it with some sort of “black magic” (read Javascript) and have everyone look at it in awe. But how about after the presentation? How about after first 10 uses? How about after a year? How about when something needs to be rapidly changed (because NOBODY gets the spec right at first go > analytics dashboards are a moving target because user needs are).

The number one reason for frustration (user) and failure (business) regarding data visualization is “data puking”. Don’t just puke, tell a story that helps people to take action (change behavior).

The basic principle is that generally you’ll understand the value in your work better than anyone else does. Be very mindful of that.

2. Make it Goal Based

Around 2008 I led a team of young turks that shared a goal; start a new era in web analytics. At that point, Omniture had not been acquired yet, and Google Analytics was really bad. I mean really bad. There were three of us trying to change things—STATSIT, Nuconomy and KissMetrics—and show the big boys how to do it. I think to some extent we did show them, at least Google. Below is an screenshot from the keynote speech Avinash Kaushik, then Google’s head evangelist for analytics, gave in the Google Analytics summit in 2009. In it you see the goal setting view of STATSIT.

As the screen shows, almost 10 years ago, we had built-in goal setting for ‘behaviors’ and ‘long term’ goals in addition to what were more commonly referred to as conversions. While ‘behaviors’ have to some a large extent become part of the analytics toolkit by today, there is still much room for improvements in terms of ‘long term’ goal tracking. This is just one example of how analytics is nowhere as nearly goal-driven as it could/should be.

You could follow these steps to avoid headaches later:

1. Identify what the current goals of the user are
2. Make sure that the visualizations correlate with that
3. Create a simple prototype first
4. After some use figure out what the goals now are
5. Make changes rapidly (this will make users want it badly)
6. Keep iterating

Here is an example of a simple user feedback loop:

Now go do it. Forget bells and whistles and ‘wow factors’ until you figure out what the user actually needs. Note that ‘needs’ and ‘wants’ are often entirely different.

Going back to the point about starting simple, you might ask, how do you know it’s simple enough?

3. Data Visualization is about Behavior Change

What do I want the user to do after they look at my visualization / dashboard? Most of the stuff, maybe 99% is “nice to know” with a sprinkle of “actionable”. A very simple example would be to use a color coding where the color by its property already sends a contextually relevant message to the user. For example, something that is critical is darker red than something that is a bug. A nice to have would be light yellow, whereas must have would be darker yellow. We don’t see a lot of this.

But we do see (still) a lot of this.

Endless menus, with endless depth and no indication of what to pay attention to (or what not to pay attention to).

4. Behavior Change is about Behavior Change

Behavior change is not some buzzword, but an actual thing. You can (and should) engineer it exactly like you should engineer other aspects of a visualization project. One of the best ways to get started with the idea of behavior change engineering, is to adopt a framework that is specifically made for the purpose.

The basic premise in behavior change is that you need three things:

#Ability
#Motivation
#Trigger

99(.9)% of all data visualization projects fail to appropriately address this. A good way to start learning about how to appropriately address it, is think carefully about the kind of behaviors you want the user to take.

Once you know exactly the kind of behaviors you are aiming for, then you can identify the relevant abilities, motivations and triggers, and put it together in to some actually amazing dataviz!

5. Context is Everything

Technology (dashboards / plots / tables) are as useful as the context it’s in. For example, if you eat nothing but soup and I give you fork, then that is not a great technology at all. For a salad eating on the other hand, fork is a brilliant piece of technology.

Generally what you think is relevant, is far less relevant to others. Because you tend to establish relevance based on your own ideas and preferences, and conversely others do it based on theirs.

For example, I needed to do a lot of descriptive stats tables, but I did not want to go out of my dev environment which is iPython / Jupyter. Painfully seeking high and low for a solution, I came to conclude that there was no meaningful way to do it without moving the data out first. I was proficient in two different visualization libraries (Matplotlib and Seaborn), Pandas and a bunch of other Pydata stuffs, but this is not something I could do with any of it. So I built it, which I find is often the best way to get exactly what you need the way you need it. In this case I was the user, so it’s easy to know the actual need.

Then that led to doing some other really basic plots that I found it was hard to do with some of the well established packages out there:

I made astetik it in to a python package so everyone else needing to do descriptive data tables could do it as well. I figured that might be a few people.

Then my last example is related with deep learning, the most froth inducing topic in the world of analytics. Again I was looking at what I could do with the libraries I already knew how to use, and what I could do with Keras or Sci-kit, or Tensorflow. None of it gave the exact thing I needed as my primary result view to get started in doing hundreds and thousands of tests with all kinds of data. None of it supported figuring out what kind of visualization I would need. So I started with this…

Then that led me to understanding what I actually needed, so I was able to move forward building that.

Generally what you think is important (for yourself) is just a reflection of your limited understanding of the scope / domain. The more time you put in to the initial build, the less likely you’re going to make significant changes (according to your actual needs) later. This is called emotional attachment, avoid it like plague. Or otherwise the whole thing may end up something like this…

As a parting word, I want to leave you with something to think about. We often think that we’re experts in a given topic, and that’s really not helping. It’s much better to start with a blank sheet of paper, and assume to know nothing at all. Always go back to the needs of the user, and your inability to accurately understand it (even if you’re the user), and the way things are always changing and how analytics dashboards / dataviz is not like some art piece sitting in a museum (even though often it ends up like that) but a living, continuously evolving thing.

At best data visualization is a thing that can help make people’s lives less frustrating, their work more productive, and help them feel more joy in whatever it is that they are doing. Have fun!


By Mikko

Source: Medium.com