Nestoria Interview: Alan Smith of the ONS' Data Visualisation Centre

There's a restriction on how much you can fit into a title, but to be more precise, Alan Smith is the Principle Methodologist of the Office for National Statistics' Data Visualisation Centre.  Meaning he's got access to a lot of interesting statistical data to and it's up to him to present it in a visual fashion.  So you can see why he's something worth putting questions to if you're interested in the how/ why and what of data visualisation.

Before we get into the Qs and As, it's important to mention that Alan spoke to us from a personal perspective – these are his views not those of the ONS.  It's also worth noting that his background is in GIS and cartography.  ALAN: My masters is in GIS, though I confess I haven't fired up a big, proper, grown-up GIS for some while. My first maps at college in the States were ink on vellum! Now that was retro...

Ah, but there's often still a place for the old school.  Here are the rest of Alan's answers:

How would you describe your work and how did you come to do it?

The work I do now - pushing ONS forward in terms of data presentation – is a logical extension of what I first joined ONS to do, namely running the corporate mapping service.  There was a tipping point, about 8 or 9 years ago, where we looking at how traditional GIS software was being ported to the web because we wanted to move ONS into that space. We weren't too impressed with early online GIS solutions, so we starting looking at emerging web-specific technologies like SVG and, later, Flash and realised that if we worked with them, we could move beyond the restriction of 'just doing maps' and take on data visualisation in a broader sense. So in 2007 we made the move to set up a new team with the wider remit...

And how would you describe the Data Visualisation Centre and its aim/s and function/s? 

The role is really to provide a link between ONS' data producing teams and the wider world.  On the web front, that means trying to produce something engaging with entire ONS datasets. Behind the scenes, we also have a role to play defining standards and best practice for basic 'safety-first' presentation, which is very important - my team will never produce more than a small proportion of ONS' graphical outputs, there is a federated approach being adopted for most content creation in ONS.

How does your relationship with the ONS influence what you do and the way you do it?

I think we're naturally closer to the data and that's probably reflected in the form and content of our outputs. While that's usually a good thing, it means that we also work within the same constraints as the data producers - ie. most of our outputs go directly onto the ONS website - so we have a workflow that's currently working within the constraints of the organisation.

What are some of the Data Visualisation Centre projects that you think have been most effective?  And why would you class them as the most successful?

There are three that spring to mind. Firstly, our population pyramids, which were the first thing that I did which made me realise there was so much potential in looking at 'traditional' types of graphics and breathing new life into them with new media via animation and interactivity.  

(download)

Secondly, the CommuterView project which allowed us to interactively map commuting flows, showed just how powerful a modern web browser is at data handling and rendering, really powerful stuff.  Finally, the animated map of ageing, which has proven flexible as a mapping template for change over time and seems to have gone down very well with users. The idea of 'brushed' data displays - that is, more than one display of the same data linked to each other (such as a chart, a map, a histogram) is a natural thing to do on the web and this map was a good vehicle for that approach.

(download)

Do you even have moments mid way though a project where you realise there may be an even better way to show something?  Or do you come up with many structural options in the early stages?

We tend to produce small, initial prototypes and then iterate rapidly based on our discussions with clients (typically the data producers), so in terms of fundamental approach and symbology we rarely get too far in before changing things completely. Having said that, every project is a learning experience so there's very few projects we'd go back to and do exactly the same way again as you're always trying to move things on.

Where do you sit on the question of style over substance?  Do you believe that data visualisation sits perfectly in the middle, or do you think it still has a very scientific part to play?

The million dollar question! There really is no crime in making something look engaging, even pretty - there really isn't.  But too often it's at the expense of the information in the data, which is what we really care about.
ONS' expertise and insight is always going to be with the data so we're trying to reflect that in the presentation. The key decision for us is about whether visualisations are exploratory (user finds story - and with the way the web works now -shares it) or we want people to focus on a narrative (ONS experts present some insight).  There's room for both, but the risk at the moment is that everything becomes exploratory so I'm quite interested in what happens to the narrative.  Charts on their own can't tell you everything.  A good example recently is a promotional graphic we did for the 2011 Census  – we used hyperlinking between the text and the graphic to encourage people to genuinely 'read' the chart.

5ons
At the Data Visualisation Centre you get a lot your data though collaborative relationships with the people who actually collect the data, how closely do you work with the people who actually harvest the data and do they ever offer any input into the way the data is finally presented?

We work very closely with the data producers as it is generally their insight into the data that needs to be unlocked as part of a visualisation, so it's incredibly important.

That said, have you ever had any problem getting access to data that you're interested in working with, or do you usually start with the dataset/s and then decide how to present it/ them?

Yes, I think it's quite interesting that people see 'opendata' as a solution just for organisations outside of the public sector! More accessible data will make MY job easier (eg, discoverable data, less data prep by direct access through data APIs) and allow us to share our visualisations more widely.

Do you think that the situation re:data access is opening up or do you find the government, for example, are still very protective of their data.

See above - in my opinion, it's definitely opening up.  Most ONS outputs have always been free at point of use and so it's really a question about access mechanisms. I can't really comment on other departments other than to say my contacts in them are similarly like-minded people who are keen about getting their data out into the open and used positively by as many people as possible.

What are some of the interesting things you see happening at the convergence point of GIS, mobile and web technology?  And what about when you add social to the mix?

It's very, very exciting. Years ago, I remember getting excited when I saw maps in vending machines on the London Underground - now of course, everyone has a GIS in their pocket, which is extraordinary. This convergence is essentially delivering a connected, personalised web experience, which is great.

Do you think it's these elements (above) that have helped to breathe lots of life into the field of data visualisation or are there other factors that you think were more influential?

It definitely plays a part, but there's something a little more fundamental going on - and it's about data. There is so much data. Too much, really, if that's possible.  And visualisation is simply one of the things that can help abbreviate data into something meaningful. There's other things too - like better data analysis - and convergence between those fields (ie. visual analytics) is very interesting. But right now, it's kind of like a
frontier and we're in a stage of learning by doing.

And what particular trend or innovations in data visualisation do you find most exciting at the moment?  What are you watching out for, or hoping for in the future of the field?

Revisiting the concept of the narrative - the interpreted story – which will help balance things against the exploratory. I quite like the idea that things are getting easier to produce as long as we're not getting lazier at what we think of is acceptable to produce. I love Google Maps - but if everything became a Google pin map, regardless of whether a pin map is the best way of showing a particular piece of information, then that would be quite sad. The American cartographer Mark Harrower has written some wise cautionary words about this - we need to be enabled by the technology, not led by it necessarily.

Thanks for the cautionary closing, and for sharing some of your thoughts with us.  It's been great to get a perspective from someone who works in a more official data vis. role, and it's good to hear that you think there's many more interesting things to come in this field. 

Filed under  //  data visualisation   interview   nestoria international  
Comments (0)
Posted by Kat Parr Mackintosh 

Check your radiation levels

There's a lot of concern over how much radiation will have escaped from the Fukushima Nuclear Plant by the time it's been completely repaired.  So to put things in perspective, here's a broken up version of a chart from Information is Beautiful:

(download)
And a link to another version from xkcd.  I couldn't break this one up in the same way, and anyway, you need to click though to properly appreciate both charts. 

If you're concerned, and have found the Information is Beautiful visualisation useful, then you can buy a hi-res PDF of it for a couple of dollars, all of which will go to Japan Crisis Relief.

Filed under  //  data visualisation  
Comments (0)
Posted by Kat Parr Mackintosh 

Nestoria Interview: Scott Manley, star gazer and DJ, with an interest in social search

A couple of weeks ago I spent a bit of time, outside of office hours of course, looking at lovely data visualisations.  I found a few that were quite relevant to what we do at Nestoria, but there was also one beautiful visualisation that was more relevant to all of mankind, but less relevant to Nestoria specifically.  It's a visualisation of asteroid discoveries made in the past 30 years, shown chronologically and adding to the spinning heavens. 

Here it is if my explanation isn't clear enough.

Of course after seeing this I wanted to ask Scott Manley, who's the man behind the visualisation, about the whys and hows of his project, so I took the liberty of expanded my remit to interview interesting and Nestoria relevant sorts to interviewing sorts that are interesting and relevant to us as earthlings as well.

Here are my Qs and his As:

I came across you via your beautiful data visualisation of asteroid discoveries over the past 30 years.  Can you tell me how you came to make it?  Both the reasons behind it's creation and how you did it in practical?

The video goes back to my student days in the late 1990's when I was fascinated by asteroids and trying to make a case that there was a potential impact threat which needed to be assessed. Telling people that there are thousands of asteroids regularly crossing the Earth's path is one thing, showing them a picture is much more effective, so I created a site http://szyzyg.arm.ac.uk/~spm/neo_map.html which would update with the locations of all known objects. Back then, we knew of less than 50,000 asteroids, but we were pretty sure that there are perhaps a billion or so substantial objects in the main belt.

Anyway, the site ran mostly automated, with the sysadmin at Armagh maintaining it after I moved to California. Last year I found someone posting a decade old video to youtube and decided that it was time to do better, so I dug up the code. There's actually 2 versions, the old one is written in TCL/TK and uses a canvas widget to build the image and then dumps it as a postscript object, this is nice because it provides high level drawing primitives and the resulting image scales nicely for printed publication, it did however take about half an hour to render an image.

The newer version (dating from 1998) is a lean bit of C code, it doesn't have any graphics library, it has a framebuffer in memory which I dump to disk and then shell out to ImageMagick to convert to a final image. This is much faster, fast enough that I could render hundreds of frames and make a very basic video, but it was a far more primitive system, the drawing essentially consists of setting the pixel at coordinates x-y to colour RGB. Over time I've had a few requests from people for various videos and had to add features to the rendering like lines, motion blur and 3d transforms, however it's always been a process of iteration rather than replacing the whole thing with a 3rd party rendering engine like Open GL, If I can make that leap then it's possible we'll see this kind of thing at speeds approaching realtime.

The actual asteroid data came from Ted Bowell's project at the Lowell Observatory, they compile a file with the most up to date orbital elements of every observed minor planet, this kind of vizualization would be a whole lot harder were it not for this data. One thing that's missing is the discovery date, and this information wasn't conveniently available in one place, instead I identified a couple of sources and wrote a perl script to crawl the sites in question and merge the data into the asteroid database.  For some objects with provisional designations there was no discovery date information readily available and so the script guesses the discovery dates based upon the temporary designation.

Now I had all the data, rendering the movie took a few hours and the first versions of the movie weren't quite right, so I'd tweak the settings and try again until I arrived at the final version which you see on youtube.

What did you personally learn from creating this?

First thing I learned is if you're uploading a video to youtube you probably want to make it private until you're sure it's ready. The first version uploaded had no audio, and ran at 60 fps, I primarily uploaded it to check what youtube would do to the video encoding. I knew that all those tiny dots moving in different directions represented something of a pathologically hard case for most encoding systems and I feared it would be unwatchable after youtube got its hands on it.

Well it was watchable enough that it's just passed 800,000 views, it got popular very quickly and there was no chance to replace it with a polished version. The best I could do was add some audio using youtube's audio swap, and I was really lucky to find 'Emergence' by Trifonic, they were local producers from San Francisco and when I put the sound next to the video it just seemed to go well together. Oddly enough there was a scientific explanation for why the music seemed to work well with the video.

Remember I had to scrape all those sites for discovery dates and make guesses for others? When I ran my first test renders I noticed that the discovery rate seemed very bursty rather than smooth, I hadn't expected this and at first I thought the discovery dates were subject to some sort of bias that was being introduced by my scraping or guessing. I tried fixing this in a number of different ways, but couldn't eliminate it.  It wasn't until after the video had been up a few days that a smart person pointed out that there was a pulsing in the discovery rate that was most likely due to the Lunar cycle. When there's a full moon the moonlight makes it harder to see faint objects, so the discovery rate rises and falls on a monthly basis. The video was running at 1 day per frame, 60 frames per second, which meant that the pulsing in the discovery rates was almost 120bpm (beats per minute), and sure enough the music I'd gravitated to was recorded at 60 bpm, with the piano motif driving along at 120bpm.

It was a rare moment where by astronomical experience aligned with my DJ experience.

Genius!  What a nice moment.
Once I'd seen your asteroid video I started poking around and came across the animations you created of what it might look like if an asteroid came close to us, and hit us.  Is this a sort of public service broadcast, or is animation the natural end result when you start visualising this sort of data?

Well, I'm a big proponent of educating the public about the potential dangers due to celestial hazards, it's more like the reverse was true, the images and videos came about as an educational tool to help people visualize just how busy the solar system actually is.

(download)

It's very easy to take simple physics and figure out the energy released in the impact of even a small asteroid is quite staggering, this is simple high school physics - a 2000meter object moving at 20km/sec contains the same energy as 800billion tons of TNT,. The consequences of an impact could end life as we know it on earth, as the dinosaurs discovered 65million years ago. However we are not the dinosaurs, we have awareness of science now, and we can recognize the danger and moreover we have some pretty good ideas on how to avert such an event should we know about it. There have been big impacts throughout Earth's history that have had terrible consequences for the inhabitants of the planet, we're the first species that has come along with the ability to do something about it. But before you go and design your asteroid diverting rocket you have to know where everything is, and truthfully, we'll probably find that there's nothing likely to be a threat in the next hundred years of so, until we look though, we won't know.

I noticed that you're also involved in web search.  Can you let us know what your interests and goals in search are and how you got into this field?

I work at Topsy.com, we're providing a realtime search engine which indexes data from the social web, because we're aware of the social networks we factor in the influence and expertise of the participants when ranking  content.
Personally I'm forever fascinated with the data that is being shared and how the sharing dynamic can inform us about the quality and relevance of web content that's being discussed.

Thanks very much to Scott for enlightening me - hopefully, Nestorialings, you'll feel the same way I do and are glad I veered slightly off topic.

Filed under  //  data visualisation   interview   interviews   perl  
Comments (0)
Posted by Kat Parr Mackintosh