Nestoria Interview: Luke Metcalfe from The Full Wiki
ATTENTION: fact fans and mash up fans alike – this interview is for you.
The person whose knowledge we've been tapping is Luke Mecalfe, founder of The Full Wiki, which is probably the web's largest reference mash up, and NationMaster, which is probably the world's largest hub of international statistics.
I've always felt there's a huge information gap on the web. On the one hand, a powerful, data rich web which is largely inaccessible to ordinary people (downloadable spreadsheets, XML dumps, APIs). On the other hand you have individual stats, plucked out by experts and special interests, that find their way into news and advertising. Our goal is to empower ordinary people with information that's otherwise outside their reach; to allow them to discover information for themselves.
The Full Wiki map of all the reported UFO spottings mentioned on Wikipedia.
Do you have a background in managing statistics or are you just a fan of facts and figures?
I studied Computer Science and Law. But right from when I started as a webmaster 14 years ago, my big fascination was the data. For me, it harks back to when I was 5 and got my first computer. The most enjoyable thing was that something you coded could give back a result that you couldn't foresee from the code. That's the closest thing to magic, at least for a computer geek like me. What are some of the most unusual statistics you've combined, or seen users combine?Oh good question, I have to get back into a NationMaster frame of mind after working so hard on The Full Wiki. The first thing that pops to mind is when we released a correlations feature on NationMaster, everyone had their eyes on the murder rate. The top correlates ended up being newspaper circulation and orange juice consumption. Ingredients for breakfast chat then? This interviewer opted for percentage of income tax paid per person vs. amount of spirts drank per personWhat's your opinion on current moves to try and open up access to government data?
I think it's excellent. Incompetent governments can manage spin but it's not so easy to manage the fine detail.Stats are still heavily curated before they reach ordinary people though. There's a large role to come for tools to aid in the interpretation of that data. And are you waiting with bated breath for new data to be released so that you can include it?To be honest, there's so much to do with what's already out there, I am spared the anxiety of waiting on institutions. That would feel too much like having clients for me. What really fascinates me is the potential of community generated content. Wikipedia is a fantastic encyclopedia but the reality is most articles don't get read from beginning to end. So on The Full Wiki we show the data by other dimensions; timelines, maps, top charts. It brings the info to life.
The Full Wiki has been mentioned on this blog before, how did you come up with the idea for this particular way of looking at information? And are there any other ways you can think of to view wikipedia?Yeah, a huge proportion of the population is visual . Every publisher knows you get more readership if you include pictures.
Our brains weren't designed to convert squiggly lines into abstract concepts. So now the web is mainstream, it's moving to more human forms of information ingestion. We imagine images (hence the maps) and follow narratives (hence the timelines). We decide which things are important right now and which not (hence the trending topics). We use conversations to learn (hence the question and answer format of our quizzes). All of these features are making use of plethora of wiki content. How do you manage all the information you have, and how do you make sure it remains up to date?We specialise in doing large scale data cheaply. We use the Google approach of having large numbers of cheap redundant servers. It's a real tightrope ensuring on the one side that you can make use of the largest dataset possible while still creating features iteratively and staying fairly up to date. It requires equal doses of pragmatism and discipline. Do you have any plans to branch out your business or are you sticking with providing easy access to statistics? (There's a comedy angle to Ask The Brain, do you have any more plans for more comic offerings for example?)Anything to do with making information accessible interests me. And yes, stats, analytics, data mining - these are always involved in every project. But not necessarily on the front end. There's an elegance to being able to hide all that from the user as I did with Ask The Brain, Fact Bites and The Full Wiki.
Yeah I'd love to do more comedy. You can make points and tickle parts of people's brains in a way you can't with straight reference. But I think the mission to enlighten would always be there. I need the audience to be able to take something away with them when they stand up from the chair, or increasingly, when they roll off the couch.
Web analytics is much easier than when I did AskTheBrain. I enjoy having a conversation with my audience via the statistics that they generate for me. I respond regularly with site updates.
And lastly is there any product, site, or app. that you'd love to see that technology hasn't quite created the possibility for yet? If so can you tell us about it and what it would do?
I'd love to see some software that combines the scalability of relational databases and algorithm customisation with the visualisation of RapidMiner and Labview. A Labview for data freaks. If there's anything else you'd like to talk about?I'm on a soapbox here a bit I know but I wanted to say: The news agonises over slight changes to stats: unemployment figures, crime rates, interest rates. But there are fundamental changes to our society that the stats really make concrete. Like adoption rates and pet ownership. The web is the empowering tool to allow people to look up these long term trends when they want. It shouldn't be just about what's happening right now, because that can deprive you of context. We're all about providing the context.
You're the interviewee, so we want you on a soapbox! Thanks very much for sharing your thoughts with us. Plenty there to distract us from gainful employment for the next hour or so...
