Showing posts with label Open Source Software. Show all posts
Showing posts with label Open Source Software. Show all posts

Tuesday, 30 January 2018

Converting UK national grid to latitude longitude Python function from Hannah Fry

I found Hannah Fry's python function useful with a dataset in national grid coordinates. It's not something that could be done easily in a Tableau calculated field as it is an iterative calculation that converges to the solution. Hannah Fry does have a Tableau connection though; she was a keynote speaker in the London Tableau conference on tour a few years ago. That presentation gave some key insights on Tom Cruise's central upper tooth, as well as showing pictures of two clones of Hannah with symmetrical faces, one based on the left side of her face and one based on the right hand side!

Wednesday, 29 November 2017

Tableau repositioning itself with regards to data preparation

For those of us that used Tableau for years, the changes in every version always seem to remove some need for external tools/code for data preparation. Think of the introduction of filled maps, the union feature, the excel data interpreter, the pivot and split, the spatial file connector (don't mention the pdf connector!). While certain Tableau partners/consultants are still keen on the Tableau-Alteryx stack, I'm not convinced of its long term market viability, and neither was Gartner last time I checked. The latest announcement on Project Maestro is a rather aggressive move from Tableau's side into traditional Alteryx territory.

If you do want my advice, learn some basic scripting, some coding, regular expressions, some unix or even good editor skills. You can only go so far with 'friendly tools' and you still have to spend a lot of effort learning them, so you might as well learn an open source transferable skill instead.   

Friday, 10 March 2017

Open Data Camp

The 2017 Open Data Camp was at the end of last month in Cardiff. I was told about it in a semi-private exchange with Chris Love. I couldn't make it, but a lot of information is online, the session grid gives you a good overview and links to any info published online. See 'Open data for newbies' for a quick introduction to the subject, among other things it makes the vital distinction between public and open data. Data in pdf seems to be the running joke, though last I heard, Tableau is promising a pdf connector!

There are some fascinating sessions, such as 'how to get 1 million people speaking welsh', which of course begs the question of how can you define, measure and model the growth of welsh speakers. There's also a minecraft session, unsurprising, I've seen a demonstration of LiDAR data in Minecraft in last year's Cambridge Dorkbot.


Wednesday, 8 February 2017

The story of Pig

Yahoo! (don't forget the exclamation mark!) nowadays makes the news for all the bad reasons: Takeover by other companies, hacked by unnamed state actors several times in the past, not to mention the very dodgy advertising all over their websites. It wasn't always thus.

Cast your minds back to the mid noughties, and you'll remember Yahoo! acquiring the smaller stars of the web 2.0 constellation: Flickr which they kept, Delicio.us which they sold on and Upcoming which they retired. Of course Yahoo! had a history of acquiring companies with great products and messing them up previously, from GeoCities to LAUNCHcast. But by 2005 it seemed like they were suddenly getting it and becoming cool.

On the technology side, Yahoo! was a pioneer of Big Data, with Open Source projects such as Hadoop (the writer of that first blog post, Jeremy Zawodny, did later sum up the story of that time nicely in his personal blog), Pig and other bits of that ecosystem that became part of an Apache project rather than a proprietary product.

One wonders if they would be better off now had they kept it as their own product. Maybe they would be the giants of cloud computing. Releasing it as open source though meant that it became an effective industry standard, with other companies contributing projects such as Hive (in fact have a look at this blog post that details the use for both Pig and Hive inside Yahoo. If only someone updated it to add Spark SQL to the mix!). So if anything, if Yahoo! goes down, the Apache Hadoop ecosystem will probably survive.

Friday, 3 February 2017

Pig history, features, application and operations

Would you consider a training course with the agenda below?
• History of Pig
• What is Pig and Why Pig
• Pig Vs. MapReduce
• Features of Pig and Its Application
• Pig Data Model and Pig Operations
I think I'll stick to Hive .