Master Data Management (MDM) looks at all areas of an organisation, tracks and records the information for each division and stores it in one central area i.e a data warehouse.  Master Data Management is the technology, tools and processes necessary to maintain and create accurate data lists compiled from financial information to stock levels to employee details.

When such systems are being developed there are many obstacles to clear including changes to business processes, internal politics and data ownership these are just some of the things that have to be addressed before you starting looking at technical issues.   The issue of data ownership is one of the biggest hurdles, as each area feel they own their data and are not willing to look at the bigger picture.  With an MDM system only those who need the data have access to the data.  This is one of the biggest advantages of centralised data sharing, permissions can be granted depending on the individual’s role.  So if you don’t work directly with the data why should you see it?  This feature is also very important when it comes to maintaining accurate data.  Editing rights can be granted to those who need it.  This eliminates the chances of good data being replaced with wrong data.

Even the smallest of companies can generate a large amount of data.  With each department/section or person collecting data, the amount of replication, inaccurate and redundant data accumulates very quickly over time.  With a structured data management system in place it can lead to a more efficiently run business.

The following outlines just some of the benefits an MDM system can provide

Redundant Data
This is one of the main advantages an MDM systems offers, it eliminates the collection of redundant data as the data is stored centrally.  This forces the creation of accurate and specific records.

Data Editing
One data edit eradicates inaccuracies in your data as the edit is reflected throughout the database.  Individual lists are no longer used as all data is recalled from the central database.

Better Analytics
This paves the way for better and more useful analytics as you are not working with redundant or inaccurate data.

Data Consistency
Setting parameters means you only collect the data that is relevant to your organisation.  Again this speeds up processes.

Stronger Security
The data is not accessible by all and access can be granted based on each individual’s role within the company.  This also relates to editing and deleting data.

Although the benefits largely outweigh the disadvantages like any system there are going to be pitfalls, but these can be avoided with proper planning and research.  Here are some of the pitfalls to avoid when trying to introduce a Master Data Management System

Ignoring Data Governance
When choosing the MDM model to fit your company it is essential that the data governance policy of that system also fits your needs.  Being able to apply the correct control and accesses to the data is essential.

Trying to do too much
When implementing an MDM system it can be a seismic change to an organisation.  It is addressing the collection and management of company data and the changing of personal customs.  This will take time and careful planning it won’t happen over night.

Collecting the wrong data
Your system is destined to fail if the information being stored is irrelevant.  He main concept being an MDM system is that all the data is of importance to the company.  This is something that has to be agreed upon from the outset.

Cleaning the data outside the system
This spells disaster as the data being stored is not up-to-date.  If your data is consistent the users loose faith in it and bad habits begin, personal lists grow and the business suffers.  It should always be one change updates all – of course once it has been validated.

To me one central system is an essential part of any company as it can only be of benefit to all.






What is big data?

What exactly is big data, is this just the next “BIG” buzz word, will it follow the same path as Y2K or is it here to stay?

Big data is exactly that what it says, it is large amounts of stored raw data.  Studies show that the data generated every two days equals the total amount of data that was created up to the year 2003 and over 90% of all the data in the world was generated over the past two years.  So where does all this data go, who has it and what are they doing with it?  Here is a quick guide to how we generate so much data:

Every minute:

  1. We send 204 million emails
  2. Generate 1.8 Facebook likes
  3. Send 278K Tweets
  4. Upload 200K photos to Facebook

And this is one a drop in what is the big data ocean.

So with all this information floating around how is it being used?

Big data is being used every day across all industries from retail to farming to health and within government bodies.

In industry the use of big data helps business understand and target customers.  Yes those little fobs your local Tesco, SuperValue and Petrol station give you are tracking everything you do within their store.  Was it by coincidence that you received the coupon for your favourite shampoo?

With Big Data improving our shopping experiences, improving health systems and the fighting  against cancers and other serious illnesses it has now turned its attention to the fight against crime and joined the Los Angeles Police Department, and no it does not come in the form of Tom Cruise or Colin Farrell.  After all police departments around the world have their customers too and they need targeting.


Professor Jeff Brantingham and a team at UCLA studied over 13 million recorded crimes, spanning 80 years and applied an algorithm that is used to predict the likelihood of aftershocks from earthquakes.  The original algorithm that looks at the probability that aftershocks occur close by in space and time, was developed by Assistant Professor George Moher.  Applying a theory that aftershocks happen in close proximity the teams approach was the same when looking at crime and human behaviour.  Using the data they wanted to see if there was any relation.  Strangely enough the patterns were similar.  Although it couldn’t prevent any of these crimes as they had already happened, they decided to build on the algorithm and use live data to see if it could predict potential crime hot spots.  With some tweaking and the joining of forces with the company PredPol, they can now predict where crime is likely to happen on a given day.

The software breaks the patrols  in to 12 hour shifts covering a 500sq feet geographical area, where it has predicted that criminal activity might occur.  As the patrol officers had a new commanding officer it took time for them to warm to their new partner, as who knows crime best a computer or a police officer?  But attitudes changed as the city has seen a 33-percent drop in burglaries, 21-percent drop in violent crimes and a 12-percent decrease in crimes against property.  On Thursday, 13 February, 2014, LAPD’s Foothill area recorded zero crime activity over a 24 hour period, “a day without (recorded) crime” the first in their fifty year history.

So Big Data is making the world a healthier, safer and better place to shop.

BBC documentary on LAPD Big Data.

The Power of R


R is an open source statistical programming package that incorporates graphical tools to present your data.  It has become one the most used business tools for computing statistical information over the past decade.  With R you can infiltrate any data format, from .CSV to .SAV.  This is why it has become the tool used by most data scientists and it’s free.  Another important thing of R is you don’t have to be a programming wizard to work with it.

Here is an excellent article on R from January 2009, New York Times by Ashlee Vance.

Jumping in

The best way to learn to swim is to jump in and that is what I did with R.  Using the tutorial site Try R it was a gradual introduction to the useful functions of the programming language, as well as being a bit of fun.  There are 7 levels in the tutorial with each level introducing a function with easy to follow examples.  After learning the basic skills to use R without getting wet, let’s see if it can answer one of the most asked questions, Who is better Messi or Ronaldo?



  Lionel Andres Messi Cristiano Ronaldo
Born 24th June 1987 (28) 5th February 1984 (31)
Height 1.70m 1.85m
Team Barcelona Real Madrid
Total Goals 2009 – 2015 232 225

Nobody can dispute the quality of these fantastic players.  So to break it down we are doing to look at their goal scoring ability, how they assist their team and how long it takes them score one goal.  So lets have a look at each player.  For this study we will be only looking at data from the 2009/10 season onwards.  That is the year Cristiano Ronaldo joined Real Madrid.

It truly is amazing looking at the fire power of each of these players.  The number of goals that each of them score in one season most centre forwards wouldn’t score over two.  You also have to take into account that the graph below only shows the goals that were scored in La Liga for each season from 2009 to 2015.  For the time period covered Messi has scored 232 goals and Ronaldo has 225.  So it’s 1 nil to Messi
Total Goals Scored



Here we look the likelihood of each of them scoring over a 90 minute period and again they break the norm.  On average most world class strikers have a goal rate of over 90 minutes.  Sergio Aguero, who finished top goal scorer in the Premier League for 2015 with 26 goals averages a goal every 98 minutes.  But the machines of Messi and Ronaldo each average a goal every 78 minutes.  It seems the only way to stop them from scoring is not let them play!!!!!  Its draw on this one.  Messi still 1 nil up.

Average MinGoal


We come down to the final comparison between the two greats and very little separates them. It’s not all about scoring goals especially if your having an off day.  So lets have a look at how they help their team mates outs.  The graph below shows how many assists each player has had in each of the seasons covered and it shows the their really is an I in team and that being Mess(I).  He is a clear winner by contributing an average of 17 assists per season to Ronadlo’s 13.  So based on our analysis Messi is the better player.  You don’t have to agree!!!

Number of Assists

Below is a heat map showing the full dataset for each player.




Data Quality

So how good is your data and how are you using it?  Like any machine it is only as good as the fuel (data) you put into it.  You won’t get very far putting diesel into a petrol engine.  The same can be said of your database(s).  Your data return will only be useful if you have good input.

Quality information is defined as “information that is suitable for all of an organisation’s purposes, not just my purposes.”

What are the steps to securing good data?

  1. One of the key factors of good information is its relationship with the business in question. Making sure your data links up and answers the questions you want answered in order to provide the service required or the relevant information to your audience.
  2. Being aware of the data you are collecting is an essential part of good data. Having process in place where data can be checked, cleansed and edited will affect your data quality. Allowing editing or checking without parameters in place will place your data in danger, this can lead to loss of good data on one persons judgement.
  3. Discarding documentation and design standards is a large problem. Over time data quality guidelines are discarded through employee turnover and data familiarity. The data can become “When do we ever use this”. If point number one is adhered to, it is useful information for some part of the business. To prevent this frequent training and updating of data guidelines is necessary.
  4. The main link of all the points above is communication. Without good channels of communication your data quality will suffer. This has to start from the outset involving all who will be using the data and supplying the data. This can seem like the most logic step but it can be the biggest hurdle to getting good information.

When dealing with poor data quality, Marsh (2005) summarises the findings from various industry research as follows:

“88 per cent of all data integration projects either fail completely or significantly over-run their budgets”

• “75 per cent of organisations have identified costs stemming from dirty data”

• “33 per cent of organisations have delayed or cancelled new IT systems because of poor data”

• “$611bn per year is lost in the US in poorly targeted mailings and staff overheads alone”

• “According to Gartner, bad data is the number one cause of CRM system failure”

• “Less than 50 per cent of companies claim to be very confident in the quality of their data”

• “Business intelligence (BI) projects often fail due to dirty data, so it is imperative that BI-based business decisions are based on clean data”

• “Only 15 per cent of companies are very confident in the quality of external data supplied to them

• “Customer data typically degenerates at 2 per cent per month or 25 per cent annually”

• “Organisations typically overestimate the quality of their data and underestimate the cost of errors”

• “Business processes, customer expectations, source systems and compliance rules are constantly changing. Data quality management systems must reflect this”

• “Vast amounts of time and money are spent on custom coding and traditional methods – usually fire-fighting to dampen an immediate crisis rather than dealing with the long-term problem”

Data Quality 2


What Can Quality Data Do?
Good data is all about providing the tools to get information that will assist in making good decisions. With good quality data you can

  1. Gain information
  2. Gain knowledge
  3. Make decisions
  4. Get results

With the world of data changing rapidly the level of data being collected has increased immensely.  As this happens the levels of data become more difficult to manage and this can allow the quality levels to drop.  It has been proven through various research that good data quality can improve customer satisfaction, decrease running costs, assist in more efficient decision making and increase employee performance and job satisfaction (Kahn et al., 2003; Leo et al., 2002; Redman, 1998).

Data Quality 3



Fusion Tables

Your data
Fusion tables are used to combine tables and present your data in a more meaningful manor. Obviously you have to get your initial information from somewhere. For this example I used the CSO (Central Statistics Office) data from the 2006 Census and 2011 Census.  For each year I looked at the total population and the population aged 15 and over that had lost or had given up their previous job.  To present this data more graphically I also required a table containing geographical information for the counties of Ireland.  This file was available on the Irish Independent website.  A link to all datasets used can be found below.

Depending on your data source you may need to clean your data. Once you have a clean data file you need to save the file as a .CSV (Comma Separated Values) file.

Data Links

Irish KMZ Datafile – There is no need to save this file in CSV format.

Analysing your data will explain what the data is saying. This can be simple calculations looking at the average or medians (mid-point) of some parts of your data, which makes the data more understandable for your audience. The analysis below compares the population in Ireland in 2006 and 2011 showing the percentage increase in each county.  A further analysis also compares the unemployment rates in each county for the same time periods.

To calculate the unemployment rate I also needed the total population that was eligible to work.  This information can be found on the CSO website.

Creating Fusion Tables

  1. You need to have a Google Drive account.
  2. Add the Fusion Tables app
    Settings, Manage Apps, Connect to more apps
  3. Create Your Table
    i) NEW
    iii) Google Fusion table
  4. After following the steps above you will be prompted to import and name the relevant files.FT3
  5. Repeat steps 3 & 4 above for each table.

Merging Tables
Follow the steps below to merge your data tables.

  1. Open one of the tables you wish to merge.
  2. From the “File” menu choose Merge.
  3. Select the second table you want to merge.
  4. Select the variables that are matching, this is how the tables join.
  5. Choose which variables you want displayed in the new merged table.
  6. Fusion Tables automatically create a new table containing the new merged data.


After merging the total population data with the Counties KMZ file you can show the population of the country by the size of the population in each county by assigning a colour code. See image below.  The image below shows two heat maps created with Fusion Tables, each showing the population by county, one for 2006 and the other 2011.

2006 V 2011 - Population
2006 V 2011 – Population

Editing the Map
The colour scheme is assigned to each county based on the size of its’ population. This can be edited using the Change Map Function on the Map of Geometry tab. From there you can assign the ranges and colours you wish to use.

POPULATION 2006 v 2011
The maps above are showing the population spread by county in Ireland using the CSO Census data from 2006 and 2011. In 2008 Ireland suffered a huge property and financial crash triggering one of Ireland’s worst recessions, a recession that Ireland is only starting to come out of seven years later.  During the time period 2008-2014 it has been widely reported that during the recession emigration levels has depleted the smaller counties of Ireland.  But looking at the census data from 2006 and comparing it with 2011, there was an average national increase of 8% across all counties.  Laois recorded the highest level of increase at 20%.

In 2006 over one in three counties had an average population of between 54,000.  In 2011 this dropped to just over 25%.  Excluding Dublin and Cork, who have a combined population of 1.8 million, the remaining counties have a median population of 136,640.



As mentioned above the Irish population had an average growth rate of 8% between 2006 and 2011.  When comparing the Population Aged 15 Years and Over, at the time of the 2006 census, the Irish economy was booming and employment rates were high.  When the bubble burst in 2008 the small towns and counties where hit the hardest with job losses.  In 2006 the country had an unemployment rate of 4.9% by the time of the 2011 census the rate had more than doubled to 13%.  The map and chart below compare the numbers of those unemployed in 2006 and 2011 by county.  The contrast between the two is staggering.  Counties saw their unemployment numbers more than double, in some cases they trebled.  Roscommon in 2006 had registered unemployed population aged 15 years and over of 1,385 by 2011 this figure was 5,409, an increase of nearly 400%.  The average national increase was 290% for the time period covered. This was devastating to rural areas.

Unemployed 2006 v 2011