Digging Into Open Data

Is The Data You Need Available for Free?

The open-source software model has proven to be highly successful, in large part due to the belief that by openly collaborating and sharing ideas we can accomplish more and achieve progress faster.  At the same time, governments are realizing that free and readily available data creates economic and social benefits.

Open data, data that can be freely used, re-used and redistributed by anyone, is proving to have an impact. The OD500 Global Network study gives a listing of over 300 U.S. companies that use open data to develop products and services and improve their business. Whether it's an app like Rocketman that uses public transit real-time next vehicle arrival data to improve commuting, or property managers in New York who use the detailed (and free) energy and water consumption data for all non-residential buildings in the city to benchmark consumption, open data is being used by entrepreneurs, businesses, NGOs and governments. A 2013 McKinsey study estimated the annual impact of open data on global economic growth to be more than $3 trillion.

Demographic Data Investigation

As a demonstration of the valuable insights that can be gleaned from open data, I downloaded the 2011 National Household Survey data from the City of Toronto’s website and performed an analysis using TIBCO Spotfire (a visualization and analytics platform). I started my analysis by plotting the percentage of each ward’s population, aged 15 and up, that held a postsecondary certificate, degree, or diploma (let’s call this statistic “education level”) on a map of Toronto. You can see this map below; the more red a ward is, the higher its education level.

Next I overlaid a bar graph of the occupational breakdown of Toronto’s population:

Occupational breakdown of Toronto’s population

Occupational breakdown of Toronto’s population

Linking the map and the bar graph allowed me to select wards on the map and dynamically update the bar graph to show the occupational distribution of each ward. The next two images of the bar graph show the occupations of people living in the top four and bottom four wards by education level.

Top Four Wards by Education Level

Top Four Wards by Education Level

Top Four Wards by Education Level

Bottom Four Wards by Education Level

Bottom Four Wards by Education Level

We can see that in wards where the education level is highest, a larger portion of the population have careers in business, law, education and healthcare and a smaller portion have service, trades, transport and equipment operation occupations compared with the municipal levels. Conversely, in wards with the lowest education levels, the proportions of manufacturing, utilities, trades and transport and equipment operation occupations are all above the city’s averages.

Who would care about this? Education and occupation distributions may be useful to advertisers who target customer segments with a certain occupation type or level of education. For example, a postsecondary institution aiming to attract prospective students who are unlikely to have post-secondary education could leverage this data to understand where in the city their potential students are concentrated, and which occupations people living in those areas currently have.

Next, to understand the relationship between the percentage of a ward’s population who are immigrants, and the ward’s average household income, education level and unemployment rate, I plotted the four variables on a scatter plot. Here the size of a data point represents that ward’s average household income; the colour of a data point represents the education level.  

Ward average income vs. education level

Ward average income vs. education level

To dig further into the immigration distribution across the city, I created pie charts that show the percentage of the population that is first, second, and third or more generations Canadian for two areas of the city as compared to the city as a whole.

 Percentage of first, second, and third generation Canadians in selected wards

 Percentage of first, second, and third generation Canadians in selected wards

The final two graphs show the additional variables that I was able to plot using the map chart’s colour dimension and the bar graph. The multiple variables available on the map chart enable the visualization of several geographic distributions. For example, plotting Immigrant Population (%) gives a comparison of the percentage of each ward’s population that is made up of first generation immigrants. This visualization can be used together with the Non-English Native Language and Immigrant Country variables on the bar graph to form a detailed understanding of where the majority of immigrants live in Toronto, what languages they speak, and what their countries of origin are. This information would be useful to a wide audience, for example: companies selling international calling cards or imported specialty food items, or a bank dedicated to hiring branch employees who have proficiency in locally spoken languages.

From this one data set I was able to extract a lot of interesting and useful information and present it through simple but compelling visuals!

So, how can open data help you? 

Take a look at some of the many applications that have been created using open data in the U.S.A.Canada, and Toronto. Next, you can begin discovering the vast amount of sources that are available.

Open Data Sources:

Perhaps you’ve found your data source but are not sure what analysis you should perform, or how to do it. If that’s the case please get in touch - the Adastra Big Data & Analytics team is highly skilled at transforming data of all types and sizes into valuable insights. Happy digging!