The latest “new” thing in the world of data mining is using “Big Data” to inform public policy. Using data mining methods, we can aid evidence-based decision making by learning what the data can tell us and using this to write or implement policy. Idox are now exploring these methods to look at opportunities for our public policy and research members.
Investigation indicates that using data in this way is in its infancy, where data mining methods are in the process of being used, but so far, very little is completed. Published examples include, London Borough of Newham’s property data, which has been combined with numerous other datasets and mined to examine change in property tenure in order to support, amongst other things, their housing management services. The University College London mined Oyster Card data in order to minimize cost for travellers using public transport and to encourage public transport use. The first stage of the research will be exploring what can be done and what would be useful to members.
As a new member of the Idox staff, I am on a scheme known as Knowledge Transfer Partnership (KTP), which helps companies engage in this type of research and development. The scheme is celebrating its 40th Anniversary this year, having first been formed in 1975 as the Teaching Company Scheme. The KTP program is funded by 17 public sector organisations and led by Innovate UK, formally the Technology Strategy Board. The aim is to support UK businesses wanting to improve their competitiveness, productivity and performance by accessing the knowledge and expertise available within UK Universities and Colleges.
Traditionally taking place in engineering and manufacturing industries, they have now branched out into ICT, looking at data analysis, and creative industries such as design, fashion, music and video games businesses. There are currently 800 partnerships across the UK.
Our research partnership includes an academic institution and The University of Salford, is on hand to provide support and guidance. It has an outstanding record with regard to innovation, enterprise and skills. The Informatics Research Centre builds on history, success and achievements of research in Computer Science and Information Systems over the last 30 years.
Data mining is a process to discover patterns in large datasets. Its roots are in disciplines such as artificial intelligence, machine learning, statistics and database systems. Its overall goal is to extract information from data and make this understandable, so that it can be used to make decisions. A popular book “Data mining: Practical machine learning tools and techniques with Java” has information about the most common data mining methods.
The three main data mining methods we will be trying are association rules, classification and clustering and we will be exploring these in the research.
- Association rule learning searches for relationships between variables (or attributes) in the dataset. A most popular example is a supermarket finding out which products their customers buy together and use this information for marketing purposes. This is also known as market basket analysis.
- Classification is when a dataset has examples grouped into known classes; the task is to assign a new example to one of these known classes. A well-known algorithm performing this task is the Decision Tree algorithm C4.5.
- Clustering performs a similar task to classification but with clustering we don’t have an assigned ‘class’. A technique known as k-nearest Neighbour is a popular method. Other main tasks are regression, summarization and anomaly detection.
Although the research is explorative at the moment, I hope to keep you updated with our progress throughout the project. If you have any thoughts or want to find out more, please get in touch.
By Susan Lomax, Data Scientist, Knowledge Transfer Partnership placement
Further recent reading*
*Some resources may only be available to members of the Idox Information Service