Data science is a new area of practical knowledge that integrates theory and skills from a number of different fields. The key objective of the course is to support students in building a repertoire of theoretical knowledge and practical skills that allows them to think about data science problems in general, as well as dealing with raw data and extracting informative features.
Introduction to the interdisciplinary nature of data science. The data scientist’s toolbox. Raw data, and data transformation methods (focus on text, image, sound, and tables of measurements). The different approaches to the analysis of text. Text pre-processing in the ‘Bag of Words’ approach. Matrix factorisation techniques for dimensionality reduction and feature extraction. Feature extraction from images. Introduction to the use of features in machine learning. Upon completing the course the student:
- has learned about the diversity of raw data formats, e.g. text, images, and bio-signals;
- has acquired the skills to pre-process, clean up and transform raw data and cast them into specific representations, e.g. matrices;
- has acquired the skills to perform feature extraction from pre-processed raw data, particularly in matrix representations via factorization methods such as principal component analysis;
- has acquired the skills to interpret features extracted from data, as well as basic knowledge to link features to machine learning algorithms that benefit from them.