PhD Call: Analytics Learning and Knowledge Mining for Big Data Exploration

Research Fields: Predictive Analytics, Machine & Statistical Learning, Data Exploration, Big Data Intelligence

Description: Data analytics and the adoption of machine and statistical learning methods over large-scale data become necessity for gaining insight of complex processes to prove scientific theories and discoveries, to support decision making, and enhance strategic planning in different areas e.g., the economy, industry, healthcare, etc. Query-driven exploration over Big Data through discovering insights, patterns and trends in a timely fashion are becoming necessity in this field. We are looking for an excellent candidate who will pursue a PhD on a novel problem of predictive analytics over distributed data in the sense that it can be deployed in environments in which data owners and sources restrict access to their data (e.g., due to security or cost reasons) and allow only certain statistical summaries or aggregation functions to be constructed/executed over the data.

Challenges: The challenge of this research relies on the idea of gaining insight knowledge only from interrogation/analytics queries and their results over distributed data. Moreover, it is challenging to develop novel query-driven supervised and unsupervised machine and statistical learning techniques over the analytics queries that can mine as much knowledge from restricted access data as possible. Such query-driven predictive analytics methods aim at multiple levels of knowledge abstraction and information fusion to allow the system to learn complex functions, mapping the input to the output indirectly, and provide evidence that on big data, sophisticated predictive analytics algorithms can achieve comparable or even better performance than those ‘traditional’ methods that explicitly require data access. In addition, the research will concentrate on problems associated with optimally scheduling of data access based on learning the query patterns over federated data nodes.

Enrolment & Opportunity

The successful candidate will enrol as a PhD student at the School of Computing Science, University of Glasgow, under the supervision of Dr Christos Anagnostopoulos and will join the Information, Data, Event, and Analytics at Scale (IDEAS) research team of the University of Glasgow, led by Professor Peter Triantafillou. Our research team explores a number of different issues such as: machine and statistical learning in high dimensional settings, information and knowledge fusion, scalable data access methods based on machine learning, complex analytics query processing and optimization, with applications on urban data, smart cities, and polyomics data. For a more detailed description the interested candidates may visit: and the list of publications within there.

The University of Glasgow is a world-renowned education and research hub, offering considerable opportunities for training and exposure to query-driven machine learning and analytics with a number of research teams in the School of Computing Science being active on these and related fields. In addition the selected candidate will have ample opportunities to participate in the top conferences of distributed computing, large-scale learning and mining, and data engineering.


The ideal candidate will have a background in computer science and some background in either mathematics or statistics. Special areas of interest include: basics on statistics, and/or mathematical modelling/optimization and data fusion. A good understanding of the basic machine learning methods/algorithms as well as an MSc in one of the above areas will be a considerable plus. Moreover, the candidate will also need to become familiar with technologies like Hadoop, or MapReduce and Spark when experimenting over publicly available big data sets. Programming skills, good command of English and team work capacity are required.

Further Information

Questions regarding academic and research aspects of the position should be directed to Dr Christos Anagnostopoulos by e-mail:

For general enquiries about the application process visit our How to Apply page.


Anagnostopoulos, C., and Triantafillou, P. (2015) Learning to Accurately COUNT with Query-Driven Predictive Analytics. In: IEEE International Conference on Big Data (IEEE BigData 2015), Santa Clara, CA, USA, 29 Oct – 1 Nov 2015,

Anagnostopoulos, C., and Triantafillou, P. (2015) Learning set cardinality in distance nearest neighbours. In: IEEE International Conference on Data Mining (IEEE ICDM 2015), Atlantic City, NJ, USA, 14-17 Nov 2015.

Anagnostopoulos, C., and Hadjiefthymiades, S. (2014) Advanced principal component-based compression schemes for wireless sensor networks. ACM Transactions on Sensor Networks, 11(1), 7.

Anagnostopoulos, C., and Triantafillou, P. (2014) Scaling out Big Data Missing Value Imputations: Pythia vs. Godzilla. In: 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’14), New York, N.Y., U.S.A, 24-27 Aug 2014, pp. 651-660.