Area of Study / Keywords
Computer Engineering Software Engineering and Intelligent Systems
I'm an Associate Professor in the Department of Electrical and Computer Engineering at University of Alberta. I did my PhD work in the Department of Electrical and Computer Engineering at University of Toronto from 2008 to 2012, and was an M.A.Sc. student in the same department from 2006 to 2008. I received the B.Eng. degree in 2005 from Sun Yat-sen University, China. Now I'm working with my team on exciting topics in the cross-disciplinary areas of data mining, applied machine learning, text mining and natural language processing, statistical learning, and networked and distributed systems.
Ongoing Research Projects
Currently, Di is actively working with his PhD and Master’s students on the following topics, in collaboration with a few industrial partners and other academic institutes (mainly including Tencent, Wedge Networks, Meridian Lightweight Technologies, and University of Toronto):
- Text modeling, text understanding, text matching, information retrieval, search, event discovery and summarization, natural language generation and concept extraction.
- Distributed machine learning and data analytics processing algorithms and platforms, with a focus on both algorithm design and innovative system development. The specific problems we study include distributed algorithms for training machine learning models, such as deep neural networks and matrix factorization, machine learning based on decentralized data for privacy preservation and efficiency optimization.
- Auto machine learning, active learning, data generation, active data selection, and data/model quality assessment.
- Spatial-temporal data mining for social-economic computing.
To Prospective Students:
I encourage self-motivated students that are interested in working with me to contact me through email 2-3 months before you plan to submit your application to University of Alberta.
Overview of parallel/distributed computing including concepts and terminology. Principles of programming with shared memory and synchronization methods. Multithread programming with Pthreads and OpenMP. Message passing computing: the Message Passing Interface library. Design and performance of parallel algorithms. Prerequisites: CMPUT 275 and 379.
Approaches, techniques and tools for data analysis and knowledge discovery. Introduction to machine learning, data mining, and the knowledge discovery process; data storage including database management systems, data warehousing, and OLAP; testing and verification methodologies; data preprocessing including missing data imputation and discretization; supervised learning including decision trees, Bayesian classification and networks, support vector machines, and ensemble methods; unsupervised learning methods including association mining and clustering; information retrieval.