I am an Associate Professor in the Department of Computing Science at the University of Alberta. I earned a PhD from the University of Toronto in 2005 working on Web data management, and since then has worked on databases, the Web, information retrieval and natural language processing, with emphasis on information extraction from semi-structured and unstructured sources. I was a principal investigator and the Leader of the Data Quality Theme of the NSERC Business Intelligence Network.
I am a co-PI of LINCS -- Linked Open Data for Canadian Cultural Research, a CFI-funded consortium building open knowledge graphs to mobilize cultural and heritage artifacts and a lead researcher with the Scotiabank Artificial Intelligence Research Initiative at the University of Alberta. I was a co-PI and the Leader of the Data Quality Theme of the NSERC Business Intelligence Network, and I have supervised 10 PhD, 17 MSc and 3 post-doctoral fellows now working as faculty members, or researchers/engineers in tech companies such as Diffbot, Amazon, Intuit, Borealis AI.
I serve on the Executive Committee of the Canadian Artificial Intelligence Association (CAIAC) and an Associate Editor of Springer's Distributed and Parallel Databases. I have served as Associate Editor of the IEEE Transactions on Knowledge and Data Engineering (2015-2018), Elsevier's Computational Intelligence Journal (2015-2018), and the SIGMOD Record (2010-2014). He was the ACM SIGMOD Information Director and Web Editor of the Record from 2006 to 2012. I have served on the Program Committee of all major conferences on data management, the Web, and natural language processing, on multiple occasions, and I have co-chaired the Program Committee of the 3rd IEEE International Conference on Data Science and Advanced Analytics, the 28th Canadian Conference on Artificial Intelligence, the 1st and the 2nd ACM SIGMOD Workshops on Databases and Social Networks, the 3rd International Workshop on Data Engineering Meets the Semantic Web, and the 5th International XML Database Symposium (co-located with VLDB 2007).
I am the recipient an Alberta Ingenuity New Faculty Award, an IBM Faculty Award, the Best Paper Award at the 2010 IEEE Conference on Data Engineering, and he supervised the recipients of the Best Undergraduate Poster Award at the 2012 ACM SIGMOD Conference. He was a Visiting Scientist at the Max-Planck Institute for Informatics, Germany from July 2014 to April 2015, and a Visiting Professor (BIT) at the Free University of Bozen-Bolzano, Italy, during the Summer of 2008.
My areas of research are knowledge extraction, data management, information retrieval, and natural language processing. I also work on machine learning methods applied to these fields. I have supervised graduate level research on the problems of named entity recognition, entity typing and disambiguation; open relation extraction from text; understanding social processes in Wikipedia article authoring; mining citation networks; and semistructured data management.
I am passionate about open linked data and the Semantic Web, and I work with Digital Humanities colleagues on creating, indexing, and processing knowledge graphs out of heritage, cultural, scholarly, and literary work.
Most of the knowledge we acquire, use, and share is expressed in natural language, and preserved as primarily textual documents. This course introduces the fundamental algorithms and data structures for organizing and searching through large collections of documents, and the techniques for evaluating the quality of search results. The course also covers practical machine-learning algorithms for text and foundational technologies used by Web search engines. Prerequisites: CMPUT 201 and CMPUT 204 or 275; MATH 125 or equivalent is strongly recommended.Winter Term 2021
Database design and normalization theory, transaction management, query processing and optimization; support for special data types such as multimedia, spatial data, and XML documents; support for complex applications and data analysis such as data mining, data warehousing, and information retrieval. Prerequisites: CMPUT 201 and CMPUT 204 or 275, and CMPUT 291.Fall Term 2020 Winter Term 2021