Big Data Analytics

Title: Big Data Analytics   Code: CSCI 4030U

Instructor: Jarek Szlichta, jaroslaw [dot] szlichta [at] uoit [dot] ca

Office hours: Tuesdays 2-3pm (except reading week)

TA:  Alexandar Mihaylov, alexandar [dot] mihaylov [at] uoit [dot] net

TA office hours in UA4029 (upon request)

Description  This course covers advanced topics in data process and analytics with special emphasis on Big Data.  Topics of the course will include, but are not limited to, indexing structures for fast information retrieval, query processing algorithms, distributed storage and processing, scalable machine learning and statistical techniques, and trends of modern very large scale data systems.  Students will gain understanding on the theoretical foundation and practical design principles of modern Big Data processing systems.

Course Outline: 

  1. Data Mining
  2. Finding Similar Items
  3. Mining Data Streams
  4. Link Analysis
  5. Frequent Itemsets
  6. Clustering
  7. Advertising on the Web
  8. Recommendation Systems
  9. Mining Social-Network Graphs
  10. Dimensionality Reduction
  11. Large-Scale Machine Learning

Marking Scheme: Labs and Project 30% (10% + 20%),  Midterm I: 20%, Participation and Presentation: 10%, Final Midterm : 40%.

Late project submissions: 50% of the mark (within the first week).

Policies: Refer to following link. Refer to UOIT Faculty of Science academic policies

Required readings: See Blackboard; Mining of Massive Datasets, Jure Leskovec, Anand Rajaraman, Jeff Ullman


Lecture Notes (always check newest version of the slides):

1. Introduction PDF

2. Association Rules Mining  PDF  with ans PDF

3. Finding Similar Items  PDF  with ans PDF

4. Clustering PDF with ans PDF

5. Social Networks (available through Blackboard, by Dr. Amirali Salehi-Abari)

6. Large Scale Machine Learning PDF

7. Link Analysis + Graph Databases (invited talk by Guilherme Damasio; IBM Centre for Advanced Studies) PDF + IBM CAS talk PDF available through Blackboard

8. Data Platforms and Pattern Mining (invited talk, Globe and Mail, by Dr. Morteza Zihayat)

9. Data Streams PDF  Bringing Order to Big Data (slides on Blackboard)


Labs:

Labs will start in the week of 15th of January

Lab tasks will be posted on Blackboard


Announcements:

  1. Any student who misses an examination  without a valid medical reason and documentation will receive zero for that examination/tutorial. Those with medical documentation will either be given a makeup exam/tutorial or will have the weight of the examination (final exam/midterm) added to the final exam.
  2. Midterm, 12th of Feb (Monday), BRING YOUR LAPTOP.
  3. Project part I report, 23rd of Feb (Friday, midnight).
  4. Final midterm, 2nd of April (Monday), BRING YOUR LAPTOP.
  5. Project part II report, 3rd of March (Tuesday, midnight).