Title: Big Data Analytics Code: CSCI 4030U
Instructor: Jarek Szlichta, jaroslaw [dot] szlichta [at] uoit [dot] ca
Office hours: Mondays 5.00-7.00pm (except reading week)
TA: Alexander Keller, alexander [dot] keller [at] uoit [dot] net
TA office hours in UA4029 from 1pm-2pm upon request
Description This course covers advanced topics in data process and analytics with special emphasis on Big Data. Topics of the course will include, but are not limited to, indexing structures for fast information retrieval, query processing algorithms, distributed storage and processing, scalable machine learning and statistical techniques, and trends of modern very large scale data systems. Students will gain understanding on the theoretical foundation and practical design principles of modern Big Data processing systems.
- Data Mining
- Finding Similar Items
- Mining Data Streams
- Link Analysis
- Frequent Itemsets
- Advertising on the Web
- Recommendation Systems
- Mining Social-Network Graphs
- Dimensionality Reduction
- Large-Scale Machine Learning
Marking Scheme: Labs and Project 30% (10% + 20%), Midterm : 20%, Participation and Presentation: 10%, Final Exam : 40%.
Late project submissions: 50% of the mark (within the first week).
Policies: Refer to following link. Refer to UOIT Faculty of Science academic policies
Required readings: See Blackboard; Mining of Massive Datasets, Jure Leskovec, Anand Rajaraman, Jeff Ullman
Lecture Notes (always check newest version of the slides):
2. Association Rules Mining PDF
3. Finding Similar Items PDF
4. IBM Watson Analytics link (PDF materials and data)
5. Clustering PDF
6. Data Platforms and Pattern Mining PDF
7. Large Scale Machine Learning PDF
8. Link Analysis PDF
9. Data Streams PDF
Labs will start in the week of 25th-29th of January
Lab tasks will be posted on Blackboard
- Midterm: February 24th BRING YOUR LAPTOP TO THE MIDTERM!
- Control questions — see Blackboard.
- Any student who misses an examination without a valid medical reason and documentation will receive zero for that examination/tutorial. Those with medical documentation will either be given a makeup exam/tutorial or will have the weight of the examination (final exam/midterm) added to the final exam.
- Project I report is due to 7th of March (Monday).
- Exam: 14-Apr 3:30PM-6.30PM ERC2056 BRING YOUR LAPTOP TO THE FINAL EXAM.
- Feb 3rd Workshop – IBM Watson Analytics, 3:30PM – 6:30P
- Invited Speaker: Khush Gill.
- Here is a link to download the Workshop Workbook, https://ibm.app.box.com/s/6zhj770ep5es0fm5t7y7dxa08ugc1y4g
- Other invited speakers: Morteza ZiHayat (IBM & York University), Mehdi Kargar (York University & Dapasoft – Microsoft Gold Partner), and Ken Pu (UOIT).