CIS 6930 / CIS4930: Data Science: Large-scale Advanced Data Analysis

Instructor: Daisy Zhe Wang

Section: 6263 / 6874
Location: Tuesday CSE E119 / Thursday CSE E119
Time: Tuesday 8-9th period (3:00-4:55pm) / Thursday 9th periods (4:05-4:55pm)
Office hour: Tuesday 10th period / Thursday 10th period (5:00-6:00pm)
Contact: E456 (office), (352) 562-8936 (office phone)


More and more companies are generating large amounts of diverse data (e.g., tweets, logs, click-streams, health care records, mobile phones, sensor nets) and applying sophisticated statistical models and algorithms for decision support, quantitative analysis and to build data-intensive products and services. Examples include Netflix, Google, Facebook, Twitter, Amazon, Fox Interactive, Splunk. Database systems have traditionally been the de facto framework for scalable data management, querying and analysis. However, the new requirements in deep analysis and big data go beyond the capabilities of the traditional database systems. This course will describe real-life applications that require large-scale advanced data analysis; cutting edge algorithms that are used for different analysis tasks; and existing data management systems and computing infrastructures developed to scale to the data as well as the computation.

In this course, we will discuss recent publications on Data Science with emphasis on algorithms and systems for large-scale advanced data analysis. Each student will be responsible for presenting one or more such papers in class and participating in discussions on papers presented by the other people in class. Also, each student will do a class project that has the largest impact on the final grade. Every student should be comfortable with programming and preferably have prior experience with data management systems, data modeling and analysis.




Information and Database Systems I (CIS 4301) or equivalent is a pre-requisite. Preferably you have already taken one of the following courses: COP6726 Database System Implementation, CIS4930DTM Data Mining or courses in Machine Learning/Natural Language Processing. I will assume that the students already have basic knowledge of database systems, data mining, data analysis, and are comfortable with basic computer programming (e.g., with C or Java).


This course will cover the most recent developments in a broad range of Data Science problems. I would like to put more focus on algorithms and systems that enable advanced (statistical/machine learning) data analysis. The topics are as follows:


(Subject to minor changes till the first day of the semester) Grading will be based on project (55 %), presentations (20 %), and homework (25 %). Class participation and novelty in projects will be rewarded by bonus (5 % each). Late returns will cause deduction of 20 % of the points for each late day.

Text book and Some Pointers

There is no required text book for this class. We will use papers as our main source.

Some related links and pointers:

Other Interest Projects