Assistant Professor

Director, Data Science Research Lab

Computer and Information Science and Engineering (CISE)

College of Engineering, University of Florida

Gainesville, FL 32611

Office: E456 CSE Building

Phone: (352) 562-8936; Fax: (352) 392-1220

Office Hours: Monday/Wednesday 3:50-4:50pm or by appointment

DSR Lab ShortBio Prospective Students Berkeley Homepage

My research interest is in building systems and designing algorithms to support better data analysis, where better can mean: more efficient/scalable, more advanced (using statistical machine learning), more accurate, more interactive, self-improving, and easier to use. I pursue research topics such as probabilistic databases, large-scale advanced data analysis, and query-driven interactive machine learning. Currently, I am particularly interested in bridging data management systems with statistical and probabilistic models and tools.

If you are an undergrad/graduate student interested in data science research, please refer to Prospective Students.

For more information please visit Data Science Research @ UF. I am also a member of the UFL Database Group.

ShortCV News Projects Students Talks Publications Teaching Other

- Fall 2014, I am co-teaching
*Projects in Data Science*, the second course in the three-course UF CISE Data Science Curriculum with Dr. Sanjay Ranka. - Together with Dr. Tyson Condie at UCLA, I serve as the Proceeding Chair for VLDB 2015.
*Knowledge Expansion over Probabilistic Knowledge Bases*paper with my student Yang Chen was accepted and presented at SIGMOD 2014. I gave an invited talk in the WACCK workshop (Workshop on Automatic Creation and Curation of Knowledge Bases) at SIGMOD 2014.- I gave a talk on
*Knowledge Base Construction from Big Text, Images and Crowds*at a WISE event June 2014, organized by TRUST at Cornell University with Big Data research as the central theme.

- ProbKB: Large-scale Probabilistic Reasoning over Uncertain Knowledge Bases
- DBlytics: Statistical Text Analysis in DBMS and MPP frameworks
- Archer: Query-Driven Machine Learning
- CAMeL: Leverage Crowd Support in Probabilistic Databases
- Scalable Image/Video Extraction and Retrieval System and Algorithms (Topic-modeling Based Information Retrieval)
- Knowledge Extraction and Exchange Using Medical Notes
- Past Projects

- CA4773/CIS6930, Projects in Data Science, Fall 2014
- CIS6930, Introduction to Data Science/Data Intensive Computing, Spring 2014
- COP5725, Data Management Systems, Fall 2013
- CIS6930, Data Science: Large-scale Advanced Data Analysis, Spring 2013
- COP5725, Data Management Systems, Fall 2012
- CIS4301, Information and Data Management Systems, Spring 2012
- CIS6930, Data Science: Large-scale Advanced Data Analysis, Fall 2011

**Knowledge Expansion over Probabilistic Knowledge Bases**

*In Proceedings of the ACM SIGMOD International Conference on Management of Data, 2014*

` Yang Chen, `

**Daisy Zhe Wang**

**CASTLE: Crowd-Assisted System for Textual Labeling and Extraction**

*In Proceedings of HCOMP 2013*

` Sean Goldberg, `

**Daisy Zhe Wang**`, Tim Kraska`

**Web-Scale Knowledge Inference Using Markov Logic Networks**

*Proceedings of ICML workshop on Structured Learning: Inferring Graphs from Structured and Unstructured Inputs, 2013, Atlanta*

` Yang Chen, `

**Daisy Zhe Wang**

**Knowledge Extraction and Outcome Prediction using Medical Notes**

*Proceedings of ICML workshop on Role of Machine Learning in Transforming Healthcare, 2013, Atlanta.*

` Ryan Cobb, Sahil Puri, Tezcan Ozrazgat Baslanti, Azra Bihorac, `

**Daisy Zhe Wang**

**GPText: Greenplum Parallel Statistical Text Analysis Framework.**

*Proceedings of SIGMOD workshop on Data Analytics in the Cloud, 2013, New York.*

` Kun Li, Christan Grant, `

**Daisy Zhe Wang**`, Sunny Khatri, George Chitouras`

**MADden: Query-Driven Statistical Text Analytics**

*Proceedings of ACM CIKM, 2012*

` Christan Grant, Jordan Gumbs, Kun Li, `

**Daisy Zhe Wang**`, George Chitouras`

**Automatic Knowledge Base Construction using Probabilistic Extraction, Deductive Reasoning, and Human Feedback**

*Proceedings of NAACL-HLT, 2012, short paper*

` `

**Daisy Zhe Wang**`, Yang Chen, Sean Goldberg, Christan Grant, and Kun Li`

**The MADlib Analytics Library or MAD Skills, the SQL**

*Proceedings of VLDB, 2012*

` Joseph M. Hellerstein, Christoper Re, Florian Schoppmann, `

**Daisy Zhe Wang**`, Eugene Fratkin,`

` Aleks Gorajek, Kee Siong Ng, Caleb Welton, Xixuan Feng, Kun Li, and Arun Kumar`

**Hybrid In-Database Inference for Declarative Information Extraction** sigmod11 sigmod11slides

*Proceedings of ACM SIGMOD International Conference on Management of Data, 2011 *

` `

**Daisy Zhe Wang**`, Michael J. Franklin, Minos Garofalakis, Joseph M. Hellerstein, `

` and Michael L. Wick`

**Selectivity Estimation for Extraction Operators over Text Data** icde11 icde11slides

*Proceedings of 27th IEEE ICDE International Conference on Data Engineering, 2011 *

` `

**Daisy Zhe Wang**`, Long Wei, Yunyao Li, Frederick Reiss, and Shivakumar Vaithyanathan`

**Querying Probabilistic Information Extraction** pvldb10 pvldb10slides

*Proceedings of 36th VLDB Very Large Data Base Endowment, 2010, PVLDB Vol.3 *

` `

**Daisy Zhe Wang**`, Michael J. Franklin, Minos Garofalakis, and Joseph M. Hellerstein`

**Probabilistic Declarative Information Extraction** icde10 icde10slides TR-pdb-ie

*Proceedings of 26th IEEE ICDE International Conference on Data Engineering, 2010, short paper *

` `

**Daisy Zhe Wang**`, Eirinaios Michelakis, Michael J. Franklin, Minos Garofalakis, `

` and Joseph M. Hellerstein`

**BayesStore: Managing Large, Uncertain Data Repositories with Probabilistic Graphical Models** vldb08a vldb08slides

*Proceedings of 34th VLDB Very Large Data Base Endowment, 2008 *

` `

**Daisy Zhe Wang**`, Eirinaios Michelakis, Minos Garofalakis, and Joseph M. Hellerstein`

**WebTables: Exploring the Power of Tables on the Web** vldb08b

*Proceedings of 34th VLDB Very Large Data Base Endowment, 2008 *

` Michael Cafarella, Alon Halevy, `

**Daisy Zhe Wang**`, Eugene Wu, Yang Zhang`

- “Probabilistic Knowledge Base Construction from Big Text, Images and Crowds”
- TRUST WISE workshop at Cornell University, June 2014
- UF Big Data Workshop, June 2013

- “Probabilistic Knowledge Base Systems”
- Invited Talk, WACCK workshop at SIGMOD, June 2014
- Shanghai Jiaotong University, China, April 2014
- ECE Department, University of Florida, October 2013
- Fudan University, China, August 2013
- Google Research, EMC, April 2013
- Rochester Big Data Forum, October 2012

- “Hybrid In-Database Inference for Declarative Information Extraction” sigmod11slides
- SIGMOD Conference, June 15, 2011

- “Selectivity Estimation for Extraction Operators over Text Data” icde11slides
- ICDE Conference, April 14, 2011

- “Querying Probabilistic Information Extraction”
- EMC/Greenplum Seminar, July 11, 2011
- CSAIL Seminar, MIT, November 17, 2010.
- Database Seminar, University of Toronto, January 5, 2010.

- EMC/Greenplum Seminar, July 11, 2011
- “Querying Probabilistic Information Extraction” pvldb10slides
- VLDB Conference, September, 2010

- “Probabilistic Declarative Information Extraction” icde10slides
- ICDE Conference, March, 2010

- “Declarative Information Extraction in a Probabilistic Database System”
- Info Lab Seminar, Stanford, May, 2009.

VLDB Endowment

ACM SIGMOD

LaTex Templates and Guides

**A Parable of Modern Research**

Bob has lost his keys in a room which is dark except for one brightly lit corner.

“Why are you looking under the light, you lost them in the dark!”

“I can only see here.”