Projects and Publications


SHOW Quick Links SORT BY Project Title; 〉

  1. Integration Benchmark Project - THALIA
  2. Morpheus Data Transformation Project
  3. New Technologies for Approximate Query Processing
  4. New Technologies for Online Aggregation
  5. New Technologies in Drug Discovery using Metabolic Networks
  6. Processing Dynamic Event Data and Multi-faceted Knowledge ...
  7. The STU Project: Database Integration of Space, Time, ...
  8. Transnational Digital Government
  9. SEEK: Scalable Extraction of Enterprise Knowledge
  10. Sequence Indexed Maize Transposon Insertion Sites...
  11. Southern Plant Diagnostic Network

SHOW Project Details GROUP BY Faculty Members SORT BY Last Name; 〉

Alin Dobra | Joachim Hammer | Chris Jermaine | Tamer Kahveci | Markus Schneider | Stanley Su



New Technologies for Approximate Query Processing

CISE Faculty: Alin Dobra
CISE Students: Florin Rusu, Amit Dhurandhar
Sponsor: National Science Foundation CAREER Award No. 0448264

Description:
One of the two goals of the project is to advance the state-of-the art in approximate query processing (AQP), a critical component of analytical processing - a 3.5 billion dollar segment of the software industry. The need for approximate query processing arises from the growing discrepancy between the volume of information that has to be processed and the computational resources or communication capabilities available. Using two computational models, data-streaming and distributed computation, the project addresses fundamental problems in AQP such as development of new approximation techniques for data-stream computation, extensions of data-stream algorithms to distributed algorithms that can efficiently query sensor and peer-to-peer networks, and theoretical aspects of AQP that allow the design of AQP techniques to be accelerated and better understood. Part of the project's research goal is the design and implementation of a approximate query processing engine that uses the developed AQP techniques and the rigorous benchmarking of the software produced. The second goal of the project is educational and consists in, on one hand, motivating students to study and pursue carers in databases through bonus points for extra activities and integration of the database curricula and other CS disciplines, and, on the other hand, integration of approximate query processing into both undergraduate and graduate curricula. The project will have broad impact by developing techniques for efficient processing of large volumes of data - crucial for scientific data processing and home-land security - and by increasing the quality of database education with a direct impact on nation's technological leadership. http://www.cise.ufl.edu/~adobra/AQP

Publications:
Visit AQP publications list


Morpheus Data Transformation Project

Faculties: Joachim Hammer (team co-lead), Mike Stonebraker (team co-lead), and Pete Dobbins (project manager)
Current CISE Students: Christan Grant, Dev Oliver, Umut Sargut, Rebecca Wells
Sponsor: Jim Gray and Microsoft Research, The New Hampshire Charitable Foundation - Manchester Region

Description:
The need to integrate collections of independently written data base schemas has seriously challenged enterprises and decision-makers across many domains. More precisely, information integration comprises the extraction, transformation, and loading (ETL) of data from disparate systems into a single repository to support data sharing, collaboration, or decision-making (reporting) to name a few.

The Morpheus project is aimed at simplifying the transformation component of ETL making it easy to build, find and reuse transformation between disparate data types. Our data transformation tool is called Morpheus TCT (Transformation Construction Toolkit) and provides the following components and capabilities:

Morpheus TCT is based on the Postgres DBMS for storage and execution of transforms and leverages the Postgres ADT system. Unlike many existing ETL tools, which require transformations to be performed outside of the repository where the data is stored, Morpheus TCT executes the transformations inside the DBMS thereby taking advantage of the amenities provided by a modern DBMS, including efficient storage for data and support for transactions and recovery.

Publications:
Visit Morpheus publications list


Integration Benchmark Project - THALIA

Faculties: Joachim Hammer and Mike Stonebraker
Current CISE Students: Oguzhan Topsakal
Sponsor: National Science Foundation under Grant No. 0122193

Description:
THALIA (Test Harness for the Assessment of Legacy information Integration Approaches) is a publicly available testbed and benchmark for testing and evaluating integration technologies. This Web site provides researchers and practitioners with a collection of 40 downloadable data sources representing University course catalogs from computer science departments around the world. The data in the testbed provide a rich source of syntactic and semantic heterogeneities since we believe they still pose the greatest technical challenges to the research community. In addition, this site provides a set of twelve benchmark queries as well as a scoring function for ranking the performance of an integration system.

We hope this site will be useful to both the research community in their efforts to develop new integration technologies as well as to potential users of existing technologies in evaluating their strengths and weaknesses.

Publications:
Visit THALIA publications list


SEEK: Scalable Extraction of Enterprise Knowledge

CISE Faculties: Joachim Hammer and Mark Schmalz
Current CISE Students: Jungmin Shin and Oguzhan Topsakal
Sponsor: National Science Foundation

Description:
The purpose of the SEEK project is to enable firms of varying size and sophistication to utilize the capabilities of value-adding electronic marketplaces and decision-support tools.

To accomplish this purpose, the SEEK toolkit enables rapid connection to and extraction of data from heterogeneous information sources. SEEK brings together researchers from computer science, construction, and manufacturing. This diverse team is developing a comprehensive approach to extracting and composing knowledge resident in heterogeneous legacy systems. This supports operational decisions in the extended supply chain.

Publications:
Visit SEEK publications list


New Technologies for Online Aggregation

CISE Faculty: Christopher Jermaine
CISE Students: Subramanian Arumugam, Shantanu Joshi, Abhijit Pol, Mingxi Wu
Sponsor: National Science Foundation CAREER Award No. 0347408
Period: 5/15/04 - 4/30/09

Description:
The project is concerned with investigating a promising approach to interactive, large-scale data analysis, called online aggregation (OLA). In OLA, the user is kept informed at all times of the current estimate of the answer to a query over the data, along with a statistical estimate of the accuracy of the answer. The advantage of OLA is that an approximate answer with satisfactory accuracy can often be computed very quickly and inexpensively, at which time the query can be terminated.

The project has two goals - technological and educational. The technological goal of the project is designing and implementing a software system for data analysis that is fundamentally based on OLA, as opposed to software that augments the traditional, batched processing architecture with some OLA capabilities. The project reconsiders data management issues such as indexing, query optimization, join algorithms, and user interface design in the context of OLA. Evaluation is via rigorous benchmarking of the software produced. The educational goal of the project is development of techniques the make use of direct faculty involvement to help ensure retention of beginning engineering students. The project will achieve broad impact by developing techniques with potential to bring about a fundamental change in the $3.5 billion segment of the software industry now devoted to analytic processing. Also, by investigating ways to ensure the retention and success of tomorrow's engineers, the project will achieve broad impact by helping to ensure the technological and scientific leadership of the nation. The project Web site http://www.cise.ufl.edu/~cjermain/OLA will be used for the results dissemination.

Publications:
Visit Online Aggregation publications list


Sequence Indexed Maize Transposon Insertion Sites for Cereal Functional Genomics

Faculties: Tamer Kahveci and Mark Settles
CISE Student: Xuehui Li
Sponsor: UFGI Seed Grant

Description:
The continued improvement of cereal crops requires the identification of novel genes that have an impact on crop traits. Functional genomics technologies enable the identification of these candidates, and gene knockouts are an essential resource for functional genomics analysis. There is a critical need to develop knockout resources for cereal crop species. These resources must be easily accessible to have a significant impact on the plant research community. We will lay the foundation to make gene knockouts from the UniformMu maize transposon mutagenesis population simple and easy to use. Specifically, we will develop transposon Flanking Sequence Tags (FSTs) from germinal transposon insertions. We will develop bioinformatic tools to identify FSTs that most likely correspond to null mutations as well as tools to rapidly design PCR primers for molecular markers to track the insertions. In the process of generating these resources, we will identify knockouts that are tightly linked to seed mutant phenotypes.


New Technologies in Drug Discovery using Metabolic Networks

CISE Faculty: Tamer Kahveci
CISE Student: Padmavati Sridhar
Sponsor: Oak Ridge Associated Universities

Description:
In pharmaceutics, the development of every drug involves two phases, namely discovery and testing. Drug discovery is an expensive process where the main steps are target identification and lead discovery. A target is a biological molecule (e.g., a compound or an enzyme) which is vital for the survival of a disease-causing microorganism, the elimination of which will result in eliminating the microorganism, thereby curing the disease. Enzymes are catalysts of reactions which result in the production of essential metabolites (compounds) in the metabolic network of every living organism. Therefore, it is intuitive to predict candidate drug targets by locating the enzymes which are responsible for the production of vital metabolites in the network. Testing all possible sets of enzymes is infeasible (both computationally and experimentally) since the number of such sets grows exponentially. Fast methods which identify potential enzymatic drug-targets will play an important role in the target-identification phase of drug discovery. The goal of this proposal is to devise efficient computational methods to determine the set of enzymes which is the optimal drug target in a metabolic network. More specifically, the goals of this proposal are: Develop sensitive and efficient methods to find best drug target candidates from the topological structure of the metabolic networks. Develop approximate methods with quality guarantees that find candidate drug targets in metabolic networks that are too large to analyze using traditional exhaustive search strategies.


The STU Project: Database Integration of Space, Time, and Uncertainty as a Foundation for the Next Generation of GIS

CISE Faculty: Markus Schneider
CISE Students: Alejandro Pauly, Reasey Praing
Sponsor: National Science Foundation CAREER Award No. 0347574

Description:
The goal of the project is to incorporate the three fundamental features of space, time, and uncertainty (STU) into the next generation of Geographical Information Systems (GIS). The STU incorporation greatly increases the expressive power of GIS and broadens the possible spectrum of future GIS applications. The STU project is devoted to the design, implementation, and database integration of new, computational, formal data models (type systems, algebras), query languages, application programming interfaces, and software tools for managing and querying space, space-time, space-uncertainty, and space-time-uncertainty objects. The goal is to create universal, application-neutral, and versatile tools serving as a basis for numerous GIS applications in which the inherent features of space, time, and uncertainty play an important role. These tools will be implemented as extension packages and embedded into extensible commercial database management systems (DBMS). The approach used in this project rests on several extensible algebras, which may cooperate with each other and make it possible to add new types and operations if necessary. The connection of the algebras to commercial DBMS will have a broad impact on the many domains that use large volumes of STU data by improving the processing methods for GIS-related applications. Different and large user groups get the chance to obtain full-fledged database support enhanced by one or several STU algebras. The educational component of this project includes creating and using GIS educational materials and involving students in research by providing students with an appropriate software and hardware environment for their research projects and further education. The project Web site http://www.cise.ufl.edu/research/SpaceTimeUncertainty/STU.html will be used for dissemination of both research and educational material.


Transnational Digital Government

CISE Faculty: Stanley Y. W. Su
Sponsor: National Science Foundation
Period: 5/15/02 - 4/30/06

Description:
This project aims to develop advanced information technologies for achieving transnational information sharing. The part of the project that Su and his students have been working on is research and development of a distributed query processing system and its integration with an Event-Trigger-Rule Sever to achieve information sharing, event notification, rule processing and process coordination. These two system components have also been integrated with a machine translation system developed at CMU and a conversational interface developed at the University of Colorado. The integrated systems have been demonstrated at the National Conference on Digital Government Research, 2004 and at project meetings in Belize and the Dominican Republic, and a meeting at the Organization of American States.


Southern Plant Diagnostic Network

CISE Faculty: Stanley Y. W. Su
CISE Students: Seema Degwekar
Sponsor: US Department of Agriculture
Period: 1/1/04 - 9/30/06

Description:
The project aims to research and develop a regional network system at the University of Florida, which connects systems in 12 states and Puerto Rico, for the detection and diagnosis of plant health problems. The key functions of the regional center are to extend and support sound public policies, implement environmentally sound prevention and management strategies, and provide leadership and training.


Processing Dynamic Event Data and Multi-faceted Knowledge in a Collaborative Federation

CISE Faculty: Stanley Y. W. Su (CISE), Howard Beck (ABE)
CISE Students: Seema Degwekar, Jeff DePree, Xuelian Xiao
Research Scholar: Chen Zhou
Sponsor: National Science Foundation
Period: 8/16/2006 - 8/15/2009

Description:
The goal of the project is the creation of an infrastructure and the development of technologies to enable the establishment of collaboration federations over the Internet. Each federation is formed by a number of collaborating but autonomous organizations within a country or across national boundaries to 1) publish and subscribe to distributed events of interest and receive notification and event data upon the occurrences of events to aid their decision-making and problem-solving, 2) define, publish, exchange and apply distributed, heterogeneous knowledge rules to enforce their policies, regulations and constraints and/or invoke manual and automated tasks, and 3) contribute and share dynamic event data and application operations to coordinate their activities and perform their functions.

Specifically, the project aims to provide a rule markup language, develop techniques and algorithms for processing and managing distributed, multifaceted knowledge to support problem solving and decision-making, define a domain ontology for plant disease and pest diagnostics and research on ontology management techniques and reasoning based on the domain ontology, establish a peer-to-peer event and rule based system, and apply the developed infrastructure and technologies in the application domain of plant disease and pest diagnostics using USDA's National Plant Diagnostics Network.



Home

News and Events

Faculty Members

Students

Volunteers

Research

Seminars

Classes

Resources

DB Wiki