Projects and Publications
〈 SHOW Quick Links SORT BY Project Title; 〉
- Integration Benchmark Project - THALIA
- Morpheus Data Transformation Project
- New Technologies for Approximate Query Processing
- New Technologies for Online Aggregation
- New Technologies in Drug Discovery using Metabolic Networks
- Processing Dynamic Event Data and Multi-faceted Knowledge ...
- The STU Project: Database Integration of Space, Time, ...
- Transnational Digital Government
- SEEK: Scalable Extraction of Enterprise Knowledge
- Sequence Indexed Maize Transposon Insertion Sites...
- Southern Plant Diagnostic Network
〈 SHOW Project Details GROUP BY Faculty Members SORT BY Last Name; 〉
Alin Dobra | Joachim Hammer | Chris Jermaine | Tamer Kahveci | Markus Schneider | Stanley Su
New Technologies for Approximate Query Processing
CISE Faculty: Alin Dobra
CISE Students: Florin Rusu, Amit Dhurandhar
Sponsor: National Science Foundation CAREER Award No. 0448264
Description:
One of the two goals of the project is to advance the state-of-the art
in approximate query processing (AQP), a critical component of
analytical processing - a 3.5 billion dollar segment of the software
industry. The need for approximate query processing arises from the
growing discrepancy between the volume of information that has to be
processed and the computational resources or communication
capabilities available. Using two computational models, data-streaming
and distributed computation, the project addresses fundamental
problems in AQP such as development of new approximation techniques
for data-stream computation, extensions of data-stream algorithms to
distributed algorithms that can efficiently query sensor and
peer-to-peer networks, and theoretical aspects of AQP that allow the
design of AQP techniques to be accelerated and better understood. Part
of the project's research goal is the design and implementation of a
approximate query processing engine that uses the developed AQP
techniques and the rigorous benchmarking of the software produced. The
second goal of the project is educational and consists in, on one
hand, motivating students to study and pursue carers in databases
through bonus points for extra activities and integration of the
database curricula and other CS disciplines, and, on the other hand,
integration of approximate query processing into both undergraduate
and graduate curricula. The project will have broad impact by
developing techniques for efficient processing of large volumes of
data - crucial for scientific data processing and home-land security
- and by increasing the quality of database education with a direct
impact on nation's technological leadership.
http://www.cise.ufl.edu/~adobra/AQP
Publications:
Visit AQP publications
list
Morpheus Data Transformation Project
Faculties: Joachim Hammer (team co-lead), Mike Stonebraker (team co-lead), and Pete Dobbins (project manager)
Current CISE Students: Christan Grant, Dev Oliver, Umut Sargut, Rebecca Wells
Sponsor: Jim Gray and Microsoft Research, The New Hampshire Charitable Foundation - Manchester Region
Description:
The need to integrate collections of independently written data base
schemas has seriously challenged enterprises and decision-makers across
many domains. More precisely, information integration comprises the
extraction, transformation, and loading (ETL) of data from disparate
systems into a single repository to support data sharing,
collaboration, or decision-making (reporting) to name a few.
The Morpheus project is aimed at simplifying the transformation
component of ETL making it easy to build, find and reuse transformation
between disparate data types. Our data transformation tool is called
Morpheus TCT (Transformation Construction Toolkit) and provides the
following components and capabilities:
- A graphical scripting facility that allows composition of transform building blocks from simple primitives (e.g., computation, control, table lookup, byte rearranger). It also facilitates the composition of more complex transforms out of existing (simple) ones.
- A repository in which to store transformations and associated data types
- A sophisticated browsing facility that allows a user to discover transforms similar or identical to the one he needs, and then modify them to meet his needs.
- A scaleable execution environment for performing the actual data transformations.
Morpheus TCT is based on the Postgres DBMS for storage and execution of transforms and leverages the Postgres ADT system. Unlike many existing ETL tools, which require transformations to be performed outside of the repository where the data is stored, Morpheus TCT executes the transformations inside the DBMS thereby taking advantage of the amenities provided by a modern DBMS, including efficient storage for data and support for transactions and recovery.
Publications:
Visit Morpheus publications list
Integration Benchmark Project - THALIA
Faculties: Joachim Hammer and Mike Stonebraker
Current CISE Students: Oguzhan Topsakal
Sponsor: National Science Foundation under Grant No. 0122193
Description:
THALIA (Test Harness for the Assessment of Legacy information Integration Approaches) is a publicly
available testbed and benchmark for testing and evaluating integration technologies. This Web site
provides researchers and practitioners with a collection of 40 downloadable data sources representing
University course catalogs from computer science departments around the world. The data in the testbed
provide a rich source of syntactic and semantic heterogeneities since we believe they still pose the
greatest technical challenges to the research community. In addition, this site provides a set of
twelve benchmark queries as well as a scoring function for ranking the performance of an integration system.
We hope this site will be useful to both the research community in their efforts to develop new
integration technologies as well as to potential users of existing technologies in evaluating their
strengths and weaknesses.
Publications:
Visit THALIA publications list
SEEK: Scalable Extraction of Enterprise Knowledge
CISE Faculties: Joachim Hammer and Mark Schmalz
Current CISE Students: Jungmin Shin and Oguzhan Topsakal
Sponsor: National Science Foundation
Description:
The purpose of the SEEK project is to enable firms of varying size and sophistication to utilize the
capabilities of value-adding electronic marketplaces and decision-support tools.
To accomplish this purpose, the SEEK toolkit enables rapid connection to and extraction of data
from heterogeneous information sources. SEEK brings together researchers from computer science,
construction, and manufacturing. This diverse team is developing a comprehensive approach to
extracting and composing knowledge resident in heterogeneous legacy systems. This supports operational
decisions in the extended supply chain.
Publications:
Visit SEEK publications list
New Technologies for Online Aggregation
CISE Faculty: Christopher Jermaine
CISE Students: Subramanian Arumugam, Shantanu Joshi, Abhijit Pol, Mingxi Wu
Sponsor: National Science Foundation CAREER Award No. 0347408
Period: 5/15/04 - 4/30/09
Description:
The project is concerned with investigating a promising approach to
interactive, large-scale data analysis, called online aggregation
(OLA). In OLA, the user is kept informed at all times of the current
estimate of the answer to a query over the data, along with a
statistical estimate of the accuracy of the answer. The advantage of
OLA is that an approximate answer with satisfactory accuracy can often
be computed very quickly and inexpensively, at which time the query
can be terminated.
The project has two goals - technological and educational. The
technological goal of the project is designing and implementing a
software system for data analysis that is fundamentally based on OLA,
as opposed to software that augments the traditional, batched
processing architecture with some OLA capabilities. The project
reconsiders data management issues such as indexing, query
optimization, join algorithms, and user interface design in the
context of OLA. Evaluation is via rigorous benchmarking of the
software produced. The educational goal of the project is development
of techniques the make use of direct faculty involvement to help
ensure retention of beginning engineering students. The project will
achieve broad impact by developing techniques with potential to bring
about a fundamental change in the $3.5 billion segment of the software
industry now devoted to analytic processing. Also, by investigating
ways to ensure the retention and success of tomorrow's engineers, the
project will achieve broad impact by helping to ensure the
technological and scientific leadership of the nation. The project Web
site http://www.cise.ufl.edu/~cjermain/OLA will be used for the
results dissemination.
Publications:
Visit Online Aggregation publications
list
Sequence Indexed Maize Transposon Insertion Sites for Cereal Functional Genomics
Faculties: Tamer Kahveci and Mark Settles
CISE Student: Xuehui Li
Sponsor: UFGI Seed Grant
Description:
The continued improvement of cereal crops requires the
identification of novel genes that have an impact on crop traits.
Functional genomics technologies enable the identification of these
candidates, and gene knockouts are an essential resource for
functional genomics analysis. There is a critical need to develop
knockout resources for cereal crop species. These resources must be
easily accessible to have a significant impact on the plant research
community. We will lay the foundation to make gene knockouts from the
UniformMu maize transposon mutagenesis population simple and easy to
use. Specifically, we will develop transposon Flanking Sequence Tags
(FSTs) from germinal transposon insertions. We will develop
bioinformatic tools to identify FSTs that most likely correspond to
null mutations as well as tools to rapidly design PCR primers for
molecular markers to track the insertions. In the process of
generating these resources, we will identify knockouts that are
tightly linked to seed mutant phenotypes.
New Technologies in Drug Discovery using Metabolic Networks
CISE Faculty: Tamer Kahveci
CISE Student: Padmavati Sridhar
Sponsor: Oak Ridge Associated Universities
Description:
In pharmaceutics, the development of every drug involves
two phases, namely discovery and testing. Drug discovery is an
expensive process where the main steps are target identification and
lead discovery. A target is a biological molecule (e.g., a compound or
an enzyme) which is vital for the survival of a disease-causing
microorganism, the elimination of which will result in eliminating the
microorganism, thereby curing the disease. Enzymes are catalysts of
reactions which result in the production of essential metabolites
(compounds) in the metabolic network of every living
organism. Therefore, it is intuitive to predict candidate drug targets
by locating the enzymes which are responsible for the production of
vital metabolites in the network. Testing all possible sets of enzymes
is infeasible (both computationally and experimentally) since the
number of such sets grows exponentially. Fast methods which identify
potential enzymatic drug-targets will play an important role in the
target-identification phase of drug discovery. The goal of this
proposal is to devise efficient computational methods to determine the
set of enzymes which is the optimal drug target in a metabolic
network. More specifically, the goals of this proposal are: Develop
sensitive and efficient methods to find best drug target candidates
from the topological structure of the metabolic networks. Develop
approximate methods with quality guarantees that find candidate drug
targets in metabolic networks that are too large to analyze using
traditional exhaustive search strategies.
The STU Project: Database Integration of Space, Time, and Uncertainty as a Foundation for the Next Generation of GIS
CISE Faculty: Markus Schneider
CISE Students: Alejandro Pauly, Reasey Praing
Sponsor: National Science Foundation CAREER Award No. 0347574
Description:
The goal of the project is to incorporate the three fundamental features of space, time, and
uncertainty (STU) into the next generation of Geographical Information Systems (GIS). The STU
incorporation greatly increases the expressive power of GIS and broadens the possible spectrum of
future GIS applications. The STU project is devoted to the design, implementation, and database
integration of new, computational, formal data models (type systems, algebras), query languages,
application programming interfaces, and software tools for managing and querying space, space-time,
space-uncertainty, and space-time-uncertainty objects. The goal is to create universal, application-neutral,
and versatile tools serving as a basis for numerous GIS applications in which the inherent features of
space, time, and uncertainty play an important role. These tools will be implemented as extension packages
and embedded into extensible commercial database management systems (DBMS). The approach used in this
project rests on several extensible algebras, which may cooperate with each other and make it possible
to add new types and operations if necessary. The connection of the algebras to commercial DBMS will
have a broad impact on the many domains that use large volumes of STU data by improving the processing
methods for GIS-related applications. Different and large user groups get the chance to obtain full-fledged
database support enhanced by one or several STU algebras. The educational component of this project
includes creating and using GIS educational materials and involving students in research by providing
students with an appropriate software and hardware environment for their research projects and further
education. The project Web site http://www.cise.ufl.edu/research/SpaceTimeUncertainty/STU.html will
be used for dissemination of both research and educational material.
Transnational Digital Government
CISE Faculty: Stanley Y. W. Su
Sponsor: National Science Foundation
Period: 5/15/02 - 4/30/06
Description:
This project aims to develop advanced information technologies for achieving transnational information sharing.
The part of the project that Su and his students have been working on is research and development of a
distributed query processing system and its integration with an Event-Trigger-Rule Sever to achieve information
sharing, event notification, rule processing and process coordination. These two system components
have also been
integrated with a machine translation system developed at CMU and a conversational interface developed at the
University of Colorado. The integrated systems have been demonstrated at the National Conference on Digital
Government Research, 2004 and at project meetings in Belize and the Dominican Republic, and a meeting at the
Organization of American States.
Southern Plant Diagnostic Network
CISE Faculty: Stanley Y. W. Su
CISE Students: Seema Degwekar
Sponsor: US Department of Agriculture
Period: 1/1/04 - 9/30/06
Description:
The project aims to research and develop a regional network system at the University of Florida, which
connects systems in 12 states and Puerto Rico, for the detection and diagnosis of plant health problems.
The key functions of the regional center are to extend and support sound public policies, implement
environmentally sound prevention and management strategies, and provide leadership and training.
Processing Dynamic Event Data and Multi-faceted Knowledge in a Collaborative Federation
CISE Faculty: Stanley Y. W. Su (CISE), Howard Beck (ABE)
CISE Students: Seema Degwekar, Jeff DePree, Xuelian Xiao
Research Scholar: Chen Zhou
Sponsor: National Science Foundation
Period: 8/16/2006 - 8/15/2009
Description:
The goal of the project is the creation of an infrastructure and the development of technologies to enable the
establishment of collaboration federations over the Internet. Each federation is formed by a number of collaborating
but autonomous organizations within a country or across national boundaries to 1) publish and subscribe to distributed
events of interest and receive notification and event data upon the occurrences of events to aid their decision-making
and problem-solving, 2) define, publish, exchange and apply distributed, heterogeneous knowledge rules to enforce their
policies, regulations and constraints and/or invoke manual and automated tasks, and 3) contribute and share dynamic event
data and application operations to coordinate their activities and perform their functions.
Specifically, the project aims to provide a rule markup language, develop techniques and algorithms for processing and
managing distributed, multifaceted knowledge to support problem solving and decision-making, define a domain ontology for
plant disease and pest diagnostics and research on ontology management techniques and reasoning based on the domain ontology,
establish a peer-to-peer event and rule based system, and apply the developed infrastructure and technologies in the application
domain of plant disease and pest diagnostics using USDA's National Plant Diagnostics Network.