News & Events
NEW TECHNOLOGIES FOR ONLINE AGGREGATION EXPLORED WITH NSF CAREER AWARD
November 9, 2004
GAINESVILLE, Fla. --- Imagine that you have a closet, and it is jam packed full of clothes that you never wear, but don't want to get rid of. You dress up in maybe 10 percent of the clothes hanging in the front of your closet, and the remaining 90 percent in the back, just sit and gather dust. You would hate to throw them away or donate them to charity, because the minute that you did, there would be an occasion for that pink ruffled dress to come out of hiding.
That's how it was about 10 years ago with computer data. At the high-end, scientific organizations and corporations had huge amounts of data, and they began to archive it with the promise that at a later date, they would undertake projects to analyze the data. Much like the old clothes in the closet, 99 percent of this data is untouched.
Christopher Jermaine, Assistant Professor in the Department of Computer and Information Science and Engineering, recently received a CAREER award from the National Science Foundation (NSF) to address some of the problems that may be causing organizations to underutilize this archive of data.
His project provides more than $400,000 in funding over five years and integrates two main components: research and teaching. Jermaine's project, New Technologies for Online Aggregation, is designed to meet two goals: improve the performance of data analysis and statistical processing in these data warehouses and to address retention rates in computer science engineering education.
The NSF sponsored Faculty Early Career Development (CAREER) Program is one of the most prestigious awards for new faculty members, and awardees are selected on the basis of creative, career-development plans that effectively integrate research and education within the context of the mission of their institution, according to NSF's Web site.
"Data wearhouses are actually huge information repositories, that are gigabytes, terabytes or even petabytes in size," Jermaine said.
The current system to analyze data of this magnitude is based on software designed for day-to-day operations that is at least 10 years old. When companies began archiving data, they applied what was already available in the industry, online transaction processing (OLTP). The system is not designed specifically for archiving or analytical processing, and the result is day-long waits for answers to queries, and a process that could take weeks, or months to investigate problems with numerous variables.
For example, a car manufacturing company might retain information on all the parts that they manufacture for a vehicle, the conditions under which the parts were developed, where they were manufactured, what raw materials were used, and into which automobile they were installed. Suppose a company received reports of several failures in the vehicles it manufactures. The company would then need to investigate and begin analyzing the data to determine if the failure was random, or if there is some systematic reason behind the failures. As relationships between variables are being explored, a researcher might ask one question, and then, several hours pass before an answer is reached. As different explanations are being investigated, the process has become so time consuming, it is no longer very efficient.
Jermaine's project investigates an approach to interactive, large-scale data analysis, called online aggregation. In OLA, the user is kept informed of the estimated time for an answer to a query of the data, in addition to, a statistical estimate of the accuracy of the answer. The main advantage is that an answer within a defined margin of error can be computed quickly and inexpensively.
"The goal is to design and implement a software system that is based on OLA, which could have a far-reaching impact in developing techniques with a potential to bring about a fundamental change in the $3.5 billion segment of the software industry now devoted to analytic processing," said Jermaine.
In addition to potentially revolutionizing the analytical software industry, Jermaine's project seeks to address retention issues in computer science engineering education. The last few decades have seen a long slide in engineering enrollments, said Jermaine. Computer science has been particularly hard hit since the dot-com bubble burst in 2000. One persistent problem contributing to low enrollments is a low retention rate in the computer sciences, due to the rigor of the curriculum. Jermaine proposes a peer support program and direct faculty involvement to help ensure retention of beginning engineering students.
At an early age, Jermaine showed an interest in computers. He remembers his dad bringing home from work, a computer that at
the time was incredibly expensive, and he got started playing around on it and trying to program it. Jermaine received
his doctorate from Georgia Tech in 2002 and joined the CISE faculty at the University of Florida in 2003.
Writer: Mandelyn Hutcherson, 352-392-4700 ext. 5011, HutchersonM@mail.vetmed.ufl.edu
Source: Chris Jermaine, 352-392-2691, jermain@cise.ufl.edu