University of Florida :: Department of Computer and Information Science and Engineering (CISE)

News & Events

Research Team Develops Information Integration Tool and Benchmark

September 19, 2005

Gainesville, FL—

Imagine two large companies merge into a new, larger company. Prior to the merger, the separate companies used two very different information technology systems to manage their customer databases, accounting activities and inventory management systems. The companies must now operate using one IT system. This requires integrating a large collection of independently written databases, a task that most enterprises find enormously challenging. On the other hand, the success of the new company may be in jeopardy if the two IT systems, or at least the data, can’t be consolidated.

Information integration refers to the process of integrating multiple, heterogeneous information technology systems. When organizations merge, grow, or collaborate, mission-critical data may reside in several different IT systems. The goal of information integration is to bring this data together seamlessly.

Given the complexity of such an integration effort (e.g., think of the many ways there are to represent a simple date string), there are few tools available that can automate the process end-to-end. In fact, most tools can only carry out part of the integration work and require significant customization and human input. Making matters worse, companies are spending lots of money on tools whose capabilities are difficult to assess since no effective benchmarks exists to validate their performance.

As the world economy continues to grow and evolve, driven in large part by information technology, information integration is often critical to organizational survival.

"Information integration continuously haunts industry," said Hammer.

A team of researchers co-led by Prof. Joachim Hammer of the University of Florida is developing a Transform Construction Tool (TCT) to enable the efficient development of data transforms between heterogeneous databases including their management and reuse. Hammer and his colleagues believe that TCT could be a step change in the way users (e.g., IT workers, scientists) will approach the integration of information. Past research on information integration has largely focused on the “attribute matching” problem, i.e., identifying which attributes in a pair of schemas represent the same data element, and then automatically mapping one to the other. Although this existing line of research is well-intentioned and performs a valuable service, it does not represent a solution to the major information integration challenge that we see since it looks for a situation that rarely exists.

As part of TCT, Prof. Hammer and his colleagues have also developed a publicly available testbed and benchmark for evaluating integration tools called THALIA (Test Harness for the Assessment of Legacy Information Integration Approaches). THALIA provides researchers and practitioners with a collection of over 40 downloadable test data sources representing University course catalogs from computer science departments around the world. The data in the testbed provide a rich source of integration problems.

"Course catalogs illustrate the integration problem very nicely. Just like in real-world integration problems, the same information is represented in many ways yet course catalogs are easily understood by everybody and are thus ideal for testing," said Hammer.

Dr. Hammer is collaborating on TCT with MIT faculty member Dr. Mike Stonebraker who is widely recognized as one of the world's foremost experts in database technology and the founder of Ingres, Illustra, and most recently StreamBase. Oguzhan Topsakal, Seok-Won Seong, and Hung-ju Chu, three graduate students in the University of Florida’s Computer and Information Sciences and Engineering department, are part of the UF TCT research team as well.

Dr. Hammer is an associate professor in the Department of Computer and Information Science and Engineering at the University of Florida. In addition to TCT, he currently leads projects on knowledge discovery and extraction, data warehousing, and XML-based information integration and management. Hammer received his Ph.D. in Computer Science from the University of Southern California.

Writer: Danny Rigby

Feedback