The goal of the benchmark is to provide an objective testbed for integration systems. As such, the benchmark specifies a list of 12 integration challenges that roughly correspond to the most common heterogeneities that exist among related yet autonomously managed data sources. In the future, we may include additional challenges in the benchmark. Each challenge is expressed as an XQuery that has been formulated against a reference schema. A challenge is successfully met if the integration system can correctly answer the query against BOTH the reference as well as the challenge schema which is also specified. 


Query 1:
--------
List courses taught by instructor 'Mark'.
Reference Schema: gatech.xml
Challenge Schema: cmu.xml

Challenge: Determine that the instructor information in CMU's course catalog can be found in a field called "Lecturer"


Query 2:
--------
Find all database courses that meet at 1:30pm on any given day.
Reference Schema: cmu.xml
Challenge Schema: umb.xml

Challenge: Conversion of time from a 12 hour-clock to a 24 hour-clock.


Query 3:
--------
Find all courses with the string 'Data Structures' in the title.
Reference Schema: umd.xml
Challenge Schema: brown.xml

Challenge: Map a single string to a combination external link (URL) and string to find a matching value. Note, this query also exhibits a synonym heterogeneity (CoureName vs. Title).


Query 4:
--------
List all database courses that carry more than 10 credit hours.
Reference Schema: cmu.xml
Challenge Schema: ethz.xml

Challenge: Apart from the language conversion issues, the challenge is to develop a mapping that coverts the numeric value for credit hours into a string that describes the expected scope of the course.


Query 5:
--------
Find all courses with the string 'database' in the course title.
Reference Schema: umd.xml
Challenge Schema: ethz.xml

Challenge: For each course in the catalog of ETH, convert the German tags into their English counterparts to locate the one representing course name. Convert the English course title 'Database' into its German counterpart 'Datenbank' or 'Datenbanksystem' and retrieve those courses from ETH that contain that substring.


Query 6:
--------
List all textbooks for courses about verification theory.
Reference Schema: toronto.xml
Challenge Schema: cmu.xml

Challenge: Proper treatment of NULL values. In the answer to the query, the integrated result must include the fact that no textbook information was available for CMU's course.


Query 7:
--------
Find all entry-level database courses.
Reference Schema: umich.xml
Challenge Schema: asu.xml

Challenge: Infer the fact that the course has prerequisite course from the information that is attached to the description.


Query 8:
--------
List all database courses open to juniors.
Reference Schema: gatech.xml
Challenge Schema: ethz.xml

Challenge: Although one could return a NULL value for 'student classification' for ETH, such an answer is misleading. To deal intelligently with this query one must support more than one kind of NULL. For example, one must distinguish 'data missing but could be present' from 'no corresponding data available'.


Query 9:
--------
Find the room in which the database course is held.
Reference Schema: brown.xml
Challenge Schema: umd.xml

Challenge: Determine that room information in the University of Maryland's course catalog is available as part of the time element located under the Section element.


Query 10:
---------
List all instructors for courses on software systems.
Reference Schema: cmu.xml
Challenge Schema: umd.xml

Challenge: Determine that instructor information is stored as part of the 'Section' information in the catalog of the University of Maryland. Specifically, the instructor information must be gathered by extracting the name part from all of the section titles rather than from a single field called 'Lecturer' as in CMU's case.


Query 11:
---------
List instructors for the database course.
Reference Schema: cmu.xml
Challenge Schema: ucsd.xml

Challenge: In the case of the catalog for the Univ. of San Diego California, instructor information is stored implicitly in the columns labeled 'Fall 2003', 'Winter 2004' etc.


Query 12:
---------
List the title and time for computer networks courses.
Reference Schema: cmu.xml
Challenge Schema: brown.xml

Challenge: Determine that course title, day and time information in the catalog of Brown University are represented as part of the title attribute rather than as separate attributes as in the case of CMU. Extract the correct title, day and time values from the title column in the catalog of Brown University.