Welcome on my PhD topics web page! If you should be interested in working on your PhD with me, please contact me by email (mschneid@cise.ufl.edu) to make an appointment. You will find information with respect to the following subjects:

In order to save the student’s and my time, I recommend that the student has made himself/herself a little bit familiar with the main aspects of the desired knowledge which is listed in each topic description. This means the student should have made a limited personal study with respect to the topic of interest and should (at least) have a rough idea what the topic is about. Deep knowledge in the listed areas is not required. But the student must demonstrate that (s)he is willing to acquire the needed knowledge. Later the student must show that (s)he is capable of coping with the topic of interest and doing research on it and that (s)he has understood the required background.

If you are interested in one of the topics, please contact me by email (mschneid@cise.ufl.edu) in order to make an appointment. Also, a student with an own topic in mind should write me an email. The email should contain the student’s resume and give a motivation why (s)he is interested in one or several topics. If more explanations are needed concerning a topic (after the student’s personal study), feel free to write me an email too.

Having learnt from the literature study, in a second phase, the student starts with a conceptual design for solving the problems of the topic. This conceptual design proceeds hand in hand with me and is based on common discussions which take place weekly or every two weeks in my office. The conceptual proposals have to be written down in a more or less detailed way by the student. They serve as the foundation of discussion between the student and me. If there is an agreement upon a concept, the student writes a preliminary “final” description about it which is a second deliverable for me and will later also be part of the thesis. The reason for this is that at the time of designing a concept, the student is fully aware of its details and thus able to write them all down. If the student described a previously devised concept at a much later point in time, (s)he would have forgotten many of the relevant details and motivations for the conceptual decisions. Another reason is that the student must practice the writing of reports and papers from the beginning.

Based on the agreed concept, in a third phase, the student devises an implementation method for it, that is, a collection of data structures and algorithms. This method is described in a document first, which is the third deliverable, before it is implemented in a programming language. The implementation language used is C++.

The next phases are iterations of the second and third phase. Concepts, implementation methods, and implementations are added, modified, and described in written form. It is highly recommended to include all written parts in the same document with a meaningful structure (explicit table of contents). This will later help you to find the gaps in your thesis, concepts, and implementations.

In the last phase, the thesis is written or completed. The single document can serve as the basis for it. Basically, you have to revise the document, add parts, and write transitions between different parts. The implementation has to be completed too. During the implementation phases, the code must be documented.

During all phases, it is expected that the student writes research papers together with me as the supervisor.

- Design and Implementation of Three-Dimensional Spatial Data Types and their Integration into Database Systems
- Design and Implementation of Fuzzy Spatial Data Types and their Integration into Database Systems
- Design and Implementation of Spatial Graphs (Networks) and their Integration into Database Systems
- Spatial and Spatio-Temporal Data Warehousing
- Design and Implementation of an Image Algebra and its Integration into Database Systems
- Genomics Algebra: A New Integrating Data Model, Language, and Tool for Processing and Querying Genomic Information

Required knowledge: databases in general; programming skills in C++, Oracle 10g

Desired (but not required) knowledge: spatial databases, computational geometry

Introducing Literature:

Ralf Hartmut Güting. An Introduction to Spatial
Database Systems. VLDB
Journal (Special
Issue on Spatial Database Systems), 7(3), 231-246, 1994. [pdf]

Markus Schneider. Spatial Data Types for Database Systems - Finite Resolution Geometry for Geographic Information Systems. LNCS 1288, Springer-Verlag, 1997.

Markus Schneider & Brian Weinrich. An Abstract
Model of Three-Dimensional Spatial Data Types. *12th ACM Int.
Symp. on Advances in Geographic Information
Systems* (*ACM GIS*), 67-72, 2004. [pdf]

Abstract. Spatial database systems and geographical information systems (GIS) are currently only able to represent and manage crisp, determinate spatial objects, that is, spat ial objects which have sharply defined boundaries and whose extent is precisely known. Examples are mainly man-made objects like land parcels, states, school districts, and canals. But geographical reality reveals that the boundaries and extent of most spatial objects cannot be precisely determined. Examples are land features such as population density, soil quality, vegetation, oceans, biotopes, deserts, an English speaking area, clouds, and sandbanks. A possible approach to modeling this kind of indeterminate spatial objects is to apply fuzzy set theory. We obtain fuzzy spatial data types like fuzzy points, fuzzy lines, and fuzzy regions. The topic of this PhD project is to make a design for such data types and for a comprehensive collection of operations and predicates defined on them, to develop implementation concepts for them, and to integrate them into database management systems and their query languages.

Required knowledge: databases in general; programming skills in C++, Oracle 10g

Desired (but not required) knowledge: spatial databases, fuzzy set theory

Introducing Literature:

Markus Schneider. Uncertainty
Management for Spatial Data in Databases: Fuzzy Spatial Data Types.
*6th Int. Symp. on Advances in Spatial
Databases* (*SSD*), LNCS 1651, Springer
Verlag, 330-351, 1999. [pdf]

Markus
Schneider. Metric
Operations on Fuzzy Spatial Objects in Databases. *8th ACM
Symp. on Geographic Information Systems*
(*ACM GIS*), 21-26, 2000. [pdf]

Markus Schneider. Fuzzy
Topological Predicates, Their Properties, and Their Integration into
Query Languages. *9th ACM Symp. on
Geographic Information Systems* (*ACM GIS*),
9-14, 2001. [pdf]

Markus Schneider. A
Design of Topological Predicates for Complex Crisp and Fuzzy Regions.
*20th Int. Conf. on Conceptual Modeling*
(*ER*), 103-116, 2001. [pdf]

Markus Schneider. Fuzzy
Spatial
Data Types and Predicates: Their Definition and
Integration into Query Languages. *Spatio-Temporal
Databases: Flexible Querying and Reasoning*.
Springer-Verlag,
265-293, 2004. [pdf] [Springer]
[Amazon]

Abstract. An important spatial concept in maps are spatial graphs representing, for example, road networks, railway networks, and power networks. This PhD project has three goals. First, a design of an abstract model should give a definition of spatial graphs and their properties and further identify the most important operations and predicates on them. An example of an important operation on spatial graphs is to find the shortest path from a source to a destination. Such a model will be based on mathematical concepts like point set theory, point set topology, graph theory, and functions. Second, since the abstract model is based on infinite point sets and functions and cannot be directly implemented, a discrete model is needed that yields finite representations for the infinite concepts of the abstract model and algorithms for the operations and predicates of the abstract model. Third, the ultimate goal is to incorporate spatial graphs into databases systems and their query languages.

Required knowledge: databases in general; programming skills in C++, Oracle 10g

Desired (but not required) knowledge: spatial databases, graph theory

Introducing Literature:

(none, not much available, has to be searched for)

Abstract. Research in data warehousing and on-line analytical processing (OLAP) has produced important technologies for the design, management, and use of information systems for decision support. However, despite the continued success and maturing of the field, much work remains to be done in the future. Given the wealth of models, terminology, and definitions, the first task is to review the most important models and their treatment of the basic concepts including the notions “dimension”, “fact”, “hierarchy”, “data cube”, and many more. The intent should be to evaluate existing models based on their expressiveness, flexibility, separation of modeling aspects from implementation aspects, etc. With the knowledge gained, the second task is to develop an overall and comprehensive conceptual model adapted to the users’ needs and abstracting from implementation aspects. The third task is to identify existing OLAP operators, to get an overview of their capabilities, and to learn how they can be used to manipulate multi-dimensional data (e.g., cube, roll-up, drill-across). The fourth task is to define these OLAP operators and perhaps new ones, which have so far not been considered, on the basis of the designed model in task 2. The fifth task is to identify new applications for data warehousing and OLAP in the spatial and spatio-temporal domain and to extend the model of task 2 correspondingly. The impact of new, advanced, and non-standard data types on the data warehousing concepts has to be explored. Also, new OLAP operations have to be detected and formally defined. The sixth task is to implement the complete model as a data warehouse extension package that can be integrated as a cartridge, datablade, or extender into Oracle, DB2, and Informix.

Required knowledge: databases in general; programming skills in C++, Oracle 10g

Desired (but not required) knowledge: data warehouses in general, spatial databases, spatio-temporal databases, computational geometry

Introducing Literature:

Torben B. Pedersen, Christian S. Jensen & Curtis E. Dyreson. A Foundation for Capturing and Querying Complex Multidimensional Data. Information Systems, 26(5), 383-423, 2001. [pdf]

Maurizio Rafanelli (editor). Multidimensional Databases: Problems and Solutions. Idea Group Publishing, 2003.

Abstract. Multimedia database systems are of interest in many application areas which deal with video, image, audio, text, or graphic data, or any kind of mixture of them. The goal of this topic is to focus exclusively on the image part. We then call these systems image database systems. Images are of particular interest in many applications since they allow the visual transport of large volumes of information in a packed manner. Although a large knowledge about images exists from a processing standpoint in disciplines like computer graphics, computer vision, and image processing, a study of the literature reveals that not much is known about the conceptual view the user has or should have on image database systems. Simply collecting images in a database and enabling to browse them does not justify the use of a database system. Some central questions are: What kind of interface should these systems provide to the user? What kind of query languages should be made available? What are the central operations on images? How can images be represented so that they can support the identified operations in an efficient way? Are formats like jpeg, tiff, and many others appropriate for this purpose? Our idea is to incorporate the answers to all these questions into a type system (we call it algebra) for images. That is, the first task is to design and implement AN IMage ALgebra called ANIMAL, which provides data types and operations for images. The next step is then to integrate these types and operations into an extensible database system (like Oracle, DB 2, or others) and its query language, and thus to create an image database system. Later, extensions are conceivable with respect to image indexing and information retrieval.

Required knowledge: databases in general, computer vision; programming skills in C++, Oracle 10g

Desired (but not required) knowledge: image processing

Introducing Literature:

(none, has to be searched for)

Abstract. The dramatic increase of mostly semi-structured genomic data, their heterogeneity and high variety, and the increasing complexity of biological applications and methods mean that many and very important challenges in biology are now challenges in computing and here especially in databases. In contrast to the many query-driven approaches advocated in the literature, we propose a new integrating approach that is based on two fundamental pillars. The Genomics Algebra provides an extensible set of high-level genomic data types (GDT) (e.g., genome, gene, chromosome, protein, nucleotide) together with a comprehensive collection of appropriate genomic functions (e.g., translate, transcribe, decode). The Unifying Database allows us to manage the semi-structured contents of publicly available genomic repositories and to transfer these data into GDT values. These values then serve as arguments of Genomics Algebra operations which are supposed to be embedded into a DBMS query language.

Required knowledge: databases in general, bioinformatics; programming skills in C++, Oracle 10g

Desired (but not required) knowledge: biological databases

Introducing Literature:

Joachim Hammer & Markus Schneider. Genomics Algebra: A New, Integrating Data Model, Language, and Tool for Processing and Querying Genomic Information. 1st Biennial Conf. on Innovative Data Systems Research (CIDR), 176-187, 2003. [pdf]

Last update: August 16, 2006.

Markus Schneider (mschneid@cise.ufl.edu)