PhD Topics

Welcome on my PhD topics web page! If you should be interested in working on your PhD with me, please contact me by email (mschneid@cise.ufl.edu) to make an appointment. You will find information with respect to the following subjects:


Introduction

The topics for students’ PhD projects described below correspond to my current research interests which are databases in general, spatial databases, spatio-temporal databases, fuzzy databases, spatial and spatio-temporal data warehousing, image databases, and bioinformatics databases. A more detailed list of my research interests can be found here. A list of my publications that can help you understand my research topics can be found here. A specific and interesting topic a student has in mind is also welcome but has, of course, to be discussed with me and be accepted by me.

In order to save the student’s and my time, I recommend that the student has made himself/herself a little bit familiar with the main aspects of the desired knowledge which is listed in each topic description. This means the student should have made a limited personal study with respect to the topic of interest and should (at least) have a rough idea what the topic is about. Deep knowledge in the listed areas is not required. But the student must demonstrate that (s)he is willing to acquire the needed knowledge. Later the student must show that (s)he is capable of coping with the topic of interest and doing research on it and that (s)he has understood the required background.

If you are interested in one of the topics, please contact me by email (mschneid@cise.ufl.edu) in order to make an appointment. Also, a student with an own topic in mind should write me an email. The email should contain the student’s resume and give a motivation why (s)he is interested in one or several topics. If more explanations are needed concerning a topic (after the student’s personal study), feel free to write me an email too.

General Procedure During a PhD

Having agreed upon a topic, in a first phase, the student starts with a detailed study of related work. The result of this study is a written document which describes, analyzes, and structures the references found, identifies their benefits and drawbacks, and precisely lists the sources of the references at the end. The resulting document is the student’s first deliverable and will be part of the thesis.

Having learnt from the literature study, in a second phase, the student starts with a conceptual design for solving the problems of the topic. This conceptual design proceeds hand in hand with me and is based on common discussions which take place weekly or every two weeks in my office. The conceptual proposals have to be written down in a more or less detailed way by the student. They serve as the foundation of discussion between the student and me. If there is an agreement upon a concept, the student writes a preliminary “final” description about it which is a second deliverable for me and will later also be part of the thesis. The reason for this is that at the time of designing a concept, the student is fully aware of its details and thus able to write them all down. If the student described a previously devised concept at a much later point in time, (s)he would have forgotten many of the relevant details and motivations for the conceptual decisions. Another reason is that the student must practice the writing of reports and papers from the beginning.

Based on the agreed concept, in a third phase, the student devises an implementation method for it, that is, a collection of data structures and algorithms. This method is described in a document first, which is the third deliverable, before it is implemented in a programming language. The implementation language used is C++.

The next phases are iterations of the second and third phase. Concepts, implementation methods, and implementations are added, modified, and described in written form. It is highly recommended to include all written parts in the same document with a meaningful structure (explicit table of contents). This will later help you to find the gaps in your thesis, concepts, and implementations.

In the last phase, the thesis is written or completed. The single document can serve as the basis for it. Basically, you have to revise the document, add parts, and write transitions between different parts. The implementation has to be completed too. During the implementation phases, the code must be documented.

During all phases, it is expected that the student writes research papers together with me as the supervisor.

Topics Overview


Topics in Detail

Design and Implementation of Three-Dimensional Spatial Data Types and their Integration into Database Systems

Abstract. So far, spatial databases have mainly dealt with the management of two-dimensional geometric data. This has turned out to be sufficient for many spatial applications. But on the other hand, our world is three-dimensional, and a number of possible applications can be identified that would greatly benefit from a treatment of three-dimensional data. The idea of this project is therefore to incorporate three-dimensional geometric information into spatial databases and hence into geographical information systems. Our research has led so far to a so-called abstract model that identifies the essential three-dimensional spatial data types, operations, and predicates. In this model, data types, operations, and predicates are formally defined on the basis of mathematical concepts like point set theory and point set topology. The task of this PhD project is to continue this research and to design a so-called discrete model for the abstract model. Since the abstract model is based on infinite point sets and functions and cannot be directly implemented, it is the goal of the discrete model to find finite representations for the infinite concepts of the abstract model and algorithms for the operations and predicates of the abstract model. The design of  a discrete model has to satisfy a number of design criteria like the generality of the data type definition, closure properties of data types, and numerical robustness. Finally, this leads to the development of a computational geometry for three-dimensional data. A second important goal of this project is the incorporation of three-dimensional data types, operations, and predicates into database systems and their query languages. The general problem is here how values of varying and very large representation length can be efficiently stored and retrieved in a database.

Required knowledge: databases in general; programming skills in C++, Oracle 10g

Desired (but not required) knowledge: spatial databases, computational geometry

Introducing Literature:

Ralf Hartmut Güting. An Introduction to Spatial Database Systems. VLDB Journal (Special Issue on Spatial Database Systems), 7(3), 231-246, 1994. [pdf]

Markus Schneider. Spatial Data Types for Database Systems - Finite Resolution Geometry for Geographic Information Systems. LNCS 1288, Springer-Verlag, 1997.

Markus Schneider & Brian Weinrich. An Abstract Model of Three-Dimensional Spatial Data Types. 12th ACM Int. Symp. on Advances in Geographic Information Systems (ACM GIS), 67-72, 2004. [pdf]

 

Design and Implementation of Fuzzy Spatial Data Types and their Integration into Database Systems

Abstract. Spatial database systems and geographical information systems (GIS) are currently only able to represent and manage crisp, determinate spatial objects, that is, spat ial objects which have sharply defined boundaries and whose extent is precisely known. Examples are mainly man-made objects like land parcels, states, school districts, and canals. But geographical reality reveals that the boundaries and extent of most spatial objects cannot be precisely determined. Examples are land features such as population density, soil quality, vegetation, oceans, biotopes, deserts, an English speaking area, clouds, and sandbanks. A possible approach to modeling this kind of indeterminate spatial objects is to apply fuzzy set theory. We obtain fuzzy spatial data types like fuzzy points, fuzzy lines, and fuzzy regions. The topic of this PhD project is to make a design for such data types and for a comprehensive collection of operations and predicates defined on them, to develop implementation concepts for them, and to integrate them into database management systems and their query languages.

Required knowledge: databases in general; programming skills in C++, Oracle 10g

Desired (but not required) knowledge: spatial databases, fuzzy set theory

Introducing Literature:

Markus Schneider. Uncertainty Management for Spatial Data in Databases: Fuzzy Spatial Data Types. 6th Int. Symp. on Advances in Spatial Databases (SSD), LNCS 1651, Springer Verlag, 330-351, 1999. [pdf]

Markus Schneider. Finite Resolution Crisp and Fuzzy Spatial Objects. 9th Int. Symp. on Spatial Data Handling (SDH), 5a.3-17, 2000. [pdf]

Markus Schneider. Metric Operations on Fuzzy Spatial Objects in Databases. 8th ACM Symp. on Geographic Information Systems (ACM GIS), 21-26, 2000. [pdf]

Markus Schneider. Fuzzy Topological Predicates, Their Properties, and Their Integration into Query Languages. 9th ACM Symp. on Geographic Information Systems (ACM GIS), 9-14, 2001. [pdf]

Markus Schneider. A Design of Topological Predicates for Complex Crisp and Fuzzy Regions. 20th Int. Conf. on Conceptual Modeling (ER), 103-116, 2001. [pdf]

Markus Schneider. Fuzzy Spatial Data Types and Predicates: Their Definition and Integration into Query Languages. Spatio-Temporal Databases: Flexible Querying and Reasoning. Springer-Verlag, 265-293, 2004. [pdf] [Springer] [Amazon]

 

Design and Implementation of Spatial Graphs (Networks) and their Integration into Database Systems

Abstract. An important spatial concept in maps are spatial graphs representing, for example, road networks, railway networks, and power networks. This PhD project has three goals. First, a design of an abstract model should give a definition of spatial graphs and their properties and further identify the most important operations and predicates on them. An example of an important operation on spatial graphs is to find the shortest path from a source to a destination. Such a model will be based on mathematical concepts like point set theory, point set topology, graph theory, and functions. Second, since the abstract model is based on infinite point sets and functions and cannot be directly implemented, a discrete model is needed that yields finite representations for the infinite concepts of the abstract model and algorithms for the operations and predicates of the abstract model. Third, the ultimate goal is to incorporate spatial graphs into databases systems and their query languages.

Required knowledge: databases in general; programming skills in C++, Oracle 10g

Desired (but not required) knowledge: spatial databases, graph theory

Introducing Literature:

(none, not much available, has to be searched for)

 

Spatial and Spatio-Temporal Data Warehousing

Abstract. Research in data warehousing and on-line analytical processing (OLAP) has produced important technologies for the design, management, and use of information systems for decision support. However, despite the continued success and maturing of the field, much work remains to be done in the future. Given the wealth of models, terminology, and definitions, the first task is to review the most important models and their treatment of the basic concepts including the notions “dimension”, “fact”, “hierarchy”, “data cube”, and many more. The intent should be to evaluate existing models based on their expressiveness, flexibility, separation of modeling aspects from implementation aspects, etc. With the knowledge gained, the second task is to develop an overall and comprehensive conceptual model adapted to the users’ needs and abstracting from implementation aspects. The third task is to identify existing OLAP operators, to get an overview of their capabilities, and to learn how they can be used to manipulate multi-dimensional data (e.g., cube, roll-up, drill-across). The fourth task is to define these OLAP operators and perhaps new ones, which have so far not been considered, on the basis of the designed model in task 2. The fifth task is to identify new applications for data warehousing and OLAP in the spatial and spatio-temporal domain and to extend the model of task 2 correspondingly. The impact of new, advanced, and non-standard data types on the data warehousing concepts has to be explored. Also, new OLAP operations have to be detected and formally defined. The sixth task is to implement the complete model as a data warehouse extension package that can be integrated as a cartridge, datablade, or extender into Oracle, DB2, and Informix.

Required knowledge: databases in general; programming skills in C++, Oracle 10g

Desired (but not required) knowledge: data warehouses in general, spatial databases, spatio-temporal databases, computational geometry

Introducing Literature:

Torben B. Pedersen, Christian S. Jensen & Curtis E. Dyreson. A Foundation for Capturing and Querying Complex Multidimensional Data. Information Systems, 26(5), 383-423, 2001. [pdf]

Maurizio Rafanelli (editor). Multidimensional Databases: Problems and Solutions. Idea Group Publishing, 2003.

 

Design and Implementation of an Image Algebra and its Integration into Database Systems

Abstract. Multimedia database systems are of interest in many application areas which deal with video, image, audio, text, or graphic data, or any kind of mixture of them. The goal of this topic is to focus exclusively on the image part. We then call these systems image database systems. Images are of particular interest in many applications since they allow the visual transport of large volumes of information in a packed manner. Although a large knowledge about images exists from a processing standpoint in disciplines like computer graphics, computer vision, and image processing, a study of the literature reveals that not much is known about the conceptual view the user has or should have on image database systems. Simply collecting images in a database and enabling to browse them does not justify the use of a database system. Some central questions are: What kind of interface should these systems provide to the user? What kind of query languages should be made available? What are the central operations on images? How can images be represented so that they can support the identified operations in an efficient way? Are formats like jpeg, tiff, and many others appropriate for this purpose? Our idea is to incorporate the answers to all these questions into a type system (we call it algebra) for images. That is, the first task is to design and implement AN IMage ALgebra called ANIMAL, which provides data types and operations for images. The next step is then to integrate these types and operations into an extensible database system (like Oracle, DB 2, or others) and its query language, and thus to create an image database system. Later, extensions are conceivable with respect to image indexing and information retrieval.

Required knowledge: databases in general, computer vision; programming skills in C++, Oracle 10g

Desired (but not required) knowledge: image processing

Introducing Literature:

(none, has to be searched for)

 

Genomics Algebra: A New Integrating Data Model, Language, and Tool for Processing and Querying Genomic Information

Abstract. The dramatic increase of mostly semi-structured genomic data, their heterogeneity and high variety, and the increasing complexity of biological applications and methods mean that many and very important challenges in biology are now challenges in computing and here especially in databases. In contrast to the many query-driven approaches advocated in the literature, we propose a new integrating approach that is based on two fundamental pillars. The Genomics Algebra provides an extensible set of high-level genomic data types (GDT) (e.g., genome, gene, chromosome, protein, nucleotide) together with a comprehensive collection of appropriate genomic functions (e.g., translate, transcribe, decode). The Unifying Database allows us to manage the semi-structured contents of publicly available genomic repositories and to transfer these data into GDT values. These values then serve as arguments of Genomics Algebra operations which are supposed to be embedded into a DBMS query language.

Required knowledge: databases in general, bioinformatics; programming skills in C++, Oracle 10g

Desired (but not required) knowledge: biological databases

Introducing Literature:

Joachim Hammer & Markus Schneider. Genomics Algebra: A New, Integrating Data Model, Language, and Tool for Processing and Querying Genomic Information. 1st Biennial Conf. on Innovative Data Systems Research (CIDR), 176-187, 2003. [pdf]


Last update: August 16, 2006.
Markus Schneider (mschneid@cise.ufl.edu)