Comparison of performance of indexing schemes and other strategies across two database systems for datawarehousing applications
We propose to carry out a comparative analysis of two database systems (Teradata and PostgreSQL) with respect to handling common data warehousing scenarios. In particular, a test schema with test data will be used to populate the databases; and a series of queries will be executed which will (hopefully) illustrate the pros and cons of the underlying architecture, indexing schemes etc.
IT 603 (CS 631) course project
This will involve studying and understanding the particular features offered by these two systems and utilising them wherever feasilble.
In addition to the project report, the complete numerical data collected and the comparative insights obtained will be presented in the form of a set of slides.
We have access to a Teradata machine in the school. About PostgreSQL, we presume installing it on our lab computers is the option. (We will try to install it on one of our servers, but we can't be sure about this.)
Due to the difference in the hardware, OSs, storage capabilities etc, of the two test machines, it does not make sense to directly compare the query execution time results. Instead, the differing architectures of the two DBMSs will be highlighted.
Teradata has a highly parallel architecture (which can be said to be database aware). In other words, the parallelism is tightly integrated with the database engine. This translates into good performance in high volume warehousing applications.
We are using the TPC-H schema and some of the queries defined in the benchmark.