Data Management for Data Sciences
An introductory course in advanced relational databases and issues related to managing non-relational data sets.
This three-credit introductory course teaches students techniques and processes for managing large data sets. It builds upon knowledge gained in IST 210 Organization of Data. This course has two major components:
- Advance students’ knowledge in relational database and their skills in using SQL and database indexing
- Introduce NoSQL databases such as document-oriented database, key-value database, column-oriented database, graph database, and Hadoop system
In the first component, the course will review the techniques learned in IST210, strengthen students’ skills in using SQL queries and introduce students about indexing and scalability issue in relational database.
While relational database is still frequently used, the emergence of storage for big data and various types of data has driven a new of class of non-relational databases commonly referred to NoSQL database. This course will introduce the real-world needs for NoSQL databases and the characteristics that distinguish them from relational database. We will introduce both the concepts of NoSQL databases and how the concepts are implemented in the database systems. We will focus on three main NoSQL data models: key-value, column family, and document. You will learn the concepts of these data models and know how to use them in the database systems. We will also introduce graph databases, Hadoop systems, and data warehousing. Finally, we will present criteria that decision makers should consider when choosing between relational and non-relational databases and techniques for selecting the NoSQL database that best addresses specific use cases.