Database Systems

Introduction

A database management system (DBMS) is a collection of data and an integrated set of programs that access that data. The collection of data is often referred to as the database.

Example: Dickinson keeps information about each student: name, Banner ID number, standing, address

...

this data makes up the database

programs (or parts of programs) are used to add new students, to change a student’s address or

standing, to retrieve information about a student ...

Goals of a DBMS:

manage large bodies of information

provide convenient and effiffifficient ways to store and access information

secure information against system failure or tampering

permit data to be shared among multiple users

 

1 File Processing Systems

Why are DBMSs needed? Alternative: store information in fifiles, and write programs as needed to modify these fifiles and retrieve information.

A file processing system is a collection of files and programs that access/modify these files. Typically, new files and programs are added over time (by different programmers) as new information needs to be stored and new ways to access information are needed.

Problems with file processing systems:

data redundancy and inconsistency

difficulty of accessing data

problems with concurrent access

Example: assume I’m paying for groceries with my MAC card at the same time my pay check is being deposited (and my bank uses a file processing system): withdrawal program deposit program.

 

It is difficult to prevent such problems unless programs (example: withdrawal and deposit) are coordinated or integrated.

atomicity problems - ensuring that a system failure during a database update does not leave the database in an inconsistent state

1security problems

not all users should have access to all data

example: bank payroll personnel shouldn’t know my checking account balance

difficult to enforce security in an ad hoc system

integrity problems

data may need to satisfy certain conditions, called consistency constraints

example: account balances should never fall below $0

difficult to enforce/add/change consistency constraints in a file processing system

DBMSs were developed to remedy these problems.

 

2 Data Abstraction

abstraction:

(Random House College Dictionary) the act of taking away or separating

hiding details that aren’t needed for a particular purpose

data abstraction - hiding details of how data is stored and maintained

Levels of data abstraction (from low to high):

physical level

describes how data is actually stored on physical media

example: files of records or objects

needed by implementors and maintainers of DBMSs

logical level

describes what data is stored and how data is interrelated

example: data is seen as tables or a graph or tree structure

needed by database administrators (and “power users”) who must see all of the data

view level

describes some part or subset of the database

simplifies the database for users who only need part

useful for controlling access to data

example: payroll personnel have a view that lets them see employee data, but not customer data

can have as many views as needed. example: one for payroll personnel, one for tellers, one for officers ..

views can overlap

the logical level is like a view that encompasses the entire database

 

3 Instances and Schemas

The data stored in the database at any given time is an instance of the database. The overall design of the database is the database schema.

Example:

 

is an instance of a database with schema (name, account number, balance)

Database systems have schemas at each level of abstraction:

the physical schema is the design at the physical level, i.e. as a file of records of a particular type

the logical schema is the design at the logical level. Example: (name, account number, balance) is a logical schema.

a subschema is a design at the view level. For example, (name, account number) is a subschema of (name, account number, balance)

As with levels of abstraction, there is:

one physical schema

one logical schema

as many subschemas as needed

 

4 Data Models

A data model is a collection of conceptual tools for describing:

data

data relationships

data semantics

consistency constraints

Data models:

provide a way of thinking about data that isn’t linked to the implementation of the database

are used at the logical and view levels (roughly)

Data models of current or historical interest include:

the network and hierarchical data models (obsolete)

the relational model

the entity-relationship model

the object-oriented and object-relational models

the semi-structured data model (XML)

 

4.1 The Entity-Relationship and Object-Oriented Models

 

In the entity-relationship model:

data is viewed as sets of entities that represent things in the real world, and sets of relationships among entities

entities in the system are distinct and uniquely identifiable

example: customer entities can be uniquely identified by social security number

data constraints can be specified explicitly

The object-oriented model is similar, except that:

objects play the role of entities

relationships among objects are not emphasized

constructing new kinds of objects from old kinds (via inheritance) is emphasized

 

4.2 The Relational Model

In the relational model:

sets of entities and sets of relationships between entities are represented as tables

each table has multiple (named) columns

each data item is represented as a fixed format record stored in a row of such a table

For example, a table storing customer information could have columns for the customer name, address and

social security number. Each row in such a table is then a record representing one customer.

Databases are often designed using the entity-relationship model, and then the design is translated to

the relational model for implementation.

The object-relational model is a combination of the object and relational models. Objects of a class can

be stored in a table with one column for each field (data member) of the class.

 

 

5 Database Languages

 

5.1 Data Definition Language

A data definition language (DDL) is the language used to define and modify the logical schema of the database.

often used to define/modify subschemas (views) and to specify consistency constraints

the definition of the logical schema (written in the DDL) is compiled into a file or set of tables called the data dictionary

the data dictionary is consulted for schema information whenever data is read or modified

a separate language called the data storage and definition language is used to access the physical schema of the database

 

5.2 Data Manipulation Language

A data manipulation language (DML) is the language used at the logical and view levels to retrieve, insert, delete and modify information stored in the database. A query is a request for information retrieval.

For example (in English): list all customers who have an account balance over $500. A query language is the part of the DML used for retrieving information. The terms query language and data manipulation language are often used interchangeably. Example: SQL is a DML (and also a DDL), but is usually called a query language.

DMLs are classified by the kind of query language they include:

procedural DMLs require the user to specify what data is needed and to provide an algorithm for getting the data. The algorithm is written in the query language.

non-procedural DMLs require only that the user specify what data is needed.

DDLs manipulate schemas, while DMLs manipulate instances

 

6 Data Storage and Querying

The implementation of a DBMS is typically divided into two functional components: the storage manager and the query processor.

 

6.1 The Storage Manager

The storage manager is the component that provides the interface between the physical level and the logical level.

Components of the storage manager:

the file manager

the buffer manager, which attempts to minimize file reads and writes. Note that disk access is much slower than main memory access.

the authorization and integrity manager, which:

checks that consistency constraints are satisfied, and takes some action when they aren’t

prevents unauthorized access to data, typically through a password and security classification system

the transaction manager, which:

logs database updates during normal operation

detects when information in the database or data dictionary is lost or corrupted due to disk crash, power failure, software errors ...

restores the database to a previous consistent state using the log

The database manager for a small system typically does not implement all of these functions.

 

6.2 The Query Processor

Components of the query processor:

the DDL interpreter, which handles DDL statements and records the effect in the data dictionary

the DML compiler, which translates DML statements into low-level commands to the query evaluation engine. The DML compiler also does query optimization - evaluating alternative query evaluation plans, and choosing the plan with the lowest cost (typically in terms of disk accesses).

the query evaluation engine, which executes the query evaluation plan produced by the DML

compiler.

 

7 Transaction Management

A transaction is a sequence of DML commands that forms a logical unit of work. Example: transferring money from one bank account to another.

Critical properties of transactions:

atomicity - a transaction must execute completely or not at all (in terms of the fifinal effffect on the database state).

consistency - once a transaction completes successfully, the database must be in a consistent state.

The database may be in an inconsistent state while a transaction is executing. For example, during a transaction that transfers money from account A to account B, the total funds held by the bank may be inconsistent (after account A has been debited, but before account B has been credited). Note that such consistency is largely the responsibility of the application programmer.

isolation - a transaction must not be affected by other transactions that are executing concurrently.

durability - once a transaction completes successfully, its effect must persist even in the presence of system failures.

These properties are known as the ACID properties.

 

 

 

8 Data Mining

Data mining is the process of analyzing large volumes of data to fifind useful patterns (for example, that young males who purchase video game systems are also likely to purchase HD televisions). Roughly speaking, this process has three steps:

 

1. preprocessing data (often from multiple databases) to put it into the format expected by the data mining algorithm

2. running the data mining algorithm, which typically uses statistical techniques to find patterns in the data

3. manually inspecting the patterns found to determine which are useful

The SQL:1999 standard specifies constructs to support data mining, and many commercial tools for data analysis and visualization are available.

 

9 Database Architecture

Modern database systems typically employ a client-server architecture:

the database system itself runs on a server machine

users interact with client machines, which connect to the server via a network

Application programs typically employ either a two-tier or a three-tier architecture:

two-tier architecture

the application program resides partially or entirely on the client machine

the program communicates with the server using standards such as ODBC and JDBC

three-tier architecture

the application program communicates with an application server (Oracle WebLogic, IBM Websphere, JBoss), but contains no direct database calls

the application server communicates with the database server

As the application server contains reusable business logic and Web interface components, this is an effective strategy for developing large applications.

 

10 Database Users and Administrators

10.1 Database Users

application programmers

application programmers write application programs in some programming language (called the host language) which permits the use of DML commands

application programs are used to ease common or complicated interaction with the DBMS

examples: generating paychecks, doing account transfers, ...

techniques for executing DML commands from within host languages:

*using an application programming interface (API) that provides procedures/methods for sending DML commands to the database. Examples:

· Open Database Connectivity (ODBC) - an API defined by Microsoft for the C programming language

· Java Database Connectivity (JDBC)

extending the host language syntax to embed DML calls within the program.

· special syntax is used to separate DML commands from host language commands

· a DML precompiler handles embedded DML commands before the program is compiled

specialized languages that combine imperative control structures (loops, if statements, procedures) with DML statements

· often called rapid application development (RAD) tools or fourth-generation languages

· usually provided with the DBMS

· usually allow rapid GUI development with forms and reports

· examples: Visual Basic for Applications (from Microsoft) and PL/SQL (from Oracle)

sophisticated users: interact with the DBMS using the DML directly.

specialized users: interact with the DBMS in specialized ways:

CAD

expert systems

graphical or audio data

temporal data

...

naive users: interact with the database through application programs (including Web interfaces).

 

10.2 Database Administrator

The database administrator (DBA) is the person(s) in control of the DBMS. One goal of DBMSs is to permit centralized control.

Typical responsibilities of the database administrator:

schema definition

storage structure and access method definition

schema and physical organization modification

granting authorization for data access

integrity constraint specification

maintenance and backups

 

 

 

Comments

Popular posts from this blog

Osun State University in Collaboration with British Computer Society

Database PowerPoint Presentation Lectures by Dr Patrick Ozoh Dept of ICT Osun State University

Osun State University in Collaboration with British Computer Society