Managing Data Resources

Managing Data Resources

Managing Data Resources 1 The Name of the Game Information is a valuable resource. It is expensive to collect, maintain, and use. The goal of database management it to maximize the benefits gained from information maximize the accuracy of information minimize the costs associated with information 2

Keeping Track of Things Entity - person, place, thing or event on which we maintain information. Attribute - A single piece of information describing a particular entity. 3 Data Hierarchy

Database - a collection of related files File - a collection of uniform records Record - a collection of related fields Field - a collection of bytes Byte (& words) Bit 4 Terminology Generic Database Spreadsheet ----- TableTable/Sheet

Entity Record Row Attribute Field Column 5 Key Field(Attribute) A key field is an attribute that uniquely identifies a record in a file. Examples: SSN, NAID The values in the key field MUST be unique. It is possible to use several fields to form a

composite key. Example: Lastname + firstname + middlename 6 Natural Keys It is convenient and desirable to use attributes which naturally occur with an entity as a key. Example - most students have a SSN by the time they enroll at NDSU, so the SSN would be natural key. 7 Accessing Information

Lookup items(records) by the value of their key. Methods of access: Sequential Access Direct Access Indexed Sequential Access 8 Ordered vs. Unordered A database file (collection of records) may be: ordered - physically arranged in the file so that the key field increases (or decreases) in a sequential fashion.

unordered - physically arranged in the file so the key field has no ordered relation with the preceding or succeeding key. 9 Costs & Benefits of Ordering In general a record can be found faster in an ordered list than in an unordered list. Ill use the term file & list interchangeably. In general you can turn an unordered list into an ordered list by sorting. Sorting is a cost of keeping a list ordered. In this course we will generally be dealing with ordered lists.

10 Sequential Access Look at key of first record in file, if not the target then look at next record, if not the target then look at next record, If file has N records on average will have to look at N/2 records to find a random target. Question - Why not just skip over some of the records?

11 Sequential Access An employee database might use SSN as the key field. If the target SSN is 540-12-3763, and the first record SSN is 120-11-0007, then how many records should you skip? This is why sequential access has to look at every record. 12 Sequential Access Historically data was stored on tapes. Tapes store information sequentially and

only allow for sequential access. DASD (disks drives) can also store files sequentially. Files are written to the disk track-by-track, cylinder-by-cylinder in a physically contiguous fashion. 13 Direct Access Direct access means that given a value for the key attribute the system can move directly to the corresponding record without having to look at an intervening records in the file. Direct access requires that the system know the physical location of the target

record on the disk. 14 Hashing Algorithms To find the physical location on the disk a computation is performed on the key value which yields a unique physical address for the corresponding record. Perfect hashing algorithms get you to a unique address. Imperfect algorithms may hash several keys to the same address. 15 Hashing Example

Suppose that I were using SSN as the key and wanted to keep track of 100 entities. Select 101 (a prime number closest to the number of records) and divide this into the SSN. Remainder will always be a number between 0 and 100. 16 Hashing Example The remainder represents the disk address. A remainder of 52 could represent cylinder 5, surface 2 If two or more SSNs have the same

remainder (hash to the same address) this is called a collision. Essentially these records are then searched sequentially. 17 Direct Access Note The physical addresses in Direct Access have no relation to the sequential order of the keys. For any two adjacent sequential keys there is no guarantee about the relationship between their physical locations on the disk, they may not be physically contiguous. 18

Sequential vs. Direct Access Sequential Access good when you want to process all records in key order, next record is always ready to be read/written. Direct Access good when you want to process records in a random order, next record can be found directly. 19 Indexed Sequential Access Method (ISAM) Combines a sequential file with one or more

levels of indexes. Each index relates a physical location to the highest key value stored in that location. You find physical location by looking in each level of the index and then sequentially searching the last physical location. 20 ISAM In the library the books are laid out sequentially by call number (the key). Look at floor index to determine the floor Look at shelf index to determine the shelf Sequentially search the shelf

21 ISAM ISAM tries to give the best of both worlds. When you want to process items sequentially you have an underlying sequential file. When you want direct access you go through the indexes to get close, then a small sequential search at end. 22 Traditional File Systems Also called:

flat file organization data file approach Typically an organization or a department within an organization would develop their applications and associated data files in an independent fashion. 23 Problems with Traditional Files Data Redundancy conflicting data Program-Data Dependence lack of flexibility

Lack of Data Sharing no common names for attributes & entities Poor Security 24 DBMS Approach Database Management Systems approach places a common interface between the users of data (the application programs) and the data files. 25

DBMS Components Data Definition Language, DDL Data Manipulation Language, DML Structured Query Language, SQL Data Dictionary, DD 26 Logical & Physical Views Logical View how the user sees the data Physical View how the data is physically saved on the storage

media The DBMS gives each user their own logical view while storing the data using a single physical view. 27 Advantages of DBMS Complexity & Confusion reduced all data stored in single centralized physical view Data redundancy & inconsistency reduced data dictionary shows what data elements are available, data element only present once

Program-data dependence reduced each user can get desired logical view 28 Advantages of DBMS Security single point of access to data Reduced cost initial purchase cost of DBMS and related staff are high, but savings in future development and maintenance usually offset these costs Access & Flexibility DML usually provides easier access to data

29 Designing Databases Hierarchical Data Model Network Data Model Relational Data Model 30 Hierarchical Data Model Author 1 Book 1 Publisher A

Book 2 Publisher B Book 3 Publisher A 31 Hierarchical Data Model Data records are broken into segments Each segment contains some attributes Segments are arranged into a hierarchical tree-like structure Physical locations pointers join related segments into records

Child segments can only have one parent 32 Network Data Model Author 1 Book 1 Publisher A Book 2 Book 3 Publisher B

33 Network Data Model Same organization as hierarchical data model Except that a child segment can have multiple parents 34 Relational Data Model Author 1 Book 1

Author 2 Book 2 Author 3 Book 3 Book 4 Publisher 1 Book 5 Publisher 2

35 Relating Fields A1 Author 1 Book 1 A1 P1 A2 Author 2 Book 2 A3 P2

A3 Author 3 Book 3 A2 P2 Book 4 A1 P2 Book 5 A1 P1 P1 Publisher 1

P2 Publisher 2 36 Relating Fields A1 Author 1 Book 1 A1 P1 A2 Author 2 Book 2

A3 P2 A3 Author 3 Book 3 A2 P2 Book 4 A1 P2 Book 5 A1 P1

P1 Publisher 1 P2 Publisher 2 37 Relational Data Model Author-table ID Author Book-table A1 Author 1

Title AID A2 Author 2 Book 1 A1 P1 A3 Author 3 Book 2 A3 P2

Publisher-table Book 3 A2 P2 ID Publisher Book 4 A1 P2 P1 Publisher 1

Book 5 A1 P1 PID P2 Publisher 2 38 Relational Data Model Data Records are broken into segments Each segment contains some attributes Segments are arranged in tables There are NO physical location pointers between tables

Relations between tables are implied by relating fields 39 Relations Generated When Asked Relationships between segments are not predefined by pointers in the relational model. Tables are JOINed together to display relationships.

JOINs occur at query time. Tables must have a common data element to be joined. 40 Example JOIN Select Author, Title, Publisher FROM Author-table, Book-table, Publisher-table WHERE Author-table.ID = Book-table.AID, and Book-table.PID = Publisher-table.ID

41 Results of Join Answer-table Author Author 1 Author 1 Author 1 Author 2 Author 3 Title Book 1 Book 4 Book 5

Book 3 Book 4 Publisher Publisher 1 Publisher 2 Publisher 1 Publisher 2 Publisher 2 42 Relational Model Operations Selection select which rows to display

Projection select which columns to display Join combine two or more tables 43 Types of Relations 1-1 1-to-1 1-n 1-to-many

n-n many-to-many 44 Name of the game Using the relational model, Represent each type of relationship as simply as possible (using the fewest tables), with a minimum of duplicated data, and with a minimum of wasted space (empty fields) 45

Tables needed for 1-1 Book Author Author1 Author2 Author3 Title Book1 Book2 Book3 46 Tables needed for 1-n

Author ID 1 2 3 Name Author1 Author2 Author3 Book ID 1 1

2 3 2 Title Book1 Book2 Book3 Book4 Book5 47 Tables needed for n-n Writes

Author ID Name 1 Author1 2 Author2 3 Author3 AID 1 1 2 2 3 3 3 1

2 BID 1 2 1 2 1 5 4 5 5 Book ID

1 2 3 4 5 Title Book1 Book2 Book3 Book4 Book5 48

Advantages & Disadvantages Hierarchical & Network Data Models faster for pre-defined queries slower for ad-hoc queries inflexible, more expensive to maintain Relational Data Models flexible, less expensive to maintain most queries require joins and are slower than pre-defined queries mentioned above 49 Entity-relationship diagram A conceptual model useful in database design.

Illustrates the relationships between various entities in the database. Entities are represented by rectangles. Relationships represented by diamonds. Attributes can be assigned to both entities and relationships. 50 ER-Diagram Authors ID Last_Name First_Name Middle_Name DOB

DOD n write n Books n Title Date Edition

publish 1 Publishers Name Address Phone 51 Centralized Database All database files are stored on a central computer. All database processing is performed by the central computer. Problems can overload central system

not very fault tolerant communications costs can be high 52 Distributed Databases Distributed Processing processing is performed locally by processors connected by a communications network. Distributed Databases the physical files that make up the database are stored in more than one location 53

Distributed Databases Duplicate Database each location has its own copy of the entire database. Partitioned Database each location has a copy of the portion of the database that it needs. 54 Distributed Databases Central Index Records are stored locally, but a centralized index is maintained to quickly located any

record. Ask-the-network Records are stored locally and the network must be polled each time a record is needed. 55 Data Warehousing A database with associated reporting and query tools, that stores current and historical data extracted from various operational systems and consolidated for management reporting and analysis.

56 A Data Warehouse... Sits on top of existing isolated legacy systems, islands of information, to provide an enterprise-wide database. Provides single platform, standardized access to current operational data and historical data (not normally maintained on legacy systems). 57 Obstacles to Database Implementation

Organizational structural changes political changes Cost/benefit considerations Placement of Data Management Function need data administration and planning at highest possible organizational level 58

Recently Viewed Presentations

  • Prime Factorization - S S Sabol

    Prime Factorization - S S Sabol

    Factor. The numbers that are multiplied to get a product. 15 = 3 x 5. 3 and 5 are factors of 15. 2 x 9 = 18. 2 & 9 are factors of 18. 3 x 2 =6. 3 &...
  • Alkali Metals

    Alkali Metals

    alkali metal - An element that belongs to group 1 of the periodic table. hydroxide - The alkali produced by the reaction between an alkali metal and water. It is a compound ion with a charge of -1. metal chloride...
  • Background - Connecticut

    Background - Connecticut

    DPH envisions adding a section to the RCH license that contains special requirements specific to the HCBS settings requirements and has agreed to meet with representatives of the associations to discuss concerns. 8/6/2018. Department of Social Services
  • PHT313 Lecture 1 2nd Term 1436-1437

    PHT313 Lecture 1 2nd Term 1436-1437

    Enterotube ® II contains 12 different agars enabling the performance of a total of 15 biochemical tests as well as an enclosed inoculating wire. I- Escherichia coli. General characteristics: Gram-negative . Motile . rods . Non-spore forming,Facultative anaerobic, Oxidase -ve.
  • 2. Descriptive metadata Standards - The Library of Congress

    2. Descriptive metadata Standards - The Library of Congress

    MARC/AACR2 cataloging of Internet resources 2. Overview of basic concepts for 21st century for bibliographic control, including specific metadata standards and applications 3. Thesaurus design principles, building a controlled vocabulary, and examination of selected controlled vocabularies 4. Digital library design,...
  • Colonial America 1587 to 1770 - Leon County Schools

    Colonial America 1587 to 1770 - Leon County Schools

    South Returns to U.S. Congress. In 1865, many of the Southern states elected new government officials and sent them to Washington, D.C. They included many former Confederate leaders.
  • GOLD SPONSORS - Americollect

    GOLD SPONSORS - Americollect

    Provide the Final Notice with 30 days to pay before initiating ECAs. How will your organization notify your collection agency if you approved an application and ECAs have to be reversed and a notice was sent to the responsible individual...
  • Integrated Education and Training

    Integrated Education and Training

    Integrated Education and Training. Roundtable Discussion. NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image.