Simon Willison’s Weblog

Managing data

Managing Data

These notes cover chapter one of “Data Management” by Richard T. Watson.

Organisations generate, store and process large amounts of data. Data management is the management of this organisational memory.

Individuals use data management techniques as well. Todo lists, calendars and address books are all examples of external memories (as opposed to our brain which is internal memory). These thre examples also share some common characteristics—they provide a standard format for storing specific kinds of information and they are organised in a way which accomodates rapid retrieval of information. They also requrie a trade-off between speed and size—a pocket calendar can not contain as much detailed information as a desk calendar but is far more conenient.

Organisations use all kinds of methods for storing data—filing cabinets, notice boards, computer systems and people are all frequently used. Storage devices are organised for rapid data entry and retrieval. Various types of information systems can be used:

Transaction processing system
TPS—collects and stores data from routine transactions
Management information system
MIS—converts data from a TPS in to information for managing an organisation
Decision support system
DSS—supports managerial decision making by provifing models for processing and analysing data
Executive information system
—provides information helpful to senior management for strategic decision making and organisation performance monitoring
On-line analytical processing
OLAP—presents multidimensional logical views of data
Data Mining
Uses statistical analysis to uncover hidden trends and relationships in data

Desirable atributes of data stored by an organisation include:

Shareable
The data can be accessed by more than one person at a time. Low-volatility data often has many copies printed and distributed, which can cause problems if changes are made.
Transportable
The data can easily be moved to the people who need to access it
Secure
Data is a valuable resource and must be kept safe. Backups should also be made of valuable data.
Accurate
The information must be reliable and precise
Timely
The data must be current and up-to-date, especially if it is time sensitive
Relevant
The data must be appropriate to the decisions being made by the organisation

An organisations memory exists in many different places, including inobvious areas such as the organisations roles and culture. Successfully designing and applying a data management solution is therefore a highly complex wicked problem.

People

People are an essential part of an organisation’s memory. The organisational culture is the collection of beliefs, values and attitudes that affect the behaviour of organisation members. People establish social networks of contacts, remembering who they can go to with specific problems or to access specific information. These networks are rarely documented. Knowledge of how to use organisational memory (who to ask about what, where to look things up) is called metamemory, but is often referred to as learning the ropes. Standard operating procedures are also an important part of organisational memory as they help organisations remember how to perform routine activities.

Documents

Documents are a commn medium for storing organisational data, and can take many different forms such as memos, manuals and reports. Hypertext allows the creation of non-linear documents which help readers find the information they are looking for more rapidly, but take longer to prepare.

Models

Organisations sometimes build mathematical models to describe their business, which can be placed in the broad category of DSS and used to aid the decision making process.

Organisations often try to capture the knowledge of their employees in the form of rules and semantic nets stored in a knowledge base, which is another type of organisational memory.

External memories

There are companies that provide access to information as a service to other organisations. This information adds to the external memory of a company.

Problems with Data Management systems

Redundancy
The same data can be stored in several different systems around an organisation. A classic example of this is a customer’s address, which may end up stored in various customer related databases.
Lack of data control
Poor management of data, usually because different data is managed by different departments with little consistency across a large organisation.
Poor interface
Poor interfaces to data management systems can make the data virtually useless to the people who need to access it.
Delays
Decision makers need fast access to the data they need. Poorly designed data management systems can lead to delays.
Lack of reality
The data stored by an organisation must be relevant to the real world. Often systems will not store or provide access to the information needed by decision makers.
Lack of data integration
Data is frequently spread across many systems in an organisation. Integrating it is a long and complex process, best handled in an incremental manner as new systems are developed.

A brief history of Data Management

  • 1950s—file systems
  • 1960s—Hierarchical DBMS
  • 1970s—Network DBMS, followed by Relational DBMS
  • 1990s—Object-oriented DBMS

Data, information and knowledge

Data
Raw, unsummarised and unanalyzed facts. Data is of very little use to decision makers as it contains far too much detail. Before it can be used it must be converted in to information.
Information
Data that has been processed in to a meaningful form.
Knowledge
The capacity to use information—requires education and experience.

One person’s information can be another person’s data, depending on the level of detail required. We finish with some atrocious ASCII-art.

             |-----------|\
             | Knowledge | \ Interpretation
 |------|    |-----------|  \
 | Data |\        |          \||----------|
 |------| \       | Request    | Decision |
           \      |          _ |----------|
Conversion  \|   \|/         /|
           |-------------|  /
           | Information | / Interpretation
           |-------------|/

This is Managing data by Simon Willison, posted on 30th September 2002.

Next: Languages and grammars

Previous: Aquarionics backups

Previously hosted at http://simon.incutio.com/archive/2002/09/30/managingData