Paths Are for People

Paths Are for People

A path is the structural relationship between an IMAGE/SQL master dataset and an IMAGE/SQL detail dataset linked through a common search item. A chain is a collection of detail entries that have the same search value; the head of the chain is in the master entry for that search value.

Every path you add to a TurboIMAGE database adds overhead, slows response, and makes your database more prone to broken chains. It seldom makes sense to add an inquiry path strictly for a batch program. Paths should be reserved for on-line users.

In order to know how many inquiry paths you should have for a dataset, you need to know the volatility of the dataset: how often do people ask questions about the data (read access) versus how often they modify the same data (write access)? If the read-write ratio is low, you should not have many search paths because the overhead of updating them during the writes will overwhelm the benefit during the reads. However, if the read-write ratio is high you can afford many more search keys. For example, a program that takes a second or more to write a record because it has dozens of keys per record, makes perfect sense. Suppose you only write records 500 times a day: the total CPU time will be less than 10 minutes each day. And even if the system is called upon to search for over a million records a day, the heavy use of keys will make the overall application work better. The only way to be confident that you have properly indexed your data is to measure, measure, measure.

The purpose of a path is to divide a dataset into small, manageable subsets of entries that can be retrieved, displayed, and updated easily. A path should divide a dataset into many short chains, rather than a few long chains. Paths are never absolutely necessary. You can accomplish the same inquiry by a serial scan of the entire dataset. The only reason for a path is to make the retrieval faster than a serial scan. However, if the number of unique search values is less than the number of entries per disc block, a serial scan will be faster than a chained retrieval. That is because a chained read takes one disc read, while a serial read takes only 1/N disc reads (where N is the blocking factor).