Chapter 5. All models are false. But some are still useful

A newspaper account of a murder omits many of the grisly details, but it is still informative.

A useful fact about models is that they are all wrong in a strict sense, if only because they are incomplete. If a model is examined closely enough, it will invariably be found to differ from what it represents. Thus, a newspaper article about a murder tells you the victim's name, age and sex. It fortunately omits the exact locations of the knife wounds and the total volume of blood spilt on the floor, and omits details deliberately kept secret to aid in identifying the murderer. The list of ketchup ingredients tells you that the ketchup contains more tomatoes than sugar, but it conveniently overlooks certain details, such as the miscellaneous insect parts and feces in the bottle.

Never be surprised to learn that a model of interest to you is incomplete, hence is "false." Furthermore, it often takes very little effort to determine how a particular model is false. The reasoning we applied to a newspaper article and the ketchup ingredients could easily be repeated for the models given in the last chapter. Even with no previous exposure to the scientific method, it is usually easy to identify several important ways that any given model differs from what it represents.

The fact that a model is wrong does not mean it is useless. We continually use false models, such as the newspaper article and the ketchup ingredients discussed above. Because no model is one hundred percent correct, refusing to use a model merely because it is false is tantamount to refusing to use any models at all. The objective in using the scientific method is to distinguish useful models from useless ones.

How models are useful

The main reason that all models are incomplete/false is that they are simplifications. The ways in which they are simplifications may not be essential for certain purposes (the simplifications may in fact make the model useful). The budget for a corporation, for example, only approximates the actual expenditures, income and profits. Clearly, no matter how many accountants a corporation hires, and no matter how carefully these accountants work, it is impossible to prepare a budget that is completely correct. To predict income exactly, one would have to know exactly how many gizmos the company will make and sell in the next year, and what price each one would fetch. This prediction depends on details of the economy, on how each prospective customer will behave in the coming year, and so forth. Because these facts cannot be known when the budget is prepared, it is inaccurate, or false. Their faults notwithstanding, budgets are also universally used. A budget allows a corporation to make decisions and to plan, and thus to achieve higher profits than would be possible without the budget.

Although every corporation acknowledges that a budget plan will not predict exactly the future financial exchanges (hence the budget is a false model), there are limits to what kinds of false statements a corporation will tolerate in its budget. The omission of some events, such as a minor unexpected price increase of raw material may be of no consequence. But other false aspects create more serious problems, as with a serious underestimate of production costs, or a failure to pay employees enough to avoid a strike.

As a second example, consider a court trial over an automobile accident in which one car rear-ended another. The trial is a model of the accident itself. It is incomplete because eyewitnesses forget and make mistakes, because medical diagnoses of whiplash can be in error, and because a picture of an intersection will invariably differ from the intersection itself. These problems notwithstanding, the judge and jury, after listening to the evidence presented in a trial, will usually have a pretty good idea of what happened. Hence the trial is a useful model.

Models in science

The models that scientists use are no different from the models you use in everyday life. They are simultaneously false and useful. Learning even a small amount about scientific models can be quite useful in detecting major limitations of scientific approaches. This knowledge enables one to pose relevant questions to those who developed the model.

The USDA food pyramid mentioned previously is useful as a model of a health-effective diet even though it condenses thousands of scientific studies about diet and health into a single picture (it is a summary model). This reduction must have resulted in the loss of substantial information, considering all the words, data, nuances and caveats in the original papers. Despite this, the food pyramid is an effective tool for communicating the results of a wide range of scientific studies to large numbers of people with varying backgrounds and levels of scientific sophistication. In fact, it is much more effective at this task than are the original scientific papers.

Biologists use animal models in developing new medical treatments (a type of physical model). Pharmaceutical manufacturers test the safety of new drugs using rats and mice before giving the drugs to humans, and heart surgeons develop new surgical techniques on dogs before trying them on humans. These models, however useful in preventing humans from taking unsafe drugs or having untried surgery performed on them, are nonetheless imperfect. Animals do not respond to drugs in exactly the same way humans do (think of how cats and humans respond to catnip). And while it might be useful for a heart surgeon to practice on a dog, performing surgery on a dog is clearly not the same as performing surgery on a human.

Pieces and parts as models

It seems fairly straightforward to consider two Suburbans off the same assembly line as models of each other, or the rennovated Ft. Davis Historic Site as a model of the cavalry fort of the 1800s. Many of us also accept without question that a picture of Abe Lincoln is a model of him. All of these models are obviously false: the picture of Abe is not the man; the modern Ft. Davis is not populated with cavalry soldiers, nor is it concerned with Indian raids. One Suburban is probably a good model of the other for many purposes, but there are countless differences between the two when it comes to how tightly bolts are fastened, which parts will fail first, and inherent weaknesses in the materials.

Consider now, an airliner crash. TWA flight 800 exploded in mid-air just off Long Island only a few days before the Atlanta Olympics. Eyewitnesses reported seeing trails of orange behind the plane, suggestive of a missile. Early speculations focused on a bomb, both because airliners don't just blow up in mid-air by themselves, and because the Atlanta Olympics provided the kind of public focus that terrorists often target. Our airports switched to tightened security measures, and President Clinton and Congress responded by passing expensive legislation to increase airport security. In the long run, we don't know what caused the crash, but odds seem to favor an equipment malfunction rather than a bomb.

What, then, are models of the cause of this crash? We would certainly want to include reconstructions of this crash -- the assembled wreckage and any computer simulations of how the same kind of plane explodes from a bomb. We should also include as a model the deliberate detonation of another plane, which could be studied to understand how a plane breaks apart in mid-air (such a deliberate detonation has been contemplated). These are all models of the crash. But models exist at much smaller levels as well. A single piece of wreckage could give the valuable clue of bomb residue or metal twisted in a particular way, diagnostic of what went wrong. Eyewitness accounts of the crash, data recorders from the plane, knowledge of the sources of baggage put on the plane (were bags from other flights transferred to TWA 800?), and even the timing of the accident with the Olympic games are all pieces of information or pieces of physical evidence that could shed light on the cause of this crash. They are thus all models of the cause.

It may seem strange to call a single piece of wreckage or information about baggage as a model, in that these particular models are clearly only portions of the entirety of the crash. Yet any model has countless differences from what it represents. A reconstruction of the whole plane from the recovered wreckage may seem more complete than all the pieces, and it may be more complete in many senses. Nonetheless, even the entire assembled wreck is a far cry from the actual accident -- it is not in the air, flying; there are no passengers, nor is there any way to retrieve the lost lives, and so on. So it is misleading to suppose that a single piece of wreckage or bit of information is too insignificant to be a model but that the sum of these insignificant parts is a legitimate model.

The important issue here is that some models are more USEFUL than others. One piece of wreckage may be more useful than a 1000 other pieces in understanding whether a bomb went off. What makes a model useful is explained next.

 

ARC: Why we use particular models

The goal determines model usefulness. Models vary in their usefulness. Some are so different from what they represent that we just refuse to use them. Yet there is no such thing as an intrinsically good or bad model without considering its context. A model is judged against the goal, and a model may be good for some purposes and bad for others. Consequently, the standards we use to decide the utility of false models are extremely diverse. An algebra problem that your math instructor assigns is a model of the problems you will be asked to solve in some careers. Because your future boss will never ask you to work a problem exactly like those at the end of the chapters in your algebra book, each problem you work in class is a false model of this future need. The relevant question is not whether the model is false, but whether the false aspects of the model seriously degrade its usefulness. The answer to this question will vary depending on how you use the model. Thus an algebra course might be usefully false model for accountants, business managers and engineers, but a hopelessly false model for artists.

More generally, the usefulness of a model depends on the problem to which it is applied (the goal of the work). Thus, any model may be useful for some purposes, and it will invariably be useless for other purposes. When considering the value of a model, it is therefore essential to know its application. In many cases, this point is obvious - a nuclear physicist would not be the least interested in using GM's annual budget model to predict the behavior of elementary particles. However, the match between model and goal applies on a much finer scale as well. For example, the file of previous exams owned by a fraternity may be very useful for some classes but not others.

The criteria for model acceptability can be classified in many ways. Here we will recognize 3 criteria: Accuracy, Repeatability, and Convenience, or ARC. Acceptance of a model depends on a combination of all 3 criteria, though there is no universal rule for assessing the relative benefits of one criterion versus the others. And we are not looking for one model that simultaneously best satisfies all three criteria. In fact, we often use several different models of any one thing to overcome the limitations of any single model -- there is NOT a most useful model for any particular goal.

Accuracy. This is the most obvious criterion to use in accepting or rejecting a model. After all, if we are trying to represent something, we hope that our model does a good job of actually representing what is intended. Accuracy is the measure of how well the results from the model will enable us to predict the real situation. This criterion is thus easy to grasp, and we will move on to the next criterion.

Repeatability. This criterion combines two numerical properties of the model: the number of times the model can be observed or manipulated, as well as the similarity (consistency) of results from one trial to the next. Mice offer a good model to study cancer-causing agents in humans because they possess repeatability -- thousands of mice can be tested, and the results from one batch of mice tend to reflect results from other batches (results are consistent). Likewise, methods for industrial testing of products are geared toward repeatability in both senses, because the test will be performed many times and the outcome needs to be consistent.

Certain types of models lack this property. Attempts to understand unique events after-the-fact are often based on models lacking repeatability. Wreckage of a plane crash provides various models of the crash -- pieces of the plane, for example -- but these models lack or are weak in repeatability. Eyewitness accounts also lack this property, and as such, are limited in an important respect.

Convenience. This third criterion covers time, cost, and ethics. In an ideal world, we might imagine that cost is no problem. Yet, budgets dictate that we make the best use of the money. And time constraints dictate that we get answers soon as opposed to later. We use mice instead of monkeys for initial tests of foods and drugs because mince are more convenient than monkeys (in cost, time to results, and ethics). Virtually all models used to test products on a large scale are chosen with a heavy emphasis on convenience. Some such models seem ridiculous because so much accuracy has been sacrificed in favor of convenience (e.g., condom testing). But at some level, most models sacrifice accuracy to achieve convenience.

An example. Consider a detailed example of the conflict between accuracy and convenience. If our goal is to understand cancer in humans, we might use genetic studies of of humans, monkeys, rodents, yeast, and/or bacteria. How do each of these models rank on the scales of accuracy and convenience?

Model

Accuracy rank

Convenience rank

humans

1

5

monkeys

2

4

rodents

3

3

yeast

4

2

bacteria

5

1

There is an inverse relationship between accuracy and convenience. All of these model organisms are useful -- yeast and bacteria might be the most useful for some purposes because they are cheap, easy to manipulate, and do not raise ethical issues in experimentation (strong on convenience). The genetics of yeast is similar enough to that of humans (with respect to the control of cell division) that many breakthroughs in cancer research have come from them. Obviously, model accuracy is greatest with humans, monkeys next, and so on. So we can have very useful (convenient) models that are much less accurate than other models.

Model incompleteness is the basis for improvement

Most models deemed useful at some point in history have only a temporary life, i.e., they are replaced by better ones after awhile (returning to the point that science does not prove models to be true). For example, today's technology enables companies to assess financial status faster and more accurately than in the past, so that their budgets incorporate different components now that 20 years ago. In science, most successful models are improved through time as well. The better models simply address and overcome some of the falsity or incompleteness in their predecessors.

One can often anticipate how a model may eventually be improved merely by contemplating how it is incomplete. The goal is not to eliminate all incompleteness in a model, but rather to correct the more serious limitations. It is often possible to anticipate how a model may ultimately be rendered obsolete merely by thinking about its limitations. In teaching you to think about models, we will therefore emphasize their limitations. Table 4.1 illustrates how some common models can be trivially wrong versus seriously wrong.

How some models can be false

Model

Minor flaws

Major flaws

Medical diagnosis

Although no two patients are identical, most individual details can be neglected

Incorrect diagnosis results in patient death or malpractice

Credit rating

Small details of personal finances are omitted

Omission of major debts or credits that have a big impact on personal finances

Test for heroin

The test assays various chemicals other than heroin itself, but these compounds are minor constituents in the body

Eating poppy seeds before the test gives the impression that you have illegally taken drugs

Income tax return

Small transactions are overlooked

Omission of large sources of income which can result in a financial penalty or incarceration

College exam score

A lucky guess results in a few points on a subject that the student was not prepared for

Exam score is totaled incorrectly

New car

Slight differences exist between different cars of the same model

The car you purchase is a lemon

Space shuttle flight

Each flight faces different problems, which are usually fixable

The shuttle explodes

An example: Rodent models of cancer

Drugs and food additives for human consumption need to be tested for their possible abilities to cause cancer. One of the important tests involves feeding the substance to rodents (rats or mice) and assessing cancer rates.

Consider how to use an animal model efficiently and ethically when testing whether a food additive causes cancer. There are several considerations in setting up a study:

i) What organism to use -- humans are the most accurate, but experimentation is costly and (depending on the experiment) possibly unethical. Bacteria and yeast are inexpensive and free of ethical considerations, but they are single-celled and cannot become cancerous. Rodents offer a good compromise between humans and simpler organisms.
Given this choice of a rodent model, other issues arise:
ii) how many rats - fewer is cheaper and more ethical from the perspective of animal welfare, but an inadequate sample size may fail to detect an effect and thus expose children to harm
iii) how long to monitor them -- your cost increases with the duration of the study
iv) the dose -- higher doses make it easier to detect small cancer effects and hasten the development of cancers

The compromise achieved amid these conflicting issues, is to feed rodents large amounts of the substance -- enough that a few of the animals actually die from it.

Bruce Ames has pointed out a serious limitation of this model -- the cancer-causing potential of the substance in humans may be overestimated by the rates in mice. High doses typically lots of cells in the mouse, even if the mouse survives. The mouse responds by replacing those cells -- by dividing at higher than normal rates. These increased cell division rates, all by themselves, can lead to increases in cancer. (Cancer is an abnormally high cell division rate, and cells become predisposed to become cancerous when they divide.) Feeding mice so much of an otherwise harmless substance that their bodies must produce lots of new cells may give the mice an elevated cancer rates; people would not eat high enough doses to affect cancer rates.

It is not clear what to do about this limitation of the rodent model of cancer. We need some way of testing food additives, drugs, and other compounds before we give them to humans. The approach used is to employ a hierarchy of models -- start with tests on bacteria to determine whether a substance causes mutations, then try rodents, then try other mammals, and eventually humans. We are not likely to abandon this model unless a superior one comes along, but we can and do supplement the limitations of this model with additional ones (that have other shortcomings).

A template for models

To facilitate use of the information in the last two chapters, we offer a template for models. This template is intended to help you think about the different aspects of models whenever you read (or think) about uses of the scientific method. Any news article describing medical research or a news article on studies into business practices can be analyzed from this perspective. In the next few chapters, we will describe some biological examples primarily from the perspective of models, and this template will be used in summarizing those presentations.

Model Template

MODEL

KIND (Abstract, Physical, or Sampling)

USE

STATUS (Accepted, Rejected, or Uncertain)

LIMITATIONS

The kind of model is either abstract, sampling, or physical; if a model does not fall into any of these 3 classes, we will not worry about the class (there are too many ways to classify models for us to bother classifying all of them). The application is the problem for which the model is used. For example, a business (financial) plan is applied to managing company money, a monkey might be used as a model of humans in understanding AIDS, and so forth. The status of a model indicates whether it is currently regarded as useful (accepted), rejected, or is in dispute (undecided). For example, the model that X-rays directly cause bacterial mutation would be accepted for the set of experiments in which irradiation of bacteria leads to mutation but would be rejected for the experiments in which irradiation of Petri dishes alone leads to mutation.

The last item, limitations, is of interest only for models whose status is accepted or undecided. It is important to realize that all models have limitations and that these limitations may ultimately lead to the model's rejection when we have better data. By highlighting limitations of currently-accepted models, we should be constantly alert to possible revisions of the model that may be even more useful.

All models must be refutable

One of the most widely publicized features of the scientific method is that scientific models must be refutable or falsifiable. By this it is meant that observations can be imagined that would cause the model to be rejected. The criterion of falsifiability differs from our claim that all models are false. Typical examples of unfalsifiable models are of the form that various demons or spirits control all events in the world, or that some person has mystical properties. These models are not falsifiable, because it is not possible to even imagine data which would call for their rejection.

How does falsifiability fit into our framework? The goal of any application of the scientific method invariably involves predicting the unknown (explaining future results) or manipulating the future (increasing profits). But because an unfalsifiable model admits all possible outcomes (since nothing is inconsistent with it), an unfalsifiable model cannot be improved upon. Consequently, unfalsifiable models are useless.

Unfalsifiable models are common outside of science. Some market analysts have a flair for "explaining" the ups and downs of the market after-the-fact. Regardless of the market changes, these nightly reports profess to account for all the ups and downs, and there is no pattern that can't be explained. Of course, as long as these reports don't attempt to forecast the directions in the future, they can never be challenged (of course, some analysts do predict future trends). Prophecies also tend to lack falsifiability. These statements are often couched in extremely vague terms, and only in retrospect are people able to "interpret" them to make sense. To be falsifiable, a prophecy needs to be specific enough that we know in advance what to expect (e.g., the world will collide with a large comet on the morning of your 3rd exam).