Observations are made of the workings of the universe. This sound big and grand, but an observation can be something as simple as, "My tummy hurts." Observations represent the taking in of information from the world. Observations may be direct, taken in by a person's own senses; you probably trust observations you personally have made with your own senses. However, there are a couple of ways that observations can be indirect. One type of indirect observation involves the use of technology. Machines allow you to see things you can't really see (like with microscopes or telescopes), or hear things outside of a human hearing range, or even detect things are senses aren't built for (like magnetic fields). We trust the machines to give us accurate information. The other type of indirect observation involves using other people's observations, when they tell or write about them. We can develop ideas based on what someone else has seen, or heard, or detecting with a machine.
|
Sometimes even our senses can be fooled: optical illusions.
Artistic images taken with microscopes.
The unreliability of eyewitnesses.
|
From those observations, explanations are formed. Such an explanation is a hypothesis. If you see something happen an think you know why it happened, you have formed a hypothesis. For science, though, a good hypothesis needs to have two critical features:
A good hypothesis should lead to good predictions. "If this hypothesis is true, then this should happen..." It isn't enough to ask, "What'll happen when we do this...?" You need to produce predictions for the second critical feature -
A good hypothesis should be testable. This is part of the basic concept of science: it's all about the testing of ideas. It may look like science, it might sound like science, but if it isn't testable, it isn't really science. Often an idea is too big or complex to test, and must be split into testable bits.
|
The myths on Mythbusters are hypotheses; the tests are sort of scientific.
|
Tests of hypotheses follow particular forms. The "meat" of science is designing the tests for hypotheses. A good test takes a lot of skill and imagination, and carrying them out often involves making adjustments as things veer away from the plan. Tests may take the form of controlled experiments, which usually take place in laboratories, and field tests, which happen out in the world and are trickier to design. It is common to use models as substitutes for subjects that can't really be tested in a lab: mice may be used to see what a new drug's toxicity levels are, or computer simulations are used for weather and climate systems.
Tests should be focused. A test should address a particular aspect of a question, and aspects within the question must be clearly defined. "Is being friendly to a stranger likely to get them to help you?" is an interesting question, but to test it, you need some particular behavior that will be your "friendly" term, and a clear idea of what form of "help" you'll be looking for. Even the "stranger" part of the test is open to definition: how different from your strangers do you want your tester to be? Sometimes two tests that seem to have conflicting results actually had very different definitions for what looked like the same factor.
Tests require a comparison test. The test above needs a comparison, maybe two - will you get help from a stranger if you are not friendly, or are even unfriendly? If you can, then the help doesn't seem related to your friendliness at all. You could only know that with comparison tests. The classic comparison in experiments is called a control, and the classic control test duplicates the experimental test, with the object being tested removed. The object being tested is the experimental variable (or one of them, but the only one we'll be discussing here). Many experiments can't follow this classic pattern, since removing the tested object by itself may not be possible, and many control tests just vary the variable, or check the impact of confounding factors (defined below). In field tests, running real controls or even good comparisons can be impractical or impossible; this makes the conclusions from such tests less reliable.
Tests should address recognizable confounding factors. There are almost always aspects of a test that might affect your results but aren't what you're testing - those are confounding factors. Many confounding factors are part of the experimental procedure, and their effects on results are called artifacts. For instance, testing a new drug requires two test groups - both get "treated," but the controls don't get the drug in the pill or shot. They must get the treatment, though, to control for the placebo effect: just the act of treating people will improve the conditions of some members of the test group, enough to show up in the results. If both groups get treated, it's assumed that the placebo effect is equal in the two groups. As drug tests have developed over the last century, part of the control design involved a single blind, where the case patients and control patients were not told which group they were in. This made sense, since knowing whether your treatment was "real" or not would affect placebo effect. Then a researcher found that if the administering doctors know who is in which group, they can subtly give it away to the patients, and tests became double blind, where they don't know who is in which group (the treatments are randomly split up before the doctors get them). All sorts of things can be confounding factors, and sometimes they aren't recognized until the tests are under way. A common confounding factor is investigator bias: researchers see what they expect to see. A philosophical concept called postmodernism addresses how a person's own internal influences, from personality, upbringing, and culture, can strongly affect the way they see the world; this can also affect how researchers form their hypotheses, design their experiments, and see their own results. There are often ethical limitations on what may be done, which is a type of postmodern bias.
Tests should be reproducible by others. If other people can't repeat your experiment and get similar results, then something odd is going on - you could be "steering" your results without being aware of it, or your particular test has an unrecognized confounding factor that changes for other testers.
|
Mythbusters talking-to-plants design, with control but limited scope. Plus, the test develops a big confounding factor.
How confounding factors can affect cancer research.
Control group definition, with example.
Why is the placebo effect stronger now than it used to be?
Using single and double-blind in a different context.
|
Getting reliable results is somewhat affected by chance - if you test a drug on one person and it really helps them, or the one person dies, how much do you know about the effects of your drug. Could you even say that the drug caused the death? Good tests require repetition. A reliable drug test should use as many subjects as possible to reduce the impact of the "oddball" results. Sometimes it's not a lot of subjects, but just doing the test over and over to see how often certain results occur.
Results are usually statistically analyzed. Data is gathered - something is counted or measured through the course or at the end of the test, but how do you know what the numbers mean? There are many ways to process the numbers, some of which are particularly used for certain types of tests. It makes checking the conclusions difficult, especially if it isn't entirely clear just how the numbers have been crunched. This is another way that two apparently similar tests can come to very different conclusions. Statistics can also be used to distort results until they look like they support the hypothesis, and some times the researchers don't even know they've done it - they have just changed statistical methods until they have gotten a "good fit" with their data.
Because so much math is used in science, scientists prefer quantitative data, data in number form, to qualitative data, in a more subjective form. If you were in a study of painkillers, it is likely that you would be asked to rate your pain on some sort of defined number scale - your pain is qualitative, but it would be converted into quantitative data.
|
Trying to figure out if results are reliable, reproducible.
Is there a reproducibility problem in cancer research?
Introduction to experimental statistics.
|