Biology - Molecules and Cells


 Terms and Concepts 



CHAPTER 3 - How Research is Supposed to Get Done




We accept today that science follows certain rules and processes that make it a dependable source of information, but those rules have not always been in place.  Until as recently as the 1600s, for instance, it was widely believed that living things could arise spontaneously from non-living, dead, or waste materials (this is called spontaneous generation), because people saw such materials "generate" living things such as mold or maggots, and no one thought to test whether this was truly what was happening.   In 1688, Italian naturalist Francisco Redi set out to test the idea with decaying meat in two containers:  one open to the air, the other sealed.  The open container meat eventually became infested with maggots.  And when critics insisted that it was the sealing of the second container that kept spontaneous generation from occurring, Redi did the test with an open container and one covered with cheesecloth, through which air could circulate (he suspected what we now know, that flies were the actual source of the maggots), and the cheesecloth-covered sample produced no maggots.  However, even as certain aspects of spontaneous generation became recognized as wrong, later, when germs were first discovered, it was first thought that they were a spontaneous product of sick tissues, rather than independent-living organisms that reproduced in the body.

How research "killed" spontaneous generation.


More on Redi's experiments (Blog post).


Someone recreates the maggot experiment (Day One Video).

It was a long road from that basic test to today's scientific method, but some of the approach Redi used persists:  modern science is about testing suspected explanations of ones observations, which can be made directly through ones own personal senses or indirectly through instruments or second-hand from someone else's direct observations.  An explanation for one or more observations is properly called a hypothesis.  A hypothesis should produce testable predictions or it isn't much use scientifically, and the tests are most reliably done under controlled conditions.  One hypothesis that always exists says that the current hypothesis is wrong, which can also be confirmed by testing - this concept is called the null hypothesis.

Introduction to scientific method.

A silly song, but it's got the bits in it.

In biology, complete control over conditions is hard to achieve, but scientists still strive for it.  If no alternative exists, testing may be done as field studies, with well-planned and organized series of observations that look for evidence for the hypothesis predictions.  Controlled experiments may be done in a laboratory environment with different test groups, similar to how Redi did his experiment.  One group, the experimental group, is specifically set up to test some critical aspect (the variable, or the independent variable) of the hypothesis;  another group, the control group, duplicates the experimental group but removes the variable (or, if that isn't possible, changes it in some significant way).  In Redi's second test, the experimental group was the cloth-covered containers (the cloth barrier as a test of air access but fly blockage was the variable), with the control test being containers with no cloth over them.   One expects there to be a critical difference in the results to uphold the hypothesis - that difference is sometimes called the dependent variable.

There are courses specifically in field study (pdf).

Parts of an experiment (video - slides with explanation).

Results, usually in some sort of number form (quantitative data, as oppose to non-number qualitative data) are collected from each group and compared.  The comparison is absolutely critical - just running an experimental group is possible (we could give a new headache remedy to a group of 100 people with headaches and record how much their symptoms improved), but how would you know whether your results were directly connected to your variable - how many headaches would have improved on their own, or improved just because the subjects were given a pill and expected improvement?  Improvement based solely on such expectations is called the placebo effect, placebo being an "empty" treatment - that makes it a confounding factor, discussed below.  In a proper experiment, a control group would have been treated identically, given "identical" pills with the remedy ingredient removed;  the difference in effects in the two groups can be said to be an effect of the ingredient itself.  Placebo effect is considered an artifact, a result of one type of confounding factor, discussed below.  When a result arises from the way that a test is done, but isn't actually connected to the variable, that result is called an experimental artifact. 

Quantitative methods (video).

A discussion about making qualitative data - ancient texts - quantitative for comparison purposes.

More on the placebo effect.

Why is the placebo effect stronger now than it used to be?




Modern science is based upon a descendant of that original scientific method, with some additions and minor changes.  A good experiment should be clearly designed and stated, and reproducible, so that someone else running the same test will get approximately the same results.  Since redoing someone else's work is rarely funded or publishable, it is rarely done except when the procedures are built upon or doubted.

Research also generally is subject to peer review, scrutiny by others in the same field, usually when results are being published (in peer-reviewed journals) but sometimes at other stages of the process.  Peer review can be a double-edged sword:  on the one hand, it should help to assure that research is being properly done and conclusions make sense, but on the other hand, one's peers may not be ready for innovative or unusual ideas or approaches.

Data issues.

More on peer review.

How open should the review be?

Pros and cons.

Modern biology, including medical research, can be confusing for a number of reasons, especially for the general public.  Often different studies seem to be completely at odds with one another, when in reality they were not studying quite the same thing, or the results were misinterpreted by the media.  How data is collected can affect results (how would the headache study above be influenced if the rating system went from "1 = barely there, to 10 = the worst headache you could imagine"?), and experiments with living organisms are affected by a wide range of confounding factors, other things that might be influencing the results.  One of the most  common confounding factors is pure chance - if the mouse you've picked to test happens to be particularly prone to cancer, anything you test will look cancerous - which requires that, whenever possible, test groups must be of sufficient size.  If you use 100 mice, that one cancer-prone one will not significantly affect your averaged results.  Conclusions based on a single instance or a very limited group are said to be based upon anecdotal evidence and are not considered to be reliable.  You know the basic logic here from real life:  just because you were lucky enough to get away with something once doesn't mean youll always be able to get away with it.

When is it truly confounding?

Confounding factors can get very technical.

Chance and bias as confounding factors.

Dangers of anecdotal evidence.

Anecdotal and placebo collide.

Obviously, if a test subject knew they were receiving a placebo, that would influence their responses;  this is why they are not told, producing what is called a blind test.  It was determined decades ago, however, that if the people giving out the treatments themselves knew which were real and which were placebos, they tended to treat the patients differently, sending subtle messages that might alter patient responses and results.  To eliminate those confounding factors, modern drug tests are double-blind:  those giving the treatments deal with numbered samples packaged and recorded elsewhere, not knowing which are real and which are not - theres no way they can alert the patients, even unconsciously, if they don't know which dose is which.  In some cases, the data is analyzed by a statistician who has no idea who belongs to which group - this is a triple-blind test.

More on experimental blinding.

Blind versus open testing.

More on placebo.

A triple-blind test - can you spot the problems?

A confounding factor that is easy to not recognize is experimenter bias.  The hypothesis is the experimenters idea, but they need to be careful not to let that influence how they set up the test, or how they define their terms, or how they interpret their results.  The idea that science cannot be separated from the minds of the scientists, their culture, their prejudices, their expectations, is linked to a broader concept applied to many fields:  postmodernism.  The statement, "I wouldn't have believed it if I hadn't seen it," has a flip side, stated by Ashleigh Brilliant as,  "I wouldn't have seen it if I hadn't believed it."

More about postmodernism.

Ashleigh Brilliant's website.

A researcher tries to recognize potential confounding factors while designing an experiment, and either eliminate them or set up separate control tests to determine or eliminate their influence, but researchers can't anticipate everything.  Often peer review will reveal a possible confounding factor never recognized, and it's back to running the test again.


Having quantitative data allows statistical manipulation of the data.  You can compare raw results of experimental and control tests, but if they are at all unequal, then some math is necessary to compare them.  Simple comparisons involve means and standard deviations;  averages allow different sized groups to be compared.  This is also true for rates.

Results commonly look for patterns in data.  Things that seem to have connected rates of change are correlated.  This often implies a connection, but not necessarily a causal one (there's a commonly-used phrase, "correlation is not causation).

The best designs incorporate randomization into the design;  where confounding factors exist, randomly changing such parameters between test groups is hoped to average out their influences.

The real question is, of course, do your results support your hypothesis, or are they just products of chance?  Statistical significance is used as a measure for this, and data gets more statistically reliable with many repetitions, as stated above.  Another way of looking at the data is finding a p value, a measure that compares your data against what would be expected if the null hypothesis was at work.  Figuring out just what "null data" is is tricky, though.  Generally, a p value below 0.05 suggests that the null hypothesis is unsupported (although suggestions for a better threshold of 0.005 are being suggested).  Design and measurement parameters all have an effect here, and sometimes an approach called p-hacking is suggested to get publishable results from bad data.


Additional Information Links


A blog about homeopathy trials that does a nice job explaining the requirements of medical testing.

An article with a historical perspective on how basic science works - better to be wrong than to let somebody fake your evidence.

An interesting perspective piece on science and values.

Research and the importance of being stupid.

A fairly bizarre page on research done with marshmallow peeps that sort of follows scientific method but uses groups that are too small to eliminate chance as a confounding factor.

Several views of ideas that persisted long after being scientifically shown to be false.

Public outreach site to help folks make sense of science.

Presentation - how to lie with charts.

The original paper on the Dunning-Kruger effect, roughly how folks with very little knowledge get convinced that they know more than experts - a very common problem in science.


Terms and Concepts

In the order they were covered.

   Spontaneous generation
Francisco Redi  
Scientific method
Field tests
Controlled experiments
Experimental group
Control group
Quantitative vs qualitative data
Placebo effect
Peer review
Confounding factors
Chance, Role of
Test Group - Need for Numbers / Size

Anecdotal Evidence

Blind & Double-blind tests



General Biology 2 - Molecules and Cells

Copyright 2013 - 2020, Michael McDarby.

Reproduction and/or dissemination without permission is prohibited.  Linking to this page is fine.



Hit Counter