Sat Feb 20 09:59:35 EST 2010
RESEARCH DATA GATHERING TECHNIQUES
By R. Stewart Ellis, PhD
This is designed to complement, not replace, the lectures and class
discussions about R.D.G.T.s. Examples given in class are not repeated in
detail and some new examples are included. Several similar essays were found
online, but all of them were either too detailed in examples or in concepts to
fit in very well with what I want you to get out of this, so I wrote my
There are 4 main R.D.G.T.s in the social sciences: experiment, survey
archival research and participant
observation. Although some refer to
these as "methods" I believe it is better to call them techniques unless one
is also going to talk in relative detail about the entire research process:
literature review, theory construction, hypothesis formation, data collection
design and analysis of the data collected.
Empirical disciplines, which all sciences have to be, require data to be
collected. Although all empirical disciplines started out as "natural
history", collecting the data by casual observation of nature, most of them
have progressed by developing systematic standards of data collection and
interpretation to reduce the casualness of the observations. Systematic
collection of data guided by the scientific method is used to tease out the
variables that are significant from the ones that are less significant,
irrelevant and/or confusing.
The most systematic technique of data collection is laboratory experiment,
where it is hoped all variables will be able to be controlled by the
researcher. Since physics and chemistry (including biochemistry) have gone
the farthest in developing the experimental "method", they are (sometimes
inappropriately) taken as the gold standard of what science has to be like.
Although it should be obvious that great strides have been made in many of the
other physical sciences where laboratory experiment is not possible such as
astronomy, geology and many areas of biology, the social sciences are
sometimes criticized as not being real sciences because they are not based
enough on experiment, particularly lab experiment. While small issues of
human social behavior can be studied in the artificial setting of a
laboratory, attempts to study larger issues have frequently resulted in
results that are very controversial because it is not completely clear that
all of the variables were actually either being controlled or even
While variables sometimes can be manipulated in a "field" setting, field
experiments can be contaminated by factors the experimenter is unaware of.
- Provides numerical results in most cases.
- Focus on trivial behaviors can yield results that are of low importance
to the subject, resulting in more natural behavior.
- Careful matching of experimental and control group members can give
increased confidence in the results.
- Generally replicable.
- Ethical guidelines prevent many experiments. Informed consent of the
subjects required by scientific societies, universities and funding
agencies alerts the subjects and puts them on their guard.
- Humans are very self-conscious when they know they are being observed and
may intentionally or unintentially alter their behavior.
- The higher the "risk" in the experiment the higher the
- In trying to control the situation, the behaviors that are being measured
may become so limited that there is little point to the experiment.
- Failure to clearly isolate and manipulate variables between a control and
an experimental group can give an illusion of an effect.
- The experimental subjects may not properly reflect the population and
therefore prevent generalization of the experimental results to the
Surveys are any form of questioning of a sample of people with hopes of
getting an insight into a larger population of people. Since people are
either being asked to fill out a questionnaire, or are being asked by a
door-to-door interviewer or by someone on the telephone, or responding to
questions from a computer, surveys usually depend on self-reporting of the
data by the subject. In the past door-to-door or phone interviewers were
often instructed to fill out some information by observation, such as age,
gender or "race", or impressions of the subject while responding: slow?
hesitant? answer with a rising tone? facial responses such as looking down
or into the eyes of the interviewer? Trained interviewers also can easily
follow a branching survey. With today's phone-robot and online surveys there
is very little door-to-door surveying done and the phone robots cannot usually
discern gender or speaking accent. However, computers and phone robot
interviewers can time responses and note changes to answers, and phone robots
could (I do not know whether they do) register rising tone answers. And
computerized surveys can easily be programmed to branch depending on the
Since surveys almost always deal with a sample, one of the most critical
issues is to understand the sample. Many people mistakenly believe the larger
the sample the more likely it is to be random, but the Literary Digest
Presidential Poll of 1936 is a good demonstration that that is not true. In
that election George Gallup stunned everyone when he correctly predicted the
outcome of the election with a much smaller, but well constructed and
understood, sample. Since the 1960's national elections have been
successfully predicted with samples of ~1200. In addition, other national
surveys of opinion or self-reported behavior have been successfully studied
with similar sample sizes. Ironically, samples of local and state-level
opinions or behavior usually need samples nearly as large, 500-1000.
Small samples frequently are stratified, which means that the less numerous
categories of people in the population are oversampled to insure they are
included, while larger categories such as white men are undersampled because
they are much more likely to be included even if undersampled. To compute
numerical values for the different answers the answers of the overcounted
categories and undercounted categories are multiplied by a weighting factor to
come closer to the population projection.
In addition, small samples are usually done on a forced basis, which means the
subjects have been selected to represent certain characteristics and if the
first person meeting those characteristics declines to participate, s/he will
be replaced by someone else with the same characteristics. For example, since
place of residence is frequently strongly correlated with other social factors
such as ethnicity, income, education, age, etc., addresses are sometimes
chosen as means of getting respondents fitting a certain set of
characteristics. The instructions to the door-to-door (d2d) interviewer might
be to interview the residents of the house on the northwest corner of each
block in a particular neighborhood. If no one is home or they refuse to
participate, go 4 doors to the left until success is achieved. For computer
or telephone interviews, phone numbers are often chosen because they are in
particular neighborhoods, so the same algorithm can be used.
- Anonymity can encourage people to be more forthcoming than in an
experiment or other kind of direct observation
- Easy to analyze with modern data management and analysis software. Some
analysis tools are usually installed on most PCs.
- Relatively cheap to mail out questionnaires or one-time keys to take
the survey online.
- Even forced sampling is more easily accomplished with computers using
auto dialing routines.
- Interviews by humans or computers can be better than questionnaires in
capturing information about how the respondents reacted to the individual
- As we lose privacy with the increase in commercial and governmental data
mining it will be increasingly easy to collect behavior without the
knowledge or consent of the people being studied.
- If anonymity is assured, there are relatively few ethical issues in
scientific polling. Political push polling is another matter entirely.
- Regardless of perceived anonymity people do, both willingly and
unwillingly, misreport their characteristics or behavior to be more
acceptable than they actually are.
- People do try to figure out what you are trying to get at and may be
influenced by that in their answers.
- Particularly with questionnaires that are distributed through the mail,
even though the initial sample to whom things were mailed may be either
properly random or properly stratified, the sampling may be disrupted by
differential response rates by different sorts of people. For example,
on politically charged topics people who feel strongly one way or the
other participate more heavily than do the people in the middle who are
usually more numerous. On lifestyle issues people whose behavior may
seem extreme or radical compared to the norm are sometimes more likely to
fill out the questionnaire.
- Interviews, which are more likely to used forced sampling to maintain
randomness or structure of the sample even in the face of people who
decline to participate, are more expensive since they involve more
members of the team who have to trained and frequently paid to conduct
the interviews. D2d interviews are even more time consuming and thus
- Ultimately, even when reporting behavior, surveys are more about thoughts
and ideas than the actual behavior.
- As we lose privacy with the increase in commercial data mining our
behavior will increasingly be reported without our knowledge or
In the past this was thought to be primarily the domain of historians, but
increasingly after the mid-1900's, some historians themselves began to think
of their discipline as a kind of social science, and historical data began to
be used by sociologists, anthropologists, economists, political scientists and
cultural geographers. People often think of the past as something that is
easily reconstructed from written records, but the cross fertilization of all
these fields raised such basic questions as "What is a document of the past?"
Folktales? Myths? Legends? Oral histories? Architecture? Art? "Why are
some things from the past preserved and others are not?" Might behavior in
the past be revealed by looking in new ways for remains of that past
As a basic example, historians and social scientists from sociology and
anthropology had for a long time assumed that the modern nuclear family,
increasingly today becoming a broken nuclear family, was a result of the
breakdown of traditional society caused by the industrial revolution.
Historians' views of families of the past were distorted by their past
concentration on the history of the elites, the ones who wrote letters, kept
diaries and were frequently documented by others. Even the artifacts produced
for and used by the elites are more likely to survive over the years because
of their quality, value and historical significance. Such families tended to
be multigenerational and even to be extended with several married children
sharing country estates and elaborate urban residences, depending on the
Sociologists' views were distorted by their tendency to focus on contemporary
social problems of whatever period the individual sociologist was living in.
Comparing the sometimes broken families of the mid to late 1800s which had
been studied at that time, to the increasingly broken families of the 20th
century without adequate detailed attention to the histories of those poorer
people before industrialization led sociologists to infer a regression line
back to an imagined state of the peasants living in the past in families like
those of the wealthy.
Anthropologists' views were distorted by their tendency to treat the
aboriginal people in all of the conquered parts of the world that the
Europeans had colonized in the Age of Exploration and the Industrial
Revolution as though those natives had been living from time immemorial in the
manner in which they appeared to the Europeans in the 18th and 19th
All three of these disciplines began to wake up at about the same time that
the history of these "people without history" was perhaps very different from
what had been assumed based on the history of "people with history."
The problem was how to recover the history where there seemed to be none?
Social scientists began to look for different documents and for previously
overlooked information in documents that had previously been used to study the
behavior of the elites. For example, travel passes issued by slave owners to
slaves to visit neighboring plantations, followed by transfers of slaves
between plantation owners which had not been looked at seriously before,
revealed a pattern of courting and family formation among some African
American slaves that hitherto had not been noticed by historians.
Content analysis of paintings could be used to reveal past attitudes and
behaviors, such as the changing depictions of children in paintings from
medieval to modern times, suggesting that in the past children went straight
from being babies to being workers on farms and in mills and factories.
Another of the tools is archeology (I prefer the simpler spelling) which
treats physical remnants of past behavior as documents. With the advent of
radiological and chemical dating techniques following World War II it was
possible to dig up an ancient Native American, African or English peasant
house from several hundred years ago and have some confidence in how old it
is. Increasing sophistication in recording the materials coming out of
archeological sites allowed fuller reconstruction of the activities at the
time the sites were "alive", including how many people lived there.
By the 1950's archeologists were using punch card sorters to analyze the
distributions of artifacts dug out of the ground, and by the 1960's some were
fortunate enough to have access to computers to catalog, store and analyze
their data. Sites that were being lived in were excavated using archeological
techniques to check what can be learned using just archeology versus
archeology and other data-gathering techniques together. These kinds of
comparisons helped archeologists make better sense of the artifacts they were
able to recover and the way in which the artifacts had originally been
distributed in the sites. As computers become more powerful reconstructions
and simulations become easier to do and more complete.
DNA mapping is being correlated with other evidence of past behavior giving
better insights into migrations and other demographic dynamics of the past.
Satellite mapping is revealing past civilizations in such unlikely places as
the depths of the Amazon jungle and the Western Sahara.
Ground penetrating radar can be used to locate and map buildings buried under
the surface of the earth without having to dig and disrupt the present-day use
of the land.
Better understanding of accomplishments of non-Western peoples before contact
with Europeans and the frequent serious disruptions of native societies even
before they came into direct contact with Europeans is increasing our
awareness of unintened consequences of our actions and the fragility of even
the most-developed civilizations in the face of environmental and economic
- The people being studied cannot alter their behavior because of the
current study or the presence of the current researcher. However the
researcher needs to be aware of the context in which the "document" of
the past was created, but that has always been the essence of the
historical method, historical criticism. What are the likely sources of
bias, sampling error or other possible sources of distortion?
- Some sources of data are readily available either in microfilm or
microfiche or in electronically readable format, some even readily
available online for free. Examples: Mormon genealogical database, GSS,
various government surveys. The entire body of ancient Greek documents
has been available in electronically readable form since at least the
1970's. Even for sources that are not online or in portable format,
indexes for more and more sources of data are going online making it
possible for more researchers to locate archival materials all around the
- With imagination it is possible to study more things through archives
than once thought possible.
- It is frequently one of the least expensive ways for the lone scholar
with limited expenses to study significant problems.
- Even though research subjects are not intruded on in any way there are
still ethical issues inherent in possibly making people or authorities
aware of information that has been considered private. The U.S. census
enforces a strict 72-year-confidentiality rule on its raw data, but the
Mormon Church exposes data about families and even living individuals in
its database that are still considered private or confidential by the
people in question. This has led some organizations to issue takedown
requests with the LDS Church, for example Jewish groups requesting the
takedown of Holocaust data and other Jewish material.
- As an extension of the previous point, some material may be so closely
guarded that only people who agree to follow guidelines set by those who
control the material are given access. While on the surface this seems
fair and reasonable, it has interfered with some important potential
studies. For example, Sigmund Freud's descendents still guard his papers
so jealously that scholars who raise legitimate questions about Freud's
methods and interpretations cannot gain access to his papers where those
things might be made more clear. Those who do have access sometimes are
not as critical as they should be of Freud's work.
- You can only study what has left traces, although imagination can
sometimes find traces where none were thought to exist.
- For archives that are not portable, travel and living expenses can be
considerable. If the archives have not been catalogued the researcher
will have to spend a lot of time cataloguing and calendaring the
materials. At least the archeologist frequently gets to take his/her
materials back to the home lab and spend years cataloguing and analyzing
- Archeology as a means of accessing "archives" of remnants of past
behavior can be horribly expensive. Add to the travel and living
expenses for the lone researcher listed above those of all the members of
the team who are working away from home. Then add the costs of local
labor, consumable tools such as shovels, picks and trowels and equipment
rental for earth-moving equipment. Also add the costs of satellite
imaging (going down in recent years), GP radar, radiological and chemical
tests. I am sure I am missing some major categories.
- Archeological sites and non-portable archives frequently become
unavailable for long periods of time because of military and political
conflicts, or because of budget cutbacks for the institutions housing the
documents or artifacts.
The basic premise of participant observation (P.O.) is that it is possible for
the researcher to observe behavior in real time in its full context without
having to depend on self-reporting, the accidents of behavior or attitudes
being documented or the accidents of artifacts of past behavior being
preserved to be studied in the future.
- It is possible to gain an understanding of the complete context of the
behavior that is being studied.
- Every aspect of the society/culture of the small community can be studied
if the period of study is long enough. This is called the holistic
- Can be carried out by a lone researcher.
- Lengthy presence in the community makes it less likely that the research
subjects will continue to be able to try to mislead the researcher about
their real behavior. What is recognized as unacceptable behavior by the
outside world or by members of the local community is more likely to be
exposed. On the flipside, posing behavior designed to impress outsiders
in a negative or positive fashion is frequently discontinued by the
research subjects as the researcher becomes part of the daily situation.
- Lots of studies of similar communities in lots of different parts of the
world to compare findings with.
- The ethics of P.O. deal mostly with questions of disrupting the lives of
the research subjects during the research or as a consequence of the
publication of the research. Additional ethical dilemnas have to do with
how much to reveal. Lengthy observation with no consent by the subjects
is rarely attempted, but would be considered unethical today by most
social scientists. Over involvement in the lives of the subjects raise
questions of both ethics and bias.
- Gaining rapport may be difficult, taking as long as several months to
several years. Some studies have been abandoned because of this
- Time consuming. The best studies last a year or longer, during which the
researcher frequently has reduced or no income. This requires
researchers to compete for grants to support the research.
- Can be dangerous. Exotic diseases, poisonous animals, ferocious beasts,
violent people, food that is quite at odds with personal preferences or
past experience all can be problems.
- Small sample size, particularly if only one researcher is carrying out
the study. Is my neighborhood or village comparable to other similar
- Researcher bias can influence findings. One researcher of a village in
Mexico saw a cooperative community in which people worked for the common
good, sometimes sacrificing their own opportunities. A later researcher
of the same village found people constantly engaging in gossip and
criticism of others, particularly those who seemed to be doing better.
Which one was right? Both. Subsequent studies of peasant villages in
similar circumstances even in other societies show that mambers of the
peasant community frequently coerce cooperation from their neighbors
through criticism and other negative informal social sanctions. We are
fortunate to have the two studies, but if either study had to be
evaluated on its own, we would be suspicious of it and probably not be
misled despite the fame and reputation of both scholars.
© 2010, R. Stewart Ellis
Edit History: Mon Feb 22 20:19:57 EST 2010 - add ethics bullet for P.O.
Mon Mar 8 12:24:44 EST 2010 - 2 minor editing errors per a student
Tue Apr 27 11:59:06 EDT 2010 - added missing prepositions
Thu Feb 3 14:34:44 EST 2011 - split infinitive, sent. frag.