Sat Feb 20 09:59:35 EST 2010

RESEARCH DATA GATHERING TECHNIQUES

By R. Stewart Ellis, PhD

This is designed to complement, not replace, the lectures and class discussions about R.D.G.T.s. Examples given in class are not repeated in detail and some new examples are included. Several similar essays were found online, but all of them were either too detailed in examples or in concepts to fit in very well with what I want you to get out of this, so I wrote my own.

There are 4 main R.D.G.T.s in the social sciences: experiment, survey archival research and participant observation. Although some refer to these as "methods" I believe it is better to call them techniques unless one is also going to talk in relative detail about the entire research process: literature review, theory construction, hypothesis formation, data collection design and analysis of the data collected.

Empirical disciplines, which all sciences have to be, require data to be collected. Although all empirical disciplines started out as "natural history", collecting the data by casual observation of nature, most of them have progressed by developing systematic standards of data collection and interpretation to reduce the casualness of the observations. Systematic collection of data guided by the scientific method is used to tease out the variables that are significant from the ones that are less significant, irrelevant and/or confusing.

EXPERIMENT

The most systematic technique of data collection is laboratory experiment, where it is hoped all variables will be able to be controlled by the researcher. Since physics and chemistry (including biochemistry) have gone the farthest in developing the experimental "method", they are (sometimes inappropriately) taken as the gold standard of what science has to be like. Although it should be obvious that great strides have been made in many of the other physical sciences where laboratory experiment is not possible such as astronomy, geology and many areas of biology, the social sciences are sometimes criticized as not being real sciences because they are not based enough on experiment, particularly lab experiment. While small issues of human social behavior can be studied in the artificial setting of a laboratory, attempts to study larger issues have frequently resulted in results that are very controversial because it is not completely clear that all of the variables were actually either being controlled or even measured.

While variables sometimes can be manipulated in a "field" setting, field experiments can be contaminated by factors the experimenter is unaware of.

Strengths:

  1. Provides numerical results in most cases.
  2. Focus on trivial behaviors can yield results that are of low importance to the subject, resulting in more natural behavior.
  3. Careful matching of experimental and control group members can give increased confidence in the results.
  4. Generally replicable.

Weaknesses:

  1. Ethical guidelines prevent many experiments. Informed consent of the subjects required by scientific societies, universities and funding agencies alerts the subjects and puts them on their guard.
  2. Humans are very self-conscious when they know they are being observed and may intentionally or unintentially alter their behavior.
  3. The higher the "risk" in the experiment the higher the self-consciousness.
  4. In trying to control the situation, the behaviors that are being measured may become so limited that there is little point to the experiment.
  5. Failure to clearly isolate and manipulate variables between a control and an experimental group can give an illusion of an effect.
  6. The experimental subjects may not properly reflect the population and therefore prevent generalization of the experimental results to the population.

SURVEY

Surveys are any form of questioning of a sample of people with hopes of getting an insight into a larger population of people. Since people are either being asked to fill out a questionnaire, or are being asked by a door-to-door interviewer or by someone on the telephone, or responding to questions from a computer, surveys usually depend on self-reporting of the data by the subject. In the past door-to-door or phone interviewers were often instructed to fill out some information by observation, such as age, gender or "race", or impressions of the subject while responding: slow? hesitant? answer with a rising tone? facial responses such as looking down or into the eyes of the interviewer? Trained interviewers also can easily follow a branching survey. With today's phone-robot and online surveys there is very little door-to-door surveying done and the phone robots cannot usually discern gender or speaking accent. However, computers and phone robot interviewers can time responses and note changes to answers, and phone robots could (I do not know whether they do) register rising tone answers. And computerized surveys can easily be programmed to branch depending on the previous answers.

Since surveys almost always deal with a sample, one of the most critical issues is to understand the sample. Many people mistakenly believe the larger the sample the more likely it is to be random, but the Literary Digest Presidential Poll of 1936 is a good demonstration that that is not true. In that election George Gallup stunned everyone when he correctly predicted the outcome of the election with a much smaller, but well constructed and understood, sample. Since the 1960's national elections have been successfully predicted with samples of ~1200. In addition, other national surveys of opinion or self-reported behavior have been successfully studied with similar sample sizes. Ironically, samples of local and state-level opinions or behavior usually need samples nearly as large, 500-1000.

Small samples frequently are stratified, which means that the less numerous categories of people in the population are oversampled to insure they are included, while larger categories such as white men are undersampled because they are much more likely to be included even if undersampled. To compute numerical values for the different answers the answers of the overcounted categories and undercounted categories are multiplied by a weighting factor to come closer to the population projection.

In addition, small samples are usually done on a forced basis, which means the subjects have been selected to represent certain characteristics and if the first person meeting those characteristics declines to participate, s/he will be replaced by someone else with the same characteristics. For example, since place of residence is frequently strongly correlated with other social factors such as ethnicity, income, education, age, etc., addresses are sometimes chosen as means of getting respondents fitting a certain set of characteristics. The instructions to the door-to-door (d2d) interviewer might be to interview the residents of the house on the northwest corner of each block in a particular neighborhood. If no one is home or they refuse to participate, go 4 doors to the left until success is achieved. For computer or telephone interviews, phone numbers are often chosen because they are in particular neighborhoods, so the same algorithm can be used.

Strengths:

  1. Anonymity can encourage people to be more forthcoming than in an experiment or other kind of direct observation
  2. Easy to analyze with modern data management and analysis software. Some analysis tools are usually installed on most PCs.
  3. Relatively cheap to mail out questionnaires or one-time keys to take the survey online.
  4. Even forced sampling is more easily accomplished with computers using auto dialing routines.
  5. Interviews by humans or computers can be better than questionnaires in capturing information about how the respondents reacted to the individual questions.
  6. As we lose privacy with the increase in commercial and governmental data mining it will be increasingly easy to collect behavior without the knowledge or consent of the people being studied.
  7. If anonymity is assured, there are relatively few ethical issues in scientific polling. Political push polling is another matter entirely.

Weaknesses:

  1. Regardless of perceived anonymity people do, both willingly and unwillingly, misreport their characteristics or behavior to be more acceptable than they actually are.
  2. People do try to figure out what you are trying to get at and may be influenced by that in their answers.
  3. Particularly with questionnaires that are distributed through the mail, even though the initial sample to whom things were mailed may be either properly random or properly stratified, the sampling may be disrupted by differential response rates by different sorts of people. For example, on politically charged topics people who feel strongly one way or the other participate more heavily than do the people in the middle who are usually more numerous. On lifestyle issues people whose behavior may seem extreme or radical compared to the norm are sometimes more likely to fill out the questionnaire.
  4. Interviews, which are more likely to used forced sampling to maintain randomness or structure of the sample even in the face of people who decline to participate, are more expensive since they involve more members of the team who have to trained and frequently paid to conduct the interviews. D2d interviews are even more time consuming and thus more expensive.
  5. Ultimately, even when reporting behavior, surveys are more about thoughts and ideas than the actual behavior.
  6. As we lose privacy with the increase in commercial data mining our behavior will increasingly be reported without our knowledge or consent.

ARCHIVAL RESEARCH

In the past this was thought to be primarily the domain of historians, but increasingly after the mid-1900's, some historians themselves began to think of their discipline as a kind of social science, and historical data began to be used by sociologists, anthropologists, economists, political scientists and cultural geographers. People often think of the past as something that is easily reconstructed from written records, but the cross fertilization of all these fields raised such basic questions as "What is a document of the past?" Folktales? Myths? Legends? Oral histories? Architecture? Art? "Why are some things from the past preserved and others are not?" Might behavior in the past be revealed by looking in new ways for remains of that past behavior?

As a basic example, historians and social scientists from sociology and anthropology had for a long time assumed that the modern nuclear family, increasingly today becoming a broken nuclear family, was a result of the breakdown of traditional society caused by the industrial revolution.

Historians' views of families of the past were distorted by their past concentration on the history of the elites, the ones who wrote letters, kept diaries and were frequently documented by others. Even the artifacts produced for and used by the elites are more likely to survive over the years because of their quality, value and historical significance. Such families tended to be multigenerational and even to be extended with several married children sharing country estates and elaborate urban residences, depending on the season.

Sociologists' views were distorted by their tendency to focus on contemporary social problems of whatever period the individual sociologist was living in. Comparing the sometimes broken families of the mid to late 1800s which had been studied at that time, to the increasingly broken families of the 20th century without adequate detailed attention to the histories of those poorer people before industrialization led sociologists to infer a regression line back to an imagined state of the peasants living in the past in families like those of the wealthy.

Anthropologists' views were distorted by their tendency to treat the aboriginal people in all of the conquered parts of the world that the Europeans had colonized in the Age of Exploration and the Industrial Revolution as though those natives had been living from time immemorial in the manner in which they appeared to the Europeans in the 18th and 19th centuries.

All three of these disciplines began to wake up at about the same time that the history of these "people without history" was perhaps very different from what had been assumed based on the history of "people with history."

The problem was how to recover the history where there seemed to be none? Social scientists began to look for different documents and for previously overlooked information in documents that had previously been used to study the behavior of the elites. For example, travel passes issued by slave owners to slaves to visit neighboring plantations, followed by transfers of slaves between plantation owners which had not been looked at seriously before, revealed a pattern of courting and family formation among some African American slaves that hitherto had not been noticed by historians.

Content analysis of paintings could be used to reveal past attitudes and behaviors, such as the changing depictions of children in paintings from medieval to modern times, suggesting that in the past children went straight from being babies to being workers on farms and in mills and factories.

Another of the tools is archeology (I prefer the simpler spelling) which treats physical remnants of past behavior as documents. With the advent of radiological and chemical dating techniques following World War II it was possible to dig up an ancient Native American, African or English peasant house from several hundred years ago and have some confidence in how old it is. Increasing sophistication in recording the materials coming out of archeological sites allowed fuller reconstruction of the activities at the time the sites were "alive", including how many people lived there.

By the 1950's archeologists were using punch card sorters to analyze the distributions of artifacts dug out of the ground, and by the 1960's some were fortunate enough to have access to computers to catalog, store and analyze their data. Sites that were being lived in were excavated using archeological techniques to check what can be learned using just archeology versus archeology and other data-gathering techniques together. These kinds of comparisons helped archeologists make better sense of the artifacts they were able to recover and the way in which the artifacts had originally been distributed in the sites. As computers become more powerful reconstructions and simulations become easier to do and more complete.

DNA mapping is being correlated with other evidence of past behavior giving better insights into migrations and other demographic dynamics of the past.

Satellite mapping is revealing past civilizations in such unlikely places as the depths of the Amazon jungle and the Western Sahara.

Ground penetrating radar can be used to locate and map buildings buried under the surface of the earth without having to dig and disrupt the present-day use of the land.

Better understanding of accomplishments of non-Western peoples before contact with Europeans and the frequent serious disruptions of native societies even before they came into direct contact with Europeans is increasing our awareness of unintened consequences of our actions and the fragility of even the most-developed civilizations in the face of environmental and economic change.

Strengths:

  1. The people being studied cannot alter their behavior because of the current study or the presence of the current researcher. However the researcher needs to be aware of the context in which the "document" of the past was created, but that has always been the essence of the historical method, historical criticism. What are the likely sources of bias, sampling error or other possible sources of distortion?
  2. Some sources of data are readily available either in microfilm or microfiche or in electronically readable format, some even readily available online for free. Examples: Mormon genealogical database, GSS, various government surveys. The entire body of ancient Greek documents has been available in electronically readable form since at least the 1970's. Even for sources that are not online or in portable format, indexes for more and more sources of data are going online making it possible for more researchers to locate archival materials all around the world.
  3. With imagination it is possible to study more things through archives than once thought possible.
  4. It is frequently one of the least expensive ways for the lone scholar with limited expenses to study significant problems.

Weaknesses:

  1. Even though research subjects are not intruded on in any way there are still ethical issues inherent in possibly making people or authorities aware of information that has been considered private. The U.S. census enforces a strict 72-year-confidentiality rule on its raw data, but the Mormon Church exposes data about families and even living individuals in its database that are still considered private or confidential by the people in question. This has led some organizations to issue takedown requests with the LDS Church, for example Jewish groups requesting the takedown of Holocaust data and other Jewish material.
  2. As an extension of the previous point, some material may be so closely guarded that only people who agree to follow guidelines set by those who control the material are given access. While on the surface this seems fair and reasonable, it has interfered with some important potential studies. For example, Sigmund Freud's descendents still guard his papers so jealously that scholars who raise legitimate questions about Freud's methods and interpretations cannot gain access to his papers where those things might be made more clear. Those who do have access sometimes are not as critical as they should be of Freud's work.
  3. You can only study what has left traces, although imagination can sometimes find traces where none were thought to exist.
  4. For archives that are not portable, travel and living expenses can be considerable. If the archives have not been catalogued the researcher will have to spend a lot of time cataloguing and calendaring the materials. At least the archeologist frequently gets to take his/her materials back to the home lab and spend years cataloguing and analyzing them.
  5. Archeology as a means of accessing "archives" of remnants of past behavior can be horribly expensive. Add to the travel and living expenses for the lone researcher listed above those of all the members of the team who are working away from home. Then add the costs of local labor, consumable tools such as shovels, picks and trowels and equipment rental for earth-moving equipment. Also add the costs of satellite imaging (going down in recent years), GP radar, radiological and chemical tests. I am sure I am missing some major categories.
  6. Archeological sites and non-portable archives frequently become unavailable for long periods of time because of military and political conflicts, or because of budget cutbacks for the institutions housing the documents or artifacts.

PARTICIPANT OBSERVATION

The basic premise of participant observation (P.O.) is that it is possible for the researcher to observe behavior in real time in its full context without having to depend on self-reporting, the accidents of behavior or attitudes being documented or the accidents of artifacts of past behavior being preserved to be studied in the future.

Strengths:

  1. It is possible to gain an understanding of the complete context of the behavior that is being studied.
  2. Every aspect of the society/culture of the small community can be studied if the period of study is long enough. This is called the holistic approach.
  3. Can be carried out by a lone researcher.
  4. Lengthy presence in the community makes it less likely that the research subjects will continue to be able to try to mislead the researcher about their real behavior. What is recognized as unacceptable behavior by the outside world or by members of the local community is more likely to be exposed. On the flipside, posing behavior designed to impress outsiders in a negative or positive fashion is frequently discontinued by the research subjects as the researcher becomes part of the daily situation.
  5. Lots of studies of similar communities in lots of different parts of the world to compare findings with.

Weaknesses:

  1. The ethics of P.O. deal mostly with questions of disrupting the lives of the research subjects during the research or as a consequence of the publication of the research. Additional ethical dilemnas have to do with how much to reveal. Lengthy observation with no consent by the subjects is rarely attempted, but would be considered unethical today by most social scientists. Over involvement in the lives of the subjects raise questions of both ethics and bias.
  2. Gaining rapport may be difficult, taking as long as several months to several years. Some studies have been abandoned because of this difficulty.
  3. Time consuming. The best studies last a year or longer, during which the researcher frequently has reduced or no income. This requires researchers to compete for grants to support the research.
  4. Can be dangerous. Exotic diseases, poisonous animals, ferocious beasts, violent people, food that is quite at odds with personal preferences or past experience all can be problems.
  5. Small sample size, particularly if only one researcher is carrying out the study. Is my neighborhood or village comparable to other similar ones elsewhere?
  6. Researcher bias can influence findings. One researcher of a village in Mexico saw a cooperative community in which people worked for the common good, sometimes sacrificing their own opportunities. A later researcher of the same village found people constantly engaging in gossip and criticism of others, particularly those who seemed to be doing better. Which one was right? Both. Subsequent studies of peasant villages in similar circumstances even in other societies show that members of the peasant community frequently coerce cooperation from their neighbors through criticism and other negative informal social sanctions. We are fortunate to have the two studies, but if either study had to be evaluated on its own, we would be suspicious of it and probably not be misled despite the fame and reputation of both scholars.

© 2010, R. Stewart Ellis
Edit History:  Mon Feb 22 20:19:57 EST 2010 - add ethics bullet for P.O.
               Mon Mar 8 12:24:44 EST 2010 - 2 minor editing errors per a student
               Tue Apr 27 11:59:06 EDT 2010 - added missing prepositions
               Thu Feb  3 14:34:44 EST 2011 - split infinitive, sent. frag.
	       Mon Mar 10 17:06:12 EDT 2014 - spelling of members.