THREE STAGES OF DATA ANALYSIS
The three major stages of data analysis can be described as follows
STAGES OF DATA ANALYSIS
The three major stages of data analysis can be described as follows.
Getting to know the Data
In the first stage we want to become familiar with the data. This is an exploratory or investigative stage (Tukey, 1977). We inspect the data carefully, get a feel for it, and even, as some experts have said, “make friends” with it (Hoaglin, Mosteller, & Tukey, 1991, p. 42). Questions we ask include, What is going on in this number set? Are there errors in the data? Do the data make sense or are there reasons for “sus- pecting fishiness” (Abelson, 1995, p. 78)? Visual displays of distributions of numbers are important at this stage. What do the data look like? Only when we have become familiar with the general features of the data, have checked for errors, and have as- sured ourselves that the data make sense, should we proceed to the second stage.
•We begin data analysis by examining the general features of the data and
edit or “clean” the data as necessary.
• It is important to check carefully for errors such as missing or impossible
values (e.g., numbers outside the range of a given scale), as well as outliers.
• A stem-and-leaf display is particularly useful for visualizing the general
features of a data set and for detecting outliers.
• Data can be effectively summarized numerically, pictorially, or verbally;
good descriptions of data frequently use all three modes.
Summarizing the Data
In the second stage we seek to summarize the data in a meaningful way. The use of descriptive statistics and creation of graphical displays are important at this stage. How should the data be organized? Which ways of describing and summarizing the data are most informative? What happened in this study as a function of the factors of interest? What trends and patterns do we see? Which graphical display best reveals these trends and patterns? When the data are appropriately summarized, we are ready to move to the confirmation stage.
• Measures of central tendency include the mean, median, and mode.
• Important measures of dispersion or variability are the range and standard
deviation.
• The standard error of the mean is the standard deviation of the theoretical
sampling distribution of means and is a measure of how well we have esti- mated the population mean.
• Effect size measures are important because they provide information about
the strength of the relationship between the independent variable and the dependent variable that is independent of sample size.
• An important effect size measure when comparing two means is Cohen’s d.
Confirming What the Data Reveal
In the third stage we decide what the data tell us about behavior. Do the data confirm our tentative claim (research hypothesis) made at the beginning of the study? What can we claim based on the evidence? Some- times we look for a categorical, yes-no judgment, and act as judge and jury to render a verdict. Do we have evidence to convict? Yes or no: Is the effect real? At this stage we may use various statistical techniques to counter arguments that our results are simply “due to chance.” Null hypothesis test- ing, when appropriate, is performed at this stage of analysis. Our evaluation of the data, however, need not always lead us to a categorical judgment about the data (e.g., Schmidt, 1996). We don’t, in other words, have to attempt a definitive statement about the “truth” of the results. Our claim about behavior may be based on an evaluation of the probable range of effect sizes for the variable of interest. What, in other words, is likely to happen when this variable is present? Confidence intervals are particularly recom- mended for this kind of evaluation (e.g., Cohen, 1995; Hunter, 1997; Loftus, 1996).
The confirmation process actually begins at the first or exploratory stage of data analysis, when we first get a feel for what our data are like. As we examine the general features of the data, we start to appreciate what we found. In the summary stage we learn more about trends and patterns among the observations. This provides feedback that helps to confirm our hypotheses. The final step in data analysis is called the confirmation stage to emphasize that it is typically at this point when we come to a decision about what the data mean. Information obtained at each stage of data analysis, however, contributes to this confirmatory process (e.g., Tukey, 1977).
An important approach to confirming what the data are telling us is to con- struct confidence intervals for the population parameter, such as a mean or difference between two means.
THE ANALYSIS STORY
hypothesis) made at the beginning of the study? What can we claim based on the evidence? Some- times we look for a categorical, yes-no judgment, and act as judge and jury to render a verdict. Do we have evidence to convict? Yes or no: Is the effect real? At this stage we may use various statistical techniques to counter arguments that our results are simply “due to chance.” Null hypothesis test- ing, when appropriate, is performed at this stage of analysis. Our evaluation of the data, however, need not always lead us to a categorical judgment about the data (e.g., Schmidt, 1996). We don’t, in other words, have to attempt a definitive statement about the “truth” of the results. Our claim about behavior may be based on an evaluation of the probable range of effect sizes for the variable of interest. What, in other words, is likely to happen when this variable is present? Confidence intervals are particularly recom- mended for this kind of evaluation (e.g., Cohen, 1995; Hunter, 1997; Loftus, 1996).
The confirmation process actually begins at the first or exploratory stage of data analysis, when we first get a feel for what our data are like. As we examine the general features of the data, we start to appreciate what we found. In the summary stage we learn more about trends and patterns among the observations. This provides feedback that helps to confirm our hypotheses. The final step in data analysis is called the confirmation stage to emphasize that it is typically at this point when we come to a decision about what the data mean. Information obtained at each stage of data analysis, however, contributes to this confirmatory process (e.g., Tukey, 1977).
butler (not the cook) might have done it. Abelson (1995) makes a similar point regarding a research argument:
High-quality evidence, embodying sizeable, well-articulated and general effects, is necessary for a statistical argument to have maximal persuasive impact, but it is not sufficient. Also vital are the attributes of the research story embodying the argument. (p. 13)
Consequently, when data analysis is completed, we must construct a coher- ent narrative that explains our findings, counters opposing interpretations, and justifies our conclusions. In Chapters 12 and 13 we’ll return to the analysis story when we introduce guidelines to help you develop an appropriate narrative for your research study.
COMPUTER-ASSISTED DATA ANALYSIS
• Researchers typically use computers to carry out the statistical analysis
of data.
• Carrying out statistical analyses using computer software requires that the
researcher must have a good knowledge of research design and statistics.
Most researchers have ready access to computers that include appropriate software to carry out the statistical analysis of data sets. The ability to set up and carry out an analysis using a statistical software package and the ability to interpret the output are essential skills that must be learned by researchers. Some of the more popular software packages are known by abbreviations like BMDP, SAS, SPSS, and STATA. You likely have access to one or more of these programs on the computers in your psychology department or at your campus computer center, or perhaps even on your laptop.
ILLUSTRATION: DATA ANALYSIS FOR AN EXPERIMENT COMPARING MEANS
How many words do you know? That is, what is the size of your vocabu- lary? You may have asked yourself this question as you prepared for college entrance exams such as the SAT or ACT, or perhaps it crossed your mind as you thought about preparing for professional school exams such as the LSAT or GRE, as all of these exams emphasize vocabulary knowledge. Surprisingly, estimating a person’s vocabulary size is a complex task (e.g., Anglin, 1993; Miller & Wakefield, 1993). Problems immediately arise, for instance, when we begin to think about what we mean by a “word.” Is “play, played, playing” one word or three? Are we interested in highly technical or scientific words, including six-syllable names of chemical compounds? What about made-up words, or the name of your dog, or the word you use to call your significant other? One rather straightforward approach is to ask how many words a person knows in a dictionary of the English language. But even here we run into difficulties because dictionaries vary in size and scope, and thus results will vary depend- ing on the specific dictionary that was used to select a word sample. And, of course, estimates of vocabulary knowledge will vary depending on how knowledge is tested. Multiple-choice tests will reveal more knowledge than will tests requiring written definitions of words.
The three major stages of data analysis can be described as follows
STAGES OF DATA ANALYSIS
The three major stages of data analysis can be described as follows.
Getting to know the Data
In the first stage we want to become familiar with the data. This is an exploratory or investigative stage (Tukey, 1977). We inspect the data carefully, get a feel for it, and even, as some experts have said, “make friends” with it (Hoaglin, Mosteller, & Tukey, 1991, p. 42). Questions we ask include, What is going on in this number set? Are there errors in the data? Do the data make sense or are there reasons for “sus- pecting fishiness” (Abelson, 1995, p. 78)? Visual displays of distributions of numbers are important at this stage. What do the data look like? Only when we have become familiar with the general features of the data, have checked for errors, and have as- sured ourselves that the data make sense, should we proceed to the second stage.
•We begin data analysis by examining the general features of the data and
edit or “clean” the data as necessary.
• It is important to check carefully for errors such as missing or impossible
values (e.g., numbers outside the range of a given scale), as well as outliers.
• A stem-and-leaf display is particularly useful for visualizing the general
features of a data set and for detecting outliers.
• Data can be effectively summarized numerically, pictorially, or verbally;
good descriptions of data frequently use all three modes.
Summarizing the Data
In the second stage we seek to summarize the data in a meaningful way. The use of descriptive statistics and creation of graphical displays are important at this stage. How should the data be organized? Which ways of describing and summarizing the data are most informative? What happened in this study as a function of the factors of interest? What trends and patterns do we see? Which graphical display best reveals these trends and patterns? When the data are appropriately summarized, we are ready to move to the confirmation stage.
• Measures of central tendency include the mean, median, and mode.
• Important measures of dispersion or variability are the range and standard
deviation.
• The standard error of the mean is the standard deviation of the theoretical
sampling distribution of means and is a measure of how well we have esti- mated the population mean.
• Effect size measures are important because they provide information about
the strength of the relationship between the independent variable and the dependent variable that is independent of sample size.
• An important effect size measure when comparing two means is Cohen’s d.
Confirming What the Data Reveal
In the third stage we decide what the data tell us about behavior. Do the data confirm our tentative claim (research hypothesis) made at the beginning of the study? What can we claim based on the evidence? Some- times we look for a categorical, yes-no judgment, and act as judge and jury to render a verdict. Do we have evidence to convict? Yes or no: Is the effect real? At this stage we may use various statistical techniques to counter arguments that our results are simply “due to chance.” Null hypothesis test- ing, when appropriate, is performed at this stage of analysis. Our evaluation of the data, however, need not always lead us to a categorical judgment about the data (e.g., Schmidt, 1996). We don’t, in other words, have to attempt a definitive statement about the “truth” of the results. Our claim about behavior may be based on an evaluation of the probable range of effect sizes for the variable of interest. What, in other words, is likely to happen when this variable is present? Confidence intervals are particularly recom- mended for this kind of evaluation (e.g., Cohen, 1995; Hunter, 1997; Loftus, 1996).
The confirmation process actually begins at the first or exploratory stage of data analysis, when we first get a feel for what our data are like. As we examine the general features of the data, we start to appreciate what we found. In the summary stage we learn more about trends and patterns among the observations. This provides feedback that helps to confirm our hypotheses. The final step in data analysis is called the confirmation stage to emphasize that it is typically at this point when we come to a decision about what the data mean. Information obtained at each stage of data analysis, however, contributes to this confirmatory process (e.g., Tukey, 1977).
An important approach to confirming what the data are telling us is to con- struct confidence intervals for the population parameter, such as a mean or difference between two means.
THE ANALYSIS STORY
hypothesis) made at the beginning of the study? What can we claim based on the evidence? Some- times we look for a categorical, yes-no judgment, and act as judge and jury to render a verdict. Do we have evidence to convict? Yes or no: Is the effect real? At this stage we may use various statistical techniques to counter arguments that our results are simply “due to chance.” Null hypothesis test- ing, when appropriate, is performed at this stage of analysis. Our evaluation of the data, however, need not always lead us to a categorical judgment about the data (e.g., Schmidt, 1996). We don’t, in other words, have to attempt a definitive statement about the “truth” of the results. Our claim about behavior may be based on an evaluation of the probable range of effect sizes for the variable of interest. What, in other words, is likely to happen when this variable is present? Confidence intervals are particularly recom- mended for this kind of evaluation (e.g., Cohen, 1995; Hunter, 1997; Loftus, 1996).
The confirmation process actually begins at the first or exploratory stage of data analysis, when we first get a feel for what our data are like. As we examine the general features of the data, we start to appreciate what we found. In the summary stage we learn more about trends and patterns among the observations. This provides feedback that helps to confirm our hypotheses. The final step in data analysis is called the confirmation stage to emphasize that it is typically at this point when we come to a decision about what the data mean. Information obtained at each stage of data analysis, however, contributes to this confirmatory process (e.g., Tukey, 1977).
butler (not the cook) might have done it. Abelson (1995) makes a similar point regarding a research argument:
High-quality evidence, embodying sizeable, well-articulated and general effects, is necessary for a statistical argument to have maximal persuasive impact, but it is not sufficient. Also vital are the attributes of the research story embodying the argument. (p. 13)
Consequently, when data analysis is completed, we must construct a coher- ent narrative that explains our findings, counters opposing interpretations, and justifies our conclusions. In Chapters 12 and 13 we’ll return to the analysis story when we introduce guidelines to help you develop an appropriate narrative for your research study.
COMPUTER-ASSISTED DATA ANALYSIS
• Researchers typically use computers to carry out the statistical analysis
of data.
• Carrying out statistical analyses using computer software requires that the
researcher must have a good knowledge of research design and statistics.
Most researchers have ready access to computers that include appropriate software to carry out the statistical analysis of data sets. The ability to set up and carry out an analysis using a statistical software package and the ability to interpret the output are essential skills that must be learned by researchers. Some of the more popular software packages are known by abbreviations like BMDP, SAS, SPSS, and STATA. You likely have access to one or more of these programs on the computers in your psychology department or at your campus computer center, or perhaps even on your laptop.
ILLUSTRATION: DATA ANALYSIS FOR AN EXPERIMENT COMPARING MEANS
How many words do you know? That is, what is the size of your vocabu- lary? You may have asked yourself this question as you prepared for college entrance exams such as the SAT or ACT, or perhaps it crossed your mind as you thought about preparing for professional school exams such as the LSAT or GRE, as all of these exams emphasize vocabulary knowledge. Surprisingly, estimating a person’s vocabulary size is a complex task (e.g., Anglin, 1993; Miller & Wakefield, 1993). Problems immediately arise, for instance, when we begin to think about what we mean by a “word.” Is “play, played, playing” one word or three? Are we interested in highly technical or scientific words, including six-syllable names of chemical compounds? What about made-up words, or the name of your dog, or the word you use to call your significant other? One rather straightforward approach is to ask how many words a person knows in a dictionary of the English language. But even here we run into difficulties because dictionaries vary in size and scope, and thus results will vary depend- ing on the specific dictionary that was used to select a word sample. And, of course, estimates of vocabulary knowledge will vary depending on how knowledge is tested. Multiple-choice tests will reveal more knowledge than will tests requiring written definitions of words.
I am stunned by the information that you have on this blog. It shows how well you fathom this subject.
ReplyDeletehttps://360digitmg.com/course/certification-program-in-data-science
Very useful info. Hope to see more posts soon!. Ephedrine
ReplyDeleteI wrote about a similar issue, I give you the link to my site. Amphetamine
ReplyDeleteAcknowledges for penmanship such a worthy column, I stumbled beside your blog besides predict a handful advise. I want your tone of manuscript... mental health
ReplyDeleteetizolam for sale High Purity Research Chemicals For Any Industry shop now! We provide high quality Bath Salt And Herbal Incense with fast delivery. We sell in bulks too.
ReplyDelete