Logo for BCcampus Open Publishing

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Chapter 5: Psychological Measurement

Reliability and Validity of Measurement

Learning Objectives

  • Define reliability, including the different types and how they are assessed.
  • Define validity, including the different types and how they are assessed.
  • Describe the kinds of evidence that would be relevant to assessing the reliability and validity of a particular measure.

Again, measurement involves assigning scores to individuals so that they represent some characteristic of the individuals. But how do researchers know that the scores actually represent the characteristic, especially when it is a construct like intelligence, self-esteem, depression, or working memory capacity? The answer is that they conduct research using the measure to confirm that the scores make sense based on their understanding of the construct being measured. This is an extremely important point. Psychologists do not simply  assume  that their measures work. Instead, they collect data to demonstrate  that they work. If their research does not demonstrate that a measure works, they stop using it.

As an informal example, imagine that you have been dieting for a month. Your clothes seem to be fitting more loosely, and several friends have asked if you have lost weight. If at this point your bathroom scale indicated that you had lost 10 pounds, this would make sense and you would continue to use the scale. But if it indicated that you had gained 10 pounds, you would rightly conclude that it was broken and either fix it or get rid of it. In evaluating a measurement method, psychologists consider two general dimensions: reliability and validity.

Reliability

Reliability  refers to the consistency of a measure. Psychologists consider three types of consistency: over time (test-retest reliability), across items (internal consistency), and across different researchers (inter-rater reliability).

Test-Retest Reliability

When researchers measure a construct that they assume to be consistent across time, then the scores they obtain should also be consistent across time.  Test-retest reliability  is the extent to which this is actually the case. For example, intelligence is generally thought to be consistent across time. A person who is highly intelligent today will be highly intelligent next week. This means that any good measure of intelligence should produce roughly the same scores for this individual next week as it does today. Clearly, a measure that produces highly inconsistent scores over time cannot be a very good measure of a construct that is supposed to be consistent.

Assessing test-retest reliability requires using the measure on a group of people at one time, using it again on the  same  group of people at a later time, and then looking at  test-retest correlation  between the two sets of scores. This is typically done by graphing the data in a scatterplot and computing Pearson’s  r . Figure 5.2 shows the correlation between two sets of scores of several university students on the Rosenberg Self-Esteem Scale, administered two times, a week apart. Pearson’s r for these data is +.95. In general, a test-retest correlation of +.80 or greater is considered to indicate good reliability.

Score at time 1 is on the x-axis and score at time 2 is on the y-axis, showing fairly consistent scores

Again, high test-retest correlations make sense when the construct being measured is assumed to be consistent over time, which is the case for intelligence, self-esteem, and the Big Five personality dimensions. But other constructs are not assumed to be stable over time. The very nature of mood, for example, is that it changes. So a measure of mood that produced a low test-retest correlation over a period of a month would not be a cause for concern.

Internal Consistency

A second kind of reliability is  internal consistency , which is the consistency of people’s responses across the items on a multiple-item measure. In general, all the items on such measures are supposed to reflect the same underlying construct, so people’s scores on those items should be correlated with each other. On the Rosenberg Self-Esteem Scale, people who agree that they are a person of worth should tend to agree that that they have a number of good qualities. If people’s responses to the different items are not correlated with each other, then it would no longer make sense to claim that they are all measuring the same underlying construct. This is as true for behavioural and physiological measures as for self-report measures. For example, people might make a series of bets in a simulated game of roulette as a measure of their level of risk seeking. This measure would be internally consistent to the extent that individual participants’ bets were consistently high or low across trials.

Like test-retest reliability, internal consistency can only be assessed by collecting and analyzing data. One approach is to look at a  split-half correlation . This involves splitting the items into two sets, such as the first and second halves of the items or the even- and odd-numbered items. Then a score is computed for each set of items, and the relationship between the two sets of scores is examined. For example, Figure 5.3 shows the split-half correlation between several university students’ scores on the even-numbered items and their scores on the odd-numbered items of the Rosenberg Self-Esteem Scale. Pearson’s  r  for these data is +.88. A split-half correlation of +.80 or greater is generally considered good internal consistency.

Score on even-numbered items is on the x-axis and score on odd-numbered items is on the y-axis, showing fairly consistent scores

Perhaps the most common measure of internal consistency used by researchers in psychology is a statistic called  Cronbach’s α  (the Greek letter alpha). Conceptually, α is the mean of all possible split-half correlations for a set of items. For example, there are 252 ways to split a set of 10 items into two sets of five. Cronbach’s α would be the mean of the 252 split-half correlations. Note that this is not how α is actually computed, but it is a correct way of interpreting the meaning of this statistic. Again, a value of +.80 or greater is generally taken to indicate good internal consistency.

Interrater Reliability

Many behavioural measures involve significant judgment on the part of an observer or a rater.  Inter-rater reliability  is the extent to which different observers are consistent in their judgments. For example, if you were interested in measuring university students’ social skills, you could make video recordings of them as they interacted with another student whom they are meeting for the first time. Then you could have two or more observers watch the videos and rate each student’s level of social skills. To the extent that each participant does in fact have some level of social skills that can be detected by an attentive observer, different observers’ ratings should be highly correlated with each other. Inter-rater reliability would also have been measured in Bandura’s Bobo doll study. In this case, the observers’ ratings of how many acts of aggression a particular child committed while playing with the Bobo doll should have been highly positively correlated. Interrater reliability is often assessed using Cronbach’s α when the judgments are quantitative or an analogous statistic called Cohen’s κ (the Greek letter kappa) when they are categorical.

Validity  is the extent to which the scores from a measure represent the variable they are intended to. But how do researchers make this judgment? We have already considered one factor that they take into account—reliability. When a measure has good test-retest reliability and internal consistency, researchers should be more confident that the scores represent what they are supposed to. There has to be more to it, however, because a measure can be extremely reliable but have no validity whatsoever. As an absurd example, imagine someone who believes that people’s index finger length reflects their self-esteem and therefore tries to measure self-esteem by holding a ruler up to people’s index fingers. Although this measure would have extremely good test-retest reliability, it would have absolutely no validity. The fact that one person’s index finger is a centimetre longer than another’s would indicate nothing about which one had higher self-esteem.

Discussions of validity usually divide it into several distinct “types.” But a good way to interpret these types is that they are other kinds of evidence—in addition to reliability—that should be taken into account when judging the validity of a measure. Here we consider three basic kinds: face validity, content validity, and criterion validity.

Face Validity

Face validity  is the extent to which a measurement method appears “on its face” to measure the construct of interest. Most people would expect a self-esteem questionnaire to include items about whether they see themselves as a person of worth and whether they think they have good qualities. So a questionnaire that included these kinds of items would have good face validity. The finger-length method of measuring self-esteem, on the other hand, seems to have nothing to do with self-esteem and therefore has poor face validity. Although face validity can be assessed quantitatively—for example, by having a large sample of people rate a measure in terms of whether it appears to measure what it is intended to—it is usually assessed informally.

Face validity is at best a very weak kind of evidence that a measurement method is measuring what it is supposed to. One reason is that it is based on people’s intuitions about human behaviour, which are frequently wrong. It is also the case that many established measures in psychology work quite well despite lacking face validity. The Minnesota Multiphasic Personality Inventory-2 (MMPI-2) measures many personality characteristics and disorders by having people decide whether each of over 567 different statements applies to them—where many of the statements do not have any obvious relationship to the construct that they measure. For example, the items “I enjoy detective or mystery stories” and “The sight of blood doesn’t frighten me or make me sick” both measure the suppression of aggression. In this case, it is not the participants’ literal answers to these questions that are of interest, but rather whether the pattern of the participants’ responses to a series of questions matches those of individuals who tend to suppress their aggression.

Content Validity

Content validity  is the extent to which a measure “covers” the construct of interest. For example, if a researcher conceptually defines test anxiety as involving both sympathetic nervous system activation (leading to nervous feelings) and negative thoughts, then his measure of test anxiety should include items about both nervous feelings and negative thoughts. Or consider that attitudes are usually defined as involving thoughts, feelings, and actions toward something. By this conceptual definition, a person has a positive attitude toward exercise to the extent that he or she thinks positive thoughts about exercising, feels good about exercising, and actually exercises. So to have good content validity, a measure of people’s attitudes toward exercise would have to reflect all three of these aspects. Like face validity, content validity is not usually assessed quantitatively. Instead, it is assessed by carefully checking the measurement method against the conceptual definition of the construct.

Criterion Validity

Criterion validity  is the extent to which people’s scores on a measure are correlated with other variables (known as  criteria ) that one would expect them to be correlated with. For example, people’s scores on a new measure of test anxiety should be negatively correlated with their performance on an important school exam. If it were found that people’s scores were in fact negatively correlated with their exam performance, then this would be a piece of evidence that these scores really represent people’s test anxiety. But if it were found that people scored equally well on the exam regardless of their test anxiety scores, then this would cast doubt on the validity of the measure.

A criterion can be any variable that one has reason to think should be correlated with the construct being measured, and there will usually be many of them. For example, one would expect test anxiety scores to be negatively correlated with exam performance and course grades and positively correlated with general anxiety and with blood pressure during an exam. Or imagine that a researcher develops a new measure of physical risk taking. People’s scores on this measure should be correlated with their participation in “extreme” activities such as snowboarding and rock climbing, the number of speeding tickets they have received, and even the number of broken bones they have had over the years. When the criterion is measured at the same time as the construct, criterion validity is referred to as concurrent validity ; however, when the criterion is measured at some point in the future (after the construct has been measured), it is referred to as predictive validity (because scores on the measure have “predicted” a future outcome).

Criteria can also include other measures of the same construct. For example, one would expect new measures of test anxiety or physical risk taking to be positively correlated with existing measures of the same constructs. This is known as convergent validity .

Assessing convergent validity requires collecting data using the measure. Researchers John Cacioppo and Richard Petty did this when they created their self-report Need for Cognition Scale to measure how much people value and engage in thinking (Cacioppo & Petty, 1982) [1] . In a series of studies, they showed that people’s scores were positively correlated with their scores on a standardized academic achievement test, and that their scores were negatively correlated with their scores on a measure of dogmatism (which represents a tendency toward obedience). In the years since it was created, the Need for Cognition Scale has been used in literally hundreds of studies and has been shown to be correlated with a wide variety of other variables, including the effectiveness of an advertisement, interest in politics, and juror decisions (Petty, Briñol, Loersch, & McCaslin, 2009) [2] .

Discriminant Validity

Discriminant validity , on the other hand, is the extent to which scores on a measure are not correlated with measures of variables that are conceptually distinct. For example, self-esteem is a general attitude toward the self that is fairly stable over time. It is not the same as mood, which is how good or bad one happens to be feeling right now. So people’s scores on a new measure of self-esteem should not be very highly correlated with their moods. If the new measure of self-esteem were highly correlated with a measure of mood, it could be argued that the new measure is not really measuring self-esteem; it is measuring mood instead.

When they created the Need for Cognition Scale, Cacioppo and Petty also provided evidence of discriminant validity by showing that people’s scores were not correlated with certain other variables. For example, they found only a weak correlation between people’s need for cognition and a measure of their cognitive style—the extent to which they tend to think analytically by breaking ideas into smaller parts or holistically in terms of “the big picture.” They also found no correlation between people’s need for cognition and measures of their test anxiety and their tendency to respond in socially desirable ways. All these low correlations provide evidence that the measure is reflecting a conceptually distinct construct.

Key Takeaways

  • Psychological researchers do not simply assume that their measures work. Instead, they conduct research to show that they work. If they cannot show that they work, they stop using them.
  • There are two distinct criteria by which researchers evaluate their measures: reliability and validity. Reliability is consistency across time (test-retest reliability), across items (internal consistency), and across researchers (interrater reliability). Validity is the extent to which the scores actually represent the variable they are intended to.
  • Validity is a judgment based on various types of evidence. The relevant evidence includes the measure’s reliability, whether it covers the construct of interest, and whether the scores it produces are correlated with other variables they are expected to be correlated with and not correlated with variables that are conceptually distinct.
  • The reliability and validity of a measure is not established by any single study but by the pattern of results across multiple studies. The assessment of reliability and validity is an ongoing process.
  • Practice: Ask several friends to complete the Rosenberg Self-Esteem Scale. Then assess its internal consistency by making a scatterplot to show the split-half correlation (even- vs. odd-numbered items). Compute Pearson’s  r too if you know how.
  • Discussion: Think back to the last college exam you took and think of the exam as a psychological measure. What construct do you think it was intended to measure? Comment on its face and content validity. What data could you collect to assess its reliability and criterion validity?
  • Cacioppo, J. T., & Petty, R. E. (1982). The need for cognition. Journal of Personality and Social Psychology, 42 , 116–131. ↵
  • Petty, R. E, Briñol, P., Loersch, C., & McCaslin, M. J. (2009). The need for cognition. In M. R. Leary & R. H. Hoyle (Eds.), Handbook of individual differences in social behaviour (pp. 318–329). New York, NY: Guilford Press. ↵

The consistency of a measure.

The consistency of a measure over time.

The consistency of a measure on the same group of people at different times.

Consistency of people’s responses across the items on a multiple-item measure.

Method of assessing internal consistency through splitting the items into two sets and examining the relationship between them.

A statistic in which α is the mean of all possible split-half correlations for a set of items.

The extent to which different observers are consistent in their judgments.

The extent to which the scores from a measure represent the variable they are intended to.

The extent to which a measurement method appears to measure the construct of interest.

The extent to which a measure “covers” the construct of interest.

The extent to which people’s scores on a measure are correlated with other variables that one would expect them to be correlated with.

In reference to criterion validity, variables that one would expect to be correlated with the measure.

When the criterion is measured at the same time as the construct.

when the criterion is measured at some point in the future (after the construct has been measured).

When new measures positively correlate with existing measures of the same constructs.

The extent to which scores on a measure are not correlated with measures of variables that are conceptually distinct.

Research Methods in Psychology - 2nd Canadian Edition Copyright © 2015 by Paul C. Price, Rajiv Jhangiani, & I-Chant A. Chiang is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

chapter 3 research methods reliability

Logo for Open Oregon Educational Resources

3.3 Research Methods

Sociologists examine the social world, see a problem or interesting pattern, and set out to study it. Just as Matthew Desmond approached his study on evictions in Milwaukee, researchers must decide what methodology to use when designing a study.

Planning the research design is a key step in any sociological study. Sociologists generally choose from widely used methods of social investigation:

  • primary source data collection such as survey, participant observation, ethnography, case study, unobtrusive observations, experiment
  • secondary data analysis, or use of existing sources

There are benefits and limitations to every research method. The topic of study and your research question strongly influence the methodology you select. When you are conducting research, think about the best way to gather or obtain knowledge about your topic. For instance, think of yourself as an architect. An architect needs a blueprint to build a house, as a sociologist your blueprint is your research design including your data collection method.

When entering a particular social environment, a researcher must be thoughtful. There are times to remain anonymous and times to be overt. Occasionally we conduct covert research, where people do not know they are being observed. Can you think of times when this would be the best approach to data collection?

Making sociologists’ presence invisible is not always realistic for other reasons. That option is not available to a researcher studying prison behaviors or early education. Researchers can’t just stroll into prisons or kindergarten classrooms and unobtrusively observe behaviors or attract attention. In situations like these, other methods are needed. Researchers choose methods that best suit their study topics, protect research participants or subjects, and that fit with their overall approaches to research.

3.3.1 Which Method to Use? Qualitative, Quantitative, or Mixed Methodology in Social Science Research

Quantitative research tends to refer to research that uses numerical data; the social world and experiences are translated into numbers that can be examined mathematically through statistical analysis. For example, through a survey we can learn a great deal about large populations, but might miss some of the interactional processes and other data better collected through direct observation. Qualitative research tends to work with non-numerical data and attempts to understand the experiences of individuals and groups from their own perspectives. With qualitative approaches, researchers examine how groups participate in their own meaning making and development of culture. Researchers who use this approach may use ethnography, in-depth interviews, focus groups, and/or content analysis to example social life. Qualitative data may involve the reading of texts and images. In the next section, we will explore some of these methodologies in greater detail.

Mixed methods research refers to the process of combining more than one method when conducting sociological research. This approach may help researchers gain a better understanding of the topic they are studying. Some research, like community based research, focuses on improving social conditions in local communities by establishing partnerships between organizations and researchers.

Sociologists consider the benefits and limitations of each method to determine how they will design their study. For example, Desmond (2016) used ethnographic research to learn about the experiences of families in poverty who experienced eviction. Ethnographic research or ethnography refers to a qualitative research method in which a researcher observes a social setting to provide descriptions of a group, society, or organization. He lived and worked in the communities people lived in and talked with them about their experiences. This qualitative approach offers us great insight into lived experiences and interactions that are observable. Desmond paired his qualitative approach with quantitative methods, specifically statistical analysis to learn more about larger patterns related to evictions in the United States. He learned that what he observed in the families he studied was part of a larger trend in the country—evictions create more poverty for people who have low incomes. In the next section you will learn how researchers use reliability, validity, and generalizability to evaluate studies.

3.3.2 Evaluating Research Methodologies

3.3.2.1 reliability of studies, validity, and generalizability.

Researchers design studies to maximize reliability , which refers to how likely research results are to be replicated if the study is reproduced. Reliability increases the likelihood that what happens to one person will happen to all people in a group or what will happen in one situation will happen in another. Baking is a science, for instance. When you follow a recipe and measure ingredients with a baking tool, such as a measuring cup, the same results are obtained as long as the cook follows the same recipe and uses the same type of tool. The measuring cup introduces accuracy into the process. If a person uses a less accurate tool, such as their hand, to measure ingredients rather than a cup, the same result may not be replicated. Accurate tools and methods increase reliability.

3.3.2.2 Validity of Studies

Researchers also strive for validity , which refers to how well the study measures what it was designed to measure. To produce reliable and valid results, sociologists develop an operational definition, that is, they define each concept, or variable, in terms of the physical or concrete steps it takes to objectively measure it. The operational definition identifies an observable condition of the concept. By operationalizing the concept, all researchers can collect data in a systematic or replicable manner.

3.3.2.3 Generalizability of Studies

Generalizability , or the extent to which findings from a study can be applied to a larger population or different circumstance is another factor that some researchers strive for. As you learned in this chapter, not all research methods are designed to produce generalizable results. Instead qualitative research offers depth and nuance to the topic being studied.

3.3.3 Licenses and Attributions for Research Methods

“Research Methods” second paragraph, first two sentences of fourth paragraph, first four sentences of fifth paragraph edited for consistency and brevity from “2.2 Research Methods” by Tonja R. Conerly, Kathleen Holmes, Asha Lal Tamang in Openstax Sociology 3e , which is licensed under CC BY 4.0 . Access for free at https://openstax.org/books/introduction-sociology-3e/pages/2-2-research-methods

All other content in this section is original content by Jennifer Puentes and is licensed under CC BY 4.0 .

“Which Method to Use? Qualitative, Quantitative, or Mixed Methodology in Social Science Research” is original content by Jennifer Puentes and is licensed under CC BY 4.0 .

Ethnography definition from the Open Education Sociology Dictionary is licensed under CC BY-SA 4.0 .

“Evaluating Research Methodologies” edited and remixed from “2.1 Approaches to Social Research” by Tonja R. Conerly, Kathleen Holmes, Asha Lal Tamang in Openstax Sociology 3e , which is licensed under CC BY 4.0 . Access for free at https://openstax.org/books/introduction-sociology-3e/pages/2-1-approaches-to-sociological-research

Generalizability definition from the Open Education Sociology Dictionary is licensed under CC BY-SA 4.0 .

Sociology in Everyday Life Copyright © by Matt Gougherty and Jennifer Puentes. All Rights Reserved.

Share This Book

  • Welcome to Chapter 3

Chapter 3 Webinars

  • Student Experience Feedback Buttons
  • Developing the Quantitative Research Design
  • Qualitative Descriptive Design
  • Qualitative Narrative Inquiry Research
  • SAGE Research Methods
  • Alignment of Dissertation Components for DIS-9902ABC
  • IRB Resources This link opens in a new window
  • Research Examples (SAGE) This link opens in a new window
  • Dataset Examples (SAGE) This link opens in a new window

Jump to DSE Guide

Need help ask us.

chapter 3 research methods reliability

Was this resource helpful?

  • Next: Developing the Quantitative Research Design >>
  • Last Updated: Nov 2, 2023 10:17 AM
  • URL: https://resources.nu.edu/c.php?g=1007179

National University

© Copyright 2024 National University. All Rights Reserved.

Privacy Policy | Consumer Information

Research Methodology

  • First Online: 29 June 2019

Cite this chapter

chapter 3 research methods reliability

  • Vaneet Kaur 3  

Part of the book series: Innovation, Technology, and Knowledge Management ((ITKM))

1094 Accesses

The chapter presents methodology employed for examining framework developed, during the literature review, for the purpose of present study. In light of the research objectives, the chapter works upon the ontology, epistemology as well as the methodology adopted for the present study. The research is based on positivist philosophy which postulates that phenomena of interest in the social world, can be studied as concrete cause and effect relationships, following a quantitative research design and a deductive approach. Consequently, the present study has used the existing body of literature to deduce relationships between constructs and develops a strategy to test the proposed theory with the ultimate objective of confirming and building upon the existing knowledge in the field. Further, the chapter presents a roadmap for the study which showcases the journey towards achieving research objectives in a series of well-defined logical steps. The process followed for building survey instrument as well as sampling design has been laid down in a similar manner. While the survey design enumerates various methods adopted along with justifications, the sampling design sets forth target population, sampling frame, sampling units, sampling method and suitable sample size for the study. The chapter also spells out the operational definitions of the key variables before exhibiting the three-stage research process followed in the present study. In the first stage, questionnaire has been developed based upon key constructs from various theories/researchers in the field. Thereafter, the draft questionnaire has been refined with the help of a pilot study and its reliability and validity has been tested. Finally, in light of the results of the pilot study, the questionnaire has been finalized and final data has been collected. In doing so, the step-by-step process of gathering data from various sources has been presented. Towards end, the chapter throws spotlight on various statistical methods employed for analysis of data, along with the presentation of rationale for the selection of specific techniques used for the purpose of presentation of outcomes of the present research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
  • Durable hardcover edition

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Aasland, A. (2008). A user manual for SPSS analysis (pp. 1–60).

Google Scholar  

Accenture Annual Report. (2016). Annual Report: 2016 Leading in the New. Retrieved February 13, 2017 from https://www.accenture.com/t20161030T213116__w__/in-en/_acnmedia/PDF-35/Accenture-2016-Shareholder-Letter10-K006.pdf#zoom=50 .

Achieng’Nyaura, L., & Omwenga, D. J. (2016). Factors affecting employee retention in the hotel industry in Mombasa County. Imperial Journal of Interdisciplinary Research, 2 (12).

Agariya, A. K., & Yayi, S. H. (2015). ERM scale development and validation in Indian IT sector. Journal of Internet Banking and Commerce, 20 (1), 1–16.

Aibinu, A. A., & Al-Lawati, A. M. (2010). Using PLS-SEM technique to model construction organizations’ willingness to participate in e-bidding. Automation in Construction, 19 (6), 714–724.

Article   Google Scholar  

Akgün, A. E., Keskin, H., & Byrne, J. (2012). Antecedents and contingent effects of organizational adaptive Capability on firm product innovativeness. Journal of Production and Innovation Management, 29 (S1), 171–189.

Akman, G., & Yilmaz, C. (2008). Innovative capability, innovation strategy and market orientation. International Journal of Innovation and Management, 12 (1), 69–111.

Akroush, M. N., Abu-ElSamen, A. A., Al-Shibly, M. S., & Al-Khawaldeh, F. M. (2010). Conceptualisation and development of customer service skills scale: An investigation of Jordanian customers. International Journal of Mobile Communications, 8 (6), 625–653.

AlKindy, A. M., Shah, I. M., & Jusoh, A. (2016). The impact of transformational leadership behaviors on work performance of Omani civil service agencies. Asian Social Science, 12 (3), 152.

Al-Mabrouk, K., & Soar, J. (2009). A delphi examination of emerging issues for successful information technology transfer in North Africa a case of Libya. African Journal of Business Management, 3 (3), 107.

Alonso-Almeida. (2015). Proactive and reactive strategies deployed by restaurants in times of crisis: Effects on capabilities, organization and competitive advantage. International Journal of Contemporary Hospitality Management, 27 (7), 1641–1661.

Alrubaiee, P., Alzubi, H. M., Hanandeh, R., & Ali, R. A. (2015). Investigating the relationship between knowledge management processes and organizational performance the mediating effect of organizational innovation. International Review of Management and Business Research, 4 (4), 989–1009.

Alters, B. J. (1997). Whose nature of science? Journal of Research in Science Teaching, 34 (1), 39–55.

Al-Thawwad, R. M. (2008). Technology transfer and sustainability-adapting factors: Culture, physical environment, and geographical location. In Proceedings of the 2008 IAJC-IJME International Conference .

Ammachchi, N. (2017). Healthcare demand spurring cloud & analytics development rush. Retrieved February 19, 2017 from http://www.nearshoreamericas.com/firms-focus-developing-low-cost-solutions-demand-outsourcing-rises-healthcare-sector-report/ .

Anatan, L. (2014). Factors influencing supply chain competitive advantage and performance. International Journal of Business and Information, 9 (3), 311–335.

Arkkelin, D. (2014). Using SPSS to understand research and data analysis.

Aroian, K. J., Kulwicki, A., Kaskiri, E. A., Templin, T. N., & Wells, C. L. (2007). Psychometric evaluation of the Arabic language version of the profile of mood states. Research in Nursing & Health, 30 (5), 531–541.

Asongu, S. A. (2013). Liberalization and financial sector competition: A critical contribution to the empirics with an African assessment.

Ayagre, P., Appiah-Gyamerah, I., & Nartey, J. (2014). The effectiveness of internal control systems of banks. The case of Ghanaian banks. International Journal of Accounting and Financial Reporting, 4 (2), 377.

Azizi, R., Maleki, M., Moradi-moghadam, M., & Cruz-machado, V. (2016). The impact of knowledge management practices on supply chain quality management and competitive advantages. Management and Production Engineering Review, 7 (1), 4–12.

Baariu, F. K. (2015). Factors influencing subscriber adoption of Mobile payments: A case of Safaricom’s Lipana M-Pesa Service in Embu Town , Kenya (Doctoral dissertation, University of Nairobi).

Babbie, E. R. (2011). Introduction to social research . Belmont: Wadsworth Cengage Learning.

Bagozzi, R. P., & Heatherton, T. F. (1994). A general approach to representing multifaceted personality constructs: Application to state self-esteem. Structural Equation Modeling: A Multidisciplinary Journal, 1 (1), 35–67.

Barlett, J. E., Kotrlik, J. W., & Higgins, C. C. (2001). Organizational research: Determining appropriate sample size in survey research. Information Technology, Learning, and Performance Journal, 19 (1), 43.

Barrales-molina, V., Bustinza, Ó. F., & Gutiérrez-gutiérrez, L. J. (2013). Explaining the causes and effects of dynamic capabilities generation: A multiple-indicator multiple-cause modelling approach. British Journal of Management, 24 , 571–591.

Barrales-molina, V., Martínez-lópez, F. J., & Gázquez-abad, J. C. (2014). Dynamic marketing capabilities: Toward an integrative framework. International Journal of Management Reviews, 16 , 397–416.

Bastian, R. W., & Thomas, J. P. (2016). Do talkativeness and vocal loudness correlate with laryngeal pathology? A study of the vocal overdoer/underdoer continuum. Journal of Voice, 30 (5), 557–562.

Bentler, P. M., & Mooijaart, A. B. (1989). Choice of structural model via parsimony: A rationale based on precision. Psychological Bulletin, 106 (2), 315–317.

Boari, C., Fratocchi, L., & Presutti, M. (2011). The Interrelated Impact of Social Networks and Knowledge Acquisition on Internationalisation Process of High-Tech Small Firms. In Proceedings of the 32th Annual Conference Academy of International Business, Bath .

Boralh, C. F. (2013). Impact of stress on depression and anxiety in dental students and professionals. International Public Health Journal, 5 (4), 485.

Bound, J. P., & Voulvoulis, N. (2005). Household disposal of pharmaceuticals as a pathway for aquatic contamination in the United Kingdom. Environmental Health Perspectives, 113 , 1705–1711.

Breznik, L., & Lahovnik, M. (2014). Renewing the resource base in line with the dynamic capabilities view: A key to sustained competitive advantage in the IT industry. Journal for East European Management Studies, 19 (4), 453–485.

Breznik, L., & Lahovnik, M. (2016). Dynamic capabilities and competitive advantage: Findings from case studies. Management: Journal of Contemporary Management Issues, 21 (Special issue), 167–185.

Cadiz, D., Sawyer, J. E., & Griffith, T. L. (2009). Developing and validating field measurement scales for absorptive capacity and experienced community of practice. Educational and Psychological Measurement, 69 (6), 1035–1058.

Carroll, G. B., Hébert, D. M., & Roy, J. M. (1999). Youth action strategies in violence prevention. Journal of Adolescent Health, 25 (1), 7–13.

Cepeda, G., & Vera, D. (2007). Dynamic capabilities and operational capabilities: A knowledge management perspective. Journal of Business Research, 60 (5), 426–437.

Chaharmahali, S. M., & Siadat, S. A. (2010). Achieving organizational ambidexterity: Understanding and explaining ambidextrous organisation.

Champoux, A., & Ommanney, C. S. L. (1986). Photo-interpretation, digital mapping, and the evolution of glaciers in glacier National Park, BC. Annals of Glaciology, 8 (1), 27–30.

Charan, C. S., & Nambirajan, T. (2016). An empirical investigation of supply chain engineering on lean thinking paradigms of in-house goldsmiths. The International Journal of Applied Business and Economic Research, 14 (6), 4475–4492.

Chau, P. Y. (2001). Inhibitors to EDI adoption in small business: An empirical investigation. Journal of Electronic Commerce Research, 2 (2), 78–88.

Chen, L. C. (2010). Multi-skilling in the hotel industry in Taiwan.

Chen, H. H., Lee, P. Y., & Lay, T. J. (2009). Drivers of dynamic learning and dynamic competitive capabilities in international strategic alliances. Journal of Business Research, 62 (12), 1289–1295.

Chen, C. W., Yu, P. H., & Li, Y. J. (2016). Understanding group-buying websites continuous use behavior: A use and gratifications theory perspective. Journal of Economics and Management, 12 (2), 177–204.

Chua, R. L., Cockfield, G., & Al-Hakim, L. (2008, November). Factors affecting trust within Australian beef supply chain. In 4th international congress on logistics and SCM systems: Effective supply chain and logistic management for sustainable development (pp. 26–28).

Cognizant Annual Report. (2015). Cognizant annual report 2015. Retrieved February 14, 2017 from http://investors.cognizant.com/download/Cognizant_AnnualReport_2015.pdf .

Cox, B. G., Mage, D. T., & Immerman, F. W. (1988). Sample design considerations for indoor air exposure surveys. JAPCA, 38 (10), 1266–1270.

Creswell, J. W. (2009). Editorial: Mapping the field of mixed methods research. Journal of Mixed Methods Research, 3 (2), 95–108.

Creswell, J. W., & Clark, V. L. P. (2007). Designing and conducting mixed methods research . Thousand Oaks: Sage.

Daniel, J. (2011). Sampling essentials: Practical guidelines for making sampling choices . London: Sage.

De Winter, J. C., & Dodou, D. (2010). Five-point Likert items: T test versus Mann-Whitney-Wilcoxon. Practical Assessment, Research & Evaluation, 15 (11), 1–12.

Deans, P. C., Karwan, K. R., Goslar, M. D., Ricks, D. A., & Toyne, B. (1991). Identification of key international information systems issues in US-based multinational corporations. Journal of Management Information Systems, 7 (4), 27–50.

Dei Mensah, R. (2014). Effects of human resource management practices on retention of employees in the banking industry in Accra, Ghana (Doctoral dissertation, Kenyatta University).

Dubey, R. (2016). Re-imagining Infosys. Retrieved February 19, 2017 from http://www.businesstoday.in/magazine/cover-story/how-infosys-ceo-is-trying-to-bring-back-the-company-into-high-growth-mode/story/230431.html .

Dunn, S., Cragg, B., Graham, I. D., Medves, J., & Gaboury, I. (2013). Interprofessional shared decision making in the NICU: A survey of an interprofessional healthcare team. Journal of Research in Interprofessional Practice and Education, 3 (1).

Einwiller, S. (2003). When reputation engenders trust: An empirical investigation in business-to-consumer electronic commerce. Electronic Markets, 13 (3), 196–209.

Eliassen, K. M., & Hopstock, L. A. (2011). Sleep promotion in the intensive care unit—A survey of nurses’ interventions. Intensive and Critical Care Nursing, 27 (3), 138–142.

Elliott, M., Page, K., Worrall-Carter, L., & Rolley, J. (2013). Examining adverse events after intensive care unit discharge: Outcomes from a pilot questionnaire. International Journal of Nursing Practice, 19 (5), 479–486.

Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. J. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods, 4 (3), 272–299.

Filippini, R., Güttel, W. H., & Nosella, A. (2012). Dynamic capabilities and the evolution of knowledge management projects in SMEs. International Journal of Technology Management, 60 (3/4), 202.

Finstad, K. (2010). Response interpolation and scale sensitivity: Evidence against 5-point scales. Journal of Usability Studies, 5 (3), 104–110.

Fleming, C. M., & Bowden, M. (2009). Web-based surveys as an alternative to traditional mail methods. Journal of Environmental Management, 90 (1), 284–292.

Foss, N. J., & Pedersen, T. (2004). Organizing knowledge processes in the multinational corporation: An introduction. Journal of International Business Studies, 35 (5), 340–349.

Frosi, G., Barros, V. A., Oliveira, M. T., Cavalcante, U. M. T., Maia, L. C., & Santos, M. G. (2016). Increase in biomass of two woody species from a seasonal dry tropical forest in association with AMF with different phosphorus levels. Applied Soil Ecology, 102 , 46–52.

Fujisato, H., Ito, M., Takebayashi, Y., Hosogoshi, H., Kato, N., Nakajima, S., & Horikoshi, M. (2017). Reliability and validity of the Japanese version of the emotion regulation skills questionnaire. Journal of Affective Disorders, 208 , 145–152.

Garg, R., & De, K. (2012). Impact of dynamic capabilities on the export orientation and export performance of small and medium sized enterprises in emerging markets: A conceptual model. African Journal of Business Management, 6 (29), 8464–8474.

Gerbing, D. W., & Anderson, J. C. (1988). An updated paradigm for scale development incorporating unidimensionality and its assessment. Journal of Marketing Research, 25 , 186–192.

Getz, L. M., Marks, S., & Roy, M. (2014). The influence of stress, optimism, and music training on music uses and preferences. Psychology of Music, 42 (1), 71–85.

Gibson, C. B., & Birkinshaw, J. (2004). The antecedents, consequences, and mediating role of organizational ambidexterity. Academy of Management Journal, 47 (2), 209–226.

Glasow, P. A. (2005). Fundamentals of survey research methodology.

Global MAKE Report. (2016). Global Most Admired Knowledge Enterprises (MAKE) report: Executive summary. Retrieved February 22, 2017 from http://www.knowledgebusiness.com/knowledgebusiness/templates/ViewAttachment.aspx?hyperLinkId=6695 .

Gold, A. H., Malhotra, A., & Segars, A. H. (2001). Knowledge management: An organizational capabilities perspective. Journal of Management Information Systems, 18 (1), 185–214.

Goltz, N. G. (2012). Influence of the first impression on credibility evaluation of online information (Bachelor’s thesis, University of Twente).

Graham, J. D., Beaulieu, N. D., Sussman, D., Sadowitz, M., & Li, Y. C. (1999). Who lives near coke plants and oil refineries? An exploration of the environmental inequity hypothesis. Risk Analysis, 19 (2), 171–186.

Granados, M. L. (2015). Knowing what social enterprises know. In 5th EMES International Research Conference on Social Enterprise (pp. 1–20).

Guo, Y. M., & Poole, M. S. (2009). Antecedents of flow in online shopping: A test of alternative models. Information Systems Journal, 19 (4), 369–390.

Hadadi, M., Ebrahimi Takamjani, I., Ebrahim Mosavi, M., Aminian, G., Fardipour, S., & Abbasi, F. (2016). Cross-cultural adaptation, reliability, and validity of the Persian version of the Cumberland ankle instability tool. Disability and Rehabilitation , 8288(February), 1–9. https://doi.org/10.1080/09638288.2016.1207105

Haghighi, M. A., Bagheri, R., & Kalat, P. S. (2015). The relationship of knowledge management and organizational performance in science and technology parks of Tehran. Independent Journal of Management & Production, 6 (2), 422–448.

Hahm, S., Knuth, D., Kehl, D., & Schmidt, S. (2016). The impact of different natures of experience on risk perception regarding fire-related incidents: A comparison of firefighters and emergency survivors using cross-national data. Safety Science, 82 , 274–282.

Hansen, S. S., & Lee, J. K. (2013). What drives consumers to pass along marketer-generated eWOM in social network games? Social and game factors in play. Journal of Theoretical and Applied Electronic Commerce Research, 8 (1), 53–68.

Haq, M. (2015). A comparative analysis of qualitative and quantitative research methods and a justification for adopting mixed methods in social research.

Hashim, Y. A. (2010). Determining sufficiency of sample size in management survey research activities. International Journal of Organisational Management & Entrepreneurship Development, 6 (1), 119–130.

Hill, R. (1998). What sample size is “enough” in internet survey research. Interpersonal Computing and Technology: An Electronic Journal for the 21st Century, 6 (3–4), 1–12.

Hinkin, T. R. (1995). A review of scale development practices in the study of organizations. Journal of Management, 21 (5), 967–988.

Hogan, S. J., Soutar, G. N., McColl-Kennedy, J. R., & Sweeney, J. C. (2011). Reconceptualizing professional service firm innovation capability: Scale development. Industrial Marketing Management, 40 (8), 1264–1273.

Holm, K. E., LaChance, H. R., Bowler, R. P., Make, B. J., & Wamboldt, F. S. (2010). Family factors are associated with psychological distress and smoking status in chronic obstructive pulmonary disease. General Hospital Psychiatry, 32 (5), 492–498.

Horng, J. S., Teng, C. C., & Baum, T. G. (2009). Evaluating the quality of undergraduate hospitality, tourism and leisure programmes. Journal of Hospitality, Leisure, Sport and Tourism Education, 8 (1), 37–54.

Huan, Y., & Li, D. (2015). Effects of intellectual capital on innovative performance: The role of knowledge- based dynamic capability. Management Decision, 53 (1), 40–56.

Huckleberry, S. D. (2011). Commitment to coaching: Using the sport commitment model as a theoretical framework with soccer coaches (Doctoral dissertation, Ohio University).

Humborstad, S. I. W., & Perry, C. (2011). Employee empowerment, job satisfaction and organizational commitment: An in-depth empirical investigation. Chinese Management Studies, 5 (3), 325–344.

Infosys Annual Report. (2015). Infosys annual report 2015. Retrieved February 12, 2017 from https://www.infosys.com/investors/reports-filings/annual-report/annual/Documents/infosys-AR-15.pdf .

Investment Standard. (2016). Cognizant is the best pick out of the 4 information technology service providers. Retrieved February 19, 2017 from http://seekingalpha.com/article/3961500-cognizant-best-pick-4-information-technology-service-providers .

Jansen, J. J., Van Den Bosch, F. A., & Volberda, H. W. (2005). Managing potential and realized absorptive capacity: How do organizational antecedents matter? Academy of Management Journal, 48 (6), 999–1015.

John, N. A., Seme, A., Roro, M. A., & Tsui, A. O. (2017). Understanding the meaning of marital relationship quality among couples in peri-urban Ethiopia. Culture, Health & Sexuality, 19 (2), 267–278.

Joo, J., & Sang, Y. (2013). Exploring Koreans’ smartphone usage: An integrated model of the technology acceptance model and uses and gratifications theory. Computers in Human Behavior, 29 (6), 2512–2518.

Kaehler, C., Busatto, F., Becker, G. V., Hansen, P. B., & Santos, J. L. S. (2014). Relationship between adaptive capability and strategic orientation: An empirical study in a Brazilian company. iBusiness .

Kajfez, R. L. (2014). Graduate student identity: A balancing act between roles.

Kam Sing Wong, S., & Tong, C. (2012). The influence of market orientation on new product success. European Journal of Innovation Management, 15 (1), 99–121.

Karttunen, V., Sahlman, H., Repo, J. K., Woo, C. S. J., Myöhänen, K., Myllynen, P., & Vähäkangas, K. H. (2015). Criteria and challenges of the human placental perfusion–Data from a large series of perfusions. Toxicology In Vitro, 29 (7), 1482–1491.

Kaur, V., & Mehta, V. (2016a). Knowledge-based dynamic capabilities: A new perspective for achieving global competitiveness in IT sector. Pacific Business Review International, 1 (3), 95–106.

Kaur, V., & Mehta, V. (2016b). Leveraging knowledge processes for building higher-order dynamic capabilities: An empirical evidence from IT sector in India. JIMS 8M , July- September.

Kaya, A., Iwamoto, D. K., Grivel, M., Clinton, L., & Brady, J. (2016). The role of feminine and masculine norms in college women’s alcohol use. Psychology of Men & Masculinity, 17 (2), 206–214.

Kenny, A., McLoone, S., Ward, T., & Delaney, D. (2006). Using user perception to determine suitable error thresholds for dead reckoning in distributed interactive applications.

Kianpour, K., Jusoh, A., & Asghari, M. (2012). Importance of Price for buying environmentally friendly products. Journal of Economics and Behavioral Studies, 4 (6), 371–375.

Kim, J., & Forsythe, S. (2008). Sensory enabling technology acceptance model (SE-TAM): A multiple-group structural model comparison. Psychology & Marketing, 25 (9), 901–922.

Kim, Y. J., Oh, Y., Park, S., Cho, S., & Park, H. (2013). Stratified sampling design based on data mining. Healthcare Informatics Research, 19 (3), 186–195.

Kim, R., Yang, H., & Chao, Y. (2016). Effect of brand equity& country origin on Korean consumers’ choice for beer brands. The Business & Management Review, 7 (3), 398.

Kimweli, J. M. (2013). The role of monitoring and evaluation practices to the success of donor funded food security intervention projects a case study of Kibwezi District. International Journal of Academic Research in Business and Social Sciences, 3 (6), 9.

Kinsfogel, K. M., & Grych, J. H. (2004). Interparental conflict and adolescent dating relationships: Integrating cognitive, emotional, and peer influences. Journal of Family Psychology, 18 (3), 505–515.

Kivimäki, M., Vahtera, J., Pentti, J., Thomson, L., Griffiths, A., & Cox, T. (2001). Downsizing, changes in work, and self-rated health of employees: A 7-year 3-wave panel study. Anxiety, Stress and Coping, 14 (1), 59–73.

Klemann, B. (2012). The unknowingly consumers of Fairtrade products.

Kothari, C. R. (2004). Research methodology: Methods and techniques . New Delhi: New Age International.

Krause, D. R. (1999). The antecedents of buying firms’ efforts to improve suppliers. Journal of Operations Management, 17 (2), 205–224.

Krejcie, R. V., & Morgan, D. W. (1970). Determining sample size for research activities. Educational and Psychological Measurement., 30 , 607–610.

Krige, S. M., Mahomoodally, F. M., Subratty, A. H., & Ramasawmy, D. (2012). Relationship between socio-demographic factors and eating practices in a multicultural society. Food and Nutrition Sciences, 3 (3), 286–295.

Krzakiewicz, K. (2013). Dynamic capabilities and knowledge management. Management, 17 (2), 1–15.

Kuzic, J., Fisher, J., Scollary, A., Dawson, L., Kuzic, M., & Turner, R. (2005). Modus vivendi of E-business. PACIS 2005 Proceedings , 99.

Laframboise, K., Croteau, A. M., Beaudry, A., & Manovas, M. (2009). Interdepartmental knowledge transfer success during information technology projects. International Journal of Knowledge Management , 189–210.

Landaeta, R. E. (2008). Evaluating benefits and challenges of knowledge transfer across projects. Engineering Management Journal, 20 (1), 29–38.

Lee, Y., Chen, A., Yang, Y. L., Ho, G. H., Liu, H. T., & Lai, H. Y. (2005). The prophylactic antiemetic effects of ondansetron, propofol, and midazolam in female patients undergoing sevoflurane anaesthesia for ambulatory surgery: A-42. European Journal of Anaesthesiology (EJA), 22 , 11–12.

Lee, V. H., Foo, A. T. L., Leong, L. Y., & Ooi, K. B. (2016). Can competitive advantage be achieved through knowledge management? A case study on SMEs. Expert Systems with Applications, 65 , 136–151.

Leech, N. L., Barrett, K. C., & Morgan, G. A. (2005). SPSS for intermediate statistics: Use and interpretation . New Jersey: Psychology Press.

Leonardi, F., Spazzafumo, L., & Marcellini, F. (2005). Subjective Well-being: The constructionist point of view. A longitudinal study to verify the predictive power of top-down effects and bottom-up processes. Social Indicators Research, 70 (1), 53–77.

Li, D. Y., & Liu, J. (2014). Dynamic capabilities, environmental dynamism, and competitive advantage: Evidence from China. Journal of Business Research, 67 (1), 2793–2799.

Liao, S. H., Fei, W. C., & Chen, C. C. (2007). Knowledge sharing, absorptive capacity, and innovation capability: An empirical study of Taiwan’s knowledge-intensive industries. Journal of Information Science, 33 (3), 340–359.

Liao, S. H., & Wu, C. C. (2009). The relationship among knowledge management, organizational learning, and organizational performance. International Journal of Business and Management, 4 (4), 64.

Liao, T. S., Rice, J., & Lu, J. C. (2014). The vicissitudes of Competitive advantage: Empirical evidence from Australian manufacturing SMEs. Journal of Small Business Management, 53 (2), 469–481.

Liu, S., & Deng, Z. (2015). Understanding knowledge management capability in business process outsourcing: A cluster analysis. Management Decision, 53 (1), 1–11.

Liu, C. L. E., Ghauri, P. N., & Sinkovics, R. R. (2010). Understanding the impact of relational capital and organizational learning on alliance outcomes. Journal of World Business, 45 (3), 237–249.

Luís, C., Cothran, E. G., & do Mar Oom, M. (2007). Inbreeding and genetic structure in the endangered Sorraia horse breed: Implications for its conservation and management. Journal of Heredity, 98 (3), 232–237.

MacDonald, C. M., & Atwood, M. E. (2014, June). What does it mean for a system to be useful?: An exploratory study of usefulness. In Proceedings of the 2014 conference on designing interactive systems (pp. 885–894). New York: ACM.

Mafini, C., & Dlodlo, N. (2014). The relationship between extrinsic motivation, job satisfaction and life satisfaction amongst employees in a public organisation. SA Journal of Industrial Psychology, 40 (1), 01–12.

Mafini, C., Dhurup, M., & Mandhlazi, L. (2014). Shopper typologies amongst a generation Y consumer cohort and variations in terms of age in the fashion apparel market: Original research. Acta Commercii, 14 (1), 1–11.

Mageswari, S. U., Sivasubramanian, C., & Dath, T. S. (2015). Knowledge management enablers, processes and innovation in Small manufacturing firms: A structural equation modeling approach. IUP Journal of Knowledge Management, 13 (1), 33.

Mahoney, J. T. (2005). Resource-based theory, dynamic capabilities, and real options. In Foundations for organizational science. Economic foundations of strategy . Thousand Oaks: SAGE Publications.

Malhotra, N., Hall, J., Shaw, M., & Oppenheim, P. (2008). Essentials of marketing research, 2nd Australian edition.

Manan, R. M. (2016). The use of hangman game in motivating students in Learning English. ELT Perspective, 4 (2).

Manco-Johnson, M., Morrissey-Harding, G., Edelman-Lewis, B., Oster, G., & Larson, P. (2004). Development and validation of a measure of disease-specific quality of life in young children with haemophilia. Haemophilia, 10 (1), 34–41.

Marek, L. (2016). Guess which Illinois company uses the most worker visas. Retrieved February 13, 2017 from http://www.chicagobusiness.com/article/20160227/ISSUE01/302279994/guess-which-illinois-company-uses-the-most-worker-visas .

Martin, C. M., Roach, V. A., Nguyen, N., Rice, C. L., & Wilson, T. D. (2013). Comparison of 3D reconstructive technologies used for morphometric research and the translation of knowledge using a decision matrix. Anatomical Sciences Education, 6 (6), 393–403.

Maskatia, S. A., Altman, C. A., Morris, S. A., & Cabrera, A. G. (2013). The echocardiography “boot camp”: A novel approach in pediatric cardiovascular imaging education. Journal of the American Society of Echocardiography, 26 (10), 1187–1192.

Matson, J. L., Boisjoli, J., Rojahn, J., & Hess, J. (2009). A factor analysis of challenging behaviors assessed with the baby and infant screen for children with autism traits. Research in Autism Spectrum Disorders, 3 (3), 714–722.

Matusik, S. F., & Heeley, M. B. (2005). Absorptive capacity in the software Industry: Identifying dimensions that affect knowledge and knowledge creation activities. Journal of Management, 31 (4), 549–572.

Matveev, A. V. (2002). The advantages of employing quantitative and qualitative methods in intercultural research: Practical implications from the study of the perceptions of intercultural communication competence by American and Russian managers. Bulletin of Russian Communication Association Theory of Communication and Applied Communication, 1 , 59–67.

McDermott, E. P., & Ervin, D. (2005). The influence of procedural and distributive variables on settlement rates in employment discrimination mediation. Journal of Dispute Resolution, 45 , 1–16.

McKelvie, A. (2007). Innovation in new firms: Examining the role of knowledge and growth willingness.

Mendonca, J., & Sen, A. (2016). IT companies including TCS, Infosys, Wipro bracing for slowest topline expansion on annual basis. Retrieved February 19 2017 from http://economictimes.indiatimes.com/markets/stocks/earnings/it-companies-including-tcs-infosys-wipro-bracing-for-slowest-topline-expansion-on-annual-basis/articleshow/51639858.cms .

Mesina, F., De Deyne, C., Judong, M., Vandermeersch, E., & Heylen, R. (2005). Quality survey of pre-operative assessment: Influence of a standard questionnaire: A-38. European Journal of Anaesthesiology (EJA), 22 , 11.

Michailova, S., & Zhan, W. (2014). Dynamic capabilities and innovation in MNC subsidiaries. Journal of World Business , 1–9.

Miller, R., Salmona, M., & Melton, J. (2012). Modeling student concern for professional online image. Journal of Internet Social Networking & Virtual Communities, 3 (2), 1.

Minarro-Viseras, E., Baines, T., & Sweeney, M. (2005). Key success factors when implementing strategic manufacturing initiatives. International Journal of Operations & Production Management, 25 (2), 151–179.

Monferrer, D., Blesa, A., & Ripollés, M. (2015). Catching dynamic capabilities through market-oriented networks. European Journal of International Management, 9 (3), 384–408.

Moyer, J. E. (2007). Learning from leisure reading: A study of adult public library patrons. Reference & User Services Quarterly, 46 , 66–79.

Mulaik, S. A., James, L. R., Van Alstine, J., Bennett, N., Lind, S., & Stilwell, C. D. (1989). Evaluation of goodness-of-fit indices for structural equation models. Psychological Bulletin, 105 (3), 430–445.

Murphy, T. H., & Terry, H. R. (1998). Faculty needs associated with agricultural distance education. Journal of Agricultural Education, 39 , 17–27.

Murphy, C., Hearty, C., Murray, M., & McCaul, C. (2005). Patient preferences for desired post-anaesthesia outcomes-a comparison with medical provider perspective: A-40. European Journal of Anaesthesiology (EJA), 22 , 11.

Nair, A., Rustambekov, E., McShane, M., & Fainshmidt, S. (2014). Enterprise risk management as a dynamic Capability: A test of its effectiveness during a crisis. Managerial and Decision Economics, 35 , 555–566.

Nandan, S. (2010). Determinants of customer satisfaction on service quality: A study of railway platforms in India. Journal of Public Transportation, 13 (1), 6.

NASSCOM Indian IT-BPM Industry Report. (2016). NASSCOM Indian IT-BPM Industry Report 2016. Retrieved January 11, 2017 from http://www.nasscom.in/itbpm-sector-india-strategic-review-2016 .

Nedzinskas, Š. (2013). Dynamic capabilities and organizational inertia interaction in volatile environment. Retrieved from http://archive.ism.lt/handle/1/301 .

Nguyen, T. N. Q. (2010). Knowledge management capability and competitive advantage: An empirical study of Vietnamese enterprises.

Nguyen, N. T. D., & Aoyama, A. (2014). Achieving efficient technology transfer through a specific corporate culture facilitated by management practices. The Journal of High Technology Management Research, 25 (2), 108–122.

Nguyen, Q. T. N., & Neck, P. A. (2008, July). Knowledge management as dynamic capabilities: Does it work in emerging less developed countries. In Proceedings of the 16th Annual Conference on Pacific Basin Finance, Economics, Accounting and Management (pp. 1–18).

Nieves, J., & Haller, S. (2014). Building dynamic capabilities through knowledge resources. Tourism Management, 40 , 224–232.

Nirmal, R. (2016). Indian IT firms late movers in digital race. Retrieved February 19, 2017 from http://www.thehindubusinessline.com/info-tech/indian-it-firms-late-movers-in-digital-race/article8505379.ece .

Numthavaj, P., Bhongmakapat, T., Roongpuwabaht, B., Ingsathit, A., & Thakkinstian, A. (2017). The validity and reliability of Thai Sinonasal outcome Test-22. European Archives of Oto-Rhino-Laryngology, 274 (1), 289–295.

Obwoge, M. E., Mwangi, S. M., & Nyongesa, W. J. (2013). Linking TVET institutions and industry in Kenya: Where are we. The International Journal of Economy, Management and Social Science, 2 (4), 91–96.

Oktemgil, M., & Greenley, G. (1997). Consequences of high and low adaptive capability in UK companies. European Journal of Marketing, 31 (7), 445–466.

Ouyang, Y. (2015). A cyclic model for knowledge management capability-a review study. Arabian Journal of Business and Management Review, 5 (2), 1–9.

Paloniemi, R., & Vainio, A. (2011). Legitimacy and empowerment: Combining two conceptual approaches for explaining forest owners’ willingness to cooperate in nature conservation. Journal of Integrative Environmental Sciences, 8 (2), 123–138.

Pant, S., & Lado, A. (2013). Strategic business process offshoring and Competitive advantage: The role of strategic intent and absorptive capacity. Journal of Information Science and Technology, 9 (1), 25–58.

Paramati, S. R., Gupta, R., Maheshwari, S., & Nagar, V. (2016). The empirical relationship between the value of rupee and performance of information technology firms: Evidence from India. International Journal of Business and Globalisation, 16 (4), 512–529.

Parida, V., Oghazi, P., & Cedergren, S. (2016). A study of how ICT capabilities can influence dynamic capabilities. Journal of Enterprise Information Management, 29 (2), 1–22.

Parkhurst, K. A., Conwell, Y., & Van Orden, K. A. (2016). The interpersonal needs questionnaire with a shortened response scale for oral administration with older adults. Aging & Mental Health, 20 (3), 277–283.

Payne, A. A., Gottfredson, D. C., & Gottfredson, G. D. (2006). School predictors of the intensity of implementation of school-based prevention programs: Results from a national study. Prevention Science, 7 (2), 225–237.

Pereira-Moliner, J., Font, X., Molina-Azorín, J., Lopez-Gamero, M. D., Tarí, J. J., & Pertusa-Ortega, E. (2015). The holy grail: Environmental management, competitive advantage and business performance in the Spanish hotel industry. International Journal of Contemporary Hospitality Management, 27 (5), 714–738.

Persada, S. F., Razif, M., Lin, S. C., & Nadlifatin, R. (2014). Toward paperless public announcement on environmental impact assessment (EIA) through SMS gateway in Indonesia. Procedia Environmental Sciences, 20 , 271–279.

Pertusa-Ortega, E. M., Molina-Azorín, J. F., & Claver-Cortés, E. (2010). Competitive strategy, structure and firm performance: A comparison of the resource-based view and the contingency approach. Management Decision, 48 (8), 1282–1303.

Peters, M. D., Wieder, B., Sutton, S. G., & Wake, J. (2016). Business intelligence systems use in performance measurement capabilities: Implications for enhanced competitive advantage. International Journal of Accounting Information Systems, 21 (1–17), 1–17.

Protogerou, A., Caloghirou, Y., & Lioukas, S. (2011). Dynamic capabilities and their indirect impact on firm performance. Industrial and Corporate Change, 21 (3), 615–647.

Rapiah, M., Wee, S. H., Ibrahim Kamal, A. R., & Rozainun, A. A. (2010). The relationship between strategic performance measurement systems and organisational competitive advantage. Asia-Pacific Management Accounting Journal, 5 (1), 1–20.

Reuner, T. (2016). HfS blueprint Report, ServiceNow services 2016, excerpt for Cognizant. Retrieved February 2, 2017 from https://www.cognizant.com/services-resources/Services/hfs-blueprint-report-servicenow-2016.pdf .

Ríos, V. R., & del Campo, E. P. (2013). Business research methods: Theory and practice . Madrid: ESIC Editorial.

Sachitra, V. (2015). Review of Competitive advantage measurements: The case of agricultural firms. IV, 303–317.

Sahney, S., Banwet, D. K., & Karunes, S. (2004). Customer requirement constructs: The premise for TQM in education: A comparative study of select engineering and management institutions in the Indian context. International Journal of Productivity and Performance Management, 53 (6), 499–520.

Sampe, F. (2012). The influence of organizational learning on performance in Indonesian SMEs.

Sarlak, M. A., Shafiei, M., Sarlak, M. A., Shafiei, M., Capability, M., Capability, I., & Competitive, S. (2013). A research in relationship between entrepreneurship, marketing Capability, innovative Capability and sustainable Competitive advantage. Kaveh Industrial City, 7 (8), 1490–1497.

Saunders, M., Lewis, P., & Thornhill, A. (2012). Research methods for business students . Pearson.

Schiff, J. H., Fornaschon, S., Schiff, M., Martin, E., & Motsch, J. (2005). Measuring patient dissatisfaction with anethesia care: A-41. European Journal of Anaesthesiology (EJA), 22 , 11.

Schwartz, S. J., Coatsworth, J. D., Pantin, H., Prado, G., Sharp, E. H., & Szapocznik, J. (2006). The role of ecodevelopmental context and self-concept in depressive and externalizing symptoms in Hispanic adolescents. International Journal of Behavioral Development, 30 (4), 359–370.

Scott, V. C., Sandberg, J. G., Harper, J. M., & Miller, R. B. (2012). The impact of depressive symptoms and health on sexual satisfaction for older couples: Implications for clinicians. Contemporary Family Therapy, 34 (3), 376–390.

Shafia, M. A., Shavvalpour, S., Hosseini, M., & Hosseini, R. (2016). Mediating effect of technological innovation capabilities between dynamic capabilities and competitiveness of research and technology organisations. Technology Analysis & Strategic Management, 28 , 1–16. https://doi.org/10.1080/09537325.2016.1158404 .

Shahzad, K., Faisal, A., Farhan, S., Sami, A., Bajwa, U., & Sultani, R. (2016). Integrating knowledge management (KM) strategies and processes to enhance organizational creativity and performance: An empirical investigation. Journal of Modelling in Management, 11 (1), 1–34.

Sharma, A. (2016). Five reasons why you should avoid investing in IT stocks. Retrieved February 19, 2017 from http://www.businesstoday.in/markets/company-stock/five-reasons-why-you-should-avoid-investing-in-infosys-tcs-wipro/story/238225.html .

Sharma, J. K., & Singh, A. K. (2012). Absorptive capability and competitive advantage: Some insights from Indian pharmaceutical Industry. International Journal of Management and Business Research, 2 (3), 175–192.

Shepherd, R. M., & Edelmann, R. J. (2005). Reasons for internet use and social anxiety. Personality and Individual Differences, 39 (5), 949–958.

Singh, R., & Khanduja, D. (2010). Customer requirements grouping–a prerequisite for successful implementation of TQM in technical education. International Journal of Management in Education, 4 (2), 201–215.

Small, M. J., Gupta, J., Frederic, R., Joseph, G., Theodore, M., & Kershaw, T. (2008). Intimate partner and nonpartner violence against pregnant women in rural Haiti. International Journal of Gynecology & Obstetrics, 102 (3), 226–231.

Srivastava, M. (2016). IT biggies expect weaker Sept quarter. Retrieved February 19, 2017 from http://www.business-standard.com/article/companies/it-biggies-expect-weaker-sept-quarter-116100400680_1.html .

Stoten, D. W. (2016). Discourse, knowledge and power: The continuing debate over the DBA. Journal of Management Development, 35 (4), 430–447.

Sudarvel, J., & Velmurugan, R. (2015). Semi month effect in Indian IT sector with reference to BSE IT index. International Journal of Advance Research in Computer Science and Management Studies, 3 (10), 155–159.

Sylvia, M., & Terhaar, M. (2014). An approach to clinical data Management for the Doctor of nursing practice curriculum. Journal of Professional Nursing, 30 (1), 56–62.

Tabachnick, B. G., & Fidell, L. S. (2007). Multivariate analysis of variance and covariance. Using Multivariate Statistics, 3 , 402–407.

Teece, D. J. (2014). The foundations of Enterprise performance: Dynamic and ordinary capabilities in an (economic) theory of firms. The Academy of Management Perspectives, 28 (4), 328–352.

Teece, D. J., Pisano, G., & Shuen, A. (1997). Dynamic capabilities and strategic management. Strategic Management Journal, 18 (7), 509–533.

Thomas, J. B., Sussman, S. W., & Henderson, J. C. (2001). Understanding “strategic learning”: Linking organizational learning, knowledge management, and sensemaking. Organization Science, 12 (3), 331–345.

Travis, S. E., & Grace, J. B. (2010). Predicting performance for ecological restoration: A case study using Spartina alterniflora. Ecological Applications, 20 (1), 192–204.

Tseng, S., & Lee, P. (2014). The effect of knowledge management capability and dynamic capability on organizational performance. Journal of Enterprise Information Management, 27 (2), 158–179.

Turker, D. (2009). Measuring corporate social responsibility: A scale development study. Journal of Business Ethics, 85 (4), 411–427.

Vanham, D., Mak, T. N., & Gawlik, B. M. (2016). Urban food consumption and associated water resources: The example of Dutch cities. Science of the Total Environment, 565 , 232–239.

Visser, P. S., Krosnick, J. A., & Lavrakas, P. J. (2000). Survey research. In H.T. Reis & C.M. Judd (Eds.), Handbook of research methods in social and personality psychology (pp. 223-252). New York: Cambridge.

Vitale, G., Sala, F., Consonni, F., Teruzzi, M., Greco, M., Bertoli, E., & Maisano, P. (2005). Perioperative complications correlate with acid-base balance in elderly trauma patients: A-37. European Journal of Anaesthesiology (EJA), 22 , 10–11.

Wang, C. L., & Ahmed, P. K. (2004). Leveraging knowledge in the innovation and learning process at GKN. International Journal of Technology Management, 27 (6/7), 674–688.

Wang, C. L., Senaratne, C., & Rafiq, M. (2015). Success traps, dynamic capabilities and firm performance. British Journal of Management, 26 , 26–44.

Wasswa Katono, I. (2011). Student evaluation of e-service quality criteria in Uganda: The case of automatic teller machines. International Journal of Emerging Markets, 6 (3), 200–216.

Wasylkiw, L., Currie, M. A., Meuse, R., & Pardoe, R. (2010). Perceptions of male ideals: The power of presentation. International Journal of Men's Health, 9 (2), 144–153.

Wilhelm, H., Schlömer, M., & Maurer, I. (2015). How dynamic capabilities affect the effectiveness and efficiency of operating routines under high and Low levels of environmental dynamism. British Journal of Management , 1–19.

Wilkens, U., Menzel, D., & Pawlowsky, P. (2004). Inside the black-box : Analysing the generation of Core competencies and dynamic capabilities by exploring collective minds. An organizational learning perspective. Management Review, 15 (1), 8–27.

Willemsen, M. C., & de Vries, H. (1996). Saying “no” to environmental tobacco smoke: Determinants of assertiveness among nonsmoking employees. Preventive Medicine, 25 (5), 575–582.

Williams, M., Peterson, G. M., Tenni, P. C., & Bindoff, I. K. (2012). A clinical knowledge measurement tool to assess the ability of community pharmacists to detect drug-related problems. International Journal of Pharmacy Practice, 20 (4), 238–248.

Wintermark, M., Huss, D. S., Shah, B. B., Tustison, N., Druzgal, T. J., Kassell, N., & Elias, W. J. (2014). Thalamic connectivity in patients with essential tremor treated with MR imaging–guided focused ultrasound: In vivo Fiber tracking by using diffusion-tensor MR imaging. Radiology, 272 (1), 202–209.

Wipro Annual Report. (2015). Wipro annual report 2014–15. Retrieved February 16, 2017 from http://www.wipro.com/documents/investors/pdf-files/Wipro-annual-report-2014-15.pdf .

Wu, J., & Chen, X. (2012). Leaders’ social ties, knowledge acquisition capability and firm competitive advantage. Asia Pacific Journal of Management, 29 (2), 331–350.

Yamane, T. (1967). Elementary Sampling Theory Prentice Inc. Englewood Cliffs. NS, USA, 1, 371–390.

Zahra, S., Sapienza, H. J., & Davidsson, P. (2006). Entrepreneurship and dynamic capabilities: A review, model and research agenda. Journal of Management Studies, 43 (4), 917–955.

Zaied, A. N. H. (2012). An integrated knowledge management capabilities framework for assessing organizational performance. International Journal of Information Technology and Computer Science, 4 (2), 1–10.

Zakaria, Z. A., Anuar, H. S., & Udin, Z. M. (2015). The relationship between external and internal factors of information systems success towards employee performance: A case of Royal Malaysia custom department. International Journal of Economics, Finance and Management, 4 (2), 54–60.

Zheng, S., Zhang, W., & Du, J. (2011). Knowledge-based dynamic capabilities and innovation in networked environments. Journal of Knowledge Management, 15 (6), 1035–1051.

Zikmund, W. G., Babin, B. J., Carr, J. C., & Griffin, M. (2010). Business research methods . Mason: South Western Cengage Learning.

Download references

Author information

Authors and affiliations.

The University of Texas at Dallas, Richardson, TX, USA

Vaneet Kaur

You can also search for this author in PubMed   Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Kaur, V. (2019). Research Methodology. In: Knowledge-Based Dynamic Capabilities. Innovation, Technology, and Knowledge Management. Springer, Cham. https://doi.org/10.1007/978-3-030-21649-8_3

Download citation

DOI : https://doi.org/10.1007/978-3-030-21649-8_3

Published : 29 June 2019

Publisher Name : Springer, Cham

Print ISBN : 978-3-030-21648-1

Online ISBN : 978-3-030-21649-8

eBook Packages : Business and Management Business and Management (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

National Academies Press: OpenBook

Reliability and Quality of Service Evaluation Methods for Rural Highways: A Guide (2024)

Chapter: 1 introduction.

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

CHAPTER 1 Introduction Rural highways account for a very significant portion of the National Highway System and serve many vital mobility purposes, such as the following: • Connectivity between major urban areas. • Access to rural recreational areas (e.g., mountains, lakes, oceans). • Access to special events held in rural areas (e.g., concerts, regional festivals). • Evacuation route for extreme events (e.g., natural disasters). • Diversion of traffic from another disrupted route. Despite the importance of rural highways, infrastructure funding to improve operations is often more limited for rural highways than it is for congested urban roadways. Thus, to ensure effective investment of such funding, it is essential for highway agencies to identify locations of poor operations and consider appropriate mitigation measures. For this to be possible, an agency needs traffic analysis methods that allow for examination of short sections of highway (e.g., a passing zone, signalized intersection) individually and also within the context of an extended length (many miles) of highway. Rural highways, which often span distances of 20 to 60 miles between urban areas, may consist of segments with a variety of cross-section elements (two-lane highway, multilane highway, pass- ing lane sections) as well as intersections with different traffic controls (signal control, stop control, roundabouts with yield control). A sample of these component roadway configurations is shown in Figure 1.1. These component roadway types are described in detail in Chapter 2. These highways are usually more varied in horizontal and vertical alignment than urban roadways. The Highway Capacity Manual (HCM7), the standard reference for traffic analysis methodologies, contains analysis methods for all the individual segments or intersections that may constitute a rural highway; how- ever, it does not include a method, or guidance, for connecting the individual roadway segments into a connected, cohesive, facility-level analysis) (TRB 2022). It is important to continue to extend the capabilities of the HCM7 analysis methodologies, particularly at the facility level, so that roadway design and traffic engineers have the analysis tools they need to perform accurate and comprehen- sive facility evaluations. Furthermore, analysis at the facility level is consistent with the fact that drivers typically evaluate the quality of their trip over its entire length, not just in separate segments. 1.1 Purpose of the Guide This Guide is intended to assist transportation agencies charged with monitoring, maintain- ing, and improving rural highways of regional or statewide importance, specifically with the evaluation of rural highways in three areas: • Motorized vehicle traffic operations per HCM7 analysis methods. • Motorized vehicle traffic operations per probe vehicle data analysis methods. 3

4   Reliability and Quality of Service Evaluation Methods for Rural Highways: A Guide Figure 1.1.   A sample of rural highway component roadway configurations.

Introduction  5 • Overview of alternative analysis methods, to the HCM7, for bicycles and recommendations for future bicycle operations’ research needs. With the heterogeneity of cross-section (i.e., roadway segment type) composition over such distances and the disparate HCM7 service measures (density, follower density, delay) across these segment types, the process for performing an HCM7 facility analysis across the variety of contiguous segments contained within a rural highway facility is not necessarily straightforward. This Guide proposes an analysis framework for assessing the level of service (LOS) of auto- mobiles on a long rural highway facility (20+ miles). In addition to LOS, several other facility-level performance measures are presented along with discussion about the analysis context in which such measures are useful for evaluating overall traffic operations along the route. While simulation is always an option for analyzing a stretch of rural highways, the level of effort would be high for typical rural highway distances considered for analysis. In some situa- tions, simulation may be warranted, but the methodology described in this Guide would still be a good first step and may even be sufficient. This methodology would also be much more efficient for performing “what-if ” scenario testing, where relative differences in results are the primary concern. Reliability analyses on freeways and arterials are typically based on HCM7 guidance on sce- nario generation and predictive reliability. However, because of the typically limited data availabil- ity on rural highways, the reliability analysis in this report is focused on historical probe vehicle data and meant to be used in conjunction with the HCM7 automobile LOS methodology. Demand for cycling in urban areas and on rural highways is on the increase, yet it is not clear which analysis procedures are best suited to cover large rural highway facilities or statewide analyses. This Guide summarizes existing HCM7 analysis methods used for the bicycle mode as well as two popular alternatives: Level of Stress and Bicycle Compatibility Index. In addi- tion, it proposes recommendations for future bicycle operations research needs based on two qualitative surveys. This Guide is intended to serve as a companion to the HCM. 1.2 Guide Scope and Limitations The facility-level analysis is important in assessing current conditions along important cor- ridors for people and goods movement. The analysis methodology is also useful for evaluating facility performance for situations where significant changes in traffic demand or capacity may occur, such as in the following scenarios: • Evaluation of a rural highway route to handle potential evacuation traffic demand (e.g., forest fire, hurricane). • Evaluation of a rural highway route to handle potential diversion traffic demand due to an alternative route being closed or restricted due to construction, an incident (e.g., truck roll- over), or a natural disaster (e.g., landslide/avalanche, flooding). • Evaluation of a rural highway route to handle short-term spikes in traffic demand due to recre- ational activities (e.g., weekend ski season, Labor Day weekend beach travel, concert/festival). • Evaluation of a rural highway route to handle short-term spikes in traffic demand and/or heavy vehicles due to season-specific activities (e.g., crop harvesting in farming regions). • Evaluation of a rural highway route to handle a large increase in traffic demand as projected to occur as part of construction of a large generator for regional economic development (e.g., tribal casino, Amazon distribution warehouse). The parameters of the scope for this project generally required that the developed LOS evaluation methodology make use of the existing analysis methodologies within the HCM. However,

6   Reliability and Quality of Service Evaluation Methods for Rural Highways: A Guide to facilitate the development of a facility-level evaluation methodology, it was necessary to develop a few new computational procedures, largely to connect component pieces of highway into a single facility for evaluation purposes. Furthermore, some planning-level simplifications, such as the classification of terrain and the treatment of signal progression along an arterial, were implemented. Such simplifications were included to (1) reduce the segmentation process effort and/or (2) reduce the complexity of the calculation process where the return on such precision is minimized for the relatively long lengths of rural highway. This Guide also introduces a method for evaluating rural highway operations with the use of probe vehicle data. Over the last decade, the spatial and temporal coverage of probe vehicle data available from third-party vendors has improved immensely. Many state agencies now pay for subscriptions to providers of such data and are making use of the data to supplement their traditional data sources (e.g., fixed-point sensors) for assessing and managing traffic operations on their roadways. These data generally consist of average travel times/speeds and correspond to a sample of the vehicles traveling along a given roadway segment; thus, an important limitation of this data source is that it does not include flow rate. To obtain flow rates, agencies may supplement the probe vehicle data with fixed-point sensors or conduct field data collection using portable sensors on a regular basis. Another limitation of probe data is that their accuracy drops in low-traffic conditions, such as late night or early morning when fewer samples are available. The spatial resolution of the probe data might also pose a challenge, particularly when analyzing very short segments. Probe data are typically reported for predefined segments of roadway— often referred to as traffic message channels (TMCs). The TMCs’ length usually ranges from approximately 0.6 to 2 miles, and the TMC boundaries do not necessarily match those used for other traffic analysis purposes, such as for segments as defined by the HCM. The temporal reso- lution of the measurements ranges from approximately 1 to 5 minutes. Currently, the quantity and quality of probe vehicle data are much greater for urban areas than for rural areas. This gap, however, will continue to narrow with time. The automobile LOS methodology presented in this Guide is not intended to handle over- saturated traffic flow conditions. For multilane and two-lane highway segments, the HCM7 analysis methodologies do not include any mechanism to deal with traffic demand exceeding capacity. In some instances, short periods of demand exceeding capacity can be accounted for in the intersection analysis methodologies. The HCM7 should be consulted for further information on this topic. 1.3 Guide Organization This Guide is organized into three parts. Part I focuses on analysis methodology descriptions and consists of the following chapters: • Chapter 1—Introduction. This chapter provides an overview of the Guide’s purpose, scope, and limitations. It also discusses the format of the Guide and how the user community can contribute to the content. Further, the chapter provides a summary of the research behind this Guide. • Chapter 2—Rural Highway LOS for Automobiles. This chapter provides the methodology used to assess traffic operational quality and LOS for the automobile mode on rural highways. • Chapter 3—Automobile Travel Time Reliability. This chapter describes methods for quanti- fying travel time reliability, from a historical perspective, based on probe vehicle travel speed measurements.

Introduction  7 • Chapter 4—Bicycle Operations Analysis on Rural Highways. This chapter provides an overview of commonly used methods for assessing bicycle operations, recommendations for which methods are most appropriate for certain bicycle analysis situations, and recom- mendations for enhancements to the commonly used analysis methods. Part II consists of Chapters 5 through 12. It provides an overview of the component HCM7 analysis methodologies that are incorporated in the rural highway analysis methodology for automobiles. This part does not replicate the full content of the relevant HCM7 analysis method- ologies but rather summarizes the chapters and sections that are used within the rural highway analysis methodology. This material will be updated as necessary to reflect updates to the HCM. Part III consists of Chapters 13 through 19. It focuses on case studies using real-world routes to demonstrate the analysis methodologies in the Guide. The material is contained in a separate part of the report to facilitate the inclusion of additional case studies in the future. 1.4 Supporting Resources and Tools Several complementary resources are provided with this Guide. LOS Calculation Software and Case Study Input Files. The LOS calculation methodology described in this Guide is available in the software tool HCM-CALC. This program can be downloaded from https://github.com/swash17/HCM-CALC. The Computational Engine chapter also provides an overview of the software tool. Input data files for the case studies for the HCM-CALC software are also available. Scripting Code/Tools for “Reliability” Calculations/Output. Scripts, written in Python programming code (https://www.python.org/about/), to process probe vehicle data and produce a variety of visualizations are provided. More information is provided in Chapter 3. KML Files for Case Studies. For each of the case studies, supporting information for the segmentation process is included in a KML file. [A Keyhole Markup Language (KML) file contains geographic and supporting data for use with geographic software visualization tools. More information can be found at https://en.wikipedia.org/wiki/Keyhole_Markup_Language.] Detailed information about the KML files is provided in the introduction to Part 3: Case Studies. More information about these resources is contained in Chapter 15. This Guide was developed through NCHRP Project 08-135: “Reliability and Quality of Service Evaluation Methods for Rural Highways.” A conduct of research report was also produced for this project, published as NCHRP Web-Only Document 392: Developing a Guide for Rural High- ways: Reliability and Quality of Service Evaluation Methods (Washburn et al. 2024). That report contains additional details about the development of the material contained in this Guide.

Rural highways account for a significant portion of the National Highway System and serve many vital mobility purposes. The Highway Capacity Manual , the standard reference for traffic analysis methodologies, contains analysis methodologies for all of the individual segments or intersections that may constitute a rural highway; however, it does not include a methodology or guidelines for connecting the individual roadway segments into a connected, cohesive, facility-level analysis.

NCHRP Research Report 1102: Reliability and Quality of Service Evaluation Methods for Rural Highways: A Guide , from TRB's National Cooperative Highway Research Program, presents a guide for traffic analysis of rural highways that connects the individual highway segments into a connected, cohesive, facility-level analysis.

Supplemental to the report is NCHRP Web-Only Document 392: Developing a Guide for Rural Highways: Reliability and Quality of Service Evaluation Methods .

READ FREE ONLINE

Welcome to OpenBook!

You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

Do you want to take a quick tour of the OpenBook's features?

Show this book's table of contents , where you can jump to any chapter by name.

...or use these buttons to go back to the previous chapter or skip to the next one.

Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

To search the entire text of this book, type in your search term here and press Enter .

Share a link to this book page on your preferred social network or via email.

View our suggested citation for this chapter.

Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

Get Email Updates

Do you enjoy reading reports from the Academies online for free ? Sign up for email notifications and we'll let you know about new publications in your areas of interest when they're released.

Logo for VCU Pressbooks

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Part 3: Using quantitative methods

11. Quantitative measurement

Chapter outline.

  • Conceptual definitions (17 minute read)
  • Operational definitions (36 minute read)
  • Measurement quality (21 minute read)
  • Ethical and social justice considerations (15 minute read)

Content warning: examples in this chapter contain references to ethnocentrism, toxic masculinity, racism in science, drug use, mental health and depression, psychiatric inpatient care, poverty and basic needs insecurity, pregnancy, and racism and sexism in the workplace and higher education.

11.1 Conceptual definitions

Learning objectives.

Learners will be able to…

  • Define measurement and conceptualization
  • Apply Kaplan’s three categories to determine the complexity of measuring a given variable
  • Identify the role previous research and theory play in defining concepts
  • Distinguish between unidimensional and multidimensional concepts
  • Critically apply reification to how you conceptualize the key variables in your research project

In social science, when we use the term  measurement , we mean the process by which we describe and ascribe meaning to the key facts, concepts, or other phenomena that we are investigating. At its core, measurement is about defining one’s terms in as clear and precise a way as possible. Of course, measurement in social science isn’t quite as simple as using a measuring cup or spoon, but there are some basic tenets on which most social scientists agree when it comes to measurement. We’ll explore those, as well as some of the ways that measurement might vary depending on your unique approach to the study of your topic.

An important point here is that measurement does not require any particular instruments or procedures. What it does require is a systematic procedure for assigning scores, meanings, and descriptions to individuals or objects so that those scores represent the characteristic of interest. You can measure phenomena in many different ways, but you must be sure that how you choose to measure gives you information and data that lets you answer your research question. If you’re looking for information about a person’s income, but your main points of measurement have to do with the money they have in the bank, you’re not really going to find the information you’re looking for!

The question of what social scientists measure can be answered by asking yourself what social scientists study. Think about the topics you’ve learned about in other social work classes you’ve taken or the topics you’ve considered investigating yourself. Let’s consider Melissa Milkie and Catharine Warner’s study (2011) [1] of first graders’ mental health. In order to conduct that study, Milkie and Warner needed to have some idea about how they were going to measure mental health. What does mental health mean, exactly? And how do we know when we’re observing someone whose mental health is good and when we see someone whose mental health is compromised? Understanding how measurement works in research methods helps us answer these sorts of questions.

As you might have guessed, social scientists will measure just about anything that they have an interest in investigating. For example, those who are interested in learning something about the correlation between social class and levels of happiness must develop some way to measure both social class and happiness. Those who wish to understand how well immigrants cope in their new locations must measure immigrant status and coping. Those who wish to understand how a person’s gender shapes their workplace experiences must measure gender and workplace experiences (and get more specific about which experiences are under examination). You get the idea. Social scientists can and do measure just about anything you can imagine observing or wanting to study. Of course, some things are easier to observe or measure than others.

chapter 3 research methods reliability

Observing your variables

In 1964, philosopher Abraham Kaplan (1964) [2] wrote The   Conduct of Inquiry,  which has since become a classic work in research methodology (Babbie, 2010). [3] In his text, Kaplan describes different categories of things that behavioral scientists observe. One of those categories, which Kaplan called “observational terms,” is probably the simplest to measure in social science. Observational terms are the sorts of things that we can see with the naked eye simply by looking at them. Kaplan roughly defines them as conditions that are easy to identify and verify through direct observation. If, for example, we wanted to know how the conditions of playgrounds differ across different neighborhoods, we could directly observe the variety, amount, and condition of equipment at various playgrounds.

Indirect observables , on the other hand, are less straightforward to assess. In Kaplan’s framework, they are conditions that are subtle and complex that we must use existing knowledge and intuition to define. If we conducted a study for which we wished to know a person’s income, we’d probably have to ask them their income, perhaps in an interview or a survey. Thus, we have observed income, even if it has only been observed indirectly. Birthplace might be another indirect observable. We can ask study participants where they were born, but chances are good we won’t have directly observed any of those people being born in the locations they report.

Sometimes the measures that we are interested in are more complex and more abstract than observational terms or indirect observables. Think about some of the concepts you’ve learned about in other social work classes—for example, ethnocentrism. What is ethnocentrism? Well, from completing an introduction to social work class you might know that it has something to do with the way a person judges another’s culture. But how would you  measure  it? Here’s another construct: bureaucracy. We know this term has something to do with organizations and how they operate but measuring such a construct is trickier than measuring something like a person’s income. The theoretical concepts of ethnocentrism and bureaucracy represent ideas whose meanings we have come to agree on. Though we may not be able to observe these abstractions directly, we can observe their components.

Kaplan referred to these more abstract things that behavioral scientists measure as constructs.  Constructs  are “not observational either directly or indirectly” (Kaplan, 1964, p. 55), [4] but they can be defined based on observables. For example, the construct of bureaucracy could be measured by counting the number of supervisors that need to approve routine spending by public administrators. The greater the number of administrators that must sign off on routine matters, the greater the degree of bureaucracy. Similarly, we might be able to ask a person the degree to which they trust people from different cultures around the world and then assess the ethnocentrism inherent in their answers. We can measure constructs like bureaucracy and ethnocentrism by defining them in terms of what we can observe. [5]

The idea of coming up with your own measurement tool might sound pretty intimidating at this point. The good news is that if you find something in the literature that works for you, you can use it (with proper attribution, of course). If there are only pieces of it that you like, you can reuse those pieces (with proper attribution and describing/justifying any changes). You don’t always have to start from scratch!

Look at the variables in your research question.

  • Classify them as direct observables, indirect observables, or constructs.
  • Do you think measuring them will be easy or hard?
  • What are your first thoughts about how to measure each variable? No wrong answers here, just write down a thought about each variable.

chapter 3 research methods reliability

Measurement starts with conceptualization

In order to measure the concepts in your research question, we first have to understand what we think about them. As an aside, the word concept  has come up quite a bit, and it is important to be sure we have a shared understanding of that term. A  concept is the notion or image that we conjure up when we think of some cluster of related observations or ideas. For example, masculinity is a concept. What do you think of when you hear that word? Presumably, you imagine some set of behaviors and perhaps even a particular style of self-presentation. Of course, we can’t necessarily assume that everyone conjures up the same set of ideas or images when they hear the word  masculinity . While there are many possible ways to define the term and some may be more common or have more support than others, there is no universal definition of masculinity. What counts as masculine may shift over time, from culture to culture, and even from individual to individual (Kimmel, 2008). This is why defining our concepts is so important.\

Not all researchers clearly explain their theoretical or conceptual framework for their study, but they should! Without understanding how a researcher has defined their key concepts, it would be nearly impossible to understand the meaning of that researcher’s findings and conclusions. Back in Chapter 7 , you developed a theoretical framework for your study based on a survey of the theoretical literature in your topic area. If you haven’t done that yet, consider flipping back to that section to familiarize yourself with some of the techniques for finding and using theories relevant to your research question. Continuing with our example on masculinity, we would need to survey the literature on theories of masculinity. After a few queries on masculinity, I found a wonderful article by Wong (2010) [6] that analyzed eight years of the journal Psychology of Men & Masculinity and analyzed how often different theories of masculinity were used . Not only can I get a sense of which theories are more accepted and which are more marginal in the social science on masculinity, I am able to identify a range of options from which I can find the theory or theories that will inform my project. 

Identify a specific theory (or more than one theory) and how it helps you understand…

  • Your independent variable(s).
  • Your dependent variable(s).
  • The relationship between your independent and dependent variables.

Rather than completing this exercise from scratch, build from your theoretical or conceptual framework developed in previous chapters.

In quantitative methods, conceptualization involves writing out clear, concise definitions for our key concepts. These are the kind of definitions you are used to, like the ones in a dictionary. A conceptual definition involves defining a concept in terms of other concepts, usually by making reference to how other social scientists and theorists have defined those concepts in the past. Of course, new conceptual definitions are created all the time because our conceptual understanding of the world is always evolving.

Conceptualization is deceptively challenging—spelling out exactly what the concepts in your research question mean to you. Following along with our example, think about what comes to mind when you read the term masculinity. How do you know masculinity when you see it? Does it have something to do with men or with social norms? If so, perhaps we could define masculinity as the social norms that men are expected to follow. That seems like a reasonable start, and at this early stage of conceptualization, brainstorming about the images conjured up by concepts and playing around with possible definitions is appropriate. However, this is just the first step. At this point, you should be beyond brainstorming for your key variables because you have read a good amount of research about them

In addition, we should consult previous research and theory to understand the definitions that other scholars have already given for the concepts we are interested in. This doesn’t mean we must use their definitions, but understanding how concepts have been defined in the past will help us to compare our conceptualizations with how other scholars define and relate concepts. Understanding prior definitions of our key concepts will also help us decide whether we plan to challenge those conceptualizations or rely on them for our own work. Finally, working on conceptualization is likely to help in the process of refining your research question to one that is specific and clear in what it asks. Conceptualization and operationalization (next section) are where “the rubber meets the road,” so to speak, and you have to specify what you mean by the question you are asking. As your conceptualization deepens, you will often find that your research question becomes more specific and clear.

If we turn to the literature on masculinity, we will surely come across work by Michael Kimmel , one of the preeminent masculinity scholars in the United States. After consulting Kimmel’s prior work (2000; 2008), [7] we might tweak our initial definition of masculinity. Rather than defining masculinity as “the social norms that men are expected to follow,” perhaps instead we’ll define it as “the social roles, behaviors, and meanings prescribed for men in any given society at any one time” (Kimmel & Aronson, 2004, p. 503). [8] Our revised definition is more precise and complex because it goes beyond addressing one aspect of men’s lives (norms), and addresses three aspects: roles, behaviors, and meanings. It also implies that roles, behaviors, and meanings may vary across societies and over time. Using definitions developed by theorists and scholars is a good idea, though you may find that you want to define things your own way.

As you can see, conceptualization isn’t as simple as applying any random definition that we come up with to a term. Defining our terms may involve some brainstorming at the very beginning. But conceptualization must go beyond that, to engage with or critique existing definitions and conceptualizations in the literature. Once we’ve brainstormed about the images associated with a particular word, we should also consult prior work to understand how others define the term in question. After we’ve identified a clear definition that we’re happy with, we should make sure that every term used in our definition will make sense to others. Are there terms used within our definition that also need to be defined? If so, our conceptualization is not yet complete. Our definition includes the concept of “social roles,” so we should have a definition for what those mean and become familiar with role theory to help us with our conceptualization. If we don’t know what roles are, how can we study them?

Let’s say we do all of that. We have a clear definition of the term masculinity with reference to previous literature and we also have a good understanding of the terms in our conceptual definition…then we’re done, right? Not so fast. You’ve likely met more than one man in your life, and you’ve probably noticed that they are not the same, even if they live in the same society during the same historical time period. This could mean there are dimensions of masculinity. In terms of social scientific measurement, concepts can be said to have multiple dimensions  when there are multiple elements that make up a single concept. With respect to the term  masculinity , dimensions could based on gender identity, gender performance, sexual orientation, etc.. In any of these cases, the concept of masculinity would be considered to have multiple dimensions.

While you do not need to spell out every possible dimension of the concepts you wish to measure, it is important to identify whether your concepts are unidimensional (and therefore relatively easy to define and measure) or multidimensional (and therefore require multi-part definitions and measures). In this way, how you conceptualize your variables determines how you will measure them in your study. Unidimensional concepts are those that are expected to have a single underlying dimension. These concepts can be measured using a single measure or test. Examples include simple concepts such as a person’s weight, time spent sleeping, and so forth. 

One frustrating this is that there is no clear demarcation between concepts that are inherently unidimensional or multidimensional. Even something as simple as age could be broken down into multiple dimensions including mental age and chronological age, so where does conceptualization stop? How far down the dimensional rabbit hole do we have to go? Researchers should consider two things. First, how important is this variable in your study? If age is not important in your study (maybe it is a control variable), it seems like a waste of time to do a lot of work drawing from developmental theory to conceptualize this variable. A unidimensional measure from zero to dead is all the detail we need. On the other hand, if we were measuring the impact of age on masculinity, conceptualizing our independent variable (age) as multidimensional may provide a richer understanding of its impact on masculinity. Finally, your conceptualization will lead directly to your operationalization of the variable, and once your operationalization is complete, make sure someone reading your study could follow how your conceptual definitions informed the measures you chose for your variables. 

Write a conceptual definition for your independent and dependent variables.

  • Cite and attribute definitions to other scholars, if you use their words.
  • Describe how your definitions are informed by your theoretical framework.
  • Place your definition in conversation with other theories and conceptual definitions commonly used in the literature.
  • Are there multiple dimensions of your variables?
  • Are any of these dimensions important for you to measure?

chapter 3 research methods reliability

Do researchers actually know what we’re talking about?

Conceptualization proceeds differently in qualitative research compared to quantitative research. Since qualitative researchers are interested in the understandings and experiences of their participants, it is less important for them to find one fixed definition for a concept before starting to interview or interact with participants. The researcher’s job is to accurately and completely represent how their participants understand a concept, not to test their own definition of that concept.

If you were conducting qualitative research on masculinity, you would likely consult previous literature like Kimmel’s work mentioned above. From your literature review, you may come up with a  working definition  for the terms you plan to use in your study, which can change over the course of the investigation. However, the definition that matters is the definition that your participants share during data collection. A working definition is merely a place to start, and researchers should take care not to think it is the only or best definition out there.

In qualitative inquiry, your participants are the experts (sound familiar, social workers?) on the concepts that arise during the research study. Your job as the researcher is to accurately and reliably collect and interpret their understanding of the concepts they describe while answering your questions. Conceptualization of concepts is likely to change over the course of qualitative inquiry, as you learn more information from your participants. Indeed, getting participants to comment on, extend, or challenge the definitions and understandings of other participants is a hallmark of qualitative research. This is the opposite of quantitative research, in which definitions must be completely set in stone before the inquiry can begin.

The contrast between qualitative and quantitative conceptualization is instructive for understanding how quantitative methods (and positivist research in general) privilege the knowledge of the researcher over the knowledge of study participants and community members. Positivism holds that the researcher is the “expert,” and can define concepts based on their expert knowledge of the scientific literature. This knowledge is in contrast to the lived experience that participants possess from experiencing the topic under examination day-in, day-out. For this reason, it would be wise to remind ourselves not to take our definitions too seriously and be critical about the limitations of our knowledge.

Conceptualization must be open to revisions, even radical revisions, as scientific knowledge progresses. While I’ve suggested consulting prior scholarly definitions of our concepts, you should not assume that prior, scholarly definitions are more real than the definitions we create. Likewise, we should not think that our own made-up definitions are any more real than any other definition. It would also be wrong to assume that just because definitions exist for some concept that the concept itself exists beyond some abstract idea in our heads. Building on the paradigmatic ideas behind interpretivism and the critical paradigm, researchers call the assumption that our abstract concepts exist in some concrete, tangible way is known as reification . It explores the power dynamics behind how we can create reality by how we define it.

Returning again to our example of masculinity. Think about our how our notions of masculinity have developed over the past few decades, and how different and yet so similar they are to patriarchal definitions throughout history. Conceptual definitions become more or less popular based on the power arrangements inside of social science the broader world. Western knowledge systems are privileged, while others are viewed as unscientific and marginal. The historical domination of social science by white men from WEIRD countries meant that definitions of masculinity were imbued their cultural biases and were designed explicitly and implicitly to preserve their power. This has inspired movements for cognitive justice as we seek to use social science to achieve global development.

Key Takeaways

  • Measurement is the process by which we describe and ascribe meaning to the key facts, concepts, or other phenomena that we are investigating.
  • Kaplan identified three categories of things that social scientists measure including observational terms, indirect observables, and constructs.
  • Some concepts have multiple elements or dimensions.
  • Researchers often use measures previously developed and studied by other researchers.
  • Conceptualization is a process that involves coming up with clear, concise definitions.
  • Conceptual definitions are based on the theoretical framework you are using for your study (and the paradigmatic assumptions underlying those theories).
  • Whether your conceptual definitions come from your own ideas or the literature, you should be able to situate them in terms of other commonly used conceptual definitions.
  • Researchers should acknowledge the limited explanatory power of their definitions for concepts and how oppression can shape what explanations are considered true or scientific.

Think historically about the variables in your research question.

  • How has our conceptual definition of your topic changed over time?
  • What scholars or social forces were responsible for this change?

Take a critical look at your conceptual definitions.

  • How participants might define terms for themselves differently, in terms of their daily experience?
  • On what cultural assumptions are your conceptual definitions based?
  • Are your conceptual definitions applicable across all cultures that will be represented in your sample?

11.2 Operational definitions

  • Define and give an example of indicators and attributes for a variable
  • Apply the three components of an operational definition to a variable
  • Distinguish between levels of measurement for a variable and how those differences relate to measurement
  • Describe the purpose of composite measures like scales and indices

Conceptual definitions are like dictionary definitions. They tell you what a concept means by defining it using other concepts. In this section we will move from the abstract realm (theory) to the real world (measurement). Operationalization is the process by which researchers spell out precisely how a concept will be measured in their study. It involves identifying the specific research procedures we will use to gather data about our concepts. If conceptually defining your terms means looking at theory, how do you operationally define your terms? By looking for indicators of when your variable is present or not, more or less intense, and so forth. Operationalization is probably the most challenging part of quantitative research, but once it’s done, the design and implementation of your study will be straightforward.

chapter 3 research methods reliability

Operationalization works by identifying specific  indicators that will be taken to represent the ideas we are interested in studying. If we are interested in studying masculinity, then the indicators for that concept might include some of the social roles prescribed to men in society such as breadwinning or fatherhood. Being a breadwinner or a father might therefore be considered indicators  of a person’s masculinity. The extent to which a man fulfills either, or both, of these roles might be understood as clues (or indicators) about the extent to which he is viewed as masculine.

Let’s look at another example of indicators. Each day, Gallup researchers poll 1,000 randomly selected Americans to ask them about their well-being. To measure well-being, Gallup asks these people to respond to questions covering six broad areas: physical health, emotional health, work environment, life evaluation, healthy behaviors, and access to basic necessities. Gallup uses these six factors as indicators of the concept that they are really interested in, which is well-being .

Identifying indicators can be even simpler than the examples described thus far. Political party affiliation is another relatively easy concept for which to identify indicators. If you asked a person what party they voted for in the last national election (or gained access to their voting records), you would get a good indication of their party affiliation. Of course, some voters split tickets between multiple parties when they vote and others swing from party to party each election, so our indicator is not perfect. Indeed, if our study were about political identity as a key concept, operationalizing it solely in terms of who they voted for in the previous election leaves out a lot of information about identity that is relevant to that concept. Nevertheless, it’s a pretty good indicator of political party affiliation.

Choosing indicators is not an arbitrary process. As described earlier, utilizing prior theoretical and empirical work in your area of interest is a great way to identify indicators in a scholarly manner. And you conceptual definitions will point you in the direction of relevant indicators. Empirical work will give you some very specific examples of how the important concepts in an area have been measured in the past and what sorts of indicators have been used. Often, it makes sense to use the same indicators as previous researchers; however, you may find that some previous measures have potential weaknesses that your own study will improve upon.

All of the examples in this chapter have dealt with questions you might ask a research participant on a survey or in a quantitative interview. If you plan to collect data from other sources, such as through direct observation or the analysis of available records, think practically about what the design of your study might look like and how you can collect data on various indicators feasibly. If your study asks about whether the participant regularly changes the oil in their car, you will likely not observe them directly doing so. Instead, you will likely need to rely on a survey question that asks them the frequency with which they change their oil or ask to see their car maintenance records.

  • What indicators are commonly used to measure the variables in your research question?
  • How can you feasibly collect data on these indicators?
  • Are you planning to collect your own data using a questionnaire or interview? Or are you planning to analyze available data like client files or raw data shared from another researcher’s project?

Remember, you need raw data . You research project cannot rely solely on the results reported by other researchers or the arguments you read in the literature. A literature review is only the first part of a research project, and your review of the literature should inform the indicators you end up choosing when you measure the variables in your research question.

Unlike conceptual definitions which contain other concepts, operational definition consists of the following components: (1) the variable being measured and its attributes, (2) the measure you will use, (3) how you plan to interpret the data collected from that measure to draw conclusions about the variable you are measuring.

Step 1: Specifying variables and attributes

The first component, the variable, should be the easiest part. At this point in quantitative research, you should have a research question that has at least one independent and at least one dependent variable. Remember that variables must be able to vary. For example, the United States is not a variable. Country of residence is a variable, as is patriotism. Similarly, if your sample only includes men, gender is a constant in your study, not a variable. A  constant is a characteristic that does not change in your study.

When social scientists measure concepts, they sometimes use the language of variables and attributes. A  variable refers to a quality or quantity that varies across people or situations. Attributes  are the characteristics that make up a variable. For example, the variable hair color would contain attributes like blonde, brown, black, red, gray, etc. A variable’s attributes determine its level of measurement. There are four possible levels of measurement: nominal, ordinal, interval, and ratio. The first two levels of measurement are  categorical , meaning their attributes are categories rather than numbers. The latter two levels of measurement are  continuous , meaning their attributes are numbers.

chapter 3 research methods reliability

Levels of measurement

Hair color is an example of a nominal level of measurement.  Nominal measures are categorical, and those categories cannot be mathematically ranked. As a brown-haired person (with some gray), I can’t say for sure that brown-haired people are better than blonde-haired people. As with all nominal levels of measurement, there is no ranking order between hair colors; they are simply different. That is what constitutes a nominal level of gender and race are also measured at the nominal level.

What attributes are contained in the variable  hair color ? While blonde, brown, black, and red are common colors, some people may not fit into these categories if we only list these attributes. My wife, who currently has purple hair, wouldn’t fit anywhere. This means that our attributes were not exhaustive. Exhaustiveness  means that all possible attributes are listed. We may have to list a lot of colors before we can meet the criteria of exhaustiveness. Clearly, there is a point at which exhaustiveness has been reasonably met. If a person insists that their hair color is  light burnt sienna , it is not your responsibility to list that as an option. Rather, that person would reasonably be described as brown-haired. Perhaps listing a category for  other color  would suffice to make our list of colors exhaustive.

What about a person who has multiple hair colors at the same time, such as red and black? They would fall into multiple attributes. This violates the rule of  mutual exclusivity , in which a person cannot fall into two different attributes. Instead of listing all of the possible combinations of colors, perhaps you might include a  multi-color  attribute to describe people with more than one hair color.

Making sure researchers provide mutually exclusive and exhaustive is about making sure all people are represented in the data record. For many years, the attributes for gender were only male or female. Now, our understanding of gender has evolved to encompass more attributes that better reflect the diversity in the world. Children of parents from different races were often classified as one race or another, even if they identified with both cultures. The option for bi-racial or multi-racial on a survey not only more accurately reflects the racial diversity in the real world but validates and acknowledges people who identify in that manner. If we did not measure race in this way, we would leave empty the data record for people who identify as biracial or multiracial, impairing our search for truth.

Unlike nominal-level measures, attributes at the  ordinal  level can be rank ordered. For example, someone’s degree of satisfaction in their romantic relationship can be ordered by rank. That is, you could say you are not at all satisfied, a little satisfied, moderately satisfied, or highly satisfied. Note that even though these have a rank order to them (not at all satisfied is certainly worse than highly satisfied), we cannot calculate a mathematical distance between those attributes. We can simply say that one attribute of an ordinal-level variable is more or less than another attribute.

This can get a little confusing when using rating scales . If you have ever taken a customer satisfaction survey or completed a course evaluation for school, you are familiar with rating scales. “On a scale of 1-5, with 1 being the lowest and 5 being the highest, how likely are you to recommend our company to other people?” That surely sounds familiar. Rating scales use numbers, but only as a shorthand, to indicate what attribute (highly likely, somewhat likely, etc.) the person feels describes them best. You wouldn’t say you are “2” likely to recommend the company, but you would say you are not very likely to recommend the company. Ordinal-level attributes must also be exhaustive and mutually exclusive, as with nominal-level variables.

At the  interval   level, attributes must also be exhaustive and mutually exclusive and there is equal distance between attributes. Interval measures are also continuous, meaning their attributes are numbers, rather than categories. IQ scores are interval level, as are temperatures in Fahrenheit and Celsius. Their defining characteristic is that we can say how much more or less one attribute differs from another. We cannot, however, say with certainty what the ratio of one attribute is in comparison to another. For example, it would not make sense to say that a person with an IQ score of 140 has twice the IQ of a person with a score of 70. However, the difference between IQ scores of 80 and 100 is the same as the difference between IQ scores of 120 and 140.

While we cannot say that someone with an IQ of 140 is twice as intelligent as someone with an IQ of 70 because IQ is measured at the interval level, we can say that someone with six siblings has twice as many as someone with three because number of siblings is measured at the ratio level. Finally, at the ratio   level, attributes are mutually exclusive and exhaustive, attributes can be rank ordered, the distance between attributes is equal, and attributes have a true zero point. Thus, with these variables, we can  say what the ratio of one attribute is in comparison to another. Examples of ratio-level variables include age and years of education. We know that a person who is 12 years old is twice as old as someone who is 6 years old. Height measured in meters and weight measured in kilograms are good examples. So are counts of discrete objects or events such as the number of siblings one has or the number of questions a student answers correctly on an exam. The differences between each level of measurement are visualized in Table 11.1.

Table 11.1 Criteria for Different Levels of Measurement
Nominal Ordinal Interval Ratio
Exhaustive X X X X
Mutually exclusive X X X X
Rank-ordered X X X
Equal distance between attributes X X
True zero point X

Levels of measurement=levels of specificity

We have spent time learning how to determine our data’s level of measurement. Now what? How could we use this information to help us as we measure concepts and develop measurement tools? First, the types of statistical tests that we are able to use are dependent on our data’s level of measurement. With nominal-level measurement, for example, the only available measure of central tendency is the mode. With ordinal-level measurement, the median or mode can be used as indicators of central tendency. Interval and ratio-level measurement are typically considered the most desirable because they permit for any indicators of central tendency to be computed (i.e., mean, median, or mode). Also, ratio-level measurement is the only level that allows meaningful statements about ratios of scores. The higher the level of measurement, the more complex statistical tests we are able to conduct. This knowledge may help us decide what kind of data we need to gather, and how.

That said, we have to balance this knowledge with the understanding that sometimes, collecting data at a higher level of measurement could negatively impact our studies. For instance, sometimes providing answers in ranges may make prospective participants feel more comfortable responding to sensitive items. Imagine that you were interested in collecting information on topics such as income, number of sexual partners, number of times someone used illicit drugs, etc. You would have to think about the sensitivity of these items and determine if it would make more sense to collect some data at a lower level of measurement (e.g., asking if they are sexually active or not (nominal) versus their total number of sexual partners (ratio).

Finally, sometimes when analyzing data, researchers find a need to change a data’s level of measurement. For example, a few years ago, a student was interested in studying the relationship between mental health and life satisfaction. This student used a variety of measures. One item asked about the number of mental health symptoms, reported as the actual number. When analyzing data, my student examined the mental health symptom variable and noticed that she had two groups, those with none or one symptoms and those with many symptoms. Instead of using the ratio level data (actual number of mental health symptoms), she collapsed her cases into two categories, few and many. She decided to use this variable in her analyses. It is important to note that you can move a higher level of data to a lower level of data; however, you are unable to move a lower level to a higher level.

  • Check that the variables in your research question can vary…and that they are not constants or one of many potential attributes of a variable.
  • Think about the attributes your variables have. Are they categorical or continuous? What level of measurement seems most appropriate?

chapter 3 research methods reliability

Step 2: Specifying measures for each variable

Let’s pick a social work research question and walk through the process of operationalizing variables to see how specific we need to get. I’m going to hypothesize that residents of a psychiatric unit who are more depressed are less likely to be satisfied with care. Remember, this would be a inverse relationship—as depression increases, satisfaction decreases. In this question, depression is my independent variable (the cause) and satisfaction with care is my dependent variable (the effect). Now we have identified our variables, their attributes, and levels of measurement, we move onto the second component: the measure itself.

So, how would you measure my key variables: depression and satisfaction? What indicators would you look for? Some students might say that depression could be measured by observing a participant’s body language. They may also say that a depressed person will often express feelings of sadness or hopelessness. In addition, a satisfied person might be happy around service providers and often express gratitude. While these factors may indicate that the variables are present, they lack coherence. Unfortunately, what this “measure” is actually saying is that “I know depression and satisfaction when I see them.” While you are likely a decent judge of depression and satisfaction, you need to provide more information in a research study for how you plan to measure your variables. Your judgment is subjective, based on your own idiosyncratic experiences with depression and satisfaction. They couldn’t be replicated by another researcher. They also can’t be done consistently for a large group of people. Operationalization requires that you come up with a specific and rigorous measure for seeing who is depressed or satisfied.

Finding a good measure for your variable depends on the kind of variable it is. Variables that are directly observable don’t come up very often in my students’ classroom projects, but they might include things like taking someone’s blood pressure, marking attendance or participation in a group, and so forth. To measure an indirectly observable variable like age, you would probably put a question on a survey that asked, “How old are you?” Measuring a variable like income might require some more thought, though. Are you interested in this person’s individual income or the income of their family unit? This might matter if your participant does not work or is dependent on other family members for income. Do you count income from social welfare programs? Are you interested in their income per month or per year? Even though indirect observables are relatively easy to measure, the measures you use must be clear in what they are asking, and operationalization is all about figuring out the specifics of what you want to know. For more complicated constructs, you will need compound measures (that use multiple indicators to measure a single variable).

How you plan to collect your data also influences how you will measure your variables. For social work researchers using secondary data like client records as a data source, you are limited by what information is in the data sources you can access. If your organization uses a given measurement for a mental health outcome, that is the one you will use in your study. Similarly, if you plan to study how long a client was housed after an intervention using client visit records, you are limited by how their caseworker recorded their housing status in the chart. One of the benefits of collecting your own data is being able to select the measures you feel best exemplify your understanding of the topic.

Measuring unidimensional concepts

The previous section mentioned two important considerations: how complicated the variable is and how you plan to collect your data. With these in hand, we can use the level of measurement to further specify how you will measure your variables and consider specialized rating scales developed by social science researchers.

Measurement at each level

Nominal measures assess categorical variables. These measures are used for variables or indicators that have mutually exclusive attributes, but that cannot be rank-ordered. Nominal measures ask about the variable and provide names or labels for different attribute values like social work, counseling, and nursing for the variable profession. Nominal measures are relatively straightforward.

Ordinal measures often use a rating scale. It is an ordered set of responses that participants must choose from. Figure 11.1 shows several examples. The number of response options on a typical rating scale is usualy five or seven, though it can range from three to 11. Five-point scales are best for unipolar scales where only one construct is tested, such as frequency (Never, Rarely, Sometimes, Often, Always). Seven-point scales are best for bipolar scales where there is a dichotomous spectrum, such as liking (Like very much, Like somewhat, Like slightly, Neither like nor dislike, Dislike slightly, Dislike somewhat, Dislike very much). For bipolar questions, it is useful to offer an earlier question that branches them into an area of the scale; if asking about liking ice cream, first ask “Do you generally like or dislike ice cream?” Once the respondent chooses like or dislike, refine it by offering them relevant choices from the seven-point scale. Branching improves both reliability and validity (Krosnick & Berent, 1993). [9] Although you often see scales with numerical labels, it is best to only present verbal labels to the respondents but convert them to numerical values in the analyses. Avoid partial labels or length or overly specific labels. In some cases, the verbal labels can be supplemented with (or even replaced by) meaningful graphics. The last rating scale shown in Figure 11.1 is a visual-analog scale, on which participants make a mark somewhere along the horizontal line to indicate the magnitude of their response.

chapter 3 research methods reliability

Interval measures are those where the values measured are not only rank-ordered, but are also equidistant from adjacent attributes. For example, the temperature scale (in Fahrenheit or Celsius), where the difference between 30 and 40 degree Fahrenheit is the same as that between 80 and 90 degree Fahrenheit. Likewise, if you have a scale that asks respondents’ annual income using the following attributes (ranges): $0 to 10,000, $10,000 to 20,000, $20,000 to 30,000, and so forth, this is also an interval measure, because the mid-point of each range (i.e., $5,000, $15,000, $25,000, etc.) are equidistant from each other. The intelligence quotient (IQ) scale is also an interval measure, because the measure is designed such that the difference between IQ scores 100 and 110 is supposed to be the same as between 110 and 120 (although we do not really know whether that is truly the case). Interval measures allow us to examine “how much more” is one attribute when compared to another, which is not possible with nominal or ordinal measures. You may find researchers who “pretend” (incorrectly) that ordinal rating scales are actually interval measures so that we can use different statistical techniques for analyzing them. As we will discuss in the latter part of the chapter, this is a mistake because there is no way to know whether the difference between a 3 and a 4 on a rating scale is the same as the difference between a 2 and a 3. Those numbers are just placeholders for categories.

Ratio measures are those that have all the qualities of nominal, ordinal, and interval scales, and in addition, also have a “true zero” point (where the value zero implies lack or non-availability of the underlying construct). Think about how to measure the number of people working in human resources at a social work agency. It could be one, several, or none (if the company contracts out for those services). Measuring interval and ratio data is relatively easy, as people either select or input a number for their answer. If you ask a person how many eggs they purchased last week, they can simply tell you they purchased `a dozen eggs at the store, two at breakfast on Wednesday, or none at all.

Commonly used rating scales in questionnaires

The level of measurement will give you the basic information you need, but social scientists have developed specialized instruments for use in questionnaires, a common tool used in quantitative research. As we mentioned before, if you plan to source your data from client files or previously published results

Although Likert scale is a term colloquially used to refer to almost any rating scale (e.g., a 0-to-10 life satisfaction scale), it has a much more precise meaning. In the 1930s, researcher Rensis Likert (pronounced LICK-ert) created a new approach for measuring people’s attitudes (Likert, 1932) . [10]  It involves presenting people with several statements—including both favorable and unfavorable statements—about some person, group, or idea. Respondents then express their agreement or disagreement with each statement on a 5-point scale:  Strongly Agree ,  Agree ,  Neither Agree nor Disagree ,  Disagree ,  Strongly Disagree . Numbers are assigned to each response a nd then summed across all items to produce a score representing the attitude toward the person, group, or idea. For items that are phrased in an opposite direction (e.g., negatively worded statements instead of positively worded statements), reverse coding is used so that the numerical scoring of statements also runs in the opposite direction.  The entire set of items came to be called a Likert scale, as indicated in Table 11.2 below.

Unless you are measuring people’s attitude toward something by assessing their level of agreement with several statements about it, it is best to avoid calling it a Likert scale. You are probably just using a rating scale. Likert scales allow for more granularity (more finely tuned response) than yes/no items, including whether respondents are neutral to the statement. Below is an example of how we might use a Likert scale to assess your attitudes about research as you work your way through this textbook.

Table 11.2 Likert scale
I like research more now than when I started reading this book.
This textbook is easy to use.
I feel confident about how well I understand levels of measurement.
This textbook is helping me plan my research proposal.

Semantic differential scales are composite (multi-item) scales in which respondents are asked to indicate their opinions or feelings toward a single statement using different pairs of adjectives framed as polar opposites. Whereas in the above Likert scale, the participant is asked how much they agree or disagree with a statement, in a semantic differential scale the participant is asked to indicate how they feel about a specific item. This makes the s emantic differential scale an excellent technique for measuring people’s attitudes or feelings toward objects, events, or behaviors. Table 11.3 is an example of a semantic differential scale that was created to assess participants’ feelings about this textbook. 

Very much Somewhat Neither Somewhat Very much
Boring Exciting
Useless Useful
Hard Easy
Irrelevant Applicable

This composite scale was designed by Louis Guttman and uses a series of items arranged in increasing order of intensity (least intense to most intense) of the concept. This type of scale allows us to understand the intensity of beliefs or feelings. Each item in the above Guttman scale has a weight (this is not indicated on the tool) which varies with the intensity of that item, and the weighted combination of each response is used as an aggregate measure of an observation.

Example Guttman Scale Items

  • I often felt the material was not engaging                               Yes/No
  • I was often thinking about other things in class                     Yes/No
  • I was often working on other tasks during class                     Yes/No
  • I will work to abolish research from the curriculum              Yes/No

Notice how the items move from lower intensity to higher intensity. A researcher reviews the yes answers and creates a score for each participant.

Composite measures: Scales and indices

Depending on your research design, your measure may be something you put on a survey or pre/post-test that you give to your participants. For a variable like age or income, one well-worded question may suffice. Unfortunately, most variables in the social world are not so simple. Depression and satisfaction are multidimensional concepts. Relying on a single indicator like a question that asks “Yes or no, are you depressed?” does not encompass the complexity of depression, including issues with mood, sleeping, eating, relationships, and happiness. There is no easy way to delineate between multidimensional and unidimensional concepts, as its all in how you think about your variable. Satisfaction could be validly measured using a unidimensional ordinal rating scale. However, if satisfaction were a key variable in our study, we would need a theoretical framework and conceptual definition for it. That means we’d probably have more indicators to ask about like timeliness, respect, sensitivity, and many others, and we would want our study to say something about what satisfaction truly means in terms of our other key variables. However, if satisfaction is not a key variable in your conceptual framework, it makes sense to operationalize it as a unidimensional concept.

For more complicated measures, researchers use scales and indices (sometimes called indexes) to measure their variables because they assess multiple indicators to develop a composite (or total) score. Co mposite scores provide a much greater understanding of concepts than a single item could. Although we won’t delve too deeply into the process of scale development, we will cover some important topics for you to understand how scales and indices developed by other researchers can be used in your project.

Although they exhibit differences (which will later be discussed) the two have in common various factors.

  • Both are ordinal measures of variables.
  • Both can order the units of analysis in terms of specific variables.
  • Both are composite measures .

chapter 3 research methods reliability

The previous section discussed how to measure respondents’ responses to predesigned items or indicators belonging to an underlying construct. But how do we create the indicators themselves? The process of creating the indicators is called scaling. More formally, scaling is a branch of measurement that involves the construction of measures by associating qualitative judgments about unobservable constructs with quantitative, measurable metric units. Stevens (1946) [11] said, “Scaling is the assignment of objects to numbers according to a rule.” This process of measuring abstract concepts in concrete terms remains one of the most difficult tasks in empirical social science research.

The outcome of a scaling process is a scale , which is an empirical structure for measuring items or indicators of a given construct. Understand that multidimensional “scales”, as discussed in this section, are a little different from “rating scales” discussed in the previous section. A rating scale is used to capture the respondents’ reactions to a given item on a questionnaire. For example, an ordinally scaled item captures a value between “strongly disagree” to “strongly agree.” Attaching a rating scale to a statement or instrument is not scaling. Rather, scaling is the formal process of developing scale items, before rating scales can be attached to those items.

If creating your own scale sounds painful, don’t worry! For most multidimensional variables, you would likely be duplicating work that has already been done by other researchers. Specifically, this is a branch of science called psychometrics. You do not need to create a scale for depression because scales such as the Patient Health Questionnaire (PHQ-9), the Center for Epidemiologic Studies Depression Scale (CES-D), and Beck’s Depression Inventory (BDI) have been developed and refined over dozens of years to measure variables like depression. Similarly, scales such as the Patient Satisfaction Questionnaire (PSQ-18) have been developed to measure satisfaction with medical care. As we will discuss in the next section, these scales have been shown to be reliable and valid. While you could create a new scale to measure depression or satisfaction, a study with rigor would pilot test and refine that new scale over time to make sure it measures the concept accurately and consistently. This high level of rigor is often unachievable in student research projects because of the cost and time involved in pilot testing and validating, so using existing scales is recommended.

Unfortunately, there is no good one-stop=shop for psychometric scales. The Mental Measurements Yearbook provides a searchable database of measures for social science variables, though it woefully incomplete and often does not contain the full documentation for scales in its database. You can access it from a university library’s list of databases. If you can’t find anything in there, your next stop should be the methods section of the articles in your literature review. The methods section of each article will detail how the researchers measured their variables, and often the results section is instructive for understanding more about measures. In a quantitative study, researchers may have used a scale to measure key variables and will provide a brief description of that scale, its names, and maybe a few example questions. If you need more information, look at the results section and tables discussing the scale to get a better idea of how the measure works. Looking beyond the articles in your literature review, searching Google Scholar using queries like “depression scale” or “satisfaction scale” should also provide some relevant results. For example, searching for documentation for the Rosenberg Self-Esteem Scale (which we will discuss in the next section), I found this report from researchers investigating acceptance and commitment therapy which details this scale and many others used to assess mental health outcomes. If you find the name of the scale somewhere but cannot find the documentation (all questions and answers plus how to interpret the scale), a general web search with the name of the scale and “.pdf” may bring you to what you need. Or, to get professional help with finding information, always ask a librarian!

Unfortunately, these approaches do not guarantee that you will be able to view the scale itself or get information on how it is interpreted. Many scales cost money to use and may require training to properly administer. You may also find scales that are related to your variable but would need to be slightly modified to match your study’s needs. You could adapt a scale to fit your study, however changing even small parts of a scale can influence its accuracy and consistency. While it is perfectly acceptable in student projects to adapt a scale without testing it first (time may not allow you to do so), pilot testing is always recommended for adapted scales, and researchers seeking to draw valid conclusions and publish their results must take this additional step.

An index is a composite score derived from aggregating measures of multiple concepts (called components) using a set of rules and formulas. It is different from a scale. Scales also aggregate measures; however, these measures examine different dimensions or the same dimension of a single construct. A well-known example of an index is the consumer price index (CPI), which is computed every month by the Bureau of Labor Statistics of the U.S. Department of Labor. The CPI is a measure of how much consumers have to pay for goods and services (in general) and is divided into eight major categories (food and beverages, housing, apparel, transportation, healthcare, recreation, education and communication, and “other goods and services”), which are further subdivided into more than 200 smaller items. Each month, government employees call all over the country to get the current prices of more than 80,000 items. Using a complicated weighting scheme that takes into account the location and probability of purchase for each item, analysts then combine these prices into an overall index score using a series of formulas and rules.

Another example of an index is the Duncan Socioeconomic Index (SEI). This index is used to quantify a person’s socioeconomic status (SES) and is a combination of three concepts: income, education, and occupation. Income is measured in dollars, education in years or degrees achieved, and occupation is classified into categories or levels by status. These very different measures are combined to create an overall SES index score. However, SES index measurement has generated a lot of controversy and disagreement among researchers.

The process of creating an index is similar to that of a scale. First, conceptualize (define) the index and its constituent components. Though this appears simple, there may be a lot of disagreement on what components (concepts/constructs) should be included or excluded from an index. For instance, in the SES index, isn’t income correlated with education and occupation? And if so, should we include one component only or all three components? Reviewing the literature, using theories, and/or interviewing experts or key stakeholders may help resolve this issue. Second, operationalize and measure each component. For instance, how will you categorize occupations, particularly since some occupations may have changed with time (e.g., there were no Web developers before the Internet)? As we will see in step three below, researchers must create a rule or formula for calculating the index score. Again, this process may involve a lot of subjectivity, so validating the index score using existing or new data is important.

Scale and index development at often taught in their own course in doctoral education, so it is unreasonable for you to expect to develop a consistently accurate measure within the span of a week or two. Using available indices and scales is recommended for this reason.

Differences between scales and indices

Though indices and scales yield a single numerical score or value representing a concept of interest, they are different in many ways. First, indices often comprise components that are very different from each other (e.g., income, education, and occupation in the SES index) and are measured in different ways. Conversely, scales typically involve a set of similar items that use the same rating scale (such as a five-point Likert scale about customer satisfaction).

Second, indices often combine objectively measurable values such as prices or income, while scales are designed to assess subjective or judgmental constructs such as attitude, prejudice, or self-esteem. Some argue that the sophistication of the scaling methodology makes scales different from indexes, while others suggest that indexing methodology can be equally sophisticated. Nevertheless, indexes and scales are both essential tools in social science research.

Scales and indices seem like clean, convenient ways to measure different phenomena in social science, but just like with a lot of research, we have to be mindful of the assumptions and biases underneath. What if a scale or an index was developed using only White women as research participants? Is it going to be useful for other groups? It very well might be, but when using a scale or index on a group for whom it hasn’t been tested, it will be very important to evaluate the validity and reliability of the instrument, which we address in the rest of the chapter.

Finally, it’s important to note that while scales and indices are often made up of nominal or ordinal variables, when we analyze them into composite scores, we will treat them as interval/ratio variables.

  • Look back to your work from the previous section, are your variables unidimensional or multidimensional?
  • Describe the specific measures you will use (actual questions and response options you will use with participants) for each variable in your research question.
  • If you are using a measure developed by another researcher but do not have all of the questions, response options, and instructions needed to implement it, put it on your to-do list to get them.

chapter 3 research methods reliability

Step 3: How you will interpret your measures

The final stage of operationalization involves setting the rules for how the measure works and how the researcher should interpret the results. Sometimes, interpreting a measure can be incredibly easy. If you ask someone their age, you’ll probably interpret the results by noting the raw number (e.g., 22) someone provides and that it is lower or higher than other people’s ages. However, you could also recode that person into age categories (e.g., under 25, 20-29-years-old, generation Z, etc.). Even scales may be simple to interpret. If there is a scale of problem behaviors, one might simply add up the number of behaviors checked off–with a range from 1-5 indicating low risk of delinquent behavior, 6-10 indicating the student is moderate risk, etc. How you choose to interpret your measures should be guided by how they were designed, how you conceptualize your variables, the data sources you used, and your plan for analyzing your data statistically. Whatever measure you use, you need a set of rules for how to take any valid answer a respondent provides to your measure and interpret it in terms of the variable being measured.

For more complicated measures like scales, refer to the information provided by the author for how to interpret the scale. If you can’t find enough information from the scale’s creator, look at how the results of that scale are reported in the results section of research articles. For example, Beck’s Depression Inventory (BDI-II) uses 21 statements to measure depression and respondents rate their level of agreement on a scale of 0-3. The results for each question are added up, and the respondent is put into one of three categories: low levels of depression (1-16), moderate levels of depression (17-30), or severe levels of depression (31 and over).

One common mistake I see often is that students will introduce another variable into their operational definition. This is incorrect. Your operational definition should mention only one variable—the variable being defined. While your study will certainly draw conclusions about the relationships between variables, that’s not what operationalization is. Operationalization specifies what instrument you will use to measure your variable and how you plan to interpret the data collected using that measure.

Operationalization is probably the trickiest component of basic research methods, so please don’t get frustrated if it takes a few drafts and a lot of feedback to get to a workable definition. At the time of this writing, I am in the process of operationalizing the concept of “attitudes towards research methods.” Originally, I thought that I could gauge students’ attitudes toward research methods by looking at their end-of-semester course evaluations. As I became aware of the potential methodological issues with student course evaluations, I opted to use focus groups of students to measure their common beliefs about research. You may recall some of these opinions from Chapter 1 , such as the common beliefs that research is boring, useless, and too difficult. After the focus group, I created a scale based on the opinions I gathered, and I plan to pilot test it with another group of students. After the pilot test, I expect that I will have to revise the scale again before I can implement the measure in a real social work research project. At the time I’m writing this, I’m still not completely done operationalizing this concept.

  • Operationalization involves spelling out precisely how a concept will be measured.
  • Operational definitions must include the variable, the measure, and how you plan to interpret the measure.
  • There are four different levels of measurement: nominal, ordinal, interval, and ratio (in increasing order of specificity).
  • Scales and indices are common ways to collect information and involve using multiple indicators in measurement.
  • A key difference between a scale and an index is that a scale contains multiple indicators for one concept, whereas an indicator examines multiple concepts (components).
  • Using scales developed and refined by other researchers can improve the rigor of a quantitative study.

Use the research question that you developed in the previous chapters and find a related scale or index that researchers have used. If you have trouble finding the exact phenomenon you want to study, get as close as you can.

  • What is the level of measurement for each item on each tool? Take a second and think about why the tool’s creator decided to include these levels of measurement. Identify any levels of measurement you would change and why.
  • If these tools don’t exist for what you are interested in studying, why do you think that is?

11.3 Measurement quality

  • Define and describe the types of validity and reliability
  • Assess for systematic error

The previous chapter provided insight into measuring concepts in social work research. We discussed the importance of identifying concepts and their corresponding indicators as a way to help us operationalize them. In essence, we now understand that when we think about our measurement process, we must be intentional and thoughtful in the choices that we make. This section is all about how to judge the quality of the measures you’ve chosen for the key variables in your research question.

Reliability

First, let’s say we’ve decided to measure alcoholism by asking people to respond to the following question: Have you ever had a problem with alcohol? If we measure alcoholism this way, then it is likely that anyone who identifies as an alcoholic would respond “yes.” This may seem like a good way to identify our group of interest, but think about how you and your peer group may respond to this question. Would participants respond differently after a wild night out, compared to any other night? Could an infrequent drinker’s current headache from last night’s glass of wine influence how they answer the question this morning? How would that same person respond to the question before consuming the wine? In each cases, the same person might respond differently to the same question at different points, so it is possible that our measure of alcoholism has a reliability problem.  Reliability  in measurement is about consistency.

One common problem of reliability with social scientific measures is memory. If we ask research participants to recall some aspect of their own past behavior, we should try to make the recollection process as simple and straightforward for them as possible. Sticking with the topic of alcohol intake, if we ask respondents how much wine, beer, and liquor they’ve consumed each day over the course of the past 3 months, how likely are we to get accurate responses? Unless a person keeps a journal documenting their intake, there will very likely be some inaccuracies in their responses. On the other hand, we might get more accurate responses if we ask a participant how many drinks of any kind they have consumed in the past week.

Reliability can be an issue even when we’re not reliant on others to accurately report their behaviors. Perhaps a researcher is interested in observing how alcohol intake influences interactions in public locations. They may decide to conduct observations at a local pub by noting how many drinks patrons consume and how their behavior changes as their intake changes. What if the researcher has to use the restroom, and the patron next to them takes three shots of tequila during the brief period the researcher is away from their seat? The reliability of this researcher’s measure of alcohol intake depends on their ability to physically observe every instance of patrons consuming drinks. If they are unlikely to be able to observe every such instance, then perhaps their mechanism for measuring this concept is not reliable.

The following subsections describe the types of reliability that are important for you to know about, but keep in mind that you may see other approaches to judging reliability mentioned in the empirical literature.

Test-retest reliability

When researchers measure a construct that they assume to be consistent across time, then the scores they obtain should also be consistent across time. Test-retest reliability is the extent to which this is actually the case. For example, intelligence is generally thought to be consistent across time. A person who is highly intelligent today will be highly intelligent next week. This means that any good measure of intelligence should produce roughly the same scores for this individual next week as it does today. Clearly, a measure that produces highly inconsistent scores over time cannot be a very good measure of a construct that is supposed to be consistent.

Assessing test-retest reliability requires using the measure on a group of people at one time, using it again on the  same group of people at a later time. Unlike an experiment, you aren’t giving participants an intervention but trying to establish a reliable baseline of the variable you are measuring. Once you have these two measurements, you then look at the correlation between the two sets of scores. This is typically done by graphing the data in a scatterplot and computing the correlation coefficient. Figure 11.2 shows the correlation between two sets of scores of several university students on the Rosenberg Self-Esteem Scale, administered two times, a week apart. The correlation coefficient for these data is +.95. In general, a test-retest correlation of +.80 or greater is considered to indicate good reliability.

A scatterplot with scores at time 1 on the x-axis and scores at time 2 on the y-axis, both ranging from 0 to 30. The dots on the scatter plot indicate a strong, positive correlation.

Again, high test-retest correlations make sense when the construct being measured is assumed to be consistent over time, which is the case for intelligence, self-esteem, and the Big Five personality dimensions. But other constructs are not assumed to be stable over time. The very nature of mood, for example, is that it changes. So a measure of mood that produced a low test-retest correlation over a period of a month would not be a cause for concern.

Internal consistency

Another kind of reliability is internal consistency , which is the consistency of people’s responses across the items on a multiple-item measure. In general, all the items on such measures are supposed to reflect the same underlying construct, so people’s scores on those items should be correlated with each other. On the Rosenberg Self-Esteem Scale, people who agree that they are a person of worth should tend to agree that they have a number of good qualities. If people’s responses to the different items are not correlated with each other, then it would no longer make sense to claim that they are all measuring the same underlying construct. This is as true for behavioral and physiological measures as for self-report measures. For example, people might make a series of bets in a simulated game of roulette as a measure of their level of risk seeking. This measure would be internally consistent to the extent that individual participants’ bets were consistently high or low across trials. A specific statistical test known as Cronbach’s Alpha provides a way to measure how well each question of a scale is related to the others.

Interrater reliability

Many behavioral measures involve significant judgment on the part of an observer or a rater. Interrater reliability is the extent to which different observers are consistent in their judgments. For example, if you were interested in measuring university students’ social skills, you could make video recordings of them as they interacted with another student whom they are meeting for the first time. Then you could have two or more observers watch the videos and rate each student’s level of social skills. To the extent that each participant does, in fact, have some level of social skills that can be detected by an attentive observer, different observers’ ratings should be highly correlated with each other.

chapter 3 research methods reliability

Validity , another key element of assessing measurement quality, is the extent to which the scores from a measure represent the variable they are intended to. But how do researchers make this judgment? We have already considered one factor that they take into account—reliability. When a measure has good test-retest reliability and internal consistency, researchers should be more confident that the scores represent what they are supposed to. There has to be more to it, however, because a measure can be extremely reliable but have no validity whatsoever. As an absurd example, imagine someone who believes that people’s index finger length reflects their self-esteem and therefore tries to measure self-esteem by holding a ruler up to people’s index fingers. Although this measure would have extremely good test-retest reliability, it would have absolutely no validity. The fact that one person’s index finger is a centimeter longer than another’s would indicate nothing about which one had higher self-esteem.

Discussions of validity usually divide it into several distinct “types.” But a good way to interpret these types is that they are other kinds of evidence—in addition to reliability—that should be taken into account when judging the validity of a measure.

Face validity

Face validity is the extent to which a measurement method appears “on its face” to measure the construct of interest. Most people would expect a self-esteem questionnaire to include items about whether they see themselves as a person of worth and whether they think they have good qualities. So a questionnaire that included these kinds of items would have good face validity. The finger-length method of measuring self-esteem, on the other hand, seems to have nothing to do with self-esteem and therefore has poor face validity. Although face validity can be assessed quantitatively—for example, by having a large sample of people rate a measure in terms of whether it appears to measure what it is intended to—it is usually assessed informally.

Face validity is at best a very weak kind of evidence that a measurement method is measuring what it is supposed to. One reason is that it is based on people’s intuitions about human behavior, which are frequently wrong. It is also the case that many established measures in psychology work quite well despite lacking face validity. The Minnesota Multiphasic Personality Inventory-2 (MMPI-2) measures many personality characteristics and disorders by having people decide whether each of over 567 different statements applies to them—where many of the statements do not have any obvious relationship to the construct that they measure. For example, the items “I enjoy detective or mystery stories” and “The sight of blood doesn’t frighten me or make me sick” both measure the suppression of aggression. In this case, it is not the participants’ literal answers to these questions that are of interest, but rather whether the pattern of the participants’ responses to a series of questions matches those of individuals who tend to suppress their aggression.

Content validity

Content validity is the extent to which a measure “covers” the construct of interest. For example, if a researcher conceptually defines test anxiety as involving both sympathetic nervous system activation (leading to nervous feelings) and negative thoughts, then his measure of test anxiety should include items about both nervous feelings and negative thoughts. Or consider that attitudes are usually defined as involving thoughts, feelings, and actions toward something. By this conceptual definition, a person has a positive attitude toward exercise to the extent that they think positive thoughts about exercising, feels good about exercising, and actually exercises. So to have good content validity, a measure of people’s attitudes toward exercise would have to reflect all three of these aspects. Like face validity, content validity is not usually assessed quantitatively. Instead, it is assessed by carefully checking the measurement method against the conceptual definition of the construct.

Criterion validity

Criterion validity is the extent to which people’s scores on a measure are correlated with other variables (known as criteria) that one would expect them to be correlated with. For example, people’s scores on a new measure of test anxiety should be negatively correlated with their performance on an important school exam. If it were found that people’s scores were in fact negatively correlated with their exam performance, then this would be a piece of evidence that these scores really represent people’s test anxiety. But if it were found that people scored equally well on the exam regardless of their test anxiety scores, then this would cast doubt on the validity of the measure.

A criterion can be any variable that one has reason to think should be correlated with the construct being measured, and there will usually be many of them. For example, one would expect test anxiety scores to be negatively correlated with exam performance and course grades and positively correlated with general anxiety and with blood pressure during an exam. Or imagine that a researcher develops a new measure of physical risk taking. People’s scores on this measure should be correlated with their participation in “extreme” activities such as snowboarding and rock climbing, the number of speeding tickets they have received, and even the number of broken bones they have had over the years. When the criterion is measured at the same time as the construct, criterion validity is referred to as concurrent validity ; however, when the criterion is measured at some point in the future (after the construct has been measured), it is referred to as predictive validity (because scores on the measure have “predicted” a future outcome).

Discriminant validity

Discriminant validity , on the other hand, is the extent to which scores on a measure are not  correlated with measures of variables that are conceptually distinct. For example, self-esteem is a general attitude toward the self that is fairly stable over time. It is not the same as mood, which is how good or bad one happens to be feeling right now. So people’s scores on a new measure of self-esteem should not be very highly correlated with their moods. If the new measure of self-esteem were highly correlated with a measure of mood, it could be argued that the new measure is not really measuring self-esteem; it is measuring mood instead.

Increasing the reliability and validity of measures

We have reviewed the types of errors and how to evaluate our measures based on reliability and validity considerations. However, what can we do while selecting or creating our tool so that we minimize the potential of errors? Many of our options were covered in our discussion about reliability and validity. Nevertheless, the following table provides a quick summary of things that you should do when creating or selecting a measurement tool. While not all of these will be feasible in your project, it is important to include easy-to-implement measures in your research context.

Make sure that you engage in a rigorous literature review so that you understand the concept that you are studying. This means understanding the different ways that your concept may manifest itself. This review should include a search for existing instruments. [12]

  • Do you understand all the dimensions of your concept? Do you have a good understanding of the content dimensions of your concept(s)?
  • What instruments exist? How many items are on the existing instruments? Are these instruments appropriate for your population?
  • Are these instruments standardized? Note: If an instrument is standardized, that means it has been rigorously studied and tested.

Consult content experts to review your instrument. This is a good way to check the face validity of your items. Additionally, content experts can also help you understand the content validity. [13]

  • Do you have access to a reasonable number of content experts? If not, how can you locate them?
  • Did you provide a list of critical questions for your content reviewers to use in the reviewing process?

Pilot test your instrument on a sufficient number of people and get detailed feedback. [14] Ask your group to provide feedback on the wording and clarity of items. Keep detailed notes and make adjustments BEFORE you administer your final tool.

  • How many people will you use in your pilot testing?
  • How will you set up your pilot testing so that it mimics the actual process of administering your tool?
  • How will you receive feedback from your pilot testing group? Have you provided a list of questions for your group to think about?

Provide training for anyone collecting data for your project. [15] You should provide those helping you with a written research protocol that explains all of the steps of the project. You should also problem solve and answer any questions that those helping you may have. This will increase the chances that your tool will be administered in a consistent manner.

  • How will you conduct your orientation/training? How long will it be? What modality?
  • How will you select those who will administer your tool? What qualifications do they need?

When thinking of items, use a higher level of measurement, if possible. [16] This will provide more information and you can always downgrade to a lower level of measurement later.

  • Have you examined your items and the levels of measurement?
  • Have you thought about whether you need to modify the type of data you are collecting? Specifically, are you asking for information that is too specific (at a higher level of measurement) which may reduce participants’ willingness to participate?

Use multiple indicators for a variable. [17] Think about the number of items that you will include in your tool.

  • Do you have enough items? Enough indicators? The correct indicators?

Conduct an item-by-item assessment of multiple-item measures. [18] When you do this assessment, think about each word and how it changes the meaning of your item.

  • Are there items that are redundant? Do you need to modify, delete, or add items?

chapter 3 research methods reliability

Types of error

As you can see, measures never perfectly describe what exists in the real world. Good measures demonstrate validity and reliability but will always have some degree of error. Systematic error (also called bias) causes our measures to consistently output incorrect data in one direction or another on a measure, usually due to an identifiable process. Imagine you created a measure of height, but you didn’t put an option for anyone over six feet tall. If you gave that measure to your local college or university, some of the taller students might not be measured accurately. In fact, you would be under the mistaken impression that the tallest person at your school was six feet tall, when in actuality there are likely people taller than six feet at your school. This error seems innocent, but if you were using that measure to help you build a new building, those people might hit their heads!

A less innocent form of error arises when researchers word questions in a way that might cause participants to think one answer choice is preferable to another. For example, if I were to ask you “Do you think global warming is caused by human activity?” you would probably feel comfortable answering honestly. But what if I asked you “Do you agree with 99% of scientists that global warming is caused by human activity?” Would you feel comfortable saying no, if that’s what you honestly felt? I doubt it. That is an example of a  leading question , a question with wording that influences how a participant responds. We’ll discuss leading questions and other problems in question wording in greater detail in Chapter 12 .

In addition to error created by the researcher, your participants can cause error in measurement. Some people will respond without fully understanding a question, particularly if the question is worded in a confusing way. Let’s consider another potential source or error. If we asked people if they always washed their hands after using the bathroom, would we expect people to be perfectly honest? Polling people about whether they wash their hands after using the bathroom might only elicit what people would like others to think they do, rather than what they actually do. This is an example of  social desirability bias , in which participants in a research study want to present themselves in a positive, socially desirable way to the researcher. People in your study will want to seem tolerant, open-minded, and intelligent, but their true feelings may be closed-minded, simple, and biased. Participants may lie in this situation. This occurs often in political polling, which may show greater support for a candidate from a minority race, gender, or political party than actually exists in the electorate.

A related form of bias is called  acquiescence bias , also known as “yea-saying.” It occurs when people say yes to whatever the researcher asks, even when doing so contradicts previous answers. For example, a person might say yes to both “I am a confident leader in group discussions” and “I feel anxious interacting in group discussions.” Those two responses are unlikely to both be true for the same person. Why would someone do this? Similar to social desirability, people want to be agreeable and nice to the researcher asking them questions or they might ignore contradictory feelings when responding to each question. You could interpret this as someone saying “yeah, I guess.” Respondents may also act on cultural reasons, trying to “save face” for themselves or the person asking the questions. Regardless of the reason, the results of your measure don’t match what the person truly feels.

So far, we have discussed sources of error that come from choices made by respondents or researchers. Systematic errors will result in responses that are incorrect in one direction or another. For example, social desirability bias usually means that the number of people who say  they will vote for a third party in an election is greater than the number of people who actually vote for that candidate. Systematic errors such as these can be reduced, but random error can never be eliminated. Unlike systematic error, which biases responses consistently in one direction or another,  random error  is unpredictable and does not consistently result in scores that are consistently higher or lower on a given measure. Instead, random error is more like statistical noise, which will likely average out across participants.

Random error is present in any measurement. If you’ve ever stepped on a bathroom scale twice and gotten two slightly different results, maybe a difference of a tenth of a pound, then you’ve experienced random error. Maybe you were standing slightly differently or had a fraction of your foot off of the scale the first time. If you were to take enough measures of your weight on the same scale, you’d be able to figure out your true weight. In social science, if you gave someone a scale measuring depression on a day after they lost their job, they would likely score differently than if they had just gotten a promotion and a raise. Even if the person were clinically depressed, our measure is subject to influence by the random occurrences of life. Thus, social scientists speak with humility about our measures. We are reasonably confident that what we found is true, but we must always acknowledge that our measures are only an approximation of reality.

Humility is important in scientific measurement, as errors can have real consequences. At the time I’m writing this, my wife and I are expecting our first child. Like most people, we used a pregnancy test from the pharmacy. If the test said my wife was pregnant when she was not pregnant, that would be a false positive . On the other hand, if the test indicated that she was not pregnant when she was in fact pregnant, that would be a  false negative . Even if the test is 99% accurate, that means that one in a hundred women will get an erroneous result when they use a home pregnancy test. For us, a false positive would have been initially exciting, then devastating when we found out we were not having a child. A false negative would have been disappointing at first and then quite shocking when we found out we were indeed having a child. While both false positives and false negatives are not very likely for home pregnancy tests (when taken correctly), measurement error can have consequences for the people being measured.

  • Reliability is a matter of consistency.
  • Validity is a matter of accuracy.
  • There are many types of validity and reliability.
  • Systematic error may arise from the researcher, participant, or measurement instrument.
  • Systematic error biases results in a particular direction, whereas random error can be in any direction.
  • All measures are prone to error and should interpreted with humility.

Use the measurement tools you located in the previous exercise. Evaluate the reliability and validity of these tools. Hint: You will need to go into the literature to “research” these tools.

  • Provide a clear statement regarding the reliability and validity of these tools. What strengths did you notice? What were the limitations?
  • Think about your target population . Are there changes that need to be made in order for one of these tools to be appropriate for your population?
  • If you decide to create your own tool, how will you assess its validity and reliability?

11.4 Ethical and social justice considerations

  • Identify potential cultural, ethical, and social justice issues in measurement.

With your variables operationalized, it’s time to take a step back and look at how measurement in social science impact our daily lives. As we will see, how we measure things is both shaped by power arrangements inside our society, and more insidiously, by establishing what is scientifically true, measures have their own power to influence the world. Just like reification in the conceptual world, how we operationally define concepts can reinforce or fight against oppressive forces.

chapter 3 research methods reliability

Data equity

How we decide to measure our variables determines what kind of data we end up with in our research project. Because scientific processes are a part of our sociocultural context, the same biases and oppressions we see in the real world can be manifested or even magnified in research data. Jagadish and colleagues (2021) [19] presents four dimensions of data equity that are relevant to consider: in representation of non-dominant groups within data sets; in how data is collected, analyzed, and combined across datasets; in equitable and participatory access to data, and finally in the outcomes associated with the data collection. Historically, we have mostly focused on the outcomes of measures producing outcomes that are biased in one way or another, and this section reviews many such examples. However, it is important to note that equity must also come from designing measures that respond to questions like:

  • Are groups historically suppressed from the data record represented in the sample?
  • Are equity data gathered by researchers and used to uncover and quantify inequity?
  • Are the data accessible across domains and levels of expertise, and can community members participate in the design, collection, and analysis of the public data record?
  • Are the data collected used to monitor and mitigate inequitable impacts?

So, it’s not just about whether measures work for one population for another. Data equity is about the context in which data are created from how we measure people and things. We agree with these authors that data equity should be considered within the context of automated decision-making systems and recognizing a broader literature around the role of administrative systems in creating and reinforcing discrimination. To combat the inequitable processes and outcomes we describe below, researchers must foreground equity as a core component of measurement.

Flawed measures & missing measures

At the end of every semester, students in just about every university classroom in the United States complete similar student evaluations of teaching (SETs). Since every student is likely familiar with these, we can recognize many of the concepts we discussed in the previous sections. There are number of rating scale questions that ask you to rate the professor, class, and teaching effectiveness on a scale of 1-5. Scores are averaged across students and used to determine the quality of teaching delivered by the faculty member. SETs scores are often a principle component of how faculty are reappointed to teaching positions. Would it surprise you to learn that student evaluations of teaching are of questionable quality? If your instructors are assessed with a biased or incomplete measure, how might that impact your education?

Most often, student scores are averaged across questions and reported as a final average. This average is used as one factor, often the most important factor, in a faculty member’s reappointment to teaching roles. We learned in this chapter that rating scales are ordinal, not interval or ratio, and the data are categories not numbers. Although rating scales use a familiar 1-5 scale, the numbers 1, 2, 3, 4, & 5 are really just helpful labels for categories like “excellent” or “strongly agree.” If we relabeled these categories as letters (A-E) rather than as numbers (1-5), how would you average them?

Averaging ordinal data is methodologically dubious, as the numbers are merely a useful convention. As you will learn in Chapter 14 , taking the median value is what makes the most sense with ordinal data. Median values are also less sensitive to outliers. So, a single student who has strong negative or positive feelings towards the professor could bias the class’s SETs scores higher or lower than what the “average” student in the class would say, particularly for classes with few students or in which fewer students completed evaluations of their teachers.

We care about teaching quality because more effective teachers will produce more knowledgeable and capable students. However, student evaluations of teaching are not particularly good indicators of teaching quality and are not associated with the independently measured learning gains of students (i.e., test scores, final grades) (Uttl et al., 2017). [20] This speaks to the lack of criterion validity. Higher teaching quality should be associated with better learning outcomes for students, but across multiple studies stretching back years, there is no association that cannot be better explained by other factors. To be fair, there are scholars who find that SETs are valid and reliable. For a thorough defense of SETs as well as a historical summary of the literature see Benton & Cashin (2012). [21]

Even though student evaluations of teaching often contain dozens of questions, researchers often find that the questions are so highly interrelated that one concept (or factor, as it is called in a factor analysis ) explains a large portion of the variance in teachers’ scores on student evaluations (Clayson, 2018). [22] Personally, I believe based on completing SETs myself that factor is probably best conceptualized as student satisfaction, which is obviously worthwhile to measure, but is conceptually quite different from teaching effectiveness or whether a course achieved its intended outcomes. The lack of a clear operational and conceptual definition for the variable or variables being measured in student evaluations of teaching also speaks to a lack of content validity. Researchers check content validity by comparing the measurement method with the conceptual definition, but without a clear conceptual definition of the concept measured by student evaluations of teaching, it’s not clear how we can know our measure is valid. Indeed, the lack of clarity around what is being measured in teaching evaluations impairs students’ ability to provide reliable and valid evaluations. So, while many researchers argue that the class average SETs scores are reliable in that they are consistent over time and across classes, it is unclear what exactly is being measured even if it is consistent (Clayson, 2018). [23]

As a faculty member, there are a number of things I can do to influence my evaluations and disrupt validity and reliability. Since SETs scores are associated with the grades students perceive they will receive (e.g., Boring et al., 2016), [24] guaranteeing everyone a final grade of A in my class will likely increase my SETs scores and my chances at tenure and promotion. I could time an email reminder to complete SETs with releasing high grades for a major assignment to boost my evaluation scores. On the other hand, student evaluations might be coincidentally timed with poor grades or difficult assignments that will bias student evaluations downward. Students may also infer I am manipulating them and give me lower SET scores as a result. To maximize my SET scores and chances and promotion, I also need to select which courses I teach carefully. Classes that are more quantitatively oriented generally receive lower ratings than more qualitative and humanities-driven classes, which makes my decision to teach social work research a poor strategy (Uttl & Smibert, 2017). [25] The only manipulative strategy I will admit to using is bringing food (usually cookies or donuts) to class during the period in which students are completing evaluations. Measurement is impacted by context.

As a white cis-gender male educator, I am adversely impacted by SETs because of their sketchy validity, reliability, and methodology. The other flaws with student evaluations actually help me while disadvantaging teachers from oppressed groups. Heffernan (2021) [26] provides a comprehensive overview of the sexism, racism, ableism, and prejudice baked into student evaluations:

“In all studies relating to gender, the analyses indicate that the highest scores are awarded in subjects filled with young, white, male students being taught by white English first language speaking, able-bodied, male academics who are neither too young nor too old (approx. 35–50 years of age), and who the students believe are heterosexual. Most deviations from this scenario in terms of student and academic demographics equates to lower SET scores. These studies thus highlight that white, able-bodied, heterosexual, men of a certain age are not only the least affected, they benefit from the practice. When every demographic group who does not fit this image is significantly disadvantaged by SETs, these processes serve to further enhance the position of the already privileged” (p. 5).

The staggering consistency of studies examining prejudice in SETs has led to some rather superficial reforms like reminding students to not submit racist or sexist responses in the written instructions given before SETs. Yet, even though we know that SETs are systematically biased against women, people of color, and people with disabilities, the overwhelming majority of universities in the United States continue to use them to evaluate faculty for promotion or reappointment. From a critical perspective, it is worth considering why university administrators continue to use such a biased and flawed instrument. SETs produce data that make it easy to compare faculty to one another and track faculty members over time. Furthermore, they offer students a direct opportunity to voice their concerns and highlight what went well.

As the people with the greatest knowledge about what happened in the classroom as whether it met their expectations, providing students with open-ended questions is the most productive part of SETs. Personally, I have found focus groups written, facilitated, and analyzed by student researchers to be more insightful than SETs. MSW student activists and leaders may look for ways to evaluate faculty that are more methodologically sound and less systematically biased, creating institutional change by replacing or augmenting traditional SETs in their department. There is very rarely student input on the criteria and methodology for teaching evaluations, yet students are the most impacted by helpful or harmful teaching practices.

Students should fight for better assessment in the classroom because well-designed assessments provide documentation to support more effective teaching practices and discourage unhelpful or discriminatory practices. Flawed assessments like SETs, can lead to a lack of information about problems with courses, instructors, or other aspects of the program. Think critically about what data your program uses to gauge its effectiveness. How might you introduce areas of student concern into how your program evaluates itself? Are there issues with food or housing insecurity, mentorship of nontraditional and first generation students, or other issues that faculty should consider when they evaluate their program? Finally, as you transition into practice, think about how your agency measures its impact and how it privileges or excludes client and community voices in the assessment process.

Let’s consider an example from social work practice. Let’s say you work for a mental health organization that serves youth impacted by community violence. How should you measure the impact of your services on your clients and their community? Schools may be interested in reducing truancy, self-injury, or other behavioral concerns. However, by centering delinquent behaviors in how we measure our impact, we may be inattentive to the role of trauma, family dynamics, and other cognitive and social processes beyond “delinquent behavior.” Indeed, we may bias our interventions by focusing on things that are not as important to clients’ needs. Social workers want to make sure their programs are improving over time, and we rely on our measures to indicate what to change and what to keep. If our measures present a partial or flawed view, we lose our ability to establish and act on scientific truths.

While writing this section, one of the authors wrote this commentary article addressing potential racial bias in social work licensing exams. If you are interested in an example of missing or flawed measures that relates to systems your social work practice is governed by (rather than SETs which govern our practice in higher education) check it out!

You may also be interested in similar arguments against the standard grading scale (A-F), and why grades (numerical, letter, etc.) do not do a good job of measuring learning. Think critically about the role that grades play in your life as a student, your self-concept, and your relationships with teachers. Your test and grade anxiety is due in part to how your learning is measured. Those measurements end up becoming an official record of your scholarship and allow employers or funders to compare you to other scholars. The stakes for measurement are the same for participants in your research study.

chapter 3 research methods reliability

Self-reflection and measurement

Student evaluations of teaching are just like any other measure. How we decide to measure what we are researching is influenced by our backgrounds, including our culture, implicit biases, and individual experiences. For me as a middle-class, cisgender white woman, the decisions I make about measurement will probably default to ones that make the most sense to me and others like me, and thus measure characteristics about us most accurately if I don’t think carefully about it. There are major implications for research here because this could affect the validity of my measurements for other populations.

This doesn’t mean that standardized scales or indices, for instance, won’t work for diverse groups of people. What it means is that researchers must not ignore difference in deciding how to measure a variable in their research. Doing so may serve to push already marginalized people further into the margins of academic research and, consequently, social work intervention. Social work researchers, with our strong orientation toward celebrating difference and working for social justice, are obligated to keep this in mind for ourselves and encourage others to think about it in their research, too.

This involves reflecting on what we are measuring, how we are measuring, and why we are measuring. Do we have biases that impacted how we operationalized our concepts? Did we include stakeholders and gatekeepers in the development of our concepts? This can be a way to gain access to vulnerable populations. What feedback did we receive on our measurement process and how was it incorporated into our work? These are all questions we should ask as we are thinking about measurement. Further, engaging in this intentionally reflective process will help us maximize the chances that our measurement will be accurate and as free from bias as possible.

The NASW Code of Ethics discusses social work research and the importance of engaging in practices that do not harm participants. This is especially important considering that many of the topics studied by social workers are those that are disproportionately experienced by marginalized and oppressed populations. Some of these populations have had negative experiences with the research process: historically, their stories have been viewed through lenses that reinforced the dominant culture’s standpoint. Thus, when thinking about measurement in research projects, we must remember that the way in which concepts or constructs are measured will impact how marginalized or oppressed persons are viewed. It is important that social work researchers examine current tools to ensure appropriateness for their population(s). Sometimes this may require researchers to use existing tools. Other times, this may require researchers to adapt existing measures or develop completely new measures in collaboration with community stakeholders. In summary, the measurement protocols selected should be tailored and attentive to the experiences of the communities to be studied.

Unfortunately, social science researchers do not do a great job of sharing their measures in a way that allows social work practitioners and administrators to use them to evaluate the impact of interventions and programs on clients. Few scales are published under an open copyright license that allows other people to view it for free and share it with others. Instead, the best way to find a scale mentioned in an article is often to simply search for it in Google with “.pdf” or “.docx” in the query to see if someone posted a copy online (usually in violation of copyright law). As we discussed in Chapter 4 , this is an issue of information privilege, or the structuring impact of oppression and discrimination on groups’ access to and use of scholarly information. As a student at a university with a research library, you can access the Mental Measurement Yearbook to look up scales and indexes that measure client or program outcomes while researchers unaffiliated with university libraries cannot do so. Similarly, the vast majority of scholarship in social work and allied disciplines does not share measures, data, or other research materials openly, a best practice in open and collaborative science. It is important to underscore these structural barriers to using valid and reliable scales in social work practice. An invalid or unreliable outcome test may cause ineffective or harmful programs to persist or may worsen existing prejudices and oppressions experienced by clients, communities, and practitioners.

But it’s not just about reflecting and identifying problems and biases in our measurement, operationalization, and conceptualization—what are we going to  do about it? Consider this as you move through this book and become a more critical consumer of research. Sometimes there isn’t something you can do in the immediate sense—the literature base at this moment just is what it is. But how does that inform what you will do later?

A place to start: Stop oversimplifying race

We will address many more of the critical issues related to measurement in the next chapter. One way to get started in bringing cultural awareness to scientific measurement is through a critical examination of how we analyze race quantitatively. There are many important methodological objections to how we measure the impact of race. We encourage you to watch Dr. Abigail Sewell’s three-part workshop series called “Nested Models for Critical Studies of Race & Racism” for the Inter-university Consortium for Political and Social Research (ICPSR). She discusses how to operationalize and measure inequality, racism, and intersectionality and critiques researchers’ attempts to oversimplify or overlook racism when we measure concepts in social science. If you are interested in developing your social work research skills further, consider applying for financial support from your university to attend an ICPSR summer seminar like Dr. Sewell’s where you can receive more advanced and specialized training in using research for social change.

  • Part 1: Creating Measures of Supraindividual Racism (2-hour video)
  • Part 2: Evaluating Population Risks of Supraindividual Racism (2-hour video)
  • Part 3: Quantifying Intersectionality (2-hour video)
  • Social work researchers must be attentive to personal and institutional biases in the measurement process that affect marginalized groups.
  • What is measured and how it is measured is shaped by power, and social workers must be critical and self-reflective in their research projects.

Think about your current research question and the tool(s) that you will use to gather data. Even if you haven’t chosen your tools yet, think of some that you have encountered in the literature so far.

  • How does your positionality and experience shape what variables you are choosing to measure and how you measure them?
  • Evaluate the measures in your study for potential biases.
  • If you are using measures developed by another researcher, investigate whether it is valid and reliable in other studies across cultures.

Media Attributions

  • jose-martin-ramirez-carrasco-z2tinW7Z6Bw-unsplash © José Martín Ramírez Carrasco
  • simone-pellegrini-L3QG_OBluT0-unsplash © Simone Pellegrini
  • man-5573925_1280 © Mohamed Hassan
  • detective-152085_1280 © OpenClipart-Vectors
  • tommy-van-kessel-BXFY8_iii9M-unsplash © Tommy van Kessel
  • markus-winkler-htShI76GLDM-unsplash © Markus Winkler
  • Figure 11.1 Example Rating Scales for Closed-Ended Questionnaire Items © Rajiv S. Jhangiani, I-Chant A. Chiang, Carrie Cuttler, & Dana C. Leighton is licensed under a CC BY-NC-SA (Attribution NonCommercial ShareAlike) license
  • survey-4441595_1920 © Christina Smith
  • mockup-graphics-i1iqQRLULlg-unsplash © Mockup Graphics
  • Test-retest reliability © Rajiv S. Jhangiani, I-Chant A. Chiang, Carrie Cuttler, & Dana C. Leighton is licensed under a CC BY-NC-SA (Attribution NonCommercial ShareAlike) license
  • dartboard-5518055_1920 © FlitsArt
  • error-63628_1920 © Gerd Altmann
  • mitchell-griest-ImgBdiGAl4c-unsplash © Mitchell Griest
  • man-5732103_1280 © Mohamed Hassan
  • Milkie, M. A., & Warner, C. H. (2011). Classroom learning environments and the mental health of first grade children. Journal of Health and Social Behavior, 52 , 4–22 ↵
  • Kaplan, A. (1964). The conduct of inquiry: Methodology for behavioral science . San Francisco, CA: Chandler Publishing Company. ↵
  • Earl Babbie offers a more detailed discussion of Kaplan’s work in his text. You can read it in: Babbie, E. (2010). The practice of social research (12th ed.). Belmont, CA: Wadsworth. ↵
  • In this chapter, we will use the terms concept and construct interchangeably. While each term has a distinct meaning in research conceptualization, we do not believe this distinction is important enough to warrant discussion in this chapter. ↵
  • Wong, Y. J., Steinfeldt, J. A., Speight, Q. L., & Hickman, S. J. (2010). Content analysis of Psychology of men & masculinity (2000–2008).  Psychology of Men & Masculinity ,  11 (3), 170. ↵
  • Kimmel, M. (2000).  The  gendered society . New York, NY: Oxford University Press; Kimmel, M. (2008). Masculinity. In W. A. Darity Jr. (Ed.),  International  encyclopedia of the social sciences  (2nd ed., Vol. 5, p. 1–5). Detroit, MI: Macmillan Reference USA ↵
  • Kimmel, M. & Aronson, A. B. (2004).  Men and masculinities: A-J . Denver, CO: ABL-CLIO. ↵
  • Krosnick, J.A. & Berent, M.K. (1993). Comparisons of party identification and policy preferences: The impact of survey question format.  American Journal of Political Science, 27 (3), 941-964. ↵
  • Likert, R. (1932). A technique for the measurement of attitudes.  Archives of Psychology,140 , 1–55. ↵
  • Stevens, S. S. (1946). On the Theory of Scales of Measurement.  Science ,  103 (2684), 677-680. ↵
  • Sullivan G. M. (2011). A primer on the validity of assessment instruments. Journal of graduate medical education, 3 (2), 119–120. doi:10.4300/JGME-D-11-00075.1 ↵
  • Engel, R. & Schutt, R. (2013). The practice of research in social work (3rd. ed.) . Thousand Oaks, CA: SAGE. ↵
  • Engel, R. & Schutt, R. (2013). The practice of research in social work (3rd. ed.). Thousand Oaks, CA: SAGE. ↵
  • Jagadish, H. V., Stoyanovich, J., & Howe, B. (2021). COVID-19 Brings Data Equity Challenges to the Fore. Digital Government: Research and Practice ,  2 (2), 1-7. ↵
  • Uttl, B., White, C. A., & Gonzalez, D. W. (2017). Meta-analysis of faculty's teaching effectiveness: Student evaluation of teaching ratings and student learning are not related. Studies in Educational Evaluation ,  54 , 22-42. ↵
  • Benton, S. L., & Cashin, W. E. (2014). Student ratings of instruction in college and university courses. In Higher education: Handbook of theory and research  (pp. 279-326). Springer, Dordrecht. ↵
  • Clayson, D. E. (2018). Student evaluation of teaching and matters of reliability.  Assessment & Evaluation in Higher Education ,  43 (4), 666-681. ↵
  • Clayson, D. E. (2018). Student evaluation of teaching and matters of reliability. Assessment & Evaluation in Higher Education ,  43 (4), 666-681. ↵
  • Boring, A., Ottoboni, K., & Stark, P. (2016). Student evaluations of teaching (mostly) do not measure teaching effectiveness.  ScienceOpen Research . ↵
  • Uttl, B., & Smibert, D. (2017). Student evaluations of teaching: teaching quantitative courses can be hazardous to one’s career. Peer Journal ,  5 , e3299. ↵
  • Heffernan, T. (2021). Sexism, racism, prejudice, and bias: a literature review and synthesis of research surrounding student evaluations of courses and teaching.  Assessment & Evaluation in Higher Education , 1-11. ↵

The process by which we describe and ascribe meaning to the key facts, concepts, or other phenomena under investigation in a research study.

In measurement, conditions that are easy to identify and verify through direct observation.

In measurement, conditions that are subtle and complex that we must use existing knowledge and intuition to define.

Conditions that are not directly observable and represent states of being, experiences, and ideas.

A mental image that summarizes a set of similar observations, feelings, or ideas

developing clear, concise definitions for the key concepts in a research question

concepts that are comprised of multiple elements

concepts that are expected to have a single underlying dimension

assuming that abstract concepts exist in some concrete, tangible way

process by which researchers spell out precisely how a concept will be measured in their study

Clues that demonstrate the presence, intensity, or other aspects of a concept in the real world

unprocessed data that researchers can analyze using quantitative and qualitative methods (e.g., responses to a survey or interview transcripts)

a characteristic that does not change in a study

The characteristics that make up a variable

variables whose values are organized into mutually exclusive groups but whose numerical values cannot be used in mathematical operations.

variables whose values are mutually exclusive and can be used in mathematical operations

The lowest level of measurement; categories cannot be mathematically ranked, though they are exhaustive and mutually exclusive

Exhaustive categories are options for closed ended questions that allow for every possible response (no one should feel like they can't find the answer for them).

Mutually exclusive categories are options for closed ended questions that do not overlap, so people only fit into one category or another, not both.

Level of measurement that follows nominal level. Has mutually exclusive categories and a hierarchy (rank order), but we cannot calculate a mathematical distance between attributes.

An ordered set of responses that participants must choose from.

A level of measurement that is continuous, can be rank ordered, is exhaustive and mutually exclusive, and for which the distance between attributes is known to be equal. But for which there is no zero point.

The highest level of measurement. Denoted by mutually exclusive categories, a hierarchy (order), values can be added, subtracted, multiplied, and divided, and the presence of an absolute zero.

measuring people’s attitude toward something by assessing their level of agreement with several statements about it

Composite (multi-item) scales in which respondents are asked to indicate their opinions or feelings toward a single statement using different pairs of adjectives framed as polar opposites.

A composite scale using a series of items arranged in increasing order of intensity of the construct of interest, from least intense to most intense.

measurements of variables based on more than one one indicator

An empirical structure for measuring items or indicators of the multiple dimensions of a concept.

a composite score derived from aggregating measures of multiple concepts (called components) using a set of rules and formulas

The ability of a measurement tool to measure a phenomenon the same way, time after time. Note: Reliability does not imply validity.

The extent to which scores obtained on a scale or other measure are consistent across time

The consistency of people’s responses across the items on a multiple-item measure. Responses about the same underlying construct should be correlated, though not perfectly.

The extent to which different observers are consistent in their assessment or rating of a particular characteristic or item.

The extent to which the scores from a measure represent the variable they are intended to.

The extent to which a measurement method appears “on its face” to measure the construct of interest

The extent to which a measure “covers” the construct of interest, i.e., it's comprehensiveness to measure the construct.

The extent to which people’s scores on a measure are correlated with other variables (known as criteria) that one would expect them to be correlated with.

A type of criterion validity. Examines how well a tool provides the same scores as an already existing tool administered at the same point in time.

A type of criterion validity that examines how well your tool predicts a future criterion.

The extent to which scores on a measure are not correlated with measures of variables that are conceptually distinct.

(also known as bias) refers to when a measure consistently outputs incorrect data, usually in one direction and due to an identifiable process

When a participant's answer to a question is altered due to the way in which a question is written. In essence, the question leads the participant to answer in a specific way.

Social desirability bias occurs when we create questions that lead respondents to answer in ways that don't reflect their genuine thoughts or feelings to avoid being perceived negatively.

In a measure, when people say yes to whatever the researcher asks, even when doing so contradicts previous answers.

Unpredictable error that does not result in scores that are consistently higher or lower on a given measure but are nevertheless inaccurate.

when a measure indicates the presence of a phenomenon, when in reality it is not present

when a measure does not indicate the presence of a phenomenon, when in reality it is present

the group of people whose needs your study addresses

The value in the middle when all our values are placed in numerical order. Also called the 50th percentile.

individuals or groups who have an interest in the outcome of the study you conduct

the people or organizations who control access to the population you want to study

Graduate research methods in social work Copyright © 2021 by Matthew DeCarlo, Cory Cummings, Kate Agnelli is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

admsci-logo

Article Menu

  • Subscribe SciFeed
  • Recommended Articles
  • Author Biographies
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Leveraging supply chain reaction time: the effects of big data analytics capabilities on organizational resilience enhancement in the auto-parts industry.

chapter 3 research methods reliability

1. Introduction

  • Q1: Do BDA capabilities play a significant role in reducing organizations’ reaction time in the face of unexpected incidents or disruptions within their supply chains?
  • Q2: Furthermore, what effects does this expedited response have on enhancing organizations’ resilience?
  • Q3: Does a firm’s position within the supply chain influence the effects BDA capabilities have on reaction time and organizational resilience?
  • Q4: Are the effects of BDA capabilities on reaction time and organizational resilience greater for companies with longer Industry 4.0 journeys and more intense absorption of Industry 4.0 smart technologies?
  • BDACs’ role in disruption response: Although the concept that BDA capabilities can reduce reaction time during supply chain disruptions may seem intuitive, there is a lack of empirical evidence demonstrating the extent to which BDACs directly influence the speed of response to unforeseen incidents. Particularly important and novel for our investigation is the description of the effects of BDACs on different and specific forms of latencies—data latency, analytical latency, and decision latency—which are fundamental components of reaction time.
  • Effects on organizational resilience: The literature has yet to fully explore the systemic relationships between reaction time, facilitated by BDACs, and the enhancement of organizational resilience, particularly during supply chain disruptions.
  • Position in the supply chain and BDAC impact: Knowledge is lacking regarding how a firm’s position within the supply chain (e.g., first-tier vs. indirect suppliers) influences the effect of BDACs on reaction time and organizational resilience, as many specificities can be taken into account, such as risk exposure, the quality and speed of information flow, power dynamics, and supply chain resource allocation priorities, amongst other relevant dimensions.
  • Influence of Industry 4.0 journey length: There are not yet enough studies to assert whether the level of technological readiness of firms, in relation to the absorption of smart technologies from Industry 4.0, would have any influence on the potential effects of BDA capabilities on reaction times to disruptions in the supply chain and the degree of firms’ resilience in such events.

2. Theoretical Background and Research Hypotheses

2.1. bda capabilities, 2.2. reaction time, 2.3. organizational resilience, 2.4. smart manufacturing technologies of industry 4.0, 3. research methodology, 3.1. research design and sample characterization, 3.2. measures, 3.3. data analysis, 4. findings, 4.1. descriptive data analysis, 4.2. measurement model validation, 4.3. structural model assessment, 5. discussion, 5.1. managerial contributions, 5.2. theoretical contributions, 6. conclusions, author contributions, institutional review board statement, informed consent statement, data availability statement, conflicts of interest, appendix a. descriptive statistics, convergent validity, and reliability tests.

Big Data Analytics CapabilitiesBDAC 1The organization uses cloud services to process and analyze data.0.5590.4160.8750.8530.8430.23980.3818
BDAC 2 0.664 4.100.652
BDAC 3The organization has access to a large amount of unstructured data (from sources, such as social networks, websites, videos, images, among others) that can be quickly analyzed by its technicians and/or data scientists.0.551 3.481.073
BDAC 4The organization can effectively integrate internal and external data from multiple sources (especially with suppliers and direct customers).0.587 3.560.897
BDAC 5The organization has professionals in different areas of the company with the necessary skills and experience to analyze data, using this knowledge in the execution of their tasks/activities.0.658 3.770.717
BDAC 6The organization provides training in decision-support systems (such as data mining and use of artificial intelligence for predictive analysis, among others).0.542 3.490.855
BDAC 7The managers involved in data analysis in the company have a good grasp of the information requirements of different area or process managers within the organization, as well as those of its suppliers and customers.0.730 3.920.722
BDAC 8The managers involved in data analysis in the company are capable of analyzing data collaboratively with both area and process managers within the organization and its suppliers and customers.0.699 3.870.727
BDAC 9The managers involved in data analysis in the company are capable of anticipating and being proactive in considering the information needs of various area or process managers in the company, as well as those of the organization’s suppliers and customers.0.566 3.770.723
BDAC10The organization considers data to be a valuable asset for the business and for managing its processes in the supply chain.0.833 4.160.788
Reaction TimeREA 1When a disruptive or unplanned event occurs, the organization has fairly quickly access to data and information about the event.0.7470.5180.8650.8220.8153.660.889
REA 2When a disruptive or unplanned event occurs, the organization is able to fairly quickly analyze data and gather information about the event.0.729 4.040.659
REA 3When a disruption or unplanned event occurs, the organization makes decisions fairly quickly once it has access to data and analyzes data about the event. 0.693 4.140.794
REA 4In our company, the data are dynamically updated, allowing a real-time view of the different processes and/or areas of the organization.0.689 3.940.702
REA 5There is a governance structure in place in the company to monitor and identify disruption events and put into action plans to mitigate the effects of these events.0.794 3.970.717
REA 6Your value chain partners share with your company an aligned vision as to how to proceed and discuss actions to be implemented in disruption situations.0.657 4.080.768
Organizational ResilienceRES 1In the face of a disruptive or unplanned event, the organization was able to respond to the disruptive situation in a way that quickly restored normal production flows.0.7610.6270.8700.8030.8013.880.915
RES 2In the face of a disruptive or unplanned event, the organization was well-prepared to deal with potential financial effects caused by the disruption.0.760 4.110.764
RES 3In the face of a disruptive or unplanned event, the organization was able to maintain a satisfactory level of connectivity with other agents in the supply chain during the period of impact of the disruption.0.845 3.790.739
RES 4In the face of a disruptive or unplanned event, the organization was able to maintain a satisfactory level of functioning of its internal functions.0.798 4.130.678
  • Akter, Shahriar, Samuel Fosso, Angappa Gunasekaran, Rameshwar Dubey, and Stephen J. Childe. 2016. How to improve firm performance using big data analytics capability and business strategy alignment? International Journal of Production Economics 182: 113–31. [ Google Scholar ] [ CrossRef ]
  • Alkhatib, Saleh Fahed, and Rahma Asem Momani. 2023. Supply Chain Resilience and Operational Performance: The Role of Digital Technologies in Jordanian Manufacturing Firms. Administrative Sciences 13: 40. [ Google Scholar ] [ CrossRef ]
  • Atobishi, Thabit, Sahar Moh’d Abu Bakir, and Saeed Nosratabadi. 2024. How Do Digital Capabilities Affect Organizational Performance in the Public Sector? The Mediating Role of the Organizational Agility. Administrative Sciences 14: 37. [ Google Scholar ] [ CrossRef ]
  • Barbosa, Marcelo Werneck, Alberto de la Calle Vicente, Marcelo Bronzo, and Marcos Paulo Valadares de Oliveira. 2017. Managing supply chain resources with Big Data Analytics: A systematic review. International Journal of Logistics: Research and Applications 21: 1–24. [ Google Scholar ] [ CrossRef ]
  • Barbosa, Marcelo Werneck, Marcelo Bronzo, Paulo Renato de Sousa, and Marcos Paulo Valadares de Oliveira. 2022. Supply chain collaboration and organizational performance: The effects of big data analytics capabilities, technological dynamism and competitive intensity. International Journal of Business Environment 13: 358–91. [ Google Scholar ] [ CrossRef ]
  • Barker, Kash, James H. Lambert, Christopher W. Zobel, Andrea H. Tapia, Jose E. Ramirez-Marquez, Laura Albert, Charles D. Nicholson, and Cornelia Caragea. 2017. Defining resilience analytics for interdependent cyber-physical-social networks. Sustainable and Resilient Infrastructure 2: 59–67. [ Google Scholar ] [ CrossRef ]
  • Barlette, Yves, and Paméla Baillette. 2022. Big data analytics in turbulent contexts: Towards organizational change for enhanced agility. Production Planning and Control 33: 105–22. [ Google Scholar ] [ CrossRef ]
  • Bianco, Débora, Adauto Bueno, Moacir Filho Godinho, Hengky Latan, Gilberto Miller Devós Ganga, Alejandro Germán Frank, and Charbel Jose Chiappetta Jabbour. 2023. The role of Industry 4.0 in developing resilience for manufacturing companies during COVID-19. International Journal of Production Economics 256: 108728. [ Google Scholar ] [ CrossRef ]
  • Bode, Christoph, Stephan M. Wagner, Kenneth J. Petersen, and Lisa M. Ellram. 2011. Understanding responses to supply chain disruptions: Insights from information processing and resource dependence perspectives. Academy of Management Journal 54: 833–56. [ Google Scholar ] [ CrossRef ]
  • Bortolini, Marco, Emilio Ferrari, Mauro Gamberi, Francesco Pilati, and Maurizio Faccio. 2017. Assembly system design in the Industry 4.0 era: A general framework. In IFAC PapersOnLine . Paris: International Federation of Automatic Control, pp. 5700–5. [ Google Scholar ]
  • Chaudhuri, Ranjan, Sheshadri Chatterjee, Marcello M. Mariani, and Samuel Fosso Wamba. 2024. Assessing the influence of emerging technologies on organizational data driven culture and innovation capabilities: A sustainability performance perspective. Technological Forecasting and Social Change 200: 123165. [ Google Scholar ] [ CrossRef ]
  • Chen, Hsinchun, Roger Chiang, and Veda Storey. 2012. Business Intelligence and Analytics: From Big Data to Big Impact. MIS Quarterly 36: 1165–88. [ Google Scholar ] [ CrossRef ]
  • Chen, Yi-Ting, Edward Sun, Ming-feng Chang, and Yi-Bing Lin. 2021. Pragmatic real-time logistics management with traffic IoT infrastructure: Big data predictive analytics of freight travel time for Logistics 4.0. International Journal of Production Economics 238: 108157. [ Google Scholar ] [ CrossRef ]
  • Chopra, Sunil, and ManMohan S. Sodhi. 2004. Managing Risk to Avoid Supply-Chain Breakdown. MIT Sloan Management Review 46: 52–61. [ Google Scholar ]
  • Christopher, Martin. 2000. The agile supply chain: Competing in volatile markets. Industrial Marketing Management 29: 37–44. [ Google Scholar ] [ CrossRef ]
  • Christopher, Martin, and Dennis Towill. 2001. An integrated model for the design of agile supply chains. International Journal of Physical Distribution and Logistics Management 31: 235–46. [ Google Scholar ] [ CrossRef ]
  • Christopher, Martin, and Hau Lee. 2004. Mitigating supply chain risk through improved confidence. International Journal of Physical Distribution & Logistics Management 34: 388–96. [ Google Scholar ] [ CrossRef ]
  • Cohen, Jacob. 1988. Statistical Power Analysis for the Behavioral Sciences , 2nd ed. Hillsdale: Lawrence Erlbaum Associates. [ Google Scholar ]
  • Colicchia, Claudia, and Fernanda Strozzi. 2012. Supply chain risk management: A new methodology for a systematic literature review. Supply Chain Management: An International Journal 17: 403–18. [ Google Scholar ] [ CrossRef ]
  • Corallo, Angelo, Anna Maria Crespino, Vito Del Vecchio, Massimiliano Gervasi, Mariangela Lazoi, and Manuela Marra. 2023. Evaluating maturity level of big data management and analytics in industrial companies. Technological Forecasting and Social Change 196: 122826. [ Google Scholar ] [ CrossRef ]
  • Cosic, Ranko, Graeme Shanks, and Sean Maynard. 2015. A business analytics capability framework. Australasian Journal of Information Systems 19: 5–19. [ Google Scholar ] [ CrossRef ]
  • Côrte-real, Nadine, Tiago Oliveira, and Pedro Ruivo. 2017. Assessing business value of Big Data Analytics in European firms. Journal of Business Research 70: 379–90. [ Google Scholar ] [ CrossRef ]
  • Craighead, Christopher W., Jennifer Blackhurst, M. Johnny Rungtusanatham, and Robert B. Handfield. 2007. The severity of supply chain disruptions: Design characteristics and mitigation capabilities. Decision Sciences 38: 131–56. [ Google Scholar ] [ CrossRef ]
  • Dalenogare, Lucas Santos, Guilherme Brittes Benitez, Néstor Fabián Ayala, and Alejandro Germán Frank. 2018. The expected contribution of Industry 4.0 technologies for industrial performance. International Journal of Production Economics 204: 383–94. [ Google Scholar ] [ CrossRef ]
  • Davenport, Thomas H. 2013. Analytics 3.0: Measurable Business Impact from Analytics & Big Data. Harvard Business Review , November 11. [ Google Scholar ]
  • Davenport, Thomas H. 2014. How strategists use ‘big data’ to support internal business decisions, discovery and production. Strategy & Leadership 42: 45–50. [ Google Scholar ] [ CrossRef ]
  • Davenport, Thomas H., and G. Harris Jeanne. 2007. Competing on Analytics: The New Science of Winning . Cambridge: Harvard Business Press. [ Google Scholar ]
  • Dey, Shantanu. 2022. Surviving major disruptions: Building supply chains resilience and visibility through rapid information flow and real-time insights at the ‘edge’. Sustainable Manufacturing and Service Economics 2: 100008. [ Google Scholar ] [ CrossRef ]
  • Dwyer, F. Robert, Paul H. Schurr, and Sejo Oh. 1987. Developing Buyer-Seller Relationships. Journal of Marketing 51: 11–27. [ Google Scholar ] [ CrossRef ]
  • Eisenberg, Daniel, Thomas Seager, and David L. Alderson. 2019. Rethinking Resilience Analytics. Risk Analysis 39: 1870–84. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Eisenhardt, Kathleen M., and Jeffrey A. Martin. 2000. Dynamic capabilities: What are they? Strategic Management Journal 21: 1105–21. [ Google Scholar ] [ CrossRef ]
  • Ergun, Ozlem, Wallace J. Hopp, and Pinar Keskinocak. 2023. A structured overview of insights and opportunities for enhancing supply chain resilience. IISE Transactions 55: 57–74. [ Google Scholar ] [ CrossRef ]
  • Eslami, Mohammad H., Leona Achtenhagen, Cedric Tobias, and Annika Lehmann. 2023. Sharing across supply chain actors in adopting Industry 4.0 technologies: An exploratory case study within the automotive industry. Technological Forecasting & Social Change 186: 122118. [ Google Scholar ] [ CrossRef ]
  • Esper, Terry L., C. Clifford Defee, and John T. Mentzer. 2010. A framework of supply chain orientation. The International Journal of Logistics Management 21: 161–79. [ Google Scholar ] [ CrossRef ]
  • Fabbe-costes, Nathalie, and Lucie Lechaptois. 2022. Automotive Supply Chain Digitalization: Lessons and Perspectives. The Digital Supply Chain . Amsterdam: Elsevier Inc. [ Google Scholar ] [ CrossRef ]
  • Fornell, Claes, and David Larcker. 1981. Evaluating structural equation models with unobservable variables and measurement error. Journal of Marketing Research 18: 39–50. [ Google Scholar ] [ CrossRef ]
  • Ghadge, Abhijeet, Dnyaneshwar Mogale, Michael Bourlakis, Lohithaksha M. Maiyar, and Hamid Moradlou. 2022. Link between Industry 4.0 and green supply chain management: Evidence from the automotive industry. Computers & Industrial Engineering 169: 108303. [ Google Scholar ] [ CrossRef ]
  • Gligor, David, Javad Feizabadi, Terrance Pohlen, Michael Maloni, and Jeffrey A. Ogden. 2022. The impact of the supply chain orientation fit between supply chain members: A triadic perspective. Journal of Business Logistics 43: 518–39. [ Google Scholar ] [ CrossRef ]
  • Gorsuch, Richard. 1983. Factor Analysis , 2nd ed. Hillsdale: Lawrence Erlbaum. [ Google Scholar ]
  • Gupta, Manjul, and Joey F. George. 2016. Toward the development of a big data analytics capability. Information & Management 53: 1049–64. [ Google Scholar ] [ CrossRef ]
  • Hackatorn, Richard. 2002. Minimizing action distance. DM Review 12: 22–23. [ Google Scholar ]
  • Hagstrom, Thomas. 2012. High-performance analytics fuels innovation and inclusive growth: Use big data, hyperconnectivity and speed to intelligence to get true value in the digital economy. Journal of Advanced Analytics 2: 3–4. [ Google Scholar ]
  • Hair, Joseph, Tomas Hult, Christian Ringle, and Marko Sarstedt. 2021. A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) , 3rd ed. Los Angeles: SAGE Publications Inc. [ Google Scholar ]
  • Hallikas, Jukka, Iris Karvonen, Urho Pulkkinen, Veli-Matti Virolainen, and Markku Tuominen. 2004. Risk management processes in supplier networks. International Journal of Production Economics 90: 47–58. [ Google Scholar ] [ CrossRef ]
  • Hatcher, Larry. 1994. A Step-by-Step Approach to Using the SAS ® System for Factor Analysis and Structural Equation Modeling . Cary: SAS Institute. [ Google Scholar ]
  • Henseler, Jörg, Christian M. Ringle, and Marko Sarstedt. 2015. A New criterion for assessing discriminant validity in variance-based structural equation modeling. Journal of the Academy of Marketing Science 43: 115–35. [ Google Scholar ] [ CrossRef ]
  • Hermann, Mario, Tobias Pentek, and Boris Otto. 2015. Design Principles for Industrie 4.0 Scenarios: A Literature Review. Paper presented at the 2016 49th Hawaii International Conference on System Sciences (HICSS), Koloa, HI, USA, January 5–8. [ Google Scholar ]
  • Huynh, Minh Tay, Michael Nippa, and Thomas Aichner. 2023. Big data analytics capabilities: Patchwork or progress? A systematic review of the status quo and implications for future research. Technological Forecasting and Social Change 197: 122884. [ Google Scholar ] [ CrossRef ]
  • IBM Corporation. 2020. Building a Robust, Governed Data Lake for AI . Armond: IBM Corporation. [ Google Scholar ]
  • Ivanov, Dmitry, Weiwei Chen, David Coit, and Nezih Altay. 2024. Modeling and Optimization of Supply Chain Resilience to Pandemics and Long-Term Crises. IISE Transactions 56: 683–84. [ Google Scholar ] [ CrossRef ]
  • Jabbar, Abdul, Pervaiz Akhtar, and Samir Dani. 2020. Real-time big data processing for instantaneous marketing decisions: A problematization approach. Industrial Marketing Management 90: 558–69. [ Google Scholar ] [ CrossRef ]
  • Jeschke, Sabina, Christian Brecher, Tobias Meisen, Denis Özdemir, and Tim Eschert. 2017. Industrial Internet of Things and Cyber Manufacturing Systems. In Industrial Internet of Things . Cham: Springer, pp. 3–19. [ Google Scholar ] [ CrossRef ]
  • Ladeira, Marcelo Bronzo, Marcos Paulo Valadares de Oliveira, Paulo Renato de Sousa, and Marcelo Werneck Barbosa. 2021. Firm’s supply chain agility enabling resilience and performance in turmoil times. International Journal of Agile Systems and Management 14: 224–53. [ Google Scholar ] [ CrossRef ]
  • Lee, Jay, Behrad Bagheri, and Hung-An Kao. 2015. A cyber-physical systems architecture for Industry 4.0-based manufacturing systems. Manufacturing Letters 3: 18–23. [ Google Scholar ] [ CrossRef ]
  • Li, Dun, Bangdong Zhi, Tobias Schoenherr, and Xiaojun Wang. 2023. Developing capabilities for supply chain resilience in a post-COVID world: A machine learning-based thematic analysis. IISE Transactions 55: 1256–76. [ Google Scholar ] [ CrossRef ]
  • Lichtblau, Karl, Volker Stich, Roman Bertenrath, Matthias Blum, Martin Bleider, Agnes Millack, Katharina Schmitt, Edgar Schmitz, and Moritz Schroter. 2015. Impuls: Industrie 4.0 Readiness . Aachen and Cologne: VDMA’s IMPULS-Stiftung. [ Google Scholar ]
  • Lin, Danping, Carman Lee, Henry Lau, and Yang Yang. 2018. Strategic response to Industry 4.0: An empirical investigation on the Chinese automotive industry. Industrial Management and Data Systems 118: 589–605. [ Google Scholar ] [ CrossRef ]
  • Liu, Yang, Wei Fang, Taiwen Feng, and Na Gao. 2022. Bolstering green supply chain integration via big data analytics capability: The moderating role of data-driven decision culture. Industrial Management and Data Systems 122: 2558–82. [ Google Scholar ] [ CrossRef ]
  • Machado, Carla Gonçalves, Mats Winroth, Dan Carlsson, Peter Almstrom, Victor Centerholt, and Malin Hallin. 2019. Industry 4.0 readiness in manufacturing companies: Challenges and enablers towards increased digitalization. Procedia CIRP 81: 1113–18. [ Google Scholar ] [ CrossRef ]
  • Manyika, James, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, and Angela Hung Byers. 2011. Big Data: The Next Frontier for Innovation, Competition, and Productivity . San Francisco: McKinsey Global Institute, Report, June. [ Google Scholar ]
  • Marcucci, Giulio, Sara Antomarioni, Filippo Emanuele Ciarapica, and Maurizio Bevilacqua. 2022. The impact of Operations and IT-related Industry 4.0 key technologies on organizational resilience. Production Planning & Control 33: 1417–31. [ Google Scholar ] [ CrossRef ]
  • Masiero, Gilmar, Mario Henrique, Ailton Conde, and Marcelo Luiz. 2017. The global value chain of electric vehicles: A review of the Japanese, South Korean and Brazilian cases. Renewable and Sustainable Energy Reviews 80: 290–96. [ Google Scholar ] [ CrossRef ]
  • Mettler, Tobias, and Roberto Pinto. 2018. Evolutionary paths and influencing factors towards digital maturity: An analysis of the status quo in Swiss hospitals. Technological Forecasting & Social Change 133: 104–17. [ Google Scholar ] [ CrossRef ]
  • Miemczyk, Joe, Thomas E. Johnsen, and Monica Macquet. 2012. Sustainable purchasing and supply management: A structured literature review of definitions and measures at the dyad, chain and network levels. Supply Chain Management 17: 478–96. [ Google Scholar ] [ CrossRef ]
  • Müller, Oliver, Maria Fay, and Jan vom Brocke. 2018. The Effect of Big Data and Analytics on Firm Performance: An Econometric Analysis Considering Industry Characteristics. Journal of Management Information Systems 35: 488–509. [ Google Scholar ] [ CrossRef ]
  • Nisar, Qasim Ali, Nadia Nasir, Samia Jamshed, Shumaila Naz, Mubashar Ali, and Shahzad Ali. 2020. Big data management and environmental performance: Role of big data decision-making capabilities and decision-making quality. Journal of Enterprise Information Management 34: 1061–96. [ Google Scholar ] [ CrossRef ]
  • Pereira, Carla Roberta, Martin Christopher, and Andrea Lago da Silva. 2014. Achieving supply chain resilience: The role of procurement. Supply Chain Management 19: 626–42. [ Google Scholar ] [ CrossRef ]
  • Pettit, Timothy J. 2008. Supply Chain Resilience: Development of a Conceptual Framework, an Assessment Tool and an Implementation Process. Doctoral dissertation, The Ohio State University, Columbus, OH, USA; pp. 1–420. [ Google Scholar ]
  • Pettit, Timothy J., Keely L. Croxton, and Joseph Fiksel. 2013. Ensuring supply chain resilience: Development and implementation of an assessment tool. Journal of Business Logistics 34: 46–76. [ Google Scholar ] [ CrossRef ]
  • Pinto, Marcelo Rezende, Paula Karina Salume, Marcelo Werneck Barbosa, and Paulo Renato de Sousa. 2023. The path to digital maturity: A cluster analysis of the retail industry in an emerging economy. Technology in Society 72: 102191. [ Google Scholar ] [ CrossRef ]
  • Ponomarov, Serhiy Y. 2012. Antecedents and Consequences of Supply Chain Resilience: A Dynamic Capabilities Perspective . Knoxville: University of Tennesse. [ Google Scholar ]
  • Ponomarov, Serhiy Y., and Mary C. Holcomb. 2009. Understanding the concept of supply chain resilience. The International Journal of Logistics Management 20: 124–43. [ Google Scholar ] [ CrossRef ]
  • Poppo, Laura, and Todd Zenger. 2002. Do formal contracts and relational governance function as substitutes or complements? Strategic Management Journal 23: 707–25. [ Google Scholar ] [ CrossRef ]
  • Qader, Ghulam, Muhammad Junaid, Qamar Abbas, and Muhammad Shujaat Mubarik. 2022. Industry 4.0 enables supply chain resilience and supply chain performance. Technological Forecasting and Social Change 185: 122026. [ Google Scholar ] [ CrossRef ]
  • Rad, Fakhreddin F., Pejvak Oghazi, and Maximilian Palmi. 2022. Industry 4.0 and supply chain performance: A systematic literature review of the benefits, challenges, and critical success factors of 11 core technologies. Industrial Marketing Management 105: 268–93. [ Google Scholar ] [ CrossRef ]
  • Radomir, Läcrämioara, and Ovidiu Moisescu. 2019. Discriminant validity of the customer-based corporate reputation scale: Some causes for concern. Journal of Product & Brand Management 29: 457–469. [ Google Scholar ] [ CrossRef ]
  • Reischauer, Georg. 2018. Industry 4.0 as policy-driven discourse to institutionalize innovation systems in manufacturing. Technological Forecasting and Social Change 132: 26–33. [ Google Scholar ] [ CrossRef ]
  • Rialti, Riccardo, Giacomo Marzi, Cristiano Ciappei, and Donatella Busso. 2019. Big data and dynamic capabilities: A bibliometric analysis and systematic literature review. Management Decision 57: 2052–68. [ Google Scholar ] [ CrossRef ]
  • Richey, Robert Glenn, Anthony S. Roath, Frank G. Adams, and Andreas Wieland. 2022. A responsiveness view of logistics and supply chain management. Journal of Business Logistics 43: 62–91. [ Google Scholar ] [ CrossRef ]
  • Rinehart, Lloyd M., James A. Eckert, Robert B. Handfield, Thomas J. Page, Jr., and Thomas Atkin. 2004. An assessment of supplier-customer relationships. Journal of Business Logistics 25: 25–62. [ Google Scholar ] [ CrossRef ]
  • Scholten, Kirstin, and Sanne Schilder. 2015. The role of collaboration in supply chain resilience. Supply Chain Management: An International Journal 20: 471–84. [ Google Scholar ] [ CrossRef ]
  • Shan, Siqing, Yiting Luo, Yuan Zhou, and Yigang Wei. 2019. Big data analysis adaptation and enterprises’ competitive advantages: The perspective of dynamic capability and resource-based theories. Technology Analysis and Strategic Management 31: 406–20. [ Google Scholar ] [ CrossRef ]
  • Shuradze, Giorgi, and Heinz-Theo Wagner. 2016. Towards a Conceptualization of Data Analytics Capabilities. Paper presented at the Hawaii International Conference on System Sciences (HICSS), Koloa, HI, USA, January 5–8. [ Google Scholar ]
  • Siachou, Evangelia, Demetris Vrontis, and Eleni Trichina. 2020. Can traditional organizations be digitally transformed by themselves? The moderating role of absorptive capacity and strategic interdependence. Journal of Business Research 124: 408–21. [ Google Scholar ] [ CrossRef ]
  • Simpson, Dayna F., and Damien J. Power. 2005. Use the supply relationship to develop lean and green suppliers. Supply Chain Management 10: 60–68. [ Google Scholar ] [ CrossRef ]
  • Solano, Maria C., and Juan C. Cruz. 2024. Integrating Analytics in Enterprise Systems: A Systematic Literature Review of Impacts and Innovations. Administrative Sciences 14: 138. [ Google Scholar ] [ CrossRef ]
  • Suhr, Diana. 2006. Exploratory or Confirmatory Factor Analysis . Cary: SAS Institute. [ Google Scholar ]
  • Teece, David. 2007. Explicating dynamic capabilities: The nature and microfoundations of (sustainable) enterprise performance. Strategic Management Journal 28: 1319–50. [ Google Scholar ] [ CrossRef ]
  • Teece, David, Gary Pisano, and Amy Shuen. 1997. Dynamic capabilities and strategic management. Strategic Management Journal 17: 509–33. [ Google Scholar ] [ CrossRef ]
  • Tjahjono, Benny, Carlos Esplugues, Enrique Ares, and Gustavo Pelaez. 2017. What does Industry 4.0 mean to supply chain? Procedia Manufacturing 13: 1175–82. [ Google Scholar ] [ CrossRef ]
  • Trkman, Peter, Kevin McCormack, Marcos Paulo Valadares de Oliveira, and Marcelo Bronzo Ladeira. 2010. The impact of business analytics on supply chain performance. Decision Support Systems 49: 318–27. [ Google Scholar ] [ CrossRef ]
  • Vachon, Stephan, and Robert Klassen. 2006. Extending green practices across the supply chain. International Journal of Operations & Production Management 26: 795–821. [ Google Scholar ] [ CrossRef ]
  • Valadares de Oliveira, Marcos Paulo, and Robert Handfield. 2018. Analytical foundations for development of real-time supply chain capabilities. International Journal of Production Research 57: 1571–89. [ Google Scholar ] [ CrossRef ]
  • van der Vorst, J., and A. Beulens. 2002. Identifying sources of uncertainty to generate supply chain redesign and strategies. International Journal of Physical Distribution and Logistics Management 32: 409–30. [ Google Scholar ] [ CrossRef ]
  • Van Hoek, Remko I., Alan Harrison, and Martin Christopher. 2016. Measuring Agile Capabilities in Supply Chain. International Journal of Operations & Production Management 21: 126–48. [ Google Scholar ] [ CrossRef ]
  • Vera-Baquero, Alejandro, Ricardo Colomo-Palacios, and Owen Molloy. 2016. Real-time business activity monitoring and analysis of process performance on big-data domains. Telematics and Informatics 33: 793–807. [ Google Scholar ] [ CrossRef ]
  • Wagire, Aniruddha Anil, Rohit Joshi, Ajay Pal Singh Rathore, and Rakesh Jain. 2021. Development of maturity model for assessing the implementation of Industry 4.0: Learning from theory and practice. Production Planning and Control 32: 603–22. [ Google Scholar ] [ CrossRef ]
  • Wamba, Samuel Fosso, Angappa Gunasekaran, Shahriar Akter, Steven Ji-fan Ren, Rameshwar Dubey, and Stephen J. Childe. 2016. Big data analytics and firm performance: Effects of dynamic capabilities. Journal of Business Research 70: 356–65. [ Google Scholar ] [ CrossRef ]
  • Wang, William Yu, and Yichuan Wang. 2020. Analytics in the era of big data: The digital transformations and value creation in industrial marketing. Industrial Marketing Management 86: 12–15. [ Google Scholar ] [ CrossRef ]
  • Wang, Shiyong, Jiafu Wan, Di Li, and Chunhua Zhang. 2016a. Implementing Smart Factory of Industrie 4.0: An Outlook. International Journal of Distributed Sensor Networks 12: 3159805. [ Google Scholar ] [ CrossRef ]
  • Wang, Yichuan, LeeAnn Kung, and Terry Anthony Byrd. 2016b. Big data analytics: Understanding its capabilities and potential benefits for healthcare organizations. Technological Forecasting and Social Change 126: 3–13. [ Google Scholar ] [ CrossRef ]
  • Wieland, Andreas, and Carl Marcus Wallenburg. 2013. The influence of relational competencies on supply chain resilience: A relational view. International Journal of Physical Distribution & Logistics Management 43: 300–20. [ Google Scholar ] [ CrossRef ]
  • Xu, Minghao, Yufei Zhang, Haoran Sun, Yuanxiu Tang, and Junjun Li. 2024. How digital transformation enhances corporate innovation performance: The mediating roles of big data capabilities and organizational agility. Heliyon 10: e34905. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Yang, Hui, Soundar Kumara, Satish T. S. Bukkapatnam, and Fugee Tsung. 2019. The internet of things for smart manufacturing: A review. IISE Transactions 51: 1190–216. [ Google Scholar ] [ CrossRef ]
  • Yang, Xinying, Guanghong Gong, and Yuan Tian. 2008a. Optimal Game Theory in Complicated Virtual-Modeling and CGF Decision-Making with Multi-Granularities. Paper presented at the International Conference on Smart Manufacturing Application, Goyangi, Republic of Korea, April 9–11; pp. 95–99. [ Google Scholar ] [ CrossRef ]
  • Yang, Xinying, Guanghong Gong, Yuan Tian, and Xiaoxia Yu. 2008b. Generalized Optimal Game Theory in Virtual Decision-Makings. Paper presented at the Chinese Control and Decision Conference, Yantai, China, July 2–4; pp. 1960–64. [ Google Scholar ] [ CrossRef ]
  • Yao, Fang, and Yan Wang. 2020. Towards resilient and smart cities: A real-time urban analytical and geo-visual system for social media streaming data. Sustainable Cities and Society 63: 102448. [ Google Scholar ] [ CrossRef ]
  • Zailani, Suhaiza, Kannan Govindan, Mohammad Iranmanesh, and Mohd Rizaimy Shaharudin. 2015. Green innovation adoption in automotive supply chain: The Malaysian case. Journal of Cleaner Production 108: 1115–22. [ Google Scholar ] [ CrossRef ]
  • zur Muehlen, Michael, and Robert Shapiro. 2010. Business Process Analytics. In Handbook on Business Process Management 2, International Handbooks on Information Systems . Berlin: Springer, vol. 2, pp. 137–57. [ Google Scholar ] [ CrossRef ]

Click here to enlarge figure

DimensionClassificationNumber of Responses%
Working experience in the companyMore than 5 years23990.90
Between 3 and 5 years249.13
Position in the organizationDirector/manager of operations, logistics, and supply chain9134.6
CEO, general director, superintendent6022.8
Purchasing assistant/supervisor 4115.6
Operations, logistics, and supply chain assistant/supervisor3914.8
Financial assistant/supervisor134.94
Director/manager of marketing114.18
Director/commercial and sales manager83.04
Company’s positionDirect supplier of parts/modules or systems to automakers 16161.2
Supplier of parts/modules or systems to other suppliers in the automakers’ supply chain10238.8
Number of employees100 to 4994316.3
500 to 99913149.8
1000 or greater8933.8
Annual gross revenue (in US million dollars)More than USD 18 and less than USD 60 5420.5
More than USD 60 and less than or equal to USD 10010740.7
More than USD 100 and less than USD 140 186.84
More than USD 140 and less than USD 200 4115.6
More than USD 200 4316.3
Digital transformation journeyHas not initiated238.75
Less than 3 years2710.27
More than 3 and up to 5 years11242.6
More than 5 and up to 7 years145.32
More than 7 and up to 10 years269.89
More than 10 years6123.19
Area or department responsible for the digital transformation processCEO, general director,
superintendent
4931.6
Director/manager of operations,
logistics, and supply chain
4830.9
Operations, logistics, and
supply chain supervisor
3321.3
Financial assistant/supervisor138.3
Commercial and sales manager85.1
Purchasing assistant/supervisor42.5
None10841.1
Measurement ItemBDA CapabilitiesReaction TimeOrganizational
Resilience
BDAC 10.55920.23980.3818
BDAC 20.66390.47030.5462
BDAC 30.55140.37740.3848
BDAC 40.58680.33050.2311
BDAC 50.65760.24700.3620
BDAC 60.54220.25800.2664
BDAC 70.73010.45570.3726
BDAC 80.69860.53120.4932
BDAC 90.56550.49910.5160
BDAC100.83270.45430.4722
REA 10.66990.74720.5488
REA 20.41450.72850.6298
REA 30.51540.69300.5912
REA 40.27210.68860.3926
REA 50.41900.79420.5289
REA 60.33980.65690.4950
RES 10.50620.58130.7611
RES 20.63860.67300.7596
RES 30.48120.57530.8447
RES 40.39100.52560.7980
BDA CapabilitiesReaction TimeOrganizational
Resilience
BDA capabilities
Reaction time0.69
Organizational resilience0.750.90
Path(R )(Adjusted R )
BDA capabilities -> reaction time0.4060.404
Reaction time -> organizational resilience0.5660.564
HypothesisRelationshipsPath
Coefficients
t-ValuesHypothesis Supported
H1BDA capabilities -> reaction time0.63712.959Yes
H2 Reaction time -> organizational resilience0.75227.436Yes
Construct NameReaction TimeOrganizational Resilience
BDA capabilities0.6845-
Reaction time-1.3019
ConstructsClusterSizeMedianp-Value
BDA capabilitiesOEM direct supplier 1603.600.475
OEM indirect supplier1033.90
Reaction timeOEM direct supplier1604.17<0.001
OEM indirect supplier1033.83
Organizational resilienceOEM direct supplier1604.25<0.001
OEM indirect supplier1033.75
Duration of Digital
Transformation Journey
Cluster 1 (n = 161)Cluster 2 (n = 102)
n%n%
Not initiated2314.2900.00
Less than 3 years1911.8087.84
More than 3 and up to 5 years8351.552928.43
More than 5 and up to 7 years84.9765.88
More than 7 and up to 10 years74.351918.63
More than 10 years2113.044039.22
Average Score of the Intensity of Absorption of Smart Manufacturing Technologies of I4.0 (1–5)Cluster 1
(n = 161) (%)
Cluster 2
(n = 102) (%)
1 to 20%0%
2 to 323%0%
3 to 477%63%
4 to 50%37%
ConstructsClusterSizeMedianp-Value
BDA capabilities11613.50<0.001
21024.25
Reaction time11613.83<0.001
21024.17
Organizational resilience11614.00<0.001
21024.50
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Bronzo, M.; Barbosa, M.W.; de Sousa, P.R.; Torres Junior, N.; Valadares de Oliveira, M.P. Leveraging Supply Chain Reaction Time: The Effects of Big Data Analytics Capabilities on Organizational Resilience Enhancement in the Auto-Parts Industry. Adm. Sci. 2024 , 14 , 181. https://doi.org/10.3390/admsci14080181

Bronzo M, Barbosa MW, de Sousa PR, Torres Junior N, Valadares de Oliveira MP. Leveraging Supply Chain Reaction Time: The Effects of Big Data Analytics Capabilities on Organizational Resilience Enhancement in the Auto-Parts Industry. Administrative Sciences . 2024; 14(8):181. https://doi.org/10.3390/admsci14080181

Bronzo, Marcelo, Marcelo Werneck Barbosa, Paulo Renato de Sousa, Noel Torres Junior, and Marcos Paulo Valadares de Oliveira. 2024. "Leveraging Supply Chain Reaction Time: The Effects of Big Data Analytics Capabilities on Organizational Resilience Enhancement in the Auto-Parts Industry" Administrative Sciences 14, no. 8: 181. https://doi.org/10.3390/admsci14080181

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

IMAGES

  1. Research Methods 12

    chapter 3 research methods reliability

  2. Research Data Collection

    chapter 3 research methods reliability

  3. Understanding Research Methods: Reliability and Methodology in

    chapter 3 research methods reliability

  4. Example of Chapter 3 Research Methodology in a Mini-Dissertation

    chapter 3 research methods reliability

  5. Ch. 5 Notes

    chapter 3 research methods reliability

  6. An Introduction to Reliability Chapter 1 to 4

    chapter 3 research methods reliability

COMMENTS

  1. PDF Fill in the gaps

    Chapter 3: Research methods 66—67Reliability Fill in the gaps - reliability Activity type Consolidation A simple way of checking a thorough understanding of the topic of reliability. You can decide whether to provide the words to be used or not. Either way, students should complete it without notes or text initially.

  2. PDF CHAPTER 3 Research methodology

    3.1 INTRODUCTION. In this chapter the research methodology used in the study is described. The geographical area where the study was conducted, the study design and the population and sample are described. The instrument used to collect the data, including methods implemented to maintain validity and reliability of the instrument, are described.

  3. PDF CHAPTER THREE: RESEARCH METHODOLOGY 3.1Introduction

    3.2 gives an idea of research methodology; section 3.3 explains the method for data collection and describes the sample population to be studied. Section 3.4 explains the data gathering methods including sampling and the design of the research instrument, and the data collection procedures. Finally, section 3.5 gives a summary of the research

  4. PDF CHAPTER 3: METHODOLOGY

    CHAPTER 3: METHODOLOGY The methods used in this research consist of a combination of quantitative and qualitative approaches: a "mixed methods" approach, which is described in more detail ... 1999) and "improve[s] the validity and reliability of research or evaluation of findings" (Golafshani, 2003, p. 603). As Bryman said:

  5. PDF 3. CHAPTER 3 RESEARCH METHODOLOGY

    CHAPTER 3. 3. CHAPTER 3. RCH METHODOLOGY3.1 IntroductionThis Chapter presents the de. It provides. d in undertaking this research aswell as a justifi. on for the use of this method. lection of participants, the datacollection process. nd the process of data analysi. . The Chapter also discusses therole of the researcher in qualitative re.

  6. Reliability and Validity of Measurement

    Reliability is consistency across time (test-retest reliability), across items (internal consistency), and across researchers (interrater reliability). Validity is the extent to which the scores actually represent the variable they are intended to. Validity is a judgment based on various types of evidence.

  7. PDF CHAPTER III RESEARCH METHODOLOGY

    This chapter includes research method, research participants, data collecting techniques, and data analysis. 3.1 Research Method ... 3.4.2 Reliability Reliability is the extent to which the result can be regarded consistent or stable (Brown, 1990, p.98). Thus, a test can be considered reliable if it is

  8. PDF CHAPTER 3 RESEARCH DESIGN AND METHODOLOGY

    CHAPTER 3 RESEARCH DESIGN AND METHODOLOGY 3.1 INTRODUCTION According to Brink (1999), the aim of data analysis is to reduce and synthesise ... The chapter also outlines the methods of data collection and the plan for data analysis. Lastly issues related to the reliability, validity and bias are discussed. 3.2 RESEARCH DESIGN According to ...

  9. PDF Chapter 3: Research methods Do you know everything that you need to

    Chapter 3: Research methods Types of validity 68—69 Validity and reliability definition cards 3.5 Validity The extent to which an observed effect is genuine. The extent to which the researcher has measured what they intended to and the extent to which the findings can be generalised. External validity The extent to which an observed

  10. (PDF) Chapter 3 Research Design and Methodology

    Research Design and Methodology. Chapter 3 consists of three parts: (1) Purpose of the. study and research design, (2) Methods, and (3) Statistical. Data analysis procedure. Part one, Purpose of ...

  11. (PDF) Chapter 3

    Chapter 3 - Research Methodology a nd Research Method. This chapter looks at the various research methodologies and research methods that are commonly. used by researchers in the field of ...

  12. PDF Presenting Methodology and Research Approach

    and research questions and the research approach and research methods that you have selected. Note that in the proposal's chapter 3, you project what you will do based on what you know about the particular methods used in qualitative research, in general, and in your tra-dition or genre, in particular; hence, it is written in future tense.

  13. (PDF) Chapter 3: Research Design and Methodology

    T est-retest reliability of the scale .85 (Zimet et . al., 1988). ... The goal of this chapter was to outline the research method used to answer the research questions.

  14. 3.3 Research Methods

    3.3.2 Evaluating Research Methodologies 3.3.2.1 Reliability of Studies, Validity, and Generalizability. Researchers design studies to maximize reliability, which refers to how likely research results are to be replicated if the study is reproduced. Reliability increases the likelihood that what happens to one person will happen to all people in ...

  15. Chapter 3: Home

    Chapter 3. Chapter 3 explains the research method being used in the study. It describes the instruments associated with the chosen research method and design used; this includes information regarding instrument origin, reliability, and validity. Chapter 3 details the planned research approach, design, and analysis.

  16. PDF Reliability and validity definition cards handout number 3

    Chapter 3: Research methods 68-69 Types of Validity Reliability and validity definition cards Activity type Consolidation Once both the reliability and validity spreads in the textbook have been taught, it can be useful to give out these definition cards to encourage thorough learning of these important terms. Ask students to cut the cards up.

  17. PDF CHAPTER III RESEARCH METHODOLOGY

    CHAPTER IIIRESEARCH METHODOLOGYChapter three presents the method. logy in conducting the research. This chapter provides four main parts of the investigation: research design, data collection technique, research procedu. technique.3. 1 Research DesignThe research employed quantitative method in the form of quasi experimental des.

  18. PDF Chapter Iii Research Method

    This chapter consists of seven sections namely research design, research. nstruments, data collectiontechnique, data analysis technique, and resear. h v. lidity and reliability. Thediscussion of e. d below.A. Research DesignThe design of this study i. classroom action re.

  19. PDF CHAPTER III RESEARCH METHODOLOGY validity and reliability are

    collection, analysis, and finally, problem. redefinition. In short, the cycle consists of diagnostic and therapeutic stages. According to Elliot (1991) there are two reasons using an action research in this. study. First, action research is a method and process to bridge between theory and. practice.

  20. PDF CHAPTER 3 VALIDITY AND RELIABILITY

    3.1 INTRODUCTION. In Chapter 2, the study's aims of exploring how objects can influence the level of construct validity of a Picture Vocabulary Test were discussed, and a review conducted of the literature on the various factors that play a role as to how the validity level can be influenced. In this chapter validity and reliability are ...

  21. PDF Chapter 3 Research Methodology

    Chapter 3. Methodology3.1 IntroductionThe chapter presents methodology employed for examining framework developed, during the literature review, fo. the purpose of present study. In light of the research objectives, the chapter works upon the ontology, epistemology as well as the meth-odology.

  22. (Pdf) Chapter Three Research Methodology 3.1

    Only a reliability coefficient of 0.6 or 0.7 and above will be accepted. 3.7 Ethical Consideration Before data collection in the field, the researcher will seek permission from Government Authorities.

  23. 1 Introduction

    Further, the chapter provides a summary of the research behind this Guide. â ¢ Chapter 2â Rural Highway LOS for Automobiles. ... â ¢ Chapter 3â Automobile Travel Time Reliability. This chapter describes methods for quanti- fying travel time reliability, from a historical perspective, based on probe vehicle travel speed measurements. ...

  24. 11. Quantitative measurement

    Chapter Outline. Conceptual definitions (17 minute read); Operational definitions (36 minute read); Measurement quality (21 minute read); Ethical and social justice considerations (15 minute read); Content warning: examples in this chapter contain references to ethnocentrism, toxic masculinity, racism in science, drug use, mental health and depression, psychiatric inpatient care, poverty and ...

  25. PDF Practical Use Additional notes

    Put them in groups of 4-5 and ask them to come to a unanimous decision on the following: It's 1941 and the war is waging. You are a married man/woman with 3 small children: a girl aged 6, a boy aged 5 and a baby girl of 8 months. The man is away fighting a lot of the time. You live in London and the bombing is bad.

  26. Administrative Sciences

    Big data analytics capabilities (BDACs) are strategic capabilities that expedite decision-making processes, empowering organizations to mitigate the impacts of supply chain disruptions. These capabilities enhance the ability of companies to be more proactive in detecting and predicting disruptive events, increasing their resilience. This study analyzed the effects BDACs have on firms ...