David Collier and James Mahoney, “Insights and pitfalls: selection bias in qualitative research,” World Politics 48 (1996), pp. 56-91

Main Argument: Authors are concerned with the circumstances under which selection bias can arise in small-N comparative analysis. Selection bias is a common and potentially serious problem, and qualitative researchers in international and comparative studies need to understand the consequences of selecting extreme cases of the outcome they wish to explain. There are tradeoffs in dealing with bias.
Key Definitions:
Bias: is systematic error that is expected to occur in a given context of research
Error: is generally taken to mean any difference between an estimated value and the “true” value of a variable or parameter
== Notes: ==
Concern with selecting cases based on particular outcome, esp. when outcomes are all the same or nearly the same.
deliberate selection of cases on the DV
Rogowski suggests that some of the most influential studies in comparative politics have managed to produce valuable findings even though they violate norms of case selection proposed by the literature on selection bias
Selecting Cases on the DV:
*  any mode of selection that is correlated with the dependent variable (that is, tending to select cases that have higher, or lower, values on that variable), once the effect of the explanatory variables included in the analysis is removed
*  Self-selection of individuals into the categories of an explanatory variable
*  Selection of cases above or below a particular value on the overall distribution of cases that is considered relevant to research question = truncation [59]
*  Choosing observations that constrain variation on the dependent variable reduces slope estimate –> selection on IV does not have same effect
Generalization:
*  If researchers seek to make causal inferences, they should be concerned about the larger comparison
Appropriate Frame of Comparison:
*  Garfinkel’s “contrast space” helps allow for selection of cases, taking into consideration keeping the range on the DV
*  Contrast space is derived from research question
*  Causal heterogeneity might mean the qualitative researchers trade-off more cases …. may lead to examination of limited range of variance of DV [67]
*  more general theories are also more vulnerable to problems of conceptual validity, because extending the theory to broader contexts may result in conceptual stretching
*  scholars engaged in new forms of theoretical modeling in the social sciences might maintain that it is in fact possible to develop valid concepts at a high level of generality across what might appear to be heterogeneous contexts, and that the models in which these concepts are embedded, if appropriately applied, can perform well across a broad range of cases in terms of the criteria of parsimony, accuracy, and causality –> maybe tradeoff is not necessary [69]
Selection Bias & Within-Case Analysis:
*  Qualitative method for overcoming methodological problems may be through discerning, process analysis, pattern matching, process tracing, and causal narrative … within case analysis
*  Take many observations of the same case over time, eliminating degrees of freedom problem but NOT selection bias problem (Campbell) by increasing number of cases
Variance & Other problems in small-N Analysis:
*  Such as Mill’s method of agreement / also noted by KKV (1994) & Geddes (1990)
*  outcome to be explained as a dichotomous or continuous  variable –> no variance on the DV
*  routinely employed in conjunction with counterfactual analysis
*  cases with no variance on the IV are also problematic
*  also disagreement over what the DV actually is and the scope of its variation
*  Geddes’ article on selection bias: compare inference derived from the initial set of cases with a parallel inference based on additional cases that are not selected on the dependent variable [79] –> C&M critique Geddes however
*  problem of selecting on the dependent variable that can result from choosing the end point in time-series data
Main conclusions:
*  Selection bias a serious problem in small-N studies too
*  Need to define the frame of reference against which full variation of DV is assessed
*  Comparing case with larger sets of cases that exhibit greater variation on the DV is useful
*  Random sampling may cause more problems than it solves in small-n studies
*  Qualitative designs that lack variance on the DV are vulnerable to selection bias, as in the problem of complexification based on extreme cases [90]