The importance and effect of sample size
written bySarah Small
Posted inMarch, 27.10.2015
When conducting research on your customers, patients or products, it is generally impossible, or at least impractical, to collect data on every person or object of interest to you. Instead, we take a sample (or subset) of the population of interest. and learn as much as you can about the sample population.
There are many things that can affect how well our sample reflects the population and how valid and reliable our conclusions are. In this blog, we present some of the key concepts to remember when conducting a survey, includingconfidence levelsjmargin of error,Violencejeffect sizes. (See the glossary below for some helpful definitions of these terms.) Crucially, we'll see that this is all affected by the sample size you collect.sample size.
confidence and error rate
Let's start with an example where we are simply estimating a characteristic of our population and want to see how our sample size affects the accuracy of our estimate.
Our sample size determines the amount of information we have and therefore partly determines ours.precisiono Level of confidence we have in our sample estimates. An estimate is always associated with a certain degree of uncertainty, which depends on the underlying variability of the data and the sample size. The more variable the population, the greater the uncertainty in our estimate. The larger the sample size, the more information we have and the smaller our uncertainty.
Suppose we want to estimate the percentage of adults in the UK who own a smartphone. We could take a sample of 100 people and interview them. Note: It is important to consider how the sample is selected to ensure that it is unbiased and representative of the population; We will talk about this subject in another blog.
The larger the sample size, the more information we have and the smaller our uncertainty.
If 59 out of 100 people own a smartphone, we estimate the UK ratio to be 59/100 = 59%. We can also build an interval around this point estimate to express our uncertainty in it, i.e. H. ourerror area. For example 95%confidence intervalfor our estimate based on our sample size 100 ranges from 49.36% to 68.64% (which can be calculated with ourfree online calculator). Alternatively, we can express this range by saying that our estimate is 59% with a margin of error of ±9.64%. This is a 95% confidence interval, which means that there is a 95% chance that this interval contains the true proportion. In other words, if we took 100 different samples from the population, the true ratio would fall in this range about 95 times out of 100.
What if we increased our sample size by going out and asking more people?
Let's say we survey another 900 people and find that 590 out of 1,000 people own a smartphone. Again, our estimate of prevalence in the general population is 590/1000 = 59%. However, our confidence interval for the estimate has now decreased significantly to 55.95% to 62.05%, a margin of error of ±3.05%; see Figure 1 below. Because we have more data and therefore more information, our estimate is more accurate.
illustration 1
As the sample size increases, our confidence in our estimate increases, our uncertainty decreases, and we have greater precision. This is illustrated by the narrowing of the confidence intervals in the figure above. If we took this to the limit and surveyed our entire population of interest, we'd get the true value we're trying to estimate: the true proportion of adults in the UK who own a smartphone, and we'd have no uncertainty in our estimate.
effect power and size
Increasing our sample size may also give us more opportunities to detect differences. Let's assume in the above example that we would also be interested to know if the proportion of men and women who own a smartphone is different.
We can estimate the sample proportions for men and women separately and then calculate the difference. When we originally sampled 100 people, we assumed there were 50 men and 50 women, of whom 25 and 34 have smartphones, respectively. The proportion of men and women who own a smartphone in our sample is therefore 25/50 = 50% and 34/50 = 68%, with fewer men than women owning a smartphone. The difference between these two proportions is called the observed effect size. In this case, we observe that the gender effect reduces the ratio between men and women by 18%.
Given such a small sample of the population, is this observed effect significant, or could the proportions be the same for men and women and is the observed effect simply random?
To study this, we can use a statistical test, and in this case we use what is called the "equal proportions binomial test" or "two-proportion z-test'. We found that there is not enough evidence to establish a difference between men and women and the result is considered not statistically significant. The probability of observing a gender effect of 18% or more if there really was no difference between men and women is greater than 5%, which is relatively likely, so the data do not provide any real evidence that the actual proportions of men and women women present on smartphones are different. This 5% cut is commonly used and is called "significant level" of the exam. It is selected before performing a test and is the probability of committing a type I error, that is, finding a statistically significant result, since in reality there is no difference in the population.
What happens if we increase our sample size and add an additional 900 people to our sample?
Let's assume there are 500 women and 500 men in total, of which 250 and 340 respectively own a smartphone. We now have estimates of 250/500 = 50% and 340/500 = 68% of men and women who own a smartphone. The effect size, i.e. H. The difference between stocks is the same as before (50% - 68% = -18%), but more importantly, we have more data to support this estimate of the difference. Using the equal proportions statistical test again, we find that the result is statistically significant at the 5% significance level. By increasing our sample size, we increase our power to tell the difference between the proportion of men and women who own a smartphone in the UK.
Figure 2 is a graph showing the observed proportions of males and females along with their respective 95% confidence intervals. We can clearly see that the confidence intervals for our estimates for men and women narrow significantly as the sample size increases. With a sample size of just 100, the confidence intervals overlap, giving little indication that the proportions of men and women are really different. On the other hand, in the larger sample size of 1000, there is a significant gap between the two intervals and strong evidence that the proportions of men and women are indeed different.
The binomial test above essentially looks at how much these pairs of intervals overlap, and if the overlap is small enough, we conclude that there really is a difference. (Note: data in this blog is for illustrative purposes only; seeThis articlefor the results of a recent smartphone usage survey earlier this year).
Figure 2
If the effect size is small, you'll need a large sample size to notice the difference; Otherwise, the randomness in your samples will mask the effect. Basically, all differences are within the associated confidence intervals and you cannot see them. The ability to recognize a specific effect size is called a statistic.Violence. More formally, statistical power is the probability of finding a statistically significant result given that there really is a difference (or effect) in the population. Check out our latest blog post »Depression in men 'regularly ignored''" for another example of the effect of sample size on the probability of finding a statistically significant result.
Therefore, larger sample sizes provide more reliable results with greater accuracy and performance, but cost more time and money. For this reason, you should always perform a sample size calculation before conducting a survey to ensure that you have a large enough sample size to draw meaningful conclusions without wasting resources by sampling more than necessary. We collect someFree online statistical calculatorsto help you perform your own statistical calculations, including sample size calculations forappreciate a partjComparison of two proportions.
glossary
error area– This is the accuracy you need. This is the range over which the value to be measured is estimated and is usually expressed in percentage points (eg ±2%). A smaller margin of error requires a larger sample size.
Trust level– Expresses the uncertainty associated with an estimate. It is the probability that the confidence interval (margin of error around the estimate) contains the actual value you are trying to estimate. A higher confidence level requires a larger sample size.
Violence– This is the probability of finding statistically significant evidence of a difference between groups provided there is a difference in the population. Greater power requires a larger sample size.
effect size– This is the estimated difference between the groups that we observed in our sample. To detect a difference with a given power, a smaller effect size requires a larger sample size.
related posts
FAQs
What is the importance of sample size in statistics? ›
What is sample size and why is it important? Sample size refers to the number of participants or observations included in a study. This number is usually represented by n. The size of a sample influences two statistical properties: 1) the precision of our estimates and 2) the power of the study to draw conclusions.
What is the importance of determining the sample size and sampling method in a research study? ›Determining the appropriate sample size is one of the most important factors in statistical analysis. If the sample size is too small, it will not yield valid results or adequately represent the realities of the population being studied.
What is an impact of sample size when conducting sample preparation? ›The size of our sample dictates the amount of information we have and therefore, in part, determines our precision or level of confidence that we have in our sample estimates.
What are the two important factors needed to be considered in determining the sample size? ›In general, three or four factors must be known or estimated to calculate sample size: (1) the effect size (usually the difference between 2 groups); (2) the population standard deviation (for continuous data); (3) the desired power of the experiment to detect the postulated effect; and (4) the significance level.
What are the advantages of sample size? ›The larger the study sample size, the smaller the margin of error.) Larger sample sizes allow researchers to control the risk of reporting false-negative or false-positive findings. The greater number of samples, the greater the precision of results will be.
What are the factors that affect sample size in statistics? ›The factors affecting sample sizes are study design, method of sampling, and outcome measures – effect size, standard deviation, study power, and significance level.
What is the most important consideration in determining sample size? ›- Know how variable the population is that you want to measure. ...
- Know how precise the population statistics need to be. ...
- Know exactly how confident you must be in the results.
The primary goal of sampling is to create a representative sample, one in which the smaller group (sample) accurately represents the characteristics of the larger group (population). If the sample is well selected, the sample will be generalizable to the population. There are many ways to obtain a sample.
What are the benefits of large sample size in research? ›Nonetheless, the advantages of a large sample size to interpret significant results are it allows a more precise estimate of the treatment effect and it usually is easier to assess the representativeness of the sample and to generalize the results.
How does sample size impact a statistical test? ›The sample size or the number of participants in your study has an enormous influence on whether or not your results are significant. The larger the actual difference between the groups (ie. student test scores) the smaller of a sample we'll need to find a significant difference (ie. p ≤ 0.05).
How does sample size impact effect size? ›
Small sample size studies produce larger effect sizes than large studies. Effect sizes in small studies are more highly variable than large studies. The study found that variability of effect sizes diminished with increasing sample size.
How does sample size affect statistical significance quizlet? ›Sample size is important because larger samples offer more precise estimates of the true population value.
What is the most important factor in selecting a sample? ›A good sample should be a representative subset of the population we are interested in studying, therefore, with each participant having equal chance of being randomly selected into the study.
What are the strategies for determining sample size? ›Step 1: Set precision and confidence levels. Step 2: Calculate the effective sample size. Step 3: Determine design effect and adjust sample size. Step 4: Estimate eligibility & completion rates; adjust sample size.
What are the 4 ways to determine the sample size? ›- Step 1 Find out the size of the population. ...
- Step 2 Determine the margin of error. ...
- Step 3 Set confidence level. ...
- Step 4 Use a formula to find sample size.
In statistics, the sample size is the measure of the number of individual samples used in an experiment. For example, if we are testing 50 samples of people who watch TV in a city, then the sample size is 50. We can also term it Sample Statistics.
What is the impact of increasing the sample size on sampling error? ›In general, larger sample sizes decrease the sampling error, however this decrease is not directly proportional. As a rough rule of thumb, you need to increase the sample size fourfold to halve the sampling error.
What are the three most important considerations related to sampling? ›Such considerations include understanding of: the reasons for and objectives of sampling. the relationship between accuracy and precision. the reliability of estimates with varying sample size.
What is statistical sample size determination? ›Sample size determination is the act of choosing the number of observations or replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences about a population from a sample.
Why is it important to also consider the effect size? ›Effect size helps readers understand the magnitude of differences found, whereas statistical significance examines whether the findings are likely to be due to chance. Both are essential for readers to understand the full impact of your work.
Why does having a large sample size give more reliable results? ›
The first reason to understand why a large sample size is beneficial is simple. Larger samples more closely approximate the population. Because the primary goal of inferential statistics is to generalize from a sample to a population, it is less of an inference if the sample size is large.
How does large sample size affect statistical significance? ›The greater the sample size, the more likely we are to find a statistically significant difference between groups, but that doesn't mean the effect we find is meaningful. With infinitely large sample sizes, we can actually find statistically significant differences between basically anything.
What is a good sample size? ›A good maximum sample size is usually 10% as long as it does not exceed 1000. A good maximum sample size is usually around 10% of the population, as long as this does not exceed 1000. For example, in a population of 5000, 10% would be 500. In a population of 200,000, 10% would be 20,000.
How does changing the sample size impact power? ›As the sample size increases, so does the power of the significance test. This is because a larger sample size constricts the distribution of the test statistic. This means that the standard error of the distribution is reduced and the acceptance region is reduced which in turn increases the level of power.
How would you best describe the relationship between effect size and statistical significance? ›Effect size is not the same as statistical significance: significance tells how likely it is that a result is due to chance, and effect size tells you how important the result is.
Why is a sample size of 30 important? ›A sample size of 30 often increases the confidence interval of your population data set enough to warrant assertions against your findings. 4 The higher your sample size, the more likely the sample will be representative of your population set.
What are three reasons why samples are used in statistics? ›Samples are used to make inferences about populations. Samples are easier to collect data from because they are practical, cost-effective, convenient, and manageable.
What is the impact of small sample size? ›A study that has a sample size which is too small may produce inconclusive results and could also be considered unethical, because exposing human subjects or lab animals to the possible risks associated with research is only justifiable if there is a realistic chance that the study will yield useful information.
What are the advantages of small sample size? ›Small samples have a tremendous advantage as highly sophisticated and accurate measurements can be made with all the precautions in place. The measurement errors and biases can be easily controlled and can be easily identified in a small sample.
Does the sample size matter? ›A larger sample size should hypothetically lead to more accurate or representative results, but when it comes to surveying large populations, bigger isn't always better. In fact, trying to collect results from a larger sample size can add costs – without significantly improving your results.